Selection MemMap Helper

GitHub Link to Code.

Memory mapping operations helper for feature selection system.

Provides memory-mapped matrix operations for efficient handling of large datasets in the feature selection system.

class mdxplain.pipeline.helper.selection_memmap_helper.SelectionMemmapHelper

Helper class for memory mapping operations in feature selection system.

Provides static methods for creating memory-mapped selections, horizontally stacking matrices while preserving memmap nature, and creating frame selections efficiently.

static create_memmap_selection(data: ndarray, indices: List[int], name: str, data_type: str, feature_type: str, cache_dir: str, chunk_size: int) List[ndarray]

Create memory-efficient selection using chunk-wise processing.

This method avoids loading entire columns into RAM by processing data in chunks and writing directly to a memmap output file.

Parameters

datanumpy.ndarray or memmap

Source data to select from

indiceslist

Column indices to select

namestr

Selection name for cache file naming

data_typestr

Type of data (for cache naming)

feature_typestr

Feature type name (for cache naming)

cache_dirstr

Cache directory for memmap files

chunk_sizeint

Chunk size for processing

Returns

list

List containing the memmap-selected data matrix

static memmap_hstack(matrices: list, name: str, cache_dir: str, chunk_size: int) ndarray

Horizontally stack matrices while preserving memmap nature.

Parameters

matriceslist

List of matrices to stack (all are memmap)

namestr

Name of the selection for cache file naming

cache_dirstr

Cache directory for memmap files

chunk_sizeint

Chunk size for processing

Returns

numpy.ndarray

Stacked matrix stored as memmap

static create_memmap_frame_selection(data: ndarray, frame_indices: List[int], name: str, cache_dir: str, chunk_size: int) ndarray

Create memory-efficient frame selection using chunk-wise processing.

This method avoids loading entire rows into RAM by processing data in chunks and writing directly to a memmap output file. Handles frame selection (row-wise) efficiently for large datasets.

Parameters

datanumpy.ndarray or memmap

Source data to select from

frame_indiceslist

Row indices to select

namestr

Selection name for cache file naming

cache_dirstr

Cache directory for memmap files

chunk_sizeint

Chunk size for processing

Returns

numpy.ndarray

Memmap array with selected frames