Selection MemMap Helper
GitHub Link to Code.
Memory mapping operations helper for feature selection system.
Provides memory-mapped matrix operations for efficient handling of large datasets in the feature selection system.
- class mdxplain.pipeline.helper.selection_memmap_helper.SelectionMemmapHelper
Helper class for memory mapping operations in feature selection system.
Provides static methods for creating memory-mapped selections, horizontally stacking matrices while preserving memmap nature, and creating frame selections efficiently.
- static create_memmap_selection(data: ndarray, indices: List[int], name: str, data_type: str, feature_type: str, cache_dir: str, chunk_size: int) List[ndarray]
Create memory-efficient selection using chunk-wise processing.
This method avoids loading entire columns into RAM by processing data in chunks and writing directly to a memmap output file.
Parameters
- datanumpy.ndarray or memmap
Source data to select from
- indiceslist
Column indices to select
- namestr
Selection name for cache file naming
- data_typestr
Type of data (for cache naming)
- feature_typestr
Feature type name (for cache naming)
- cache_dirstr
Cache directory for memmap files
- chunk_sizeint
Chunk size for processing
Returns
- list
List containing the memmap-selected data matrix
- static memmap_hstack(matrices: list, name: str, cache_dir: str, chunk_size: int) ndarray
Horizontally stack matrices while preserving memmap nature.
Parameters
- matriceslist
List of matrices to stack (all are memmap)
- namestr
Name of the selection for cache file naming
- cache_dirstr
Cache directory for memmap files
- chunk_sizeint
Chunk size for processing
Returns
- numpy.ndarray
Stacked matrix stored as memmap
- static create_memmap_frame_selection(data: ndarray, frame_indices: List[int], name: str, cache_dir: str, chunk_size: int) ndarray
Create memory-efficient frame selection using chunk-wise processing.
This method avoids loading entire rows into RAM by processing data in chunks and writing directly to a memmap output file. Handles frame selection (row-wise) efficiently for large datasets.
Parameters
- datanumpy.ndarray or memmap
Source data to select from
- frame_indiceslist
Row indices to select
- namestr
Selection name for cache file naming
- cache_dirstr
Cache directory for memmap files
- chunk_sizeint
Chunk size for processing
Returns
- numpy.ndarray
Memmap array with selected frames