Post Selection Reduction Helper
GitHub Link to Code.
Helper for applying post-selection reduction.
This module provides the PostSelectionReductionHelper class that applies statistical reduction to feature selections. The reduction is applied ONLY to the specific selection where it’s defined, not to all selections of that feature type.
- class mdxplain.feature_selection.helper.post_selection_reduction_helper.PostSelectionReductionHelper
Helper for applying post-selection reduction.
Important: Reduction is applied ONLY to the specific selection where it’s defined, not to all selections of that feature type.
This helper applies statistical reduction to features after initial selection. It uses the appropriate calculator for each feature type to compute reduction metrics and filters features based on threshold criteria.
The reduction process: 1. Get feature data for each trajectory 2. Calculate which columns to remove using calculator 3. Apply reduction mode (intersection/union/pooled or per_trajectory) 4. Update trajectory_results with reduced indices
- static apply_reduction(pipeline_data: PipelineData, feature_key: str, selection_dict: Dict[str, Any], trajectory_results: Dict[int, Dict], selected_traj_indices: List[int], use_memmap: bool = False, chunk_size: int = 2000, cache_dir: str = './cache') Dict[int, Dict]
Apply reduction to a specific selection.
Process: 1. Get feature data for each trajectory 2. Calculate which columns to remove using calculator 3. Apply reduction mode (intersection/union/pooled or per_trajectory) 4. Update trajectory_results
Parameters
- pipeline_dataPipelineData
Pipeline data container
- feature_keystr
Feature type key (e.g., “distances”, “contacts”)
- selection_dictdict
Selection configuration with reduction config
- trajectory_resultsdict
Current selection results per trajectory
- selected_traj_indiceslist
Trajectory indices for this selection
- use_memmapbool, default=False
Whether to use memory-mapped files for large data processing
- chunk_sizeint, default=2000
Size of data chunks for memory-efficient processing
- cache_dirstr, default=”./cache”
Directory for temporary cache files
Returns
- dict
Updated trajectory_results with reduced indices
Examples
>>> results = PostSelectionReductionHelper.apply_reduction( ... pipeline_data, "distances", selection_dict, results, [0, 1])