Post Selection Reduction Helper

GitHub Link to Code.

Helper for applying post-selection reduction.

This module provides the PostSelectionReductionHelper class that applies statistical reduction to feature selections. The reduction is applied ONLY to the specific selection where it’s defined, not to all selections of that feature type.

class mdxplain.feature_selection.helper.post_selection_reduction_helper.PostSelectionReductionHelper

Helper for applying post-selection reduction.

Important: Reduction is applied ONLY to the specific selection where it’s defined, not to all selections of that feature type.

This helper applies statistical reduction to features after initial selection. It uses the appropriate calculator for each feature type to compute reduction metrics and filters features based on threshold criteria.

The reduction process: 1. Get feature data for each trajectory 2. Calculate which columns to remove using calculator 3. Apply reduction mode (intersection/union/pooled or per_trajectory) 4. Update trajectory_results with reduced indices

static apply_reduction(pipeline_data: PipelineData, feature_key: str, selection_dict: Dict[str, Any], trajectory_results: Dict[int, Dict], selected_traj_indices: List[int], use_memmap: bool = False, chunk_size: int = 2000, cache_dir: str = './cache') → Dict[int, Dict]

Apply reduction to a specific selection.

Process: 1. Get feature data for each trajectory 2. Calculate which columns to remove using calculator 3. Apply reduction mode (intersection/union/pooled or per_trajectory) 4. Update trajectory_results

Parameters

pipeline_dataPipelineData: Pipeline data container
feature_keystr: Feature type key (e.g., “distances”, “contacts”)
selection_dictdict: Selection configuration with reduction config
trajectory_resultsdict: Current selection results per trajectory
selected_traj_indiceslist: Trajectory indices for this selection
use_memmapbool, default=False: Whether to use memory-mapped files for large data processing
chunk_sizeint, default=2000: Size of data chunks for memory-efficient processing
cache_dirstr, default=”./cache”: Directory for temporary cache files

Returns

dict: Updated trajectory_results with reduced indices

Examples

>>> results = PostSelectionReductionHelper.apply_reduction(
...     pipeline_data, "distances", selection_dict, results, [0, 1])