Post Selection Reduction Helper

GitHub Link to Code.

Helper for applying post-selection reduction.

This module provides the PostSelectionReductionHelper class that applies statistical reduction to feature selections. The reduction is applied ONLY to the specific selection where it’s defined, not to all selections of that feature type.

class mdxplain.feature_selection.helper.post_selection_reduction_helper.PostSelectionReductionHelper

Helper for applying post-selection reduction.

Important: Reduction is applied ONLY to the specific selection where it’s defined, not to all selections of that feature type.

This helper applies statistical reduction to features after initial selection. It uses the appropriate calculator for each feature type to compute reduction metrics and filters features based on threshold criteria.

The reduction process: 1. Get feature data for each trajectory 2. Calculate which columns to remove using calculator 3. Apply reduction mode (intersection/union/pooled or per_trajectory) 4. Update trajectory_results with reduced indices

static apply_reduction(pipeline_data: PipelineData, feature_key: str, selection_dict: Dict[str, Any], trajectory_results: Dict[int, Dict], selected_traj_indices: List[int], use_memmap: bool = False, chunk_size: int = 2000, cache_dir: str = './cache') Dict[int, Dict]

Apply reduction to a specific selection.

Process: 1. Get feature data for each trajectory 2. Calculate which columns to remove using calculator 3. Apply reduction mode (intersection/union/pooled or per_trajectory) 4. Update trajectory_results

Parameters

pipeline_dataPipelineData

Pipeline data container

feature_keystr

Feature type key (e.g., “distances”, “contacts”)

selection_dictdict

Selection configuration with reduction config

trajectory_resultsdict

Current selection results per trajectory

selected_traj_indiceslist

Trajectory indices for this selection

use_memmapbool, default=False

Whether to use memory-mapped files for large data processing

chunk_sizeint, default=2000

Size of data chunks for memory-efficient processing

cache_dirstr, default=”./cache”

Directory for temporary cache files

Returns

dict

Updated trajectory_results with reduced indices

Examples

>>> results = PostSelectionReductionHelper.apply_reduction(
...     pipeline_data, "distances", selection_dict, results, [0, 1])