Feature Selection Manager
GitHub Link to Code.
Feature selector manager for creating custom feature data matrices.
This module provides the FeatureSelectorManager class that manages multiple named feature selector configurations and applies them to create custom feature data matrices from computed features.
- class mdxplain.feature_selection.manager.feature_selector_manager.FeatureSelectorManager(use_memmap: bool = False, chunk_size: int = 2000, cache_dir: str = './cache')
Manager for creating and applying feature selector configurations.
This manager creates, stores, and applies named feature selector configurations to generate custom feature data matrices. Each selector can combine multiple feature types with different selection criteria and data preferences.
The manager follows the same pattern as other managers (ClusterManager, DecompositionManager) by storing named entity instances and providing methods to create, configure, and apply them.
- __init__(use_memmap: bool = False, chunk_size: int = 2000, cache_dir: str = './cache') None
Initialize feature selector manager.
Parameters
- use_memmapbool, default=False
Whether to use memory mapping for large datasets
- chunk_sizeint, default=2000
Chunk size for processing large datasets
- cache_dirstr, default=”./cache”
Directory for caching temporary files
Returns
- None
Initializes FeatureSelectorManager with configuration parameters
- create(pipeline_data: PipelineData, name: str) None
Create a new feature selector configuration.
Creates a new FeatureSelectorData instance with the given name and stores it in the pipeline data for later configuration and use.
Warning
When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.
Pipeline mode:
>>> pipeline = PipelineManager() >>> pipeline.feature_selector.create("my_analysis") # NO pipeline_data parameter
Standalone mode:
>>> pipeline_data = PipelineData() >>> manager = FeatureSelectorManager() >>> manager.create(pipeline_data, "my_analysis") # pipeline_data required
Parameters
- pipeline_dataPipelineData
Pipeline data object to store the selector configuration
- namestr
Name identifier for the feature selector configuration
Returns
- None
Creates and stores new FeatureSelectorData in pipeline_data
Raises
- ValueError
If selector with given name already exists
Examples
>>> manager = FeatureSelectorManager() >>> pipeline_data = PipelineData() >>> manager.create(pipeline_data, "protein_analysis") >>> print("protein_analysis" in pipeline_data.selected_feature_data) True
- property add: FeatureSelectorAddService
Property for IDE-supported feature selector add operations.
Returns a service object that provides feature-type-specific add methods with reduction options. This follows the consistent pattern where the manager is passed explicitly and AutoInjectProxy handles pipeline_data.
Technical Note on AutoInjectProxy:
The AutoInjectProxy mechanism in PipelineManager automatically intercepts method and property accesses on managers. When this property is accessed:
AutoInjectProxy detects the property access
Calls this property to get the service class and parameters
Automatically injects pipeline_data as the second parameter
Returns FeatureSelectorAddService(self, pipeline_data) to the user
This design provides:
Clean API without users needing to pass pipeline_data
Proper type hints for IDE autocomplete support
Consistent parameter passing across all manager services
Pattern used: manager explicitly passed, pipeline_data injected by AutoInjectProxy
Returns
- FeatureSelectorAddService
Service with feature-type properties (distances, contacts, etc.)
Examples
>>> pipeline.feature_selector.add.distances("test", "res ALA") >>> pipeline.feature_selector.add.distances.with_cv_reduction("test", "res ALA", threshold_min=0.1)
- add_selection(pipeline_data: PipelineData, name: str, feature_type: str | object, selection: str = 'all', use_reduced: bool = False, common_denominator: bool = True, traj_selection: int | str | List[int | str] | 'all' = 'all', require_all_partners: bool = False, reduction: Dict[str, Any] | None = None) None
Add a feature selection to a named selector configuration.
Adds selection criteria for a feature type to an existing selector configuration. Multiple selections can be added for the same feature type with different criteria.
Warning
When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.
Pipeline mode:
>>> pipeline = PipelineManager() >>> pipeline.feature_selector.add_selection("analysis", "distances", "res ALA") # NO pipeline_data >>> pipeline.feature_selector.add_selection("analysis", "distances", "all", ... reduction={"metric": "max", "threshold_min": 10.0})
Standalone mode:
>>> pipeline_data = PipelineData() >>> manager = FeatureSelectorManager() >>> manager.add_selection(pipeline_data, "analysis", "distances", "res ALA") # pipeline_data required >>> manager.add_selection(pipeline_data, "analysis", "distances", "all", ... reduction={"metric": "mean", "threshold_min": 7.0})
Parameters
- pipeline_dataPipelineData
Pipeline data object containing selector configurations
- namestr
Name of the existing feature selector configuration
- feature_typestr or object
Feature type to select from (e.g., “distances”, “contacts”)
- selectionstr, default=”all”
Selection criteria string (e.g., “res ALA”, “resid 123-140”, “7x50-8x50”, “all”)
- use_reducedbool, default=False
Whether to use reduced data (True) or original data (False)
- common_denominatorbool, default=True
Whether to find common features across trajectories in traj_selection. If True, only features present in ALL selected trajectories are included. If False, union of all features from selected trajectories is used.
- traj_selectionint, str, list, or “all”, default=”all”
Selection of trajectories to process:
int: trajectory index
str: trajectory name, tag (prefixed with “tag:”), or “all”
list: list of indices/names/tags
“all”: all trajectories (default)
- require_all_partnersbool, default=False
For pairwise features, require all partners to be present in selection
- reductiondict, optional
Post-selection reduction configuration with keys:
metric : str - Reduction metric (e.g., “max”, “min”, “mean”, “std”)
threshold_min : float, optional - Minimum threshold
threshold_max : float, optional - Maximum threshold
cross_trajectory : bool, optional - Deprecated legacy flag; use explicit mode flags
cross_trajectory_intersection : bool, optional - Require thresholds in ALL trajectories
cross_trajectory_union : bool, optional - Keep features that pass in ANY trajectory
cross_trajectory_pooled : bool, optional - Pool trajectories then reduce once (Only one of the three flags can be True; default is per_trajectory)
Additional metric-specific parameters (e.g., transition_threshold)
Returns
- None
Adds selection configuration to the named selector
Raises
- ValueError
If selector with given name does not exist
Examples
>>> manager = FeatureSelectorManager() >>> pipeline_data = PipelineData() >>> manager.create(pipeline_data, "analysis") >>> manager.add(pipeline_data, "analysis", "distances", "res ALA") >>> manager.add(pipeline_data, "analysis", "contacts", "resid 120-140", use_reduced=True)
- select(pipeline_data: PipelineData, name: str, reference_traj: int | str = None) None
Apply a named selector configuration to create selected feature matrix.
Applies all selection criteria from a named selector configuration to the computed features in pipeline_data and stores the results for later access through data access methods.
Warning
When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.
Pipeline mode:
>>> pipeline = PipelineManager() >>> pipeline.feature_selector.select("my_analysis") # NO pipeline_data parameter >>> # Or with specific reference trajectory >>> pipeline.feature_selector.select("my_analysis", reference_traj=2)
Standalone mode:
>>> pipeline_data = PipelineData() >>> manager = FeatureSelectorManager() >>> manager.select(pipeline_data, "my_analysis") # pipeline_data required >>> # Or with specific reference trajectory >>> manager.select(pipeline_data, "my_analysis", reference_traj="system_A")
Parameters
- pipeline_dataPipelineData
Pipeline data object containing features and selector configurations
- namestr
Name of the feature selector configuration to apply
- reference_trajUnion[int, str], default=None
Trajectory index or name to use as reference for metadata extraction. If None, uses the first trajectory from the selection.
Returns
- None
Applies selections and stores results with reference trajectory info
Raises
- ValueError
If selector with given name does not exist
- ValueError
If required features are not computed
- ValueError
If required data types (original/reduced) are not available
Examples
>>> manager = FeatureSelectorManager() >>> pipeline_data = PipelineData() >>> # ... configure selector and compute features ... >>> manager.select(pipeline_data, "protein_analysis") >>> # Results available via pipeline_data.selected_data["protein_analysis"]
- list_selectors(pipeline_data: PipelineData) List[str]
Get list of all configured feature selector names.
Warning
When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.
Pipeline mode:
>>> pipeline = PipelineManager() >>> pipeline.feature_selector.list_selectors() # NO pipeline_data parameter
Standalone mode:
>>> pipeline_data = PipelineData() >>> manager = FeatureSelectorManager() >>> manager.list_selectors(pipeline_data) # pipeline_data required
Parameters
- pipeline_dataPipelineData
Pipeline data object containing selector configurations
Returns
- List[str]
List of configured feature selector names
Examples
>>> manager = FeatureSelectorManager() >>> pipeline_data = PipelineData() >>> manager.create(pipeline_data, "analysis1") >>> manager.create(pipeline_data, "analysis2") >>> selectors = manager.list_selectors(pipeline_data) >>> print(sorted(selectors)) ['analysis1', 'analysis2']
- get_selector_summary(pipeline_data: PipelineData, name: str) dict
Get summary information about a named selector configuration.
Warning
When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.
Pipeline mode:
>>> pipeline = PipelineManager() >>> pipeline.feature_selector.get_selector_summary("analysis") # NO pipeline_data parameter
Standalone mode:
>>> pipeline_data = PipelineData() >>> manager = FeatureSelectorManager() >>> manager.get_selector_summary(pipeline_data, "analysis") # pipeline_data required
Parameters
- pipeline_dataPipelineData
Pipeline data object containing selector configurations
- namestr
Name of the feature selector configuration
Returns
- dict
Summary information about the selector configuration
Raises
- ValueError
If selector with given name does not exist
Examples
>>> manager = FeatureSelectorManager() >>> summary = manager.get_selector_summary(pipeline_data, "analysis") >>> print(summary['feature_count']) 2
- remove_selector(pipeline_data: PipelineData, name: str) None
Remove a named selector configuration.
Warning
When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.
Pipeline mode:
>>> pipeline = PipelineManager() >>> pipeline.feature_selector.remove_selector("old_analysis") # NO pipeline_data parameter
Standalone mode:
>>> pipeline_data = PipelineData() >>> manager = FeatureSelectorManager() >>> manager.remove_selector(pipeline_data, "old_analysis") # pipeline_data required
Parameters
- pipeline_dataPipelineData
Pipeline data object containing selector configurations
- namestr
Name of the feature selector configuration to remove
Returns
- None
Removes selector configuration and any associated selected data
Raises
- ValueError
If selector with given name does not exist
Examples
>>> manager = FeatureSelectorManager() >>> manager.remove_selector(pipeline_data, "old_analysis")
- save(pipeline_data: PipelineData, save_path: str) None
Save all feature selector data to single file.
Warning
When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.
Pipeline mode:
>>> pipeline = PipelineManager() >>> pipeline.feature_selector.save('feature_selector.npy') # NO pipeline_data parameter
Standalone mode:
>>> pipeline_data = PipelineData() >>> manager = FeatureSelectorManager() >>> manager.save(pipeline_data, 'feature_selector.npy') # pipeline_data required
Parameters
- pipeline_dataPipelineData
Pipeline data container with feature selector data
- save_pathstr
Path where to save all feature selector data in one file
Returns
- None
Saves all feature selector data to the specified file
Examples
>>> manager.save(pipeline_data, 'feature_selector.npy')
- load(pipeline_data: PipelineData, load_path: str) None
Load all feature selector data from single file.
Warning
When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.
Pipeline mode:
>>> pipeline = PipelineManager() >>> pipeline.feature_selector.load('feature_selector.npy') # NO pipeline_data parameter
Standalone mode:
>>> pipeline_data = PipelineData() >>> manager = FeatureSelectorManager() >>> manager.load(pipeline_data, 'feature_selector.npy') # pipeline_data required
Parameters
- pipeline_dataPipelineData
Pipeline data container to load feature selector data into
- load_pathstr
Path to saved feature selector data file
Returns
- None
Loads all feature selector data from the specified file
Examples
>>> manager.load(pipeline_data, 'feature_selector.npy')
- print_info(pipeline_data: PipelineData) None
Print featureselectordata information.
Warning
When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.
Pipeline mode:
>>> pipeline = PipelineManager() >>> pipeline.feature_selector.print_info() # NO pipeline_data parameter
Standalone mode:
>>> pipeline_data = PipelineData() >>> manager = FeatureSelectorManager() >>> manager.print_info(pipeline_data) # pipeline_data required
Parameters
- pipeline_dataPipelineData
Pipeline data container with featureselectordata
Returns
Prints featureselectordata information to console
Examples
>>> manager.print_info(pipeline_data)