Feature Selection Manager

GitHub Link to Code.

Feature selector manager for creating custom feature data matrices.

This module provides the FeatureSelectorManager class that manages multiple named feature selector configurations and applies them to create custom feature data matrices from computed features.

class mdxplain.feature_selection.manager.feature_selector_manager.FeatureSelectorManager(use_memmap: bool = False, chunk_size: int = 2000, cache_dir: str = './cache')

Manager for creating and applying feature selector configurations.

This manager creates, stores, and applies named feature selector configurations to generate custom feature data matrices. Each selector can combine multiple feature types with different selection criteria and data preferences.

The manager follows the same pattern as other managers (ClusterManager, DecompositionManager) by storing named entity instances and providing methods to create, configure, and apply them.

__init__(use_memmap: bool = False, chunk_size: int = 2000, cache_dir: str = './cache') → None

Initialize feature selector manager.

Parameters

use_memmapbool, default=False: Whether to use memory mapping for large datasets
chunk_sizeint, default=2000: Chunk size for processing large datasets
cache_dirstr, default=”./cache”: Directory for caching temporary files

Returns

None: Initializes FeatureSelectorManager with configuration parameters

create(pipeline_data: PipelineData, name: str) → None

Create a new feature selector configuration.

Creates a new FeatureSelectorData instance with the given name and stores it in the pipeline data for later configuration and use.

Warning

When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.

Pipeline mode:

>>> pipeline = PipelineManager()
>>> pipeline.feature_selector.create("my_analysis")  # NO pipeline_data parameter

Standalone mode:

>>> pipeline_data = PipelineData()
>>> manager = FeatureSelectorManager()
>>> manager.create(pipeline_data, "my_analysis")  # pipeline_data required

Parameters

pipeline_dataPipelineData: Pipeline data object to store the selector configuration
namestr: Name identifier for the feature selector configuration

Returns

None: Creates and stores new FeatureSelectorData in pipeline_data

Raises

ValueError: If selector with given name already exists

Examples

>>> manager = FeatureSelectorManager()
>>> pipeline_data = PipelineData()
>>> manager.create(pipeline_data, "protein_analysis")
>>> print("protein_analysis" in pipeline_data.selected_feature_data)
True

property add: FeatureSelectorAddService

Property for IDE-supported feature selector add operations.

Returns a service object that provides feature-type-specific add methods with reduction options. This follows the consistent pattern where the manager is passed explicitly and AutoInjectProxy handles pipeline_data.

Technical Note on AutoInjectProxy:

The AutoInjectProxy mechanism in PipelineManager automatically intercepts method and property accesses on managers. When this property is accessed:

AutoInjectProxy detects the property access
Calls this property to get the service class and parameters
Automatically injects pipeline_data as the second parameter
Returns FeatureSelectorAddService(self, pipeline_data) to the user

This design provides:

Clean API without users needing to pass pipeline_data
Proper type hints for IDE autocomplete support
Consistent parameter passing across all manager services

Pattern used: manager explicitly passed, pipeline_data injected by AutoInjectProxy

Returns

FeatureSelectorAddService: Service with feature-type properties (distances, contacts, etc.)

Examples

>>> pipeline.feature_selector.add.distances("test", "res ALA")
>>> pipeline.feature_selector.add.distances.with_cv_reduction("test", "res ALA", threshold_min=0.1)

add_selection(pipeline_data: PipelineData, name: str, feature_type: str | object, selection: str = 'all', use_reduced: bool = False, common_denominator: bool = True, traj_selection: int | str | List[int | str] | 'all' = 'all', require_all_partners: bool = False, reduction: Dict[str, Any] | None = None) → None

Add a feature selection to a named selector configuration.

Adds selection criteria for a feature type to an existing selector configuration. Multiple selections can be added for the same feature type with different criteria.

Warning

When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.

Pipeline mode:

>>> pipeline = PipelineManager()
>>> pipeline.feature_selector.add_selection("analysis", "distances", "res ALA")  # NO pipeline_data
>>> pipeline.feature_selector.add_selection("analysis", "distances", "all",
...     reduction={"metric": "max", "threshold_min": 10.0})

Standalone mode:

>>> pipeline_data = PipelineData()
>>> manager = FeatureSelectorManager()
>>> manager.add_selection(pipeline_data, "analysis", "distances", "res ALA")  # pipeline_data required
>>> manager.add_selection(pipeline_data, "analysis", "distances", "all",
...     reduction={"metric": "mean", "threshold_min": 7.0})

Parameters

pipeline_dataPipelineData

Pipeline data object containing selector configurations

namestr

Name of the existing feature selector configuration

feature_typestr or object

Feature type to select from (e.g., “distances”, “contacts”)

selectionstr, default=”all”

Selection criteria string (e.g., “res ALA”, “resid 123-140”, “7x50-8x50”, “all”)

use_reducedbool, default=False

Whether to use reduced data (True) or original data (False)

common_denominatorbool, default=True

Whether to find common features across trajectories in traj_selection. If True, only features present in ALL selected trajectories are included. If False, union of all features from selected trajectories is used.

traj_selectionint, str, list, or “all”, default=”all”

Selection of trajectories to process:

int: trajectory index
str: trajectory name, tag (prefixed with “tag:”), or “all”
list: list of indices/names/tags
“all”: all trajectories (default)

require_all_partnersbool, default=False

For pairwise features, require all partners to be present in selection

reductiondict, optional

Post-selection reduction configuration with keys:

metric : str - Reduction metric (e.g., “max”, “min”, “mean”, “std”)
threshold_min : float, optional - Minimum threshold
threshold_max : float, optional - Maximum threshold
cross_trajectory : bool, optional - Deprecated legacy flag; use explicit mode flags
cross_trajectory_intersection : bool, optional - Require thresholds in ALL trajectories
cross_trajectory_union : bool, optional - Keep features that pass in ANY trajectory
cross_trajectory_pooled : bool, optional - Pool trajectories then reduce once (Only one of the three flags can be True; default is per_trajectory)
Additional metric-specific parameters (e.g., transition_threshold)

Returns

None: Adds selection configuration to the named selector

Raises

ValueError: If selector with given name does not exist

Examples

>>> manager = FeatureSelectorManager()
>>> pipeline_data = PipelineData()
>>> manager.create(pipeline_data, "analysis")
>>> manager.add(pipeline_data, "analysis", "distances", "res ALA")
>>> manager.add(pipeline_data, "analysis", "contacts", "resid 120-140", use_reduced=True)

select(pipeline_data: PipelineData, name: str, reference_traj: int | str = None) → None

Apply a named selector configuration to create selected feature matrix.

Applies all selection criteria from a named selector configuration to the computed features in pipeline_data and stores the results for later access through data access methods.

Warning

When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.

Pipeline mode:

>>> pipeline = PipelineManager()
>>> pipeline.feature_selector.select("my_analysis")  # NO pipeline_data parameter
>>> # Or with specific reference trajectory
>>> pipeline.feature_selector.select("my_analysis", reference_traj=2)

Standalone mode:

>>> pipeline_data = PipelineData()
>>> manager = FeatureSelectorManager()
>>> manager.select(pipeline_data, "my_analysis")  # pipeline_data required
>>> # Or with specific reference trajectory
>>> manager.select(pipeline_data, "my_analysis", reference_traj="system_A")

Parameters

pipeline_dataPipelineData: Pipeline data object containing features and selector configurations
namestr: Name of the feature selector configuration to apply
reference_trajUnion[int, str], default=None: Trajectory index or name to use as reference for metadata extraction. If None, uses the first trajectory from the selection.

Returns

None: Applies selections and stores results with reference trajectory info

Raises

ValueError: If selector with given name does not exist
ValueError: If required features are not computed
ValueError: If required data types (original/reduced) are not available

Examples

>>> manager = FeatureSelectorManager()
>>> pipeline_data = PipelineData()
>>> # ... configure selector and compute features ...
>>> manager.select(pipeline_data, "protein_analysis")
>>> # Results available via pipeline_data.selected_data["protein_analysis"]

list_selectors(pipeline_data: PipelineData) → List[str]

Get list of all configured feature selector names.

Warning

When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.

Pipeline mode:

>>> pipeline = PipelineManager()
>>> pipeline.feature_selector.list_selectors()  # NO pipeline_data parameter

Standalone mode:

>>> pipeline_data = PipelineData()
>>> manager = FeatureSelectorManager()
>>> manager.list_selectors(pipeline_data)  # pipeline_data required

Parameters

pipeline_dataPipelineData: Pipeline data object containing selector configurations

Returns

List[str]: List of configured feature selector names

Examples

>>> manager = FeatureSelectorManager()
>>> pipeline_data = PipelineData()
>>> manager.create(pipeline_data, "analysis1")
>>> manager.create(pipeline_data, "analysis2")
>>> selectors = manager.list_selectors(pipeline_data)
>>> print(sorted(selectors))
['analysis1', 'analysis2']

get_selector_summary(pipeline_data: PipelineData, name: str) → dict

Get summary information about a named selector configuration.

Warning

When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.

Pipeline mode:

>>> pipeline = PipelineManager()
>>> pipeline.feature_selector.get_selector_summary("analysis")  # NO pipeline_data parameter

Standalone mode:

>>> pipeline_data = PipelineData()
>>> manager = FeatureSelectorManager()
>>> manager.get_selector_summary(pipeline_data, "analysis")  # pipeline_data required

Parameters

pipeline_dataPipelineData: Pipeline data object containing selector configurations
namestr: Name of the feature selector configuration

Returns

dict: Summary information about the selector configuration

Raises

ValueError: If selector with given name does not exist

Examples

>>> manager = FeatureSelectorManager()
>>> summary = manager.get_selector_summary(pipeline_data, "analysis")
>>> print(summary['feature_count'])
2

remove_selector(pipeline_data: PipelineData, name: str) → None

Remove a named selector configuration.

Warning

When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.

Pipeline mode:

>>> pipeline = PipelineManager()
>>> pipeline.feature_selector.remove_selector("old_analysis")  # NO pipeline_data parameter

Standalone mode:

>>> pipeline_data = PipelineData()
>>> manager = FeatureSelectorManager()
>>> manager.remove_selector(pipeline_data, "old_analysis")  # pipeline_data required

Parameters

pipeline_dataPipelineData: Pipeline data object containing selector configurations
namestr: Name of the feature selector configuration to remove

Returns

None: Removes selector configuration and any associated selected data

Raises

ValueError: If selector with given name does not exist

Examples

>>> manager = FeatureSelectorManager()
>>> manager.remove_selector(pipeline_data, "old_analysis")

save(pipeline_data: PipelineData, save_path: str) → None

Save all feature selector data to single file.

Warning

When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.

Pipeline mode:

>>> pipeline = PipelineManager()
>>> pipeline.feature_selector.save('feature_selector.npy')  # NO pipeline_data parameter

Standalone mode:

>>> pipeline_data = PipelineData()
>>> manager = FeatureSelectorManager()
>>> manager.save(pipeline_data, 'feature_selector.npy')  # pipeline_data required

Parameters

pipeline_dataPipelineData: Pipeline data container with feature selector data
save_pathstr: Path where to save all feature selector data in one file

Returns

None: Saves all feature selector data to the specified file

Examples

>>> manager.save(pipeline_data, 'feature_selector.npy')

load(pipeline_data: PipelineData, load_path: str) → None

Load all feature selector data from single file.

Warning

When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.

Pipeline mode:

>>> pipeline = PipelineManager()
>>> pipeline.feature_selector.load('feature_selector.npy')  # NO pipeline_data parameter

Standalone mode:

>>> pipeline_data = PipelineData()
>>> manager = FeatureSelectorManager()
>>> manager.load(pipeline_data, 'feature_selector.npy')  # pipeline_data required

Parameters

pipeline_dataPipelineData: Pipeline data container to load feature selector data into
load_pathstr: Path to saved feature selector data file

Returns

None: Loads all feature selector data from the specified file

Examples

>>> manager.load(pipeline_data, 'feature_selector.npy')

print_info(pipeline_data: PipelineData) → None

Print featureselectordata information.

Warning

When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.

Pipeline mode:

>>> pipeline = PipelineManager()
>>> pipeline.feature_selector.print_info()  # NO pipeline_data parameter

Standalone mode:

>>> pipeline_data = PipelineData()
>>> manager = FeatureSelectorManager()
>>> manager.print_info(pipeline_data)  # pipeline_data required

Parameters

pipeline_dataPipelineData: Pipeline data container with featureselectordata

Returns

Prints featureselectordata information to console

Examples

>>> manager.print_info(pipeline_data)