Feature Selection Manager

GitHub Link to Code.

Feature selector manager for creating custom feature data matrices.

This module provides the FeatureSelectorManager class that manages multiple named feature selector configurations and applies them to create custom feature data matrices from computed features.

class mdxplain.feature_selection.manager.feature_selector_manager.FeatureSelectorManager(use_memmap: bool = False, chunk_size: int = 2000, cache_dir: str = './cache')

Manager for creating and applying feature selector configurations.

This manager creates, stores, and applies named feature selector configurations to generate custom feature data matrices. Each selector can combine multiple feature types with different selection criteria and data preferences.

The manager follows the same pattern as other managers (ClusterManager, DecompositionManager) by storing named entity instances and providing methods to create, configure, and apply them.

__init__(use_memmap: bool = False, chunk_size: int = 2000, cache_dir: str = './cache') None

Initialize feature selector manager.

Parameters

use_memmapbool, default=False

Whether to use memory mapping for large datasets

chunk_sizeint, default=2000

Chunk size for processing large datasets

cache_dirstr, default=”./cache”

Directory for caching temporary files

Returns

None

Initializes FeatureSelectorManager with configuration parameters

create(pipeline_data: PipelineData, name: str) None

Create a new feature selector configuration.

Creates a new FeatureSelectorData instance with the given name and stores it in the pipeline data for later configuration and use.

Warning

When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.

Pipeline mode:

>>> pipeline = PipelineManager()
>>> pipeline.feature_selector.create("my_analysis")  # NO pipeline_data parameter

Standalone mode:

>>> pipeline_data = PipelineData()
>>> manager = FeatureSelectorManager()
>>> manager.create(pipeline_data, "my_analysis")  # pipeline_data required

Parameters

pipeline_dataPipelineData

Pipeline data object to store the selector configuration

namestr

Name identifier for the feature selector configuration

Returns

None

Creates and stores new FeatureSelectorData in pipeline_data

Raises

ValueError

If selector with given name already exists

Examples

>>> manager = FeatureSelectorManager()
>>> pipeline_data = PipelineData()
>>> manager.create(pipeline_data, "protein_analysis")
>>> print("protein_analysis" in pipeline_data.selected_feature_data)
True
property add: FeatureSelectorAddService

Property for IDE-supported feature selector add operations.

Returns a service object that provides feature-type-specific add methods with reduction options. This follows the consistent pattern where the manager is passed explicitly and AutoInjectProxy handles pipeline_data.

Technical Note on AutoInjectProxy:

The AutoInjectProxy mechanism in PipelineManager automatically intercepts method and property accesses on managers. When this property is accessed:

  1. AutoInjectProxy detects the property access

  2. Calls this property to get the service class and parameters

  3. Automatically injects pipeline_data as the second parameter

  4. Returns FeatureSelectorAddService(self, pipeline_data) to the user

This design provides:

  • Clean API without users needing to pass pipeline_data

  • Proper type hints for IDE autocomplete support

  • Consistent parameter passing across all manager services

Pattern used: manager explicitly passed, pipeline_data injected by AutoInjectProxy

Returns

FeatureSelectorAddService

Service with feature-type properties (distances, contacts, etc.)

Examples

>>> pipeline.feature_selector.add.distances("test", "res ALA")
>>> pipeline.feature_selector.add.distances.with_cv_reduction("test", "res ALA", threshold_min=0.1)
add_selection(pipeline_data: PipelineData, name: str, feature_type: str | object, selection: str = 'all', use_reduced: bool = False, common_denominator: bool = True, traj_selection: int | str | List[int | str] | 'all' = 'all', require_all_partners: bool = False, reduction: Dict[str, Any] | None = None) None

Add a feature selection to a named selector configuration.

Adds selection criteria for a feature type to an existing selector configuration. Multiple selections can be added for the same feature type with different criteria.

Warning

When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.

Pipeline mode:

>>> pipeline = PipelineManager()
>>> pipeline.feature_selector.add_selection("analysis", "distances", "res ALA")  # NO pipeline_data
>>> pipeline.feature_selector.add_selection("analysis", "distances", "all",
...     reduction={"metric": "max", "threshold_min": 10.0})

Standalone mode:

>>> pipeline_data = PipelineData()
>>> manager = FeatureSelectorManager()
>>> manager.add_selection(pipeline_data, "analysis", "distances", "res ALA")  # pipeline_data required
>>> manager.add_selection(pipeline_data, "analysis", "distances", "all",
...     reduction={"metric": "mean", "threshold_min": 7.0})

Parameters

pipeline_dataPipelineData

Pipeline data object containing selector configurations

namestr

Name of the existing feature selector configuration

feature_typestr or object

Feature type to select from (e.g., “distances”, “contacts”)

selectionstr, default=”all”

Selection criteria string (e.g., “res ALA”, “resid 123-140”, “7x50-8x50”, “all”)

use_reducedbool, default=False

Whether to use reduced data (True) or original data (False)

common_denominatorbool, default=True

Whether to find common features across trajectories in traj_selection. If True, only features present in ALL selected trajectories are included. If False, union of all features from selected trajectories is used.

traj_selectionint, str, list, or “all”, default=”all”

Selection of trajectories to process:

  • int: trajectory index

  • str: trajectory name, tag (prefixed with “tag:”), or “all”

  • list: list of indices/names/tags

  • “all”: all trajectories (default)

require_all_partnersbool, default=False

For pairwise features, require all partners to be present in selection

reductiondict, optional

Post-selection reduction configuration with keys:

  • metric : str - Reduction metric (e.g., “max”, “min”, “mean”, “std”)

  • threshold_min : float, optional - Minimum threshold

  • threshold_max : float, optional - Maximum threshold

  • cross_trajectory : bool, optional - Deprecated legacy flag; use explicit mode flags

  • cross_trajectory_intersection : bool, optional - Require thresholds in ALL trajectories

  • cross_trajectory_union : bool, optional - Keep features that pass in ANY trajectory

  • cross_trajectory_pooled : bool, optional - Pool trajectories then reduce once (Only one of the three flags can be True; default is per_trajectory)

  • Additional metric-specific parameters (e.g., transition_threshold)

Returns

None

Adds selection configuration to the named selector

Raises

ValueError

If selector with given name does not exist

Examples

>>> manager = FeatureSelectorManager()
>>> pipeline_data = PipelineData()
>>> manager.create(pipeline_data, "analysis")
>>> manager.add(pipeline_data, "analysis", "distances", "res ALA")
>>> manager.add(pipeline_data, "analysis", "contacts", "resid 120-140", use_reduced=True)
select(pipeline_data: PipelineData, name: str, reference_traj: int | str = None) None

Apply a named selector configuration to create selected feature matrix.

Applies all selection criteria from a named selector configuration to the computed features in pipeline_data and stores the results for later access through data access methods.

Warning

When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.

Pipeline mode:

>>> pipeline = PipelineManager()
>>> pipeline.feature_selector.select("my_analysis")  # NO pipeline_data parameter
>>> # Or with specific reference trajectory
>>> pipeline.feature_selector.select("my_analysis", reference_traj=2)

Standalone mode:

>>> pipeline_data = PipelineData()
>>> manager = FeatureSelectorManager()
>>> manager.select(pipeline_data, "my_analysis")  # pipeline_data required
>>> # Or with specific reference trajectory
>>> manager.select(pipeline_data, "my_analysis", reference_traj="system_A")

Parameters

pipeline_dataPipelineData

Pipeline data object containing features and selector configurations

namestr

Name of the feature selector configuration to apply

reference_trajUnion[int, str], default=None

Trajectory index or name to use as reference for metadata extraction. If None, uses the first trajectory from the selection.

Returns

None

Applies selections and stores results with reference trajectory info

Raises

ValueError

If selector with given name does not exist

ValueError

If required features are not computed

ValueError

If required data types (original/reduced) are not available

Examples

>>> manager = FeatureSelectorManager()
>>> pipeline_data = PipelineData()
>>> # ... configure selector and compute features ...
>>> manager.select(pipeline_data, "protein_analysis")
>>> # Results available via pipeline_data.selected_data["protein_analysis"]
list_selectors(pipeline_data: PipelineData) List[str]

Get list of all configured feature selector names.

Warning

When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.

Pipeline mode:

>>> pipeline = PipelineManager()
>>> pipeline.feature_selector.list_selectors()  # NO pipeline_data parameter

Standalone mode:

>>> pipeline_data = PipelineData()
>>> manager = FeatureSelectorManager()
>>> manager.list_selectors(pipeline_data)  # pipeline_data required

Parameters

pipeline_dataPipelineData

Pipeline data object containing selector configurations

Returns

List[str]

List of configured feature selector names

Examples

>>> manager = FeatureSelectorManager()
>>> pipeline_data = PipelineData()
>>> manager.create(pipeline_data, "analysis1")
>>> manager.create(pipeline_data, "analysis2")
>>> selectors = manager.list_selectors(pipeline_data)
>>> print(sorted(selectors))
['analysis1', 'analysis2']
get_selector_summary(pipeline_data: PipelineData, name: str) dict

Get summary information about a named selector configuration.

Warning

When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.

Pipeline mode:

>>> pipeline = PipelineManager()
>>> pipeline.feature_selector.get_selector_summary("analysis")  # NO pipeline_data parameter

Standalone mode:

>>> pipeline_data = PipelineData()
>>> manager = FeatureSelectorManager()
>>> manager.get_selector_summary(pipeline_data, "analysis")  # pipeline_data required

Parameters

pipeline_dataPipelineData

Pipeline data object containing selector configurations

namestr

Name of the feature selector configuration

Returns

dict

Summary information about the selector configuration

Raises

ValueError

If selector with given name does not exist

Examples

>>> manager = FeatureSelectorManager()
>>> summary = manager.get_selector_summary(pipeline_data, "analysis")
>>> print(summary['feature_count'])
2
remove_selector(pipeline_data: PipelineData, name: str) None

Remove a named selector configuration.

Warning

When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.

Pipeline mode:

>>> pipeline = PipelineManager()
>>> pipeline.feature_selector.remove_selector("old_analysis")  # NO pipeline_data parameter

Standalone mode:

>>> pipeline_data = PipelineData()
>>> manager = FeatureSelectorManager()
>>> manager.remove_selector(pipeline_data, "old_analysis")  # pipeline_data required

Parameters

pipeline_dataPipelineData

Pipeline data object containing selector configurations

namestr

Name of the feature selector configuration to remove

Returns

None

Removes selector configuration and any associated selected data

Raises

ValueError

If selector with given name does not exist

Examples

>>> manager = FeatureSelectorManager()
>>> manager.remove_selector(pipeline_data, "old_analysis")
save(pipeline_data: PipelineData, save_path: str) None

Save all feature selector data to single file.

Warning

When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.

Pipeline mode:

>>> pipeline = PipelineManager()
>>> pipeline.feature_selector.save('feature_selector.npy')  # NO pipeline_data parameter

Standalone mode:

>>> pipeline_data = PipelineData()
>>> manager = FeatureSelectorManager()
>>> manager.save(pipeline_data, 'feature_selector.npy')  # pipeline_data required

Parameters

pipeline_dataPipelineData

Pipeline data container with feature selector data

save_pathstr

Path where to save all feature selector data in one file

Returns

None

Saves all feature selector data to the specified file

Examples

>>> manager.save(pipeline_data, 'feature_selector.npy')
load(pipeline_data: PipelineData, load_path: str) None

Load all feature selector data from single file.

Warning

When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.

Pipeline mode:

>>> pipeline = PipelineManager()
>>> pipeline.feature_selector.load('feature_selector.npy')  # NO pipeline_data parameter

Standalone mode:

>>> pipeline_data = PipelineData()
>>> manager = FeatureSelectorManager()
>>> manager.load(pipeline_data, 'feature_selector.npy')  # pipeline_data required

Parameters

pipeline_dataPipelineData

Pipeline data container to load feature selector data into

load_pathstr

Path to saved feature selector data file

Returns

None

Loads all feature selector data from the specified file

Examples

>>> manager.load(pipeline_data, 'feature_selector.npy')
print_info(pipeline_data: PipelineData) None

Print featureselectordata information.

Warning

When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.

Pipeline mode:

>>> pipeline = PipelineManager()
>>> pipeline.feature_selector.print_info()  # NO pipeline_data parameter

Standalone mode:

>>> pipeline_data = PipelineData()
>>> manager = FeatureSelectorManager()
>>> manager.print_info(pipeline_data)  # pipeline_data required

Parameters

pipeline_dataPipelineData

Pipeline data container with featureselectordata

Returns

Prints featureselectordata information to console

Examples

>>> manager.print_info(pipeline_data)