Feature Manager
GitHub Link to Code.
FeatureManager is a manager for feature data objects.
It is used to add, reset, and reduce features to the pipeline data.
- class mdxplain.feature.manager.feature_manager.FeatureManager(use_memmap: bool = False, chunk_size: int = 2000, cache_dir: str = './cache')
Manager for feature data objects.
This class provides methods to add, reset, and reduce features in the pipeline data. It handles feature dependencies, computation, and storage.
Examples
>>> pipeline_data = PipelineData() >>> feature_manager = FeatureManager() >>> feature_manager.add_feature(pipeline_data, feature_type.Distances()) >>> feature_manager.reduce_data(pipeline_data, feature_type.Distances(), metric="cv", threshold_min=0.1) >>> feature_manager.reset_features(pipeline_data, "distances") >>> feature_manager.print_info(pipeline_data) >>> feature_manager.save(pipeline_data, 'features.npy') >>> feature_manager.load(pipeline_data, 'features.npy')
- __init__(use_memmap: bool = False, chunk_size: int = 2000, cache_dir: str = './cache') None
Initialize feature manager.
Parameters
- use_memmapbool, default=False
Whether to use memory mapping for feature data
- chunk_sizeint, optional
Processing chunk size
- cache_dirstr, default=”./cache”
Cache path for feature data
Returns
- None
Initializes FeatureManager instance with specified configuration
- reset_features(pipeline_data: PipelineData, feature_type: str | Any | None = None, strict: bool = True) None
Reset calculated features and clear feature data.
This method removes computed features and their associated data, requiring features to be recalculated from scratch. Use this method after trajectory modifications that invalidate existing features. Can reset all features or specific feature types.
Warning
When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.
Pipeline mode:
>>> pipeline = PipelineManager() >>> pipeline.feature.reset_features() # NO pipeline_data parameter
Standalone mode:
>>> pipeline_data = PipelineData() >>> manager = FeatureManager() >>> manager.reset_features(pipeline_data) # pipeline_data required
Parameters
- pipeline_dataPipelineData
Pipeline data object
- feature_typestr, FeatureTypeBase, list, or None, default=None
Feature type(s) to reset. If None, resets all features. Supports following formats:
“distances” (string)
feature_type.Distances() (instance)
feature_type.Distances (class)
[“distances”, “contacts”] (list of any of above)
- strictbool, default=False
Whether to raise ValueError if feature_type doesn’t exist. If False, non-existent features are silently ignored with warning.
Returns
- None
Clears specified feature data from pipeline_data.feature_data in-place
Examples
>>> pipeline_data = PipelineData() >>> feature_manager = FeatureManager() >>> feature_manager.add_feature(pipeline_data, feature_type.Distances()) >>> feature_manager.add_feature(pipeline_data, feature_type.Contacts())
>>> # Reset all features >>> feature_manager.reset_features(pipeline_data)
>>> # Reset specific feature type >>> feature_manager.reset_features(pipeline_data, "distances")
>>> # Reset multiple feature types >>> feature_manager.reset_features(pipeline_data, ["distances", "contacts"])
>>> # Strict mode - raise error for non-existent features >>> feature_manager.reset_features(pipeline_data, "nonexistent", strict=True) # Raises ValueError
Notes
Selected feature data is permanently deleted
Memory-mapped feature files remain on disk but are no longer referenced
Features must be recalculated after reset
- add_feature(pipeline_data: PipelineData, feature_type: FeatureTypeBase, traj_selection: str | int | List = 'all', force: bool = False, force_original: bool = True) None
Add and compute a feature for the loaded trajectories.
This method creates a FeatureData instance for the specified feature type, handles dependency checking, and computes the feature data. Features with dependencies (like Contacts depending on Distances) will automatically use the required input data.
Warning
When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.
Pipeline mode:
>>> pipeline = PipelineManager() >>> pipeline.feature.add_feature(feature_type.Distances()) # NO pipeline_data parameter
Standalone mode:
>>> pipeline_data = PipelineData() >>> manager = FeatureManager() >>> manager.add_feature(pipeline_data, feature_type.Distances()) # pipeline_data required
Parameters
- pipeline_dataPipelineData
Pipeline data object
- feature_typeFeatureTypeBase
Feature type instance(e.g., Distances(), Contacts()). The feature type determines what kind of analysis will be performed.
- traj_selectionint, str, list, or “all”, default=”all”
Selection of trajectories to compute features for:
int: trajectory index
str: trajectory name, tag (prefixed with “tag:”), or “all”
list: list of indices/names/tags
“all”: all trajectories (default)
- forcebool, default=False
Whether to force recomputation of the feature even if it already exists.
- force_originalbool, default=True
Whether to force using the original data as base for the calculation for features using other features as input instead of the reduced data
Returns
- None
Adds computed feature to pipeline data and creates analysis methods
Raises
- ValueError
If the feature already exists with computed data, if required dependencies are missing, or if trajectories are not loaded.
Examples
>>> pipeline_data = PipelineData() >>> feature_manager = FeatureManager() >>> feature_manager.add_feature(pipeline_data, feature_type.Distances()) >>> feature_manager.add_feature(pipeline_data, feature_type.Contacts())
- reset_reduction(pipeline_data: PipelineData, feature_type: FeatureTypeBase) None
Reset to using full original data instead of reduced dataset.
Supports all three input variants:
feature_type.Distances() (instance)
feature_type.Distances (class with metaclass)
“distances” (string)
Warning
When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.
Pipeline mode:
>>> pipeline = PipelineManager() >>> pipeline.feature.reset_reduction("distances") # NO pipeline_data parameter
Standalone mode:
>>> pipeline_data = PipelineData() >>> manager = FeatureManager() >>> manager.reset_reduction(pipeline_data, "distances") # pipeline_data required
Parameters
- pipeline_dataPipelineData
Pipeline data object
- feature_typeFeatureTypeBase, FeatureTypeBase class, or str
Feature type instance, class, or string E.g. Distances(), Distances, “distances”
Returns
- None
Clears reduced_data and prints confirmation with shape info
Examples
>>> pipeline_data = PipelineData() >>> feature_manager = FeatureManager() >>> feature_manager.add_feature(pipeline_data, feature_type.Distances()) >>> feature_manager.reduce_data(pipeline_data, feature_type.Distances, metric="cv", threshold_min=0.1) >>> feature_manager.reset_reduction(pipeline_data, "distances") # string
- reduce_data(pipeline_data: PipelineData, feature_type: FeatureTypeBase, metric: str | Callable[[np.ndarray], np.ndarray], traj_selection: str | int | List = 'all', threshold_min: float | None = None, threshold_max: float | None = None, transition_threshold: float = 2.0, window_size: int = 10, transition_mode: str = 'window', lag_time: int = 1, cross_trajectory: bool = False) None
Filter features based on statistical criteria.
Supports all three input variants:
feature_type.Distances() (instance)
feature_type.Distances (class with metaclass)
“distances” (string)
Warning
When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.
Pipeline mode:
>>> pipeline = PipelineManager() >>> pipeline.feature.reduce_data("distances", metric="cv") # NO pipeline_data parameter
Standalone mode:
>>> pipeline_data = PipelineData() >>> manager = FeatureManager() >>> manager.reduce_data(pipeline_data, "distances", metric="cv") # pipeline_data required
Parameters
- pipeline_dataPipelineData
Pipeline data object
- feature_typeFeatureTypeBase, FeatureTypeBase class, or str
Feature type instance, class, or string E.g. Distances(), Distances, “distances”
- metricstr or ReduceMetrics
Statistical filtering metric (‘cv’, ‘frequency’, ‘transitions’, etc.)
- threshold_minfloat, optional
Minimum value to keep (features >= threshold_min)
- threshold_maxfloat, optional
Maximum value to keep (features <= threshold_max)
- transition_thresholdfloat, default=2.0
Distance change threshold for transition detection (Angstrom)
- window_sizeint, default=10
Number of frames for transition window analysis
- transition_modestr, default=”window”
Mode for transition detection (‘window’, ‘lag’)
- lag_timeint, default=1
Lag time for transition detection
- cross_trajectorybool, default=False
If True, find common features across all selected trajectories. If False, reduce each trajectory independently.
Returns
- None
Stores filtered data in self.reduced_data and prints reduction summary
Raises
- ValueError
If the feature has no data, if the reduction has already been performed, or if threshold parameters are invalid (threshold_min > threshold_max)
Examples
>>> pipeline_data = PipelineData() >>> feature_manager = FeatureManager() >>> feature_manager.add_feature(pipeline_data, feature_type.Distances()) # instance >>> feature_manager.reduce_data(pipeline_data, feature_type.Distances, # class metric="cv", threshold_min=0.1)
- print_info(pipeline_data: PipelineData) None
Print feature information.
Warning
When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.
Pipeline mode:
>>> pipeline = PipelineManager() >>> pipeline.feature.print_info() # NO pipeline_data parameter
Standalone mode:
>>> pipeline_data = PipelineData() >>> manager = FeatureManager() >>> manager.print_info(pipeline_data) # pipeline_data required
Parameters
- pipeline_dataPipelineData
Pipeline data container with feature data
Returns
- None
Prints feature information to console
Examples
>>> pipeline_data = PipelineData() >>> feature_manager = FeatureManager() >>> feature_manager.print_info(pipeline_data)
- save(pipeline_data: PipelineData, save_path: str) None
Save all feature data to single file.
Warning
When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.
Pipeline mode:
>>> pipeline = PipelineManager() >>> pipeline.feature.save('features.npy') # NO pipeline_data parameter
Standalone mode:
>>> pipeline_data = PipelineData() >>> manager = FeatureManager() >>> manager.save(pipeline_data, 'features.npy') # pipeline_data required
Parameters
- pipeline_dataPipelineData
Pipeline data container with feature data
- save_pathstr
Path where to save all feature data in one file
Returns
- None
Saves all feature data to the specified file
Examples
>>> feature_manager.save(pipeline_data, 'features.npy')
- load(pipeline_data: PipelineData, load_path: str) None
Load all feature data from single file.
Warning
When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.
Pipeline mode:
>>> pipeline = PipelineManager() >>> pipeline.feature.load('features.npy') # NO pipeline_data parameter
Standalone mode:
>>> pipeline_data = PipelineData() >>> manager = FeatureManager() >>> manager.load(pipeline_data, 'features.npy') # pipeline_data required
Parameters
- pipeline_dataPipelineData
Pipeline data container to load feature data into
- load_pathstr
Path to saved feature data file
Returns
- None
Loads all feature data from the specified file
Examples
>>> feature_manager.load(pipeline_data, 'features.npy')
- property add: FeatureAddService
Service for adding features with simplified syntax.
Provides an intuitive interface for adding features without requiring explicit feature type instantiation or imports.
Returns
- FeatureAddService
Service instance for adding features with combined parameters
Examples
>>> # Add different feature types >>> pipeline.feature.add.distances(excluded_neighbors=2) >>> pipeline.feature.add.contacts(cutoff=4.5, traj_selection=[0,1,2]) >>> pipeline.feature.add.torsions(calculate_chi=False, force=True) >>> pipeline.feature.add.dssp(simplified=True) >>> pipeline.feature.add.sasa(probe_radius=0.12) >>> pipeline.feature.add.coordinates(atom_selection="backbone")
Notes
Pipeline data is automatically injected by AutoInjectProxy. All feature type parameters are combined with add_feature parameters.
- property reduce: FeatureReduceService
Service for reducing features with type-specific metrics.
Provides type-specific reduction metrics tailored to each feature type, such as coefficient of variation for distances or frequency for contacts.
Returns
- FeatureReduceService
Service instance for feature reduction with type-specific metrics
Examples
>>> # Distance-specific metrics >>> pipeline.feature.reduce.distances.cv(threshold_min=0.1) >>> pipeline.feature.reduce.distances.transitions(window_size=20)
>>> # Contact-specific metrics >>> pipeline.feature.reduce.contacts.frequency(threshold_min=0.5) >>> pipeline.feature.reduce.contacts.stability(threshold_max=0.8)
>>> # Torsion-specific metrics (circular statistics) >>> pipeline.feature.reduce.torsions.circular_std(threshold_max=30.0)
Notes
Pipeline data is automatically injected by AutoInjectProxy. Each feature type has its own specialized metrics.
- property analysis: FeatureAnalysisService
Service for analyzing features with type-specific methods.
Provides analysis operations tailored to each feature type, such as circular statistics for torsions or contact frequency analysis.
Parameters
None
Returns
- FeatureAnalysisService
Service instance for feature analysis with type-specific methods
Examples
>>> # Analyze distances >>> pipeline.feature.analysis.distances.compute_mean() >>> pipeline.feature.analysis.distances.compute_std()