Feature Manager

GitHub Link to Code.

FeatureManager is a manager for feature data objects.

It is used to add, reset, and reduce features to the pipeline data.

class mdxplain.feature.manager.feature_manager.FeatureManager(use_memmap: bool = False, chunk_size: int = 2000, cache_dir: str = './cache')

Manager for feature data objects.

This class provides methods to add, reset, and reduce features in the pipeline data. It handles feature dependencies, computation, and storage.

Examples

>>> pipeline_data = PipelineData()
>>> feature_manager = FeatureManager()
>>> feature_manager.add_feature(pipeline_data, feature_type.Distances())
>>> feature_manager.reduce_data(pipeline_data, feature_type.Distances(), metric="cv", threshold_min=0.1)
>>> feature_manager.reset_features(pipeline_data, "distances")
>>> feature_manager.print_info(pipeline_data)
>>> feature_manager.save(pipeline_data, 'features.npy')
>>> feature_manager.load(pipeline_data, 'features.npy')
__init__(use_memmap: bool = False, chunk_size: int = 2000, cache_dir: str = './cache') None

Initialize feature manager.

Parameters

use_memmapbool, default=False

Whether to use memory mapping for feature data

chunk_sizeint, optional

Processing chunk size

cache_dirstr, default=”./cache”

Cache path for feature data

Returns

None

Initializes FeatureManager instance with specified configuration

reset_features(pipeline_data: PipelineData, feature_type: str | Any | None = None, strict: bool = True) None

Reset calculated features and clear feature data.

This method removes computed features and their associated data, requiring features to be recalculated from scratch. Use this method after trajectory modifications that invalidate existing features. Can reset all features or specific feature types.

Warning

When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.

Pipeline mode:

>>> pipeline = PipelineManager()
>>> pipeline.feature.reset_features()  # NO pipeline_data parameter

Standalone mode:

>>> pipeline_data = PipelineData()
>>> manager = FeatureManager()
>>> manager.reset_features(pipeline_data)  # pipeline_data required

Parameters

pipeline_dataPipelineData

Pipeline data object

feature_typestr, FeatureTypeBase, list, or None, default=None

Feature type(s) to reset. If None, resets all features. Supports following formats:

  • “distances” (string)

  • feature_type.Distances() (instance)

  • feature_type.Distances (class)

  • [“distances”, “contacts”] (list of any of above)

strictbool, default=False

Whether to raise ValueError if feature_type doesn’t exist. If False, non-existent features are silently ignored with warning.

Returns

None

Clears specified feature data from pipeline_data.feature_data in-place

Examples

>>> pipeline_data = PipelineData()
>>> feature_manager = FeatureManager()
>>> feature_manager.add_feature(pipeline_data, feature_type.Distances())
>>> feature_manager.add_feature(pipeline_data, feature_type.Contacts())
>>> # Reset all features
>>> feature_manager.reset_features(pipeline_data)
>>> # Reset specific feature type
>>> feature_manager.reset_features(pipeline_data, "distances")
>>> # Reset multiple feature types
>>> feature_manager.reset_features(pipeline_data, ["distances", "contacts"])
>>> # Strict mode - raise error for non-existent features
>>> feature_manager.reset_features(pipeline_data, "nonexistent", strict=True)  # Raises ValueError

Notes

  • Selected feature data is permanently deleted

  • Memory-mapped feature files remain on disk but are no longer referenced

  • Features must be recalculated after reset

add_feature(pipeline_data: PipelineData, feature_type: FeatureTypeBase, traj_selection: str | int | List = 'all', force: bool = False, force_original: bool = True) None

Add and compute a feature for the loaded trajectories.

This method creates a FeatureData instance for the specified feature type, handles dependency checking, and computes the feature data. Features with dependencies (like Contacts depending on Distances) will automatically use the required input data.

Warning

When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.

Pipeline mode:

>>> pipeline = PipelineManager()
>>> pipeline.feature.add_feature(feature_type.Distances())  # NO pipeline_data parameter

Standalone mode:

>>> pipeline_data = PipelineData()
>>> manager = FeatureManager()
>>> manager.add_feature(pipeline_data, feature_type.Distances())  # pipeline_data required

Parameters

pipeline_dataPipelineData

Pipeline data object

feature_typeFeatureTypeBase

Feature type instance(e.g., Distances(), Contacts()). The feature type determines what kind of analysis will be performed.

traj_selectionint, str, list, or “all”, default=”all”

Selection of trajectories to compute features for:

  • int: trajectory index

  • str: trajectory name, tag (prefixed with “tag:”), or “all”

  • list: list of indices/names/tags

  • “all”: all trajectories (default)

forcebool, default=False

Whether to force recomputation of the feature even if it already exists.

force_originalbool, default=True

Whether to force using the original data as base for the calculation for features using other features as input instead of the reduced data

Returns

None

Adds computed feature to pipeline data and creates analysis methods

Raises

ValueError

If the feature already exists with computed data, if required dependencies are missing, or if trajectories are not loaded.

Examples

>>> pipeline_data = PipelineData()
>>> feature_manager = FeatureManager()
>>> feature_manager.add_feature(pipeline_data, feature_type.Distances())
>>> feature_manager.add_feature(pipeline_data, feature_type.Contacts())
reset_reduction(pipeline_data: PipelineData, feature_type: FeatureTypeBase) None

Reset to using full original data instead of reduced dataset.

Supports all three input variants:

  • feature_type.Distances() (instance)

  • feature_type.Distances (class with metaclass)

  • “distances” (string)

Warning

When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.

Pipeline mode:

>>> pipeline = PipelineManager()
>>> pipeline.feature.reset_reduction("distances")  # NO pipeline_data parameter

Standalone mode:

>>> pipeline_data = PipelineData()
>>> manager = FeatureManager()
>>> manager.reset_reduction(pipeline_data, "distances")  # pipeline_data required

Parameters

pipeline_dataPipelineData

Pipeline data object

feature_typeFeatureTypeBase, FeatureTypeBase class, or str

Feature type instance, class, or string E.g. Distances(), Distances, “distances”

Returns

None

Clears reduced_data and prints confirmation with shape info

Examples

>>> pipeline_data = PipelineData()
>>> feature_manager = FeatureManager()
>>> feature_manager.add_feature(pipeline_data, feature_type.Distances())
>>> feature_manager.reduce_data(pipeline_data, feature_type.Distances,
                                metric="cv", threshold_min=0.1)
>>> feature_manager.reset_reduction(pipeline_data, "distances")  # string
reduce_data(pipeline_data: PipelineData, feature_type: FeatureTypeBase, metric: str | Callable[[np.ndarray], np.ndarray], traj_selection: str | int | List = 'all', threshold_min: float | None = None, threshold_max: float | None = None, transition_threshold: float = 2.0, window_size: int = 10, transition_mode: str = 'window', lag_time: int = 1, cross_trajectory: bool = False) None

Filter features based on statistical criteria.

Supports all three input variants:

  • feature_type.Distances() (instance)

  • feature_type.Distances (class with metaclass)

  • “distances” (string)

Warning

When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.

Pipeline mode:

>>> pipeline = PipelineManager()
>>> pipeline.feature.reduce_data("distances", metric="cv")  # NO pipeline_data parameter

Standalone mode:

>>> pipeline_data = PipelineData()
>>> manager = FeatureManager()
>>> manager.reduce_data(pipeline_data, "distances", metric="cv")  # pipeline_data required

Parameters

pipeline_dataPipelineData

Pipeline data object

feature_typeFeatureTypeBase, FeatureTypeBase class, or str

Feature type instance, class, or string E.g. Distances(), Distances, “distances”

metricstr or ReduceMetrics

Statistical filtering metric (‘cv’, ‘frequency’, ‘transitions’, etc.)

threshold_minfloat, optional

Minimum value to keep (features >= threshold_min)

threshold_maxfloat, optional

Maximum value to keep (features <= threshold_max)

transition_thresholdfloat, default=2.0

Distance change threshold for transition detection (Angstrom)

window_sizeint, default=10

Number of frames for transition window analysis

transition_modestr, default=”window”

Mode for transition detection (‘window’, ‘lag’)

lag_timeint, default=1

Lag time for transition detection

cross_trajectorybool, default=False

If True, find common features across all selected trajectories. If False, reduce each trajectory independently.

Returns

None

Stores filtered data in self.reduced_data and prints reduction summary

Raises

ValueError

If the feature has no data, if the reduction has already been performed, or if threshold parameters are invalid (threshold_min > threshold_max)

Examples

>>> pipeline_data = PipelineData()
>>> feature_manager = FeatureManager()
>>> feature_manager.add_feature(pipeline_data, feature_type.Distances())  # instance
>>> feature_manager.reduce_data(pipeline_data, feature_type.Distances,   # class
                                metric="cv", threshold_min=0.1)
print_info(pipeline_data: PipelineData) None

Print feature information.

Warning

When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.

Pipeline mode:

>>> pipeline = PipelineManager()
>>> pipeline.feature.print_info()  # NO pipeline_data parameter

Standalone mode:

>>> pipeline_data = PipelineData()
>>> manager = FeatureManager()
>>> manager.print_info(pipeline_data)  # pipeline_data required

Parameters

pipeline_dataPipelineData

Pipeline data container with feature data

Returns

None

Prints feature information to console

Examples

>>> pipeline_data = PipelineData()
>>> feature_manager = FeatureManager()
>>> feature_manager.print_info(pipeline_data)
save(pipeline_data: PipelineData, save_path: str) None

Save all feature data to single file.

Warning

When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.

Pipeline mode:

>>> pipeline = PipelineManager()
>>> pipeline.feature.save('features.npy')  # NO pipeline_data parameter

Standalone mode:

>>> pipeline_data = PipelineData()
>>> manager = FeatureManager()
>>> manager.save(pipeline_data, 'features.npy')  # pipeline_data required

Parameters

pipeline_dataPipelineData

Pipeline data container with feature data

save_pathstr

Path where to save all feature data in one file

Returns

None

Saves all feature data to the specified file

Examples

>>> feature_manager.save(pipeline_data, 'features.npy')
load(pipeline_data: PipelineData, load_path: str) None

Load all feature data from single file.

Warning

When using PipelineManager, do NOT provide the pipeline_data parameter. The PipelineManager automatically injects this parameter.

Pipeline mode:

>>> pipeline = PipelineManager()
>>> pipeline.feature.load('features.npy')  # NO pipeline_data parameter

Standalone mode:

>>> pipeline_data = PipelineData()
>>> manager = FeatureManager()
>>> manager.load(pipeline_data, 'features.npy')  # pipeline_data required

Parameters

pipeline_dataPipelineData

Pipeline data container to load feature data into

load_pathstr

Path to saved feature data file

Returns

None

Loads all feature data from the specified file

Examples

>>> feature_manager.load(pipeline_data, 'features.npy')
property add: FeatureAddService

Service for adding features with simplified syntax.

Provides an intuitive interface for adding features without requiring explicit feature type instantiation or imports.

Returns

FeatureAddService

Service instance for adding features with combined parameters

Examples

>>> # Add different feature types
>>> pipeline.feature.add.distances(excluded_neighbors=2)
>>> pipeline.feature.add.contacts(cutoff=4.5, traj_selection=[0,1,2])
>>> pipeline.feature.add.torsions(calculate_chi=False, force=True)
>>> pipeline.feature.add.dssp(simplified=True)
>>> pipeline.feature.add.sasa(probe_radius=0.12)
>>> pipeline.feature.add.coordinates(atom_selection="backbone")

Notes

Pipeline data is automatically injected by AutoInjectProxy. All feature type parameters are combined with add_feature parameters.

property reduce: FeatureReduceService

Service for reducing features with type-specific metrics.

Provides type-specific reduction metrics tailored to each feature type, such as coefficient of variation for distances or frequency for contacts.

Returns

FeatureReduceService

Service instance for feature reduction with type-specific metrics

Examples

>>> # Distance-specific metrics
>>> pipeline.feature.reduce.distances.cv(threshold_min=0.1)
>>> pipeline.feature.reduce.distances.transitions(window_size=20)
>>> # Contact-specific metrics
>>> pipeline.feature.reduce.contacts.frequency(threshold_min=0.5)
>>> pipeline.feature.reduce.contacts.stability(threshold_max=0.8)
>>> # Torsion-specific metrics (circular statistics)
>>> pipeline.feature.reduce.torsions.circular_std(threshold_max=30.0)

Notes

Pipeline data is automatically injected by AutoInjectProxy. Each feature type has its own specialized metrics.

property analysis: FeatureAnalysisService

Service for analyzing features with type-specific methods.

Provides analysis operations tailored to each feature type, such as circular statistics for torsions or contact frequency analysis.

Parameters

None

Returns

FeatureAnalysisService

Service instance for feature analysis with type-specific methods

Examples

>>> # Analyze distances
>>> pipeline.feature.analysis.distances.compute_mean()
>>> pipeline.feature.analysis.distances.compute_std()