Feature Entities

GitHub Link to Code.

Feature data container for trajectory-specific computed features.

Container for trajectory-specific feature data (distances, contacts) with associated calculator. Stores per-trajectory feature data enabling mixed systems with different proteins in a single pipeline.

class mdxplain.feature.entities.feature_data.FeatureData(feature_type: FeatureTypeBase, use_memmap: bool = False, cache_path: str = './cache', chunk_size: int = 2000, trajectory_name: str | None = None)

Internal container for trajectory-specific computed feature data.

Stores per-trajectory feature data enabling mixed systems with different proteins in a single pipeline. Each trajectory computes its own features which are combined at the selection level.

Attributes

feature_typeFeatureTypeBase

Feature type object to compute data

use_memmapbool

Whether memory mapping is used for large datasets

cache_pathstr or None

Path for memory-mapped cache files

reduced_cache_pathstr or None

Path for memory-mapped reduced cache files

chunk_sizeint

Chunk size for processing large datasets

datanp.ndarray or None

Original feature data array

feature_metadatadict or None

Metadata including feature definitions, hyperparameters, etc.

reduced_datanp.ndarray or None

Reduced feature data array (e.g., after PCA)

reduced_feature_metadatadict or None

Metadata for reduced features

reduction_infodict or None

Information about the reduction method and parameters

analsisAny

Facade for analysis methods on this feature data

__init__(feature_type: FeatureTypeBase, use_memmap: bool = False, cache_path: str = './cache', chunk_size: int = 2000, trajectory_name: str | None = None) None

Initialize feature data container.

Parameters

feature_typeFeatureTypeBase

Feature type object to compute data

use_memmapbool, default=False

Whether to use memory mapping

cache_pathstr, optional

Cache directory path for memmap

chunk_sizeint, optional

Chunk size for processing large datasets. Smaller chunks use less memory but may be slower. If None, uses automatic chunking.

trajectory_namestr, optional

Trajectory name for unique cache filenames

Returns

None

Initializes feature data container

get_data(force_original: bool = False) ndarray

Get current dataset (reduced if available, else original).

Parameters

force_originalbool, default=False

Whether to force using the original data instead of the reduced data

Returns

numpy.ndarray

Feature data array for this trajectory

Examples

>>> # Get current data
>>> data = feature_data.get_data()
>>> print(f"Data shape: {data.shape}")
get_feature_metadata(force_original: bool = False) Dict[str, Any]

Get current feature metadata (reduced if available, else original).

This method returns the metadata corresponding to the current active dataset. If reduced data is available, it returns the reduced metadata. Otherwise, it returns the original metadata.

Parameters

force_originalbool, default=False

Whether to force using the original data instead of the reduced data

Returns

dict or None

Feature metadata dict for this trajectory If traj_idx is None: Dict mapping trajectory indices to metadata dicts Returns None if no metadata available

Examples

>>> # Get current metadata
>>> metadata = feature_data.get_feature_metadata()
>>> print(f"Features: {len(metadata['features'])}")
get_feature_names(force_original: bool = False) List[str]

Extract feature names from metadata.

This method generates human-readable feature names from the structured metadata. For pair-based features, it creates names by joining the full names of the involved partners.

Parameters

force_originalbool, default=False

Whether to return the feature names for the original data

Returns

list or None

List of feature names extracted from metadata, or None if not available

Examples

>>> feature_data = traj.get_feature("distances")
>>> names = feature_data.get_feature_names()
>>> print(f"First few feature names: {names[:3]}")
>>> # Output: ['ALA1-VAL2', 'ALA1-GLY3', 'VAL2-GLY3']
save(save_path: str) None

Save FeatureData object to disk.

Parameters

save_pathstr

Path where to save the FeatureData object

Returns

None

Saves the FeatureData object to the specified path

Examples

>>> feature_data.save('analysis_results/distances.pkl')
load(load_path: str) None

Load FeatureData object from disk.

Parameters

load_pathstr

Path to the saved FeatureData file

Returns

None

Loads the FeatureData object from the specified path

Examples

>>> feature_data.load('analysis_results/distances.pkl')
print_info() None

Print comprehensive feature information.

Parameters

None

Returns

None

Prints feature information to console

Examples

>>> feature_data.print_info()
=== FeatureData ===
Feature Type: Distances
Original Data: 1000 frames x 250 features
Reduced Data: 1000 frames x 100 features (PCA)