Feature Entities

GitHub Link to Code.

Feature data container for trajectory-specific computed features.

Container for trajectory-specific feature data (distances, contacts) with associated calculator. Stores per-trajectory feature data enabling mixed systems with different proteins in a single pipeline.

class mdxplain.feature.entities.feature_data.FeatureData(feature_type: FeatureTypeBase, use_memmap: bool = False, cache_path: str = './cache', chunk_size: int = 2000, trajectory_name: str | None = None)

Internal container for trajectory-specific computed feature data.

Stores per-trajectory feature data enabling mixed systems with different proteins in a single pipeline. Each trajectory computes its own features which are combined at the selection level.

Attributes

feature_typeFeatureTypeBase: Feature type object to compute data
use_memmapbool: Whether memory mapping is used for large datasets
cache_pathstr or None: Path for memory-mapped cache files
reduced_cache_pathstr or None: Path for memory-mapped reduced cache files
chunk_sizeint: Chunk size for processing large datasets
datanp.ndarray or None: Original feature data array
feature_metadatadict or None: Metadata including feature definitions, hyperparameters, etc.
reduced_datanp.ndarray or None: Reduced feature data array (e.g., after PCA)
reduced_feature_metadatadict or None: Metadata for reduced features
reduction_infodict or None: Information about the reduction method and parameters
analsisAny: Facade for analysis methods on this feature data

__init__(feature_type: FeatureTypeBase, use_memmap: bool = False, cache_path: str = './cache', chunk_size: int = 2000, trajectory_name: str | None = None) → None

Initialize feature data container.

Parameters

feature_typeFeatureTypeBase: Feature type object to compute data
use_memmapbool, default=False: Whether to use memory mapping
cache_pathstr, optional: Cache directory path for memmap
chunk_sizeint, optional: Chunk size for processing large datasets. Smaller chunks use less memory but may be slower. If None, uses automatic chunking.
trajectory_namestr, optional: Trajectory name for unique cache filenames

Returns

None: Initializes feature data container

get_data(force_original: bool = False) → ndarray

Get current dataset (reduced if available, else original).

Parameters

force_originalbool, default=False: Whether to force using the original data instead of the reduced data

Returns

numpy.ndarray: Feature data array for this trajectory

Examples

>>> # Get current data
>>> data = feature_data.get_data()
>>> print(f"Data shape: {data.shape}")

get_feature_metadata(force_original: bool = False) → Dict[str, Any]

Get current feature metadata (reduced if available, else original).

This method returns the metadata corresponding to the current active dataset. If reduced data is available, it returns the reduced metadata. Otherwise, it returns the original metadata.

Parameters

force_originalbool, default=False: Whether to force using the original data instead of the reduced data

Returns

dict or None: Feature metadata dict for this trajectory If traj_idx is None: Dict mapping trajectory indices to metadata dicts Returns None if no metadata available

Examples

>>> # Get current metadata
>>> metadata = feature_data.get_feature_metadata()
>>> print(f"Features: {len(metadata['features'])}")

get_feature_names(force_original: bool = False) → List[str]

Extract feature names from metadata.

This method generates human-readable feature names from the structured metadata. For pair-based features, it creates names by joining the full names of the involved partners.

Parameters

force_originalbool, default=False: Whether to return the feature names for the original data

Returns

list or None: List of feature names extracted from metadata, or None if not available

Examples

>>> feature_data = traj.get_feature("distances")
>>> names = feature_data.get_feature_names()
>>> print(f"First few feature names: {names[:3]}")
>>> # Output: ['ALA1-VAL2', 'ALA1-GLY3', 'VAL2-GLY3']

save(save_path: str) → None

Save FeatureData object to disk.

Parameters

save_pathstr: Path where to save the FeatureData object

Returns

None: Saves the FeatureData object to the specified path

Examples

>>> feature_data.save('analysis_results/distances.pkl')

load(load_path: str) → None

Load FeatureData object from disk.

Parameters

load_pathstr: Path to the saved FeatureData file

Returns

None: Loads the FeatureData object from the specified path

Examples

>>> feature_data.load('analysis_results/distances.pkl')

print_info() → None

Print comprehensive feature information.

Parameters

None

Returns

None: Prints feature information to console

Examples

>>> feature_data.print_info()
=== FeatureData ===
Feature Type: Distances
Original Data: 1000 frames x 250 features
Reduced Data: 1000 frames x 100 features (PCA)