Feature Entities
GitHub Link to Code.
Feature data container for trajectory-specific computed features.
Container for trajectory-specific feature data (distances, contacts) with associated calculator. Stores per-trajectory feature data enabling mixed systems with different proteins in a single pipeline.
- class mdxplain.feature.entities.feature_data.FeatureData(feature_type: FeatureTypeBase, use_memmap: bool = False, cache_path: str = './cache', chunk_size: int = 2000, trajectory_name: str | None = None)
Internal container for trajectory-specific computed feature data.
Stores per-trajectory feature data enabling mixed systems with different proteins in a single pipeline. Each trajectory computes its own features which are combined at the selection level.
Attributes
- feature_typeFeatureTypeBase
Feature type object to compute data
- use_memmapbool
Whether memory mapping is used for large datasets
- cache_pathstr or None
Path for memory-mapped cache files
- reduced_cache_pathstr or None
Path for memory-mapped reduced cache files
- chunk_sizeint
Chunk size for processing large datasets
- datanp.ndarray or None
Original feature data array
- feature_metadatadict or None
Metadata including feature definitions, hyperparameters, etc.
- reduced_datanp.ndarray or None
Reduced feature data array (e.g., after PCA)
- reduced_feature_metadatadict or None
Metadata for reduced features
- reduction_infodict or None
Information about the reduction method and parameters
- analsisAny
Facade for analysis methods on this feature data
- __init__(feature_type: FeatureTypeBase, use_memmap: bool = False, cache_path: str = './cache', chunk_size: int = 2000, trajectory_name: str | None = None) None
Initialize feature data container.
Parameters
- feature_typeFeatureTypeBase
Feature type object to compute data
- use_memmapbool, default=False
Whether to use memory mapping
- cache_pathstr, optional
Cache directory path for memmap
- chunk_sizeint, optional
Chunk size for processing large datasets. Smaller chunks use less memory but may be slower. If None, uses automatic chunking.
- trajectory_namestr, optional
Trajectory name for unique cache filenames
Returns
- None
Initializes feature data container
- get_data(force_original: bool = False) ndarray
Get current dataset (reduced if available, else original).
Parameters
- force_originalbool, default=False
Whether to force using the original data instead of the reduced data
Returns
- numpy.ndarray
Feature data array for this trajectory
Examples
>>> # Get current data >>> data = feature_data.get_data() >>> print(f"Data shape: {data.shape}")
- get_feature_metadata(force_original: bool = False) Dict[str, Any]
Get current feature metadata (reduced if available, else original).
This method returns the metadata corresponding to the current active dataset. If reduced data is available, it returns the reduced metadata. Otherwise, it returns the original metadata.
Parameters
- force_originalbool, default=False
Whether to force using the original data instead of the reduced data
Returns
- dict or None
Feature metadata dict for this trajectory If traj_idx is None: Dict mapping trajectory indices to metadata dicts Returns None if no metadata available
Examples
>>> # Get current metadata >>> metadata = feature_data.get_feature_metadata() >>> print(f"Features: {len(metadata['features'])}")
- get_feature_names(force_original: bool = False) List[str]
Extract feature names from metadata.
This method generates human-readable feature names from the structured metadata. For pair-based features, it creates names by joining the full names of the involved partners.
Parameters
- force_originalbool, default=False
Whether to return the feature names for the original data
Returns
- list or None
List of feature names extracted from metadata, or None if not available
Examples
>>> feature_data = traj.get_feature("distances") >>> names = feature_data.get_feature_names() >>> print(f"First few feature names: {names[:3]}") >>> # Output: ['ALA1-VAL2', 'ALA1-GLY3', 'VAL2-GLY3']
- save(save_path: str) None
Save FeatureData object to disk.
Parameters
- save_pathstr
Path where to save the FeatureData object
Returns
- None
Saves the FeatureData object to the specified path
Examples
>>> feature_data.save('analysis_results/distances.pkl')
- load(load_path: str) None
Load FeatureData object from disk.
Parameters
- load_pathstr
Path to the saved FeatureData file
Returns
- None
Loads the FeatureData object from the specified path
Examples
>>> feature_data.load('analysis_results/distances.pkl')
- print_info() None
Print comprehensive feature information.
Parameters
None
Returns
- None
Prints feature information to console
Examples
>>> feature_data.print_info() === FeatureData === Feature Type: Distances Original Data: 1000 frames x 250 features Reduced Data: 1000 frames x 100 features (PCA)