Distances Data
GitHub Link to Code.
Distance feature type implementation for molecular dynamics analysis.
Distance feature type implementation with pairwise distance calculations for analyzing molecular dynamics trajectories using MDTraj.
- class mdxplain.feature.feature_type.distances.distances.Distances(excluded_neighbors: int = 1, use_pbc: bool = True)
Distance feature type for calculating pairwise distances between atoms/residues.
Computes all pairwise distances from molecular dynamics trajectories using MDTraj’s distance calculation functions. The reference trajectory determines which atom pairs are computed and provides the basis for feature naming. This ensures consistent feature names when comparing different trajectories.
This is a base feature type with no dependencies - other features like contacts use distance data as input.
Use mdtraj’s distance functions under the hood.
Examples
>>> # Basic distance calculation via TrajectoryData >>> traj_data = TrajectoryData() >>> traj_data.load_trajectories('simulation/') >>> traj_data.add_feature(Distances())
>>> # Distance calculation with memory mapping >>> distances = Distances() >>> traj_data.add_feature(distances, use_memmap=True, cache_path='./cache/')
>>> # Distance calculation with reference trajectory for consistent naming >>> ref_traj = traj_data.trajectories[7] # Use specific trajectory as reference >>> traj_data.add_feature(Distances(ref=ref_traj))
- __init__(excluded_neighbors: int = 1, use_pbc: bool = True) None
Initialize distance feature type with optional reference trajectory.
Parameters
- excluded_neighborsint, default=1
Number of nearest neighbors to consider for distance calculation. Chain Breaks are automatically excluded. Meassured by jump in the seqid of a residue. If 0, all pairs are computed. If 1, only nearest neighbors are computed. If 2, only nearest neighbors and their neighbors are computed. If 3, only nearest neighbors and their neighbors and their neighbors are computed. etc.
- use_pbcbool, default=True
If True and the trajectory contains unitcell information, distances are computed under the minimum image convention (accounting for periodic boundary conditions).
Returns
None
Examples
>>> # Use first trajectory as reference (automatic) >>> distances = Distances()
>>> # Use custom diagonal offset >>> distances = Distances(excluded_neighbors=2)
>>> # Disable periodic boundary conditions >>> distances = Distances(use_pbc=False)
- init_calculator(use_memmap: bool = False, cache_path: str = './cache', chunk_size: int = 2000) None
Initialize the distance calculator with specified configuration.
Parameters
- use_memmapbool, default=False
Whether to use memory mapping for large datasets
- cache_pathstr, optional
Directory path for storing cache files when using memory mapping
- chunk_sizeint, optional
Number of frames to process per chunk (None for automatic sizing)
Returns
None
Examples
>>> # Basic initialization >>> distances.init_calculator()
>>> # With memory mapping for large datasets >>> distances.init_calculator(use_memmap=True, cache_path='./cache/')
>>> # With custom chunk size >>> distances.init_calculator(chunk_size=500)
- compute(input_data: Trajectory, feature_metadata: Dict[str, Any]) Tuple[ndarray, Dict[str, Any]]
Compute pairwise distances from molecular dynamics trajectories.
Parameters
- input_datamdtraj.Trajectory
MD trajectory to compute distances from
- feature_metadatadict, optional
Not used for distances (base feature type)
Returns
- tuple[numpy.ndarray, dict]
Tuple containing (distance_matrix, feature_metadata) where distance_matrix is shape (n_frames, n_pairs) and feature_metadata is structured metadata with features in same order as data columns
Examples
>>> # Compute distances from trajectories >>> distances = Distances() >>> distances.init_calculator() >>> data, names = distances.compute(input_data=trajectories) >>> print(f"Distance matrix shape: {data.shape}") >>> print(f"First few pairs: {names[:5]}")
>>> # Using memory mapping for large datasets >>> distances.init_calculator(use_memmap=True, cache_path='./cache/') >>> data, names = distances.compute(input_data=large_trajectories)
Raises
- ValueError
If trajectories have different numbers of residues If calculator is not initialized
- get_dependencies() List[str]
Get list of feature type dependencies for distance calculations.
Parameters
None
Returns
- List[str]
Empty list as distances are a base feature with no dependencies
Examples
>>> distances = Distances() >>> print(distances.get_dependencies()) []