Distances Data

GitHub Link to Code.

Distance feature type implementation for molecular dynamics analysis.

Distance feature type implementation with pairwise distance calculations for analyzing molecular dynamics trajectories using MDTraj.

class mdxplain.feature.feature_type.distances.distances.Distances(excluded_neighbors: int = 1, use_pbc: bool = True)

Distance feature type for calculating pairwise distances between atoms/residues.

Computes all pairwise distances from molecular dynamics trajectories using MDTraj’s distance calculation functions. The reference trajectory determines which atom pairs are computed and provides the basis for feature naming. This ensures consistent feature names when comparing different trajectories.

This is a base feature type with no dependencies - other features like contacts use distance data as input.

Use mdtraj’s distance functions under the hood.

Examples

>>> # Basic distance calculation via TrajectoryData
>>> traj_data = TrajectoryData()
>>> traj_data.load_trajectories('simulation/')
>>> traj_data.add_feature(Distances())
>>> # Distance calculation with memory mapping
>>> distances = Distances()
>>> traj_data.add_feature(distances, use_memmap=True, cache_path='./cache/')
>>> # Distance calculation with reference trajectory for consistent naming
>>> ref_traj = traj_data.trajectories[7]  # Use specific trajectory as reference
>>> traj_data.add_feature(Distances(ref=ref_traj))
__init__(excluded_neighbors: int = 1, use_pbc: bool = True) None

Initialize distance feature type with optional reference trajectory.

Parameters

excluded_neighborsint, default=1

Number of nearest neighbors to consider for distance calculation. Chain Breaks are automatically excluded. Meassured by jump in the seqid of a residue. If 0, all pairs are computed. If 1, only nearest neighbors are computed. If 2, only nearest neighbors and their neighbors are computed. If 3, only nearest neighbors and their neighbors and their neighbors are computed. etc.

use_pbcbool, default=True

If True and the trajectory contains unitcell information, distances are computed under the minimum image convention (accounting for periodic boundary conditions).

Returns

None

Examples

>>> # Use first trajectory as reference (automatic)
>>> distances = Distances()
>>> # Use custom diagonal offset
>>> distances = Distances(excluded_neighbors=2)
>>> # Disable periodic boundary conditions
>>> distances = Distances(use_pbc=False)
init_calculator(use_memmap: bool = False, cache_path: str = './cache', chunk_size: int = 2000) None

Initialize the distance calculator with specified configuration.

Parameters

use_memmapbool, default=False

Whether to use memory mapping for large datasets

cache_pathstr, optional

Directory path for storing cache files when using memory mapping

chunk_sizeint, optional

Number of frames to process per chunk (None for automatic sizing)

Returns

None

Examples

>>> # Basic initialization
>>> distances.init_calculator()
>>> # With memory mapping for large datasets
>>> distances.init_calculator(use_memmap=True, cache_path='./cache/')
>>> # With custom chunk size
>>> distances.init_calculator(chunk_size=500)
compute(input_data: Trajectory, feature_metadata: Dict[str, Any]) Tuple[ndarray, Dict[str, Any]]

Compute pairwise distances from molecular dynamics trajectories.

Parameters

input_datamdtraj.Trajectory

MD trajectory to compute distances from

feature_metadatadict, optional

Not used for distances (base feature type)

Returns

tuple[numpy.ndarray, dict]

Tuple containing (distance_matrix, feature_metadata) where distance_matrix is shape (n_frames, n_pairs) and feature_metadata is structured metadata with features in same order as data columns

Examples

>>> # Compute distances from trajectories
>>> distances = Distances()
>>> distances.init_calculator()
>>> data, names = distances.compute(input_data=trajectories)
>>> print(f"Distance matrix shape: {data.shape}")
>>> print(f"First few pairs: {names[:5]}")
>>> # Using memory mapping for large datasets
>>> distances.init_calculator(use_memmap=True, cache_path='./cache/')
>>> data, names = distances.compute(input_data=large_trajectories)

Raises

ValueError

If trajectories have different numbers of residues If calculator is not initialized

get_dependencies() List[str]

Get list of feature type dependencies for distance calculations.

Parameters

None

Returns

List[str]

Empty list as distances are a base feature with no dependencies

Examples

>>> distances = Distances()
>>> print(distances.get_dependencies())
[]
classmethod get_type_name() str

Get the type name as class method.

Parameters

None

Returns

str

String identifier ‘distances’

Examples

>>> print(Distances.get_type_name())
'distances'
get_input()

Get the input feature type that distances depend on.

Parameters

None

Returns

None

None since distances are a base feature with no input dependencies

Examples

>>> distances = Distances()
>>> print(distances.get_input())
None