Reference Structure Helper

GitHub Link to Code.

Helpers for building reference structures in a memory-efficient way.

class mdxplain.analysis.structure.helper.reference_structure_helper.ReferenceStructureHelper

Collection of reference structure utilities for structure analysis.

Helper methods compute mean, median, and MAD-based reference structures in a streaming fashion so large trajectories can be processed without exhausting memory resources.

static get_mean_coordinates(trajectories: Iterable, atom_chunk_size: int, use_memmap: bool, cross_trajectory: bool = True, frame_chunk_size: int = 2000) → ndarray | dict[int, ndarray]

Calculate mean coordinates across trajectories using atom-batching.

Parameters

trajectoriesIterable: Collection of trajectory-like objects supporting slicing.
atom_chunk_sizeint: Number of atoms processed per batch.
use_memmapbool: Whether to use memory-mapped processing with frame chunks.
cross_trajectorybool, optional: If True, compute reference across all trajectories combined. If False, compute separate reference per trajectory. Defaults to True.
frame_chunk_sizeint, optional: Number of frames to process per chunk when use_memmap=True. Defaults to 2000.

Returns

numpy.ndarray or dict[int, numpy.ndarray]: If cross_trajectory=True: Mean coordinates with shape (n_atoms, 3). If cross_trajectory=False: Dict mapping trajectory index to mean coordinates.

Raises

ValueError: If no frames are available across all trajectories.

Examples

>>> import mdtraj as md
>>> import numpy as np
>>> topology = md.Topology()
>>> chain = topology.add_chain()
>>> residue = topology.add_residue("ALA", chain)
>>> topology.add_atom("CA", md.element.carbon, residue)
>>> coords = np.random.rand(100, 1, 3).astype(np.float32)
>>> traj = md.Trajectory(coords, topology)
>>> mean_ref = ReferenceStructureHelper.get_mean_coordinates(
...     [traj], atom_chunk_size=50, use_memmap=False
... )
>>> mean_ref.shape
(1, 3)

Notes

When processing large trajectories with limited memory, enable use_memmap=True and adjust frame_chunk_size to control memory usage. The atom_chunk_size parameter controls the trade-off between memory usage and computational efficiency.

static get_median_coordinates(trajectories: Iterable, atom_chunk_size: int, use_memmap: bool, cross_trajectory: bool = True, frame_chunk_size: int = 2000) → ndarray | dict[int, ndarray]

Calculate median coordinates across trajectories using atom-batching.

Parameters

trajectoriesIterable: Collection of trajectory-like objects supporting slicing.
atom_chunk_sizeint: Number of atoms processed per batch.
use_memmapbool: Whether to use memory-mapped processing with frame chunks.
cross_trajectorybool, optional: If True, compute reference across all trajectories combined. If False, compute separate reference per trajectory. Defaults to True.
frame_chunk_sizeint, optional: Number of frames to process per chunk when use_memmap=True. Defaults to 2000.

Returns

numpy.ndarray or dict[int, numpy.ndarray]: If cross_trajectory=True: Median coordinates with shape (n_atoms, 3). If cross_trajectory=False: Dict mapping trajectory index to median coordinates.

Raises

ValueError: If no frames are available across all trajectories.

Examples

>>> import mdtraj as md
>>> import numpy as np
>>> topology = md.Topology()
>>> chain = topology.add_chain()
>>> residue = topology.add_residue("ALA", chain)
>>> topology.add_atom("CA", md.element.carbon, residue)
>>> coords = np.random.rand(100, 1, 3).astype(np.float32)
>>> traj = md.Trajectory(coords, topology)
>>> median_ref = ReferenceStructureHelper.get_median_coordinates(
...     [traj], atom_chunk_size=50, use_memmap=False
... )
>>> median_ref.shape
(1, 3)

Notes

Median calculation requires collecting all frames for each atom batch, which may use more memory than mean calculation. For very large trajectories, reduce atom_chunk_size or enable use_memmap with smaller frame_chunk_size.