Reference Structure Helper
GitHub Link to Code.
Helpers for building reference structures in a memory-efficient way.
- class mdxplain.analysis.structure.helper.reference_structure_helper.ReferenceStructureHelper
Collection of reference structure utilities for structure analysis.
Helper methods compute mean, median, and MAD-based reference structures in a streaming fashion so large trajectories can be processed without exhausting memory resources.
- static get_mean_coordinates(trajectories: Iterable, atom_chunk_size: int, use_memmap: bool, cross_trajectory: bool = True, frame_chunk_size: int = 2000) ndarray | dict[int, ndarray]
Calculate mean coordinates across trajectories using atom-batching.
Parameters
- trajectoriesIterable
Collection of trajectory-like objects supporting slicing.
- atom_chunk_sizeint
Number of atoms processed per batch.
- use_memmapbool
Whether to use memory-mapped processing with frame chunks.
- cross_trajectorybool, optional
If True, compute reference across all trajectories combined. If False, compute separate reference per trajectory. Defaults to True.
- frame_chunk_sizeint, optional
Number of frames to process per chunk when use_memmap=True. Defaults to 2000.
Returns
- numpy.ndarray or dict[int, numpy.ndarray]
If cross_trajectory=True: Mean coordinates with shape
(n_atoms, 3). If cross_trajectory=False: Dict mapping trajectory index to mean coordinates.
Raises
- ValueError
If no frames are available across all trajectories.
Examples
>>> import mdtraj as md >>> import numpy as np >>> topology = md.Topology() >>> chain = topology.add_chain() >>> residue = topology.add_residue("ALA", chain) >>> topology.add_atom("CA", md.element.carbon, residue) >>> coords = np.random.rand(100, 1, 3).astype(np.float32) >>> traj = md.Trajectory(coords, topology) >>> mean_ref = ReferenceStructureHelper.get_mean_coordinates( ... [traj], atom_chunk_size=50, use_memmap=False ... ) >>> mean_ref.shape (1, 3)
Notes
When processing large trajectories with limited memory, enable use_memmap=True and adjust frame_chunk_size to control memory usage. The atom_chunk_size parameter controls the trade-off between memory usage and computational efficiency.
- static get_median_coordinates(trajectories: Iterable, atom_chunk_size: int, use_memmap: bool, cross_trajectory: bool = True, frame_chunk_size: int = 2000) ndarray | dict[int, ndarray]
Calculate median coordinates across trajectories using atom-batching.
Parameters
- trajectoriesIterable
Collection of trajectory-like objects supporting slicing.
- atom_chunk_sizeint
Number of atoms processed per batch.
- use_memmapbool
Whether to use memory-mapped processing with frame chunks.
- cross_trajectorybool, optional
If True, compute reference across all trajectories combined. If False, compute separate reference per trajectory. Defaults to True.
- frame_chunk_sizeint, optional
Number of frames to process per chunk when use_memmap=True. Defaults to 2000.
Returns
- numpy.ndarray or dict[int, numpy.ndarray]
If cross_trajectory=True: Median coordinates with shape
(n_atoms, 3). If cross_trajectory=False: Dict mapping trajectory index to median coordinates.
Raises
- ValueError
If no frames are available across all trajectories.
Examples
>>> import mdtraj as md >>> import numpy as np >>> topology = md.Topology() >>> chain = topology.add_chain() >>> residue = topology.add_residue("ALA", chain) >>> topology.add_atom("CA", md.element.carbon, residue) >>> coords = np.random.rand(100, 1, 3).astype(np.float32) >>> traj = md.Trajectory(coords, topology) >>> median_ref = ReferenceStructureHelper.get_median_coordinates( ... [traj], atom_chunk_size=50, use_memmap=False ... ) >>> median_ref.shape (1, 3)
Notes
Median calculation requires collecting all frames for each atom batch, which may use more memory than mean calculation. For very large trajectories, reduce atom_chunk_size or enable use_memmap with smaller frame_chunk_size.