Parallel Operations Helper
GitHub Link to Code.
Parallel operations for MDTraj methods using Dask arrays.
Implements memory-efficient, parallelized versions of common MDTraj operations.
- class mdxplain.trajectory.helper.dask_trajectory_helper.parallel_operations_helper.ParallelOperationsHelper(zarr_path: str, topology: Topology, n_workers: int | None = None, chunk_size: int = 1000, cache_dir: str = './cache')
Parallel implementations of MDTraj operations using Dask arrays.
All operations respect memory constraints and process data in chunks.
- __init__(zarr_path: str, topology: Topology, n_workers: int | None = None, chunk_size: int = 1000, cache_dir: str = './cache')
Initialize parallel operations.
Parameters
- zarr_pathstr
Path to Zarr store containing trajectory data
- topologymd.Topology
MDTraj topology
- n_workersint, optional
Number of parallel workers (defaults to CPU count)
- chunk_sizeint, default=1000
Number of frames per chunk for memory management
- cache_dirstr, default=’./cache’
Directory for temporary files during operations
Returns
- None
Initializes parallel operations
- cleanup() None
Release in-memory and zarr-store references held by this helper.
Returns
- None
Clears Dask array references and closes store handles when possible.
- center_coordinates(result_path: str, mass_weighted: bool = False) Group
Center coordinates at origin using real MDTraj method chunkwise.
Parameters
- result_pathstr
Path to store result Zarr
- mass_weightedbool, default=False
Use mass-weighted centering
Returns
- zarr.Group
New Zarr store with centered coordinates
Examples
>>> parallel_ops = ParallelOperationsHelper('trajectory.zarr', topology, chunk_size=500) >>> centered_store = parallel_ops.center_coordinates('centered.zarr', mass_weighted=True) >>> print(f"Centered trajectory stored at {centered_store.path}")
- superpose(result_path: str, reference_traj: Trajectory, atom_indices: ndarray | None = None, ref_atom_indices: ndarray | None = None) Group
Superpose trajectory to reference trajectory using real MDTraj method chunkwise.
Parameters
- result_pathstr
Path to store result Zarr
- reference_trajmd.Trajectory
Reference trajectory (single frame) to align to
- atom_indicesnp.ndarray, optional
Atoms to use for alignment on this trajectory
- ref_atom_indicesnp.ndarray, optional
Atoms to use for alignment on the reference trajectory
Returns
- zarr.Group
New Zarr store with superposed coordinates
Raises
- ValueError
If reference trajectory has wrong number of atoms or frames
- OSError
If cache directory is not writable
Examples
>>> parallel_ops = ParallelOperationsHelper('trajectory.zarr', topology) >>> # Create reference trajectory (single frame) >>> ref_traj = md.load_frame('reference.pdb', 0) >>> aligned_store = parallel_ops.superpose('aligned.zarr', ref_traj) >>> # Superpose using only backbone atoms >>> backbone_indices = topology.select('backbone') >>> aligned_store = parallel_ops.superpose('aligned.zarr', ref_traj, backbone_indices)
- smooth(result_path: str, width: int, order: int | None = None, atom_indices: ndarray | None = None) Group
Apply smoothing filter using real MDTraj method with atom-wise chunking.
Processes atoms in chunks to reduce memory usage while applying smooth across all frames for each atom (as per MDTraj’s algorithm).
Parameters
- result_pathstr
Path to store result Zarr
- widthint
Smoothing window width
- orderint, optional
Polynomial order for Savitzky-Golay filter
- atom_indicesnp.ndarray, optional
Atoms to smooth (default: all)
Returns
- zarr.Group
New Zarr store with smoothed coordinates
Raises
- ValueError
If width is invalid for smoothing algorithm
- OSError
If cache directory is not writable
Examples
>>> parallel_ops = ParallelOperationsHelper('trajectory.zarr', topology) >>> # Apply smoothing with width 5 to all atoms >>> smoothed_store = parallel_ops.smooth('smooth.zarr', width=5) >>> # Apply smoothing only to protein atoms >>> protein_indices = topology.select('protein') >>> smoothed_store = parallel_ops.smooth('smooth.zarr', 5, atom_indices=protein_indices)
- atom_slice(result_path: str, atom_indices: ndarray) Group
Create atom slice using real MDTraj method chunkwise.
Parameters
- result_pathstr
Path to store result Zarr
- atom_indicesnp.ndarray
Indices of atoms to keep
Returns
- zarr.Group
New Zarr store with selected atoms
Raises
- IndexError
If atom_indices contains invalid atom indices
- OSError
If cache directory is not writable
Examples
>>> parallel_ops = ParallelOperationsHelper('trajectory.zarr', topology) >>> # Select only first 100 atoms >>> atom_indices = np.arange(100) >>> sliced_store = parallel_ops.atom_slice('slice.zarr', atom_indices) >>> # Select only CA atoms >>> ca_indices = topology.select('name CA') >>> ca_store = parallel_ops.atom_slice('slice.zarr', ca_indices)
- image_molecules(result_path: str, anchor_molecules: ndarray | None = None, other_molecules: ndarray | None = None, sorted_bonds: ndarray | None = None, make_whole: bool = True) Group
Apply periodic boundary condition imaging to molecules.
Recenters molecules and applies PBC using MDTraj’s image_molecules method. Processes trajectory in chunks for memory efficiency.
Parameters
- result_pathstr
Path to store result Zarr
- anchor_moleculesnp.ndarray, optional
Indices of molecules to anchor at the origin
- other_moleculesnp.ndarray, optional
Indices of other molecules to image relative to anchors
- sorted_bondsnp.ndarray, optional
Pre-sorted bond array for performance optimization
- make_wholebool, default=True
Make molecules whole across PBC before imaging
Returns
- zarr.Group
New Zarr store with imaged coordinates
Raises
- ValueError
If anchor_molecules or other_molecules contain invalid indices
- OSError
If cache directory is not writable
Examples
>>> helper = ParallelOperationsHelper('trajectory.zarr', topology) >>> # Apply default imaging (all molecules) >>> imaged_store = helper.image_molecules('imaged.zarr') >>> # Image with specific anchor molecules >>> protein_molecules = np.array([0, 1, 2]) >>> imaged_store = helper.image_molecules('imaged.zarr', anchor_molecules=protein_molecules)
- remove_solvent(result_path: str, exclude: list | None = None) Group
Remove solvent atoms from trajectory.
Creates new trajectory without solvent atoms using MDTraj’s remove_solvent method. Processes trajectory in chunks for memory efficiency.
Parameters
- result_pathstr
Path to store result Zarr
- excludelist, optional
List of solvent residue names to KEEP (not remove). If None, removes all recognized solvent molecules.
Returns
- zarr.Group
New Zarr store with non-solvent atoms only
Raises
- ValueError
If exclude contains invalid residue names
- OSError
If cache directory is not writable
Examples
>>> helper = ParallelOperationsHelper('trajectory.zarr', topology) >>> # Remove all solvent >>> no_solvent_store = helper.remove_solvent('no_solvent.zarr') >>> # Keep water but remove other solvent >>> keep_water_store = helper.remove_solvent('keep_water.zarr', exclude=['HOH', 'WAT'])