Parallel Operations Helper

GitHub Link to Code.

Parallel operations for MDTraj methods using Dask arrays.

Implements memory-efficient, parallelized versions of common MDTraj operations.

class mdxplain.trajectory.helper.dask_trajectory_helper.parallel_operations_helper.ParallelOperationsHelper(zarr_path: str, topology: Topology, n_workers: int | None = None, chunk_size: int = 1000, cache_dir: str = './cache')

Parallel implementations of MDTraj operations using Dask arrays.

All operations respect memory constraints and process data in chunks.

__init__(zarr_path: str, topology: Topology, n_workers: int | None = None, chunk_size: int = 1000, cache_dir: str = './cache')

Initialize parallel operations.

Parameters

zarr_pathstr: Path to Zarr store containing trajectory data
topologymd.Topology: MDTraj topology
n_workersint, optional: Number of parallel workers (defaults to CPU count)
chunk_sizeint, default=1000: Number of frames per chunk for memory management
cache_dirstr, default=’./cache’: Directory for temporary files during operations

Returns

None: Initializes parallel operations

cleanup() → None

Release in-memory and zarr-store references held by this helper.

Returns

None: Clears Dask array references and closes store handles when possible.

center_coordinates(result_path: str, mass_weighted: bool = False) → Group

Center coordinates at origin using real MDTraj method chunkwise.

Parameters

result_pathstr: Path to store result Zarr
mass_weightedbool, default=False: Use mass-weighted centering

Returns

zarr.Group: New Zarr store with centered coordinates

Examples

>>> parallel_ops = ParallelOperationsHelper('trajectory.zarr', topology, chunk_size=500)
>>> centered_store = parallel_ops.center_coordinates('centered.zarr', mass_weighted=True)
>>> print(f"Centered trajectory stored at {centered_store.path}")

superpose(result_path: str, reference_traj: Trajectory, atom_indices: ndarray | None = None, ref_atom_indices: ndarray | None = None) → Group

Superpose trajectory to reference trajectory using real MDTraj method chunkwise.

Parameters

result_pathstr: Path to store result Zarr
reference_trajmd.Trajectory: Reference trajectory (single frame) to align to
atom_indicesnp.ndarray, optional: Atoms to use for alignment on this trajectory
ref_atom_indicesnp.ndarray, optional: Atoms to use for alignment on the reference trajectory

Returns

zarr.Group: New Zarr store with superposed coordinates

Raises

ValueError: If reference trajectory has wrong number of atoms or frames
OSError: If cache directory is not writable

Examples

>>> parallel_ops = ParallelOperationsHelper('trajectory.zarr', topology)
>>> # Create reference trajectory (single frame)
>>> ref_traj = md.load_frame('reference.pdb', 0)
>>> aligned_store = parallel_ops.superpose('aligned.zarr', ref_traj)
>>> # Superpose using only backbone atoms
>>> backbone_indices = topology.select('backbone')
>>> aligned_store = parallel_ops.superpose('aligned.zarr', ref_traj, backbone_indices)

smooth(result_path: str, width: int, order: int | None = None, atom_indices: ndarray | None = None) → Group

Apply smoothing filter using real MDTraj method with atom-wise chunking.

Processes atoms in chunks to reduce memory usage while applying smooth across all frames for each atom (as per MDTraj’s algorithm).

Parameters

result_pathstr: Path to store result Zarr
widthint: Smoothing window width
orderint, optional: Polynomial order for Savitzky-Golay filter
atom_indicesnp.ndarray, optional: Atoms to smooth (default: all)

Returns

zarr.Group: New Zarr store with smoothed coordinates

Raises

ValueError: If width is invalid for smoothing algorithm
OSError: If cache directory is not writable

Examples

>>> parallel_ops = ParallelOperationsHelper('trajectory.zarr', topology)
>>> # Apply smoothing with width 5 to all atoms
>>> smoothed_store = parallel_ops.smooth('smooth.zarr', width=5)
>>> # Apply smoothing only to protein atoms
>>> protein_indices = topology.select('protein')
>>> smoothed_store = parallel_ops.smooth('smooth.zarr', 5, atom_indices=protein_indices)

atom_slice(result_path: str, atom_indices: ndarray) → Group

Create atom slice using real MDTraj method chunkwise.

Parameters

result_pathstr: Path to store result Zarr
atom_indicesnp.ndarray: Indices of atoms to keep

Returns

zarr.Group: New Zarr store with selected atoms

Raises

IndexError: If atom_indices contains invalid atom indices
OSError: If cache directory is not writable

Examples

>>> parallel_ops = ParallelOperationsHelper('trajectory.zarr', topology)
>>> # Select only first 100 atoms
>>> atom_indices = np.arange(100)
>>> sliced_store = parallel_ops.atom_slice('slice.zarr', atom_indices)
>>> # Select only CA atoms
>>> ca_indices = topology.select('name CA')
>>> ca_store = parallel_ops.atom_slice('slice.zarr', ca_indices)

image_molecules(result_path: str, anchor_molecules: ndarray | None = None, other_molecules: ndarray | None = None, sorted_bonds: ndarray | None = None, make_whole: bool = True) → Group

Apply periodic boundary condition imaging to molecules.

Recenters molecules and applies PBC using MDTraj’s image_molecules method. Processes trajectory in chunks for memory efficiency.

Parameters

result_pathstr: Path to store result Zarr
anchor_moleculesnp.ndarray, optional: Indices of molecules to anchor at the origin
other_moleculesnp.ndarray, optional: Indices of other molecules to image relative to anchors
sorted_bondsnp.ndarray, optional: Pre-sorted bond array for performance optimization
make_wholebool, default=True: Make molecules whole across PBC before imaging

Returns

zarr.Group: New Zarr store with imaged coordinates

Raises

ValueError: If anchor_molecules or other_molecules contain invalid indices
OSError: If cache directory is not writable

Examples

>>> helper = ParallelOperationsHelper('trajectory.zarr', topology)
>>> # Apply default imaging (all molecules)
>>> imaged_store = helper.image_molecules('imaged.zarr')
>>> # Image with specific anchor molecules
>>> protein_molecules = np.array([0, 1, 2])
>>> imaged_store = helper.image_molecules('imaged.zarr', anchor_molecules=protein_molecules)

remove_solvent(result_path: str, exclude: list | None = None) → Group

Remove solvent atoms from trajectory.

Creates new trajectory without solvent atoms using MDTraj’s remove_solvent method. Processes trajectory in chunks for memory efficiency.

Parameters

result_pathstr: Path to store result Zarr
excludelist, optional: List of solvent residue names to KEEP (not remove). If None, removes all recognized solvent molecules.

Returns

zarr.Group: New Zarr store with non-solvent atoms only

Raises

ValueError: If exclude contains invalid residue names
OSError: If cache directory is not writable

Examples

>>> helper = ParallelOperationsHelper('trajectory.zarr', topology)
>>> # Remove all solvent
>>> no_solvent_store = helper.remove_solvent('no_solvent.zarr')
>>> # Keep water but remove other solvent
>>> keep_water_store = helper.remove_solvent('keep_water.zarr', exclude=['HOH', 'WAT'])