Parallel Operations Helper

GitHub Link to Code.

Parallel operations for MDTraj methods using Dask arrays.

Implements memory-efficient, parallelized versions of common MDTraj operations.

class mdxplain.trajectory.helper.dask_trajectory_helper.parallel_operations_helper.ParallelOperationsHelper(zarr_path: str, topology: Topology, n_workers: int | None = None, chunk_size: int = 1000, cache_dir: str = './cache')

Parallel implementations of MDTraj operations using Dask arrays.

All operations respect memory constraints and process data in chunks.

__init__(zarr_path: str, topology: Topology, n_workers: int | None = None, chunk_size: int = 1000, cache_dir: str = './cache')

Initialize parallel operations.

Parameters

zarr_pathstr

Path to Zarr store containing trajectory data

topologymd.Topology

MDTraj topology

n_workersint, optional

Number of parallel workers (defaults to CPU count)

chunk_sizeint, default=1000

Number of frames per chunk for memory management

cache_dirstr, default=’./cache’

Directory for temporary files during operations

Returns

None

Initializes parallel operations

cleanup() None

Release in-memory and zarr-store references held by this helper.

Returns

None

Clears Dask array references and closes store handles when possible.

center_coordinates(result_path: str, mass_weighted: bool = False) Group

Center coordinates at origin using real MDTraj method chunkwise.

Parameters

result_pathstr

Path to store result Zarr

mass_weightedbool, default=False

Use mass-weighted centering

Returns

zarr.Group

New Zarr store with centered coordinates

Examples

>>> parallel_ops = ParallelOperationsHelper('trajectory.zarr', topology, chunk_size=500)
>>> centered_store = parallel_ops.center_coordinates('centered.zarr', mass_weighted=True)
>>> print(f"Centered trajectory stored at {centered_store.path}")
superpose(result_path: str, reference_traj: Trajectory, atom_indices: ndarray | None = None, ref_atom_indices: ndarray | None = None) Group

Superpose trajectory to reference trajectory using real MDTraj method chunkwise.

Parameters

result_pathstr

Path to store result Zarr

reference_trajmd.Trajectory

Reference trajectory (single frame) to align to

atom_indicesnp.ndarray, optional

Atoms to use for alignment on this trajectory

ref_atom_indicesnp.ndarray, optional

Atoms to use for alignment on the reference trajectory

Returns

zarr.Group

New Zarr store with superposed coordinates

Raises

ValueError

If reference trajectory has wrong number of atoms or frames

OSError

If cache directory is not writable

Examples

>>> parallel_ops = ParallelOperationsHelper('trajectory.zarr', topology)
>>> # Create reference trajectory (single frame)
>>> ref_traj = md.load_frame('reference.pdb', 0)
>>> aligned_store = parallel_ops.superpose('aligned.zarr', ref_traj)
>>> # Superpose using only backbone atoms
>>> backbone_indices = topology.select('backbone')
>>> aligned_store = parallel_ops.superpose('aligned.zarr', ref_traj, backbone_indices)
smooth(result_path: str, width: int, order: int | None = None, atom_indices: ndarray | None = None) Group

Apply smoothing filter using real MDTraj method with atom-wise chunking.

Processes atoms in chunks to reduce memory usage while applying smooth across all frames for each atom (as per MDTraj’s algorithm).

Parameters

result_pathstr

Path to store result Zarr

widthint

Smoothing window width

orderint, optional

Polynomial order for Savitzky-Golay filter

atom_indicesnp.ndarray, optional

Atoms to smooth (default: all)

Returns

zarr.Group

New Zarr store with smoothed coordinates

Raises

ValueError

If width is invalid for smoothing algorithm

OSError

If cache directory is not writable

Examples

>>> parallel_ops = ParallelOperationsHelper('trajectory.zarr', topology)
>>> # Apply smoothing with width 5 to all atoms
>>> smoothed_store = parallel_ops.smooth('smooth.zarr', width=5)
>>> # Apply smoothing only to protein atoms
>>> protein_indices = topology.select('protein')
>>> smoothed_store = parallel_ops.smooth('smooth.zarr', 5, atom_indices=protein_indices)
atom_slice(result_path: str, atom_indices: ndarray) Group

Create atom slice using real MDTraj method chunkwise.

Parameters

result_pathstr

Path to store result Zarr

atom_indicesnp.ndarray

Indices of atoms to keep

Returns

zarr.Group

New Zarr store with selected atoms

Raises

IndexError

If atom_indices contains invalid atom indices

OSError

If cache directory is not writable

Examples

>>> parallel_ops = ParallelOperationsHelper('trajectory.zarr', topology)
>>> # Select only first 100 atoms
>>> atom_indices = np.arange(100)
>>> sliced_store = parallel_ops.atom_slice('slice.zarr', atom_indices)
>>> # Select only CA atoms
>>> ca_indices = topology.select('name CA')
>>> ca_store = parallel_ops.atom_slice('slice.zarr', ca_indices)
image_molecules(result_path: str, anchor_molecules: ndarray | None = None, other_molecules: ndarray | None = None, sorted_bonds: ndarray | None = None, make_whole: bool = True) Group

Apply periodic boundary condition imaging to molecules.

Recenters molecules and applies PBC using MDTraj’s image_molecules method. Processes trajectory in chunks for memory efficiency.

Parameters

result_pathstr

Path to store result Zarr

anchor_moleculesnp.ndarray, optional

Indices of molecules to anchor at the origin

other_moleculesnp.ndarray, optional

Indices of other molecules to image relative to anchors

sorted_bondsnp.ndarray, optional

Pre-sorted bond array for performance optimization

make_wholebool, default=True

Make molecules whole across PBC before imaging

Returns

zarr.Group

New Zarr store with imaged coordinates

Raises

ValueError

If anchor_molecules or other_molecules contain invalid indices

OSError

If cache directory is not writable

Examples

>>> helper = ParallelOperationsHelper('trajectory.zarr', topology)
>>> # Apply default imaging (all molecules)
>>> imaged_store = helper.image_molecules('imaged.zarr')
>>> # Image with specific anchor molecules
>>> protein_molecules = np.array([0, 1, 2])
>>> imaged_store = helper.image_molecules('imaged.zarr', anchor_molecules=protein_molecules)
remove_solvent(result_path: str, exclude: list | None = None) Group

Remove solvent atoms from trajectory.

Creates new trajectory without solvent atoms using MDTraj’s remove_solvent method. Processes trajectory in chunks for memory efficiency.

Parameters

result_pathstr

Path to store result Zarr

excludelist, optional

List of solvent residue names to KEEP (not remove). If None, removes all recognized solvent molecules.

Returns

zarr.Group

New Zarr store with non-solvent atoms only

Raises

ValueError

If exclude contains invalid residue names

OSError

If cache directory is not writable

Examples

>>> helper = ParallelOperationsHelper('trajectory.zarr', topology)
>>> # Remove all solvent
>>> no_solvent_store = helper.remove_solvent('no_solvent.zarr')
>>> # Keep water but remove other solvent
>>> keep_water_store = helper.remove_solvent('keep_water.zarr', exclude=['HOH', 'WAT'])