Centroid Helper

GitHub Link to Code.

Centroid calculation helper for trajectory analysis.

This module provides utilities for computing centroids (mean frames) and finding frames closest to centroids. Used for representative frame selection in clustering and other analyses.

class mdxplain.pipeline.helper.centroid_helper.CentroidHelper

Helper class for centroid calculations.

Provides method to find frames closest to the centroid (mean) using memory-efficient chunked processing for large datasets.

Examples

>>> # Find centroid frame with fast numpy
>>> best_idx = CentroidHelper.find_centroid(
...     selected_data, use_memmap=False
... )
>>> # Find centroid frame with memmap-safe chunked processing
>>> best_idx = CentroidHelper.find_centroid(
...     selected_data, use_memmap=True, chunk_size=1000
... )
static find_centroid(selected_data: ndarray, use_memmap: bool = False, chunk_size: int = 1000) int

Find frame closest to centroid (mean).

Computes the centroid (mean) of all frames and finds the frame that minimizes Euclidean distance to it. Uses fast numpy operations for small datasets or chunked processing for large memmap datasets.

Parameters

selected_datanp.ndarray

Data array with shape (n_frames, n_features)

use_memmapbool, default=False

Whether to use chunked processing for memmap

chunk_sizeint, default=1000

Number of frames to process per chunk when use_memmap=True

Returns

int

Local index of centroid frame

Examples

>>> # Fast mode for standard numpy arrays
>>> data = np.random.rand(1000, 100)
>>> idx = CentroidHelper.find_centroid(data, use_memmap=False)
>>> print(idx)  # Index between 0 and 999
>>> # Memmap mode for large datasets
>>> idx = CentroidHelper.find_centroid(
...     large_memmap_data, use_memmap=True, chunk_size=500
... )