Frame Selection Helper

GitHub Link to Code.

Frame selection helper for data selector operations.

This module provides helper methods for selecting frames based on tags and cluster assignments, extracting common logic from DataSelectorManager methods.

class mdxplain.data_selector.helper.frame_selection_helper.FrameSelectionHelper

Helper class for frame selection operations.

Provides static methods for selecting trajectory frames based on various criteria such as tags and cluster assignments. These methods extract common logic from DataSelectorManager to improve code organization and reusability.

Examples

>>> # Select frames by tags
>>> indices = FrameSelectionHelper.select_frames_by_tags(
...     trajectory_data, ["system_A", "biased"], match_all=True
... )

>>> # Select frames by cluster
>>> indices = FrameSelectionHelper.select_frames_by_cluster(
...     labels, [0, 1, 2]
... )

static select_frames_by_tags(trajectory_data: TrajectoryData, tags: List[str], match_all: bool, stride: int = 1) → Dict[int, List[int]]

Select frames from trajectories that match tag criteria.

Returns all frames from trajectories whose tags match the criteria, optionally applying stride for sparse sampling.

Parameters

trajectory_dataTrajectoryData: Trajectory data object containing trajectory tags
tagsList[str]: List of tags to search for in trajectory tags
match_allbool: If True, trajectory must have ALL tags. If False, ANY tag matches.
strideint, default=1: Minimum distance between consecutive frames (per trajectory). stride=1 returns all frames, stride=10 returns every 10th frame.

Returns

Dict[int, List[int]]: Dictionary mapping trajectory indices to their frame indices

Examples

>>> # Select every 10th frame from matching trajectories
>>> frames = select_frames_by_tags(
...     trajectory_data, ["system_A"], match_all=False, stride=10
... )

static select_frames_by_cluster(labels: List[int], cluster_ids: List[int], frame_mapping: Dict[int, int] | None = None, stride: int = 1) → Dict[int, List[int]]

Select frames based on cluster assignments.

Requires frame_mapping for trajectory-specific selection. Optionally applies stride for sparse sampling per trajectory.

Parameters

labelsList[int]: List of cluster labels for each frame
cluster_idsList[int]: List of cluster IDs to select frames from
frame_mappingDict[int, tuple], optional: Mapping from global frame index to (traj_idx, local_frame_idx)
strideint, default=1: Minimum distance between consecutive frames (per trajectory). Applied after cluster selection to maintain cluster representation.

Returns

Dict[int, List[int]]: Dictionary mapping trajectory indices to their selected frame indices

Examples

>>> # Select every 5th frame from clusters (per trajectory)
>>> frames = select_frames_by_cluster(
...     labels, [0, 1], frame_mapping, stride=5
... )

static resolve_cluster_ids(cluster_data: ClusterData, cluster_ids: List[int | str], clustering_name: str) → List[int]

Convert cluster names to numeric IDs if necessary.

Processes a list of cluster identifiers, converting string cluster names to their corresponding numeric IDs using the cluster’s name mappings. Numeric IDs are passed through unchanged.

Parameters

cluster_dataClusterData: Cluster data object containing labels and optional cluster names
cluster_idsList[Union[int, str]]: List of cluster identifiers (numeric IDs or string names)
clustering_namestr: Name of the clustering (used for error messages)

Returns

List[int]: List of numeric cluster IDs corresponding to input identifiers

Examples

>>> # Convert mixed IDs and names
>>> resolved = FrameSelectionHelper.resolve_cluster_ids(
...     cluster_data, [0, "folded", 2], "conformations"
... )

static select_frames_by_indices(input_data: str | dict | List[int], trajectory_data: TrajectoryData) → Dict[int, List[int]]

Parse trajectory and frame selections from various input formats.

Structure:

input_data : Dict[traj_selection, frame_selection]

traj_selection:

int: trajectory index (0, 1, 2…)
str: trajectory name (“system_A”), tag (”tag:biased”), pattern (“system_*”)
Can resolve to multiple trajectories (e.g., tags apply frames to all matching)

frame_selection:

int: single frame (42)
List[int]: explicit frames ([10, 20, 30])
str: various formats:
- Single: “42”
- Range: “10-20” → [10, 11, …, 20]
- Comma list: “10,20,30” → [10, 20, 30]
- Combined: “10-20,30-40,50” → [10…20, 30…40, 50]
- All: “all” → all frames in trajectory
dict: with stride support:
- {“frames”: frame_selection, “stride”: N}
- stride = minimum distance between consecutive frames
- Example: {“frames”: “0-100”, “stride”: 10} → [0, 10, 20, …, 100]

Parameters

input_datadict: Dictionary with trajectory keys and frame specifications
trajectory_dataTrajectoryData: Trajectory data object for validation and resolution

Returns

Dict[int, List[int]]: Dictionary mapping trajectory indices to frame indices

Examples

>>> # Combined ranges
>>> frames = select_frames_by_indices({0: "10-20,30-40,50"}, trajectory_data)

>>> # All frames
>>> frames = select_frames_by_indices({"tag:biased": "all"}, trajectory_data)

>>> # With stride
>>> frames = select_frames_by_indices({
...     0: {"frames": "0-1000", "stride": 50}
... }, trajectory_data)

>>> # Complex example
>>> frames = select_frames_by_indices({
...     "system_A": {"frames": "10-20,100-200", "stride": 5},
...     "tag:biased": "all",
...     1: [42, 84, 126]
... }, trajectory_data)

static validate_selector_exists(pipeline_data: PipelineData, name: str) → None

Validate that a data selector with given name exists.

Parameters

pipeline_dataPipelineData: Pipeline data object containing data selectors
namestr: Name of the data selector to validate

Returns

None: Method returns nothing, raises ValueError if selector not found

static validate_trajectories_loaded(pipeline_data: PipelineData) → None

Validate that trajectory data is available for frame selection.

Parameters

pipeline_dataPipelineData: Pipeline data object to check for trajectory data

Returns

None: Method returns nothing, raises ValueError if no trajectories loaded

static validate_clustering_exists(pipeline_data: PipelineData, clustering_name: str) → None

Validate that a clustering result with given name exists.

Parameters

pipeline_dataPipelineData: Pipeline data object containing cluster data
clustering_namestr: Name of the clustering result to validate

Returns

None: Method returns nothing, raises ValueError if clustering not found