Frame Selection Helper

GitHub Link to Code.

Frame selection helper for data selector operations.

This module provides helper methods for selecting frames based on tags and cluster assignments, extracting common logic from DataSelectorManager methods.

class mdxplain.data_selector.helper.frame_selection_helper.FrameSelectionHelper

Helper class for frame selection operations.

Provides static methods for selecting trajectory frames based on various criteria such as tags and cluster assignments. These methods extract common logic from DataSelectorManager to improve code organization and reusability.

Examples

>>> # Select frames by tags
>>> indices = FrameSelectionHelper.select_frames_by_tags(
...     trajectory_data, ["system_A", "biased"], match_all=True
... )
>>> # Select frames by cluster
>>> indices = FrameSelectionHelper.select_frames_by_cluster(
...     labels, [0, 1, 2]
... )
static select_frames_by_tags(trajectory_data: TrajectoryData, tags: List[str], match_all: bool, stride: int = 1) Dict[int, List[int]]

Select frames from trajectories that match tag criteria.

Returns all frames from trajectories whose tags match the criteria, optionally applying stride for sparse sampling.

Parameters

trajectory_dataTrajectoryData

Trajectory data object containing trajectory tags

tagsList[str]

List of tags to search for in trajectory tags

match_allbool

If True, trajectory must have ALL tags. If False, ANY tag matches.

strideint, default=1

Minimum distance between consecutive frames (per trajectory). stride=1 returns all frames, stride=10 returns every 10th frame.

Returns

Dict[int, List[int]]

Dictionary mapping trajectory indices to their frame indices

Examples

>>> # Select every 10th frame from matching trajectories
>>> frames = select_frames_by_tags(
...     trajectory_data, ["system_A"], match_all=False, stride=10
... )
static select_frames_by_cluster(labels: List[int], cluster_ids: List[int], frame_mapping: Dict[int, int] | None = None, stride: int = 1) Dict[int, List[int]]

Select frames based on cluster assignments.

Requires frame_mapping for trajectory-specific selection. Optionally applies stride for sparse sampling per trajectory.

Parameters

labelsList[int]

List of cluster labels for each frame

cluster_idsList[int]

List of cluster IDs to select frames from

frame_mappingDict[int, tuple], optional

Mapping from global frame index to (traj_idx, local_frame_idx)

strideint, default=1

Minimum distance between consecutive frames (per trajectory). Applied after cluster selection to maintain cluster representation.

Returns

Dict[int, List[int]]

Dictionary mapping trajectory indices to their selected frame indices

Examples

>>> # Select every 5th frame from clusters (per trajectory)
>>> frames = select_frames_by_cluster(
...     labels, [0, 1], frame_mapping, stride=5
... )
static resolve_cluster_ids(cluster_data: ClusterData, cluster_ids: List[int | str], clustering_name: str) List[int]

Convert cluster names to numeric IDs if necessary.

Processes a list of cluster identifiers, converting string cluster names to their corresponding numeric IDs using the cluster’s name mappings. Numeric IDs are passed through unchanged.

Parameters

cluster_dataClusterData

Cluster data object containing labels and optional cluster names

cluster_idsList[Union[int, str]]

List of cluster identifiers (numeric IDs or string names)

clustering_namestr

Name of the clustering (used for error messages)

Returns

List[int]

List of numeric cluster IDs corresponding to input identifiers

Examples

>>> # Convert mixed IDs and names
>>> resolved = FrameSelectionHelper.resolve_cluster_ids(
...     cluster_data, [0, "folded", 2], "conformations"
... )
static select_frames_by_indices(input_data: str | dict | List[int], trajectory_data: TrajectoryData) Dict[int, List[int]]

Parse trajectory and frame selections from various input formats.

Structure:

input_data : Dict[traj_selection, frame_selection]

traj_selection:

  • int: trajectory index (0, 1, 2…)

  • str: trajectory name (“system_A”), tag (”tag:biased”), pattern (“system_*”)

  • Can resolve to multiple trajectories (e.g., tags apply frames to all matching)

frame_selection:

  • int: single frame (42)

  • List[int]: explicit frames ([10, 20, 30])

  • str: various formats:

    • Single: “42”

    • Range: “10-20” → [10, 11, …, 20]

    • Comma list: “10,20,30” → [10, 20, 30]

    • Combined: “10-20,30-40,50” → [10…20, 30…40, 50]

    • All: “all” → all frames in trajectory

  • dict: with stride support:

    • {“frames”: frame_selection, “stride”: N}

    • stride = minimum distance between consecutive frames

    • Example: {“frames”: “0-100”, “stride”: 10} → [0, 10, 20, …, 100]

Parameters

input_datadict

Dictionary with trajectory keys and frame specifications

trajectory_dataTrajectoryData

Trajectory data object for validation and resolution

Returns

Dict[int, List[int]]

Dictionary mapping trajectory indices to frame indices

Examples

>>> # Combined ranges
>>> frames = select_frames_by_indices({0: "10-20,30-40,50"}, trajectory_data)
>>> # All frames
>>> frames = select_frames_by_indices({"tag:biased": "all"}, trajectory_data)
>>> # With stride
>>> frames = select_frames_by_indices({
...     0: {"frames": "0-1000", "stride": 50}
... }, trajectory_data)
>>> # Complex example
>>> frames = select_frames_by_indices({
...     "system_A": {"frames": "10-20,100-200", "stride": 5},
...     "tag:biased": "all",
...     1: [42, 84, 126]
... }, trajectory_data)
static validate_selector_exists(pipeline_data: PipelineData, name: str) None

Validate that a data selector with given name exists.

Parameters

pipeline_dataPipelineData

Pipeline data object containing data selectors

namestr

Name of the data selector to validate

Returns

None

Method returns nothing, raises ValueError if selector not found

static validate_trajectories_loaded(pipeline_data: PipelineData) None

Validate that trajectory data is available for frame selection.

Parameters

pipeline_dataPipelineData

Pipeline data object to check for trajectory data

Returns

None

Method returns nothing, raises ValueError if no trajectories loaded

static validate_clustering_exists(pipeline_data: PipelineData, clustering_name: str) None

Validate that a clustering result with given name exists.

Parameters

pipeline_dataPipelineData

Pipeline data object containing cluster data

clustering_namestr

Name of the clustering result to validate

Returns

None

Method returns nothing, raises ValueError if clustering not found