Frame Selection Helper
GitHub Link to Code.
Frame selection helper for data selector operations.
This module provides helper methods for selecting frames based on tags and cluster assignments, extracting common logic from DataSelectorManager methods.
- class mdxplain.data_selector.helper.frame_selection_helper.FrameSelectionHelper
Helper class for frame selection operations.
Provides static methods for selecting trajectory frames based on various criteria such as tags and cluster assignments. These methods extract common logic from DataSelectorManager to improve code organization and reusability.
Examples
>>> # Select frames by tags >>> indices = FrameSelectionHelper.select_frames_by_tags( ... trajectory_data, ["system_A", "biased"], match_all=True ... )
>>> # Select frames by cluster >>> indices = FrameSelectionHelper.select_frames_by_cluster( ... labels, [0, 1, 2] ... )
- static select_frames_by_tags(trajectory_data: TrajectoryData, tags: List[str], match_all: bool, stride: int = 1) Dict[int, List[int]]
Select frames from trajectories that match tag criteria.
Returns all frames from trajectories whose tags match the criteria, optionally applying stride for sparse sampling.
Parameters
- trajectory_dataTrajectoryData
Trajectory data object containing trajectory tags
- tagsList[str]
List of tags to search for in trajectory tags
- match_allbool
If True, trajectory must have ALL tags. If False, ANY tag matches.
- strideint, default=1
Minimum distance between consecutive frames (per trajectory). stride=1 returns all frames, stride=10 returns every 10th frame.
Returns
- Dict[int, List[int]]
Dictionary mapping trajectory indices to their frame indices
Examples
>>> # Select every 10th frame from matching trajectories >>> frames = select_frames_by_tags( ... trajectory_data, ["system_A"], match_all=False, stride=10 ... )
- static select_frames_by_cluster(labels: List[int], cluster_ids: List[int], frame_mapping: Dict[int, int] | None = None, stride: int = 1) Dict[int, List[int]]
Select frames based on cluster assignments.
Requires frame_mapping for trajectory-specific selection. Optionally applies stride for sparse sampling per trajectory.
Parameters
- labelsList[int]
List of cluster labels for each frame
- cluster_idsList[int]
List of cluster IDs to select frames from
- frame_mappingDict[int, tuple], optional
Mapping from global frame index to (traj_idx, local_frame_idx)
- strideint, default=1
Minimum distance between consecutive frames (per trajectory). Applied after cluster selection to maintain cluster representation.
Returns
- Dict[int, List[int]]
Dictionary mapping trajectory indices to their selected frame indices
Examples
>>> # Select every 5th frame from clusters (per trajectory) >>> frames = select_frames_by_cluster( ... labels, [0, 1], frame_mapping, stride=5 ... )
- static resolve_cluster_ids(cluster_data: ClusterData, cluster_ids: List[int | str], clustering_name: str) List[int]
Convert cluster names to numeric IDs if necessary.
Processes a list of cluster identifiers, converting string cluster names to their corresponding numeric IDs using the cluster’s name mappings. Numeric IDs are passed through unchanged.
Parameters
- cluster_dataClusterData
Cluster data object containing labels and optional cluster names
- cluster_idsList[Union[int, str]]
List of cluster identifiers (numeric IDs or string names)
- clustering_namestr
Name of the clustering (used for error messages)
Returns
- List[int]
List of numeric cluster IDs corresponding to input identifiers
Examples
>>> # Convert mixed IDs and names >>> resolved = FrameSelectionHelper.resolve_cluster_ids( ... cluster_data, [0, "folded", 2], "conformations" ... )
- static select_frames_by_indices(input_data: str | dict | List[int], trajectory_data: TrajectoryData) Dict[int, List[int]]
Parse trajectory and frame selections from various input formats.
Structure:
input_data : Dict[traj_selection, frame_selection]
traj_selection:
int: trajectory index (0, 1, 2…)
str: trajectory name (“system_A”), tag (”tag:biased”), pattern (“system_*”)
Can resolve to multiple trajectories (e.g., tags apply frames to all matching)
frame_selection:
int: single frame (42)
List[int]: explicit frames ([10, 20, 30])
str: various formats:
Single: “42”
Range: “10-20” → [10, 11, …, 20]
Comma list: “10,20,30” → [10, 20, 30]
Combined: “10-20,30-40,50” → [10…20, 30…40, 50]
All: “all” → all frames in trajectory
dict: with stride support:
{“frames”: frame_selection, “stride”: N}
stride = minimum distance between consecutive frames
Example: {“frames”: “0-100”, “stride”: 10} → [0, 10, 20, …, 100]
Parameters
- input_datadict
Dictionary with trajectory keys and frame specifications
- trajectory_dataTrajectoryData
Trajectory data object for validation and resolution
Returns
- Dict[int, List[int]]
Dictionary mapping trajectory indices to frame indices
Examples
>>> # Combined ranges >>> frames = select_frames_by_indices({0: "10-20,30-40,50"}, trajectory_data)
>>> # All frames >>> frames = select_frames_by_indices({"tag:biased": "all"}, trajectory_data)
>>> # With stride >>> frames = select_frames_by_indices({ ... 0: {"frames": "0-1000", "stride": 50} ... }, trajectory_data)
>>> # Complex example >>> frames = select_frames_by_indices({ ... "system_A": {"frames": "10-20,100-200", "stride": 5}, ... "tag:biased": "all", ... 1: [42, 84, 126] ... }, trajectory_data)
- static validate_selector_exists(pipeline_data: PipelineData, name: str) None
Validate that a data selector with given name exists.
Parameters
- pipeline_dataPipelineData
Pipeline data object containing data selectors
- namestr
Name of the data selector to validate
Returns
- None
Method returns nothing, raises ValueError if selector not found
- static validate_trajectories_loaded(pipeline_data: PipelineData) None
Validate that trajectory data is available for frame selection.
Parameters
- pipeline_dataPipelineData
Pipeline data object to check for trajectory data
Returns
- None
Method returns nothing, raises ValueError if no trajectories loaded
- static validate_clustering_exists(pipeline_data: PipelineData, clustering_name: str) None
Validate that a clustering result with given name exists.
Parameters
- pipeline_dataPipelineData
Pipeline data object containing cluster data
- clustering_namestr
Name of the clustering result to validate
Returns
- None
Method returns nothing, raises ValueError if clustering not found