Data Selector Data
GitHub Link to Code.
Data selector data entity for storing frame selection configurations.
This module contains the DataSelectorData class that stores frame selection configurations including selected frame indices and selection criteria. This is the row-selection counterpart to FeatureSelectorData (which selects columns).
- class mdxplain.data_selector.entities.data_selector_data.DataSelectorData(name: str)
Data entity for storing trajectory-specific data selector configurations.
Stores frame indices per trajectory and selection criteria for trajectory frame selection. This entity serves as the counterpart to FeatureSelectorData, focusing on row selection instead of column selection.
The DataSelectorData works with the 2-level dictionary structure where each trajectory has its own frame indices, supporting the new trajectory- specific architecture.
Attributes
- namestr
Name identifier for this data selector configuration
- trajectory_framesDict[int, List[int]]
Dictionary mapping trajectory indices to lists of selected frame indices
- selection_criteriaDict[str, Any]
Dictionary containing selection criteria and operations history
Examples
>>> selector_data = DataSelectorData("folded_frames") >>> selector_data.set_trajectory_frames({0: [0, 5, 12], 1: [18, 25]}) >>> print(f"Selected {selector_data.n_selected_frames} frames") 5
>>> # With selection criteria >>> criteria = {"type": "cluster", "clustering": "conformations", "cluster_ids": [0]} >>> selector_data.append_selection_criteria(criteria) >>> print(selector_data.get_selection_info())
- __init__(name: str) None
Initialize data selector data with given name.
Parameters
- namestr
Name identifier for this data selector configuration
Returns
- None
Initializes empty DataSelectorData with given name
Examples
>>> selector_data = DataSelectorData("active_site_frames") >>> print(selector_data.name) 'active_site_frames' >>> print(len(selector_data.trajectory_frames)) 0
- property n_selected_frames: int
Get total number of selected frames across all trajectories.
Returns
- int
Total number of frames in the selection
Examples
>>> selector_data.set_trajectory_frames({0: [1, 5, 10], 1: [2, 7]}) >>> print(selector_data.n_selected_frames) 5
- set_trajectory_frames(trajectory_frames: Dict[int, List[int]]) None
Set the selected frame indices per trajectory.
Parameters
- trajectory_framesDict[int, List[int]]
Dictionary mapping trajectory indices to frame indices
Returns
- None
Updates the trajectory_frames attribute
Examples
>>> selector_data = DataSelectorData("test") >>> selector_data.set_trajectory_frames({0: [0, 10, 20], 1: [5, 15]}) >>> print(selector_data.trajectory_frames) {0: [0, 10, 20], 1: [5, 15]}
- get_trajectory_frames() Dict[int, List[int]]
Get the selected frame indices per trajectory.
Returns
- Dict[int, List[int]]
Dictionary mapping trajectory indices to frame indices
Examples
>>> traj_frames = selector_data.get_trajectory_frames() >>> print(f"Trajectory 0 frames: {traj_frames.get(0, [])}")
- append_selection_criteria(criteria: Dict[str, Any]) None
Add selection criteria to chronological operations list.
Creates operations list on first call, appends to existing list on subsequent calls. Maintains chronological order of operations for full reproducibility of frame selection process.
Parameters
- criteriaDict[str, Any]
Dictionary containing selection criteria for this operation
Returns
- None
Updates the selection_criteria attribute with operations list
Examples
>>> # First operation - creates operations list >>> selector_data.append_selection_criteria({ ... "type": "cluster", "clustering": "conformations", ... "cluster_ids": [0], "mode": "add" ... }) >>> # Second operation - appends to list >>> selector_data.append_selection_criteria({ ... "type": "tags", "tags": ["system_A"], "mode": "intersect" ... }) >>> # Result: {"operations": [operation1, operation2]}
- get_selection_criteria() Dict[str, Any]
Get the selection criteria.
Returns
- Dict[str, Any]
Dictionary containing selection criteria
Examples
>>> criteria = selector_data.get_selection_criteria() >>> print(f"Selection type: {criteria.get('type', 'unknown')}")
- add_trajectory_frames(trajectory_frames: Dict[int, List[int]]) None
Add additional frame indices to existing trajectories.
Parameters
- trajectory_framesDict[int, List[int]]
Dictionary mapping trajectory indices to additional frame indices
Returns
- None
Extends existing trajectory frame lists with new indices
Examples
>>> selector_data.set_trajectory_frames({0: [0, 5, 10]}) >>> selector_data.add_trajectory_frames({0: [15, 20], 1: [2, 7]}) >>> print(selector_data.trajectory_frames) {0: [0, 5, 10, 15, 20], 1: [2, 7]}
- remove_duplicates() None
Remove duplicate frame indices and sort the lists for each trajectory.
Returns
- None
Removes duplicates from trajectory frame lists and sorts them
Examples
>>> selector_data.trajectory_frames = {0: [5, 1, 5, 3], 1: [8, 1, 8]} >>> selector_data.remove_duplicates() >>> print(selector_data.trajectory_frames) {0: [1, 3, 5], 1: [1, 8]}
- clear_selection() None
Clear all selected frame indices and criteria.
Returns
- None
Resets trajectory_frames and selection_criteria to empty
Examples
>>> selector_data.clear_selection() >>> print(selector_data.n_selected_frames) 0 >>> print(selector_data.selection_criteria) {}
- get_selection_info() Dict[str, Any]
Get summary information about this selection.
Returns
- Dict[str, Any]
Dictionary with selection summary information
Examples
>>> info = selector_data.get_selection_info() >>> print(f"Name: {info['name']}") >>> print(f"Frames: {info['n_frames']}") >>> print(f"Type: {info['selection_type']}")
- is_empty() bool
Check if the selection is empty.
Returns
- bool
True if no frames are selected, False otherwise
Examples
>>> empty_selector = DataSelectorData("empty") >>> print(empty_selector.is_empty()) True
- save(save_path: str) None
Save DataSelectorData object to disk.
Parameters
- save_pathstr
Path where to save the DataSelectorData object
Returns
- None
Saves the DataSelectorData object to the specified path
Examples
>>> data_selector_data.save('analysis_results/folded_frames.pkl')
- load(load_path: str) None
Load DataSelectorData object from disk.
Parameters
- load_pathstr
Path to the saved DataSelectorData file
Returns
- None
Loads the DataSelectorData object from the specified path
Examples
>>> data_selector_data.load('analysis_results/folded_frames.pkl')
- print_info() None
Print comprehensive data selector information.
Parameters
None
Returns
- None
Prints data selector information to console
Examples
>>> data_selector_data.print_info() === DataSelectorData === Name: folded_frames Selected Frames: 250 frames from 3 trajectories Selection Type: cluster Frame Distribution: traj0:0-45 (15), traj1:10-89 (20), traj2:5-95 (15)