Data Selector Data

GitHub Link to Code.

Data selector data entity for storing frame selection configurations.

This module contains the DataSelectorData class that stores frame selection configurations including selected frame indices and selection criteria. This is the row-selection counterpart to FeatureSelectorData (which selects columns).

class mdxplain.data_selector.entities.data_selector_data.DataSelectorData(name: str)

Data entity for storing trajectory-specific data selector configurations.

Stores frame indices per trajectory and selection criteria for trajectory frame selection. This entity serves as the counterpart to FeatureSelectorData, focusing on row selection instead of column selection.

The DataSelectorData works with the 2-level dictionary structure where each trajectory has its own frame indices, supporting the new trajectory- specific architecture.

Attributes

namestr

Name identifier for this data selector configuration

trajectory_framesDict[int, List[int]]

Dictionary mapping trajectory indices to lists of selected frame indices

selection_criteriaDict[str, Any]

Dictionary containing selection criteria and operations history

Examples

>>> selector_data = DataSelectorData("folded_frames")
>>> selector_data.set_trajectory_frames({0: [0, 5, 12], 1: [18, 25]})
>>> print(f"Selected {selector_data.n_selected_frames} frames")
5
>>> # With selection criteria
>>> criteria = {"type": "cluster", "clustering": "conformations", "cluster_ids": [0]}
>>> selector_data.append_selection_criteria(criteria)
>>> print(selector_data.get_selection_info())
__init__(name: str) None

Initialize data selector data with given name.

Parameters

namestr

Name identifier for this data selector configuration

Returns

None

Initializes empty DataSelectorData with given name

Examples

>>> selector_data = DataSelectorData("active_site_frames")
>>> print(selector_data.name)
'active_site_frames'
>>> print(len(selector_data.trajectory_frames))
0
property n_selected_frames: int

Get total number of selected frames across all trajectories.

Returns

int

Total number of frames in the selection

Examples

>>> selector_data.set_trajectory_frames({0: [1, 5, 10], 1: [2, 7]})
>>> print(selector_data.n_selected_frames)
5
set_trajectory_frames(trajectory_frames: Dict[int, List[int]]) None

Set the selected frame indices per trajectory.

Parameters

trajectory_framesDict[int, List[int]]

Dictionary mapping trajectory indices to frame indices

Returns

None

Updates the trajectory_frames attribute

Examples

>>> selector_data = DataSelectorData("test")
>>> selector_data.set_trajectory_frames({0: [0, 10, 20], 1: [5, 15]})
>>> print(selector_data.trajectory_frames)
{0: [0, 10, 20], 1: [5, 15]}
get_trajectory_frames() Dict[int, List[int]]

Get the selected frame indices per trajectory.

Returns

Dict[int, List[int]]

Dictionary mapping trajectory indices to frame indices

Examples

>>> traj_frames = selector_data.get_trajectory_frames()
>>> print(f"Trajectory 0 frames: {traj_frames.get(0, [])}")
append_selection_criteria(criteria: Dict[str, Any]) None

Add selection criteria to chronological operations list.

Creates operations list on first call, appends to existing list on subsequent calls. Maintains chronological order of operations for full reproducibility of frame selection process.

Parameters

criteriaDict[str, Any]

Dictionary containing selection criteria for this operation

Returns

None

Updates the selection_criteria attribute with operations list

Examples

>>> # First operation - creates operations list
>>> selector_data.append_selection_criteria({
...     "type": "cluster", "clustering": "conformations", 
...     "cluster_ids": [0], "mode": "add"
... })
>>> # Second operation - appends to list
>>> selector_data.append_selection_criteria({
...     "type": "tags", "tags": ["system_A"], "mode": "intersect"
... })
>>> # Result: {"operations": [operation1, operation2]}
get_selection_criteria() Dict[str, Any]

Get the selection criteria.

Returns

Dict[str, Any]

Dictionary containing selection criteria

Examples

>>> criteria = selector_data.get_selection_criteria()
>>> print(f"Selection type: {criteria.get('type', 'unknown')}")
add_trajectory_frames(trajectory_frames: Dict[int, List[int]]) None

Add additional frame indices to existing trajectories.

Parameters

trajectory_framesDict[int, List[int]]

Dictionary mapping trajectory indices to additional frame indices

Returns

None

Extends existing trajectory frame lists with new indices

Examples

>>> selector_data.set_trajectory_frames({0: [0, 5, 10]})
>>> selector_data.add_trajectory_frames({0: [15, 20], 1: [2, 7]})
>>> print(selector_data.trajectory_frames)
{0: [0, 5, 10, 15, 20], 1: [2, 7]}
remove_duplicates() None

Remove duplicate frame indices and sort the lists for each trajectory.

Returns

None

Removes duplicates from trajectory frame lists and sorts them

Examples

>>> selector_data.trajectory_frames = {0: [5, 1, 5, 3], 1: [8, 1, 8]}
>>> selector_data.remove_duplicates()
>>> print(selector_data.trajectory_frames)
{0: [1, 3, 5], 1: [1, 8]}
clear_selection() None

Clear all selected frame indices and criteria.

Returns

None

Resets trajectory_frames and selection_criteria to empty

Examples

>>> selector_data.clear_selection()
>>> print(selector_data.n_selected_frames)
0
>>> print(selector_data.selection_criteria)
{}
get_selection_info() Dict[str, Any]

Get summary information about this selection.

Returns

Dict[str, Any]

Dictionary with selection summary information

Examples

>>> info = selector_data.get_selection_info()
>>> print(f"Name: {info['name']}")
>>> print(f"Frames: {info['n_frames']}")
>>> print(f"Type: {info['selection_type']}")
is_empty() bool

Check if the selection is empty.

Returns

bool

True if no frames are selected, False otherwise

Examples

>>> empty_selector = DataSelectorData("empty")
>>> print(empty_selector.is_empty())
True
save(save_path: str) None

Save DataSelectorData object to disk.

Parameters

save_pathstr

Path where to save the DataSelectorData object

Returns

None

Saves the DataSelectorData object to the specified path

Examples

>>> data_selector_data.save('analysis_results/folded_frames.pkl')
load(load_path: str) None

Load DataSelectorData object from disk.

Parameters

load_pathstr

Path to the saved DataSelectorData file

Returns

None

Loads the DataSelectorData object from the specified path

Examples

>>> data_selector_data.load('analysis_results/folded_frames.pkl')
print_info() None

Print comprehensive data selector information.

Parameters

None

Returns

None

Prints data selector information to console

Examples

>>> data_selector_data.print_info()
=== DataSelectorData ===
Name: folded_frames
Selected Frames: 250 frames from 3 trajectories
Selection Type: cluster
Frame Distribution: traj0:0-45 (15), traj1:10-89 (20), traj2:5-95 (15)