Feature Selection Entities

GitHub Link to Code.

Feature selector data entity for storing selection configurations.

This module contains the FeatureSelectorData class that stores feature selection configurations including selection criteria and data type preferences.

class mdxplain.feature_selection.entities.feature_selector_data.FeatureSelectorData(name: str)

Data entity for storing feature selector configurations.

Stores selection criteria for different feature types, allowing multiple selections per feature type with different data preferences (original vs reduced).

This entity follows the same pattern as ClusterData and DecompositionData, providing a container for feature selector configurations and results that can be stored and retrieved by name.

Attributes

namestr

Name identifier for this feature selector configuration

selectionsDict[str, List[dict]]

Dictionary mapping feature type keys to lists of selection dictionaries. Each selection dict contains:

  • ‘selection’: Selection criteria string (e.g., “res ALA”, “resid 123-140”)

  • ‘use_reduced’: Boolean flag for data type preference

selection_resultsDict[str, dict]

Dictionary mapping feature type keys to computed selection results. Each result dict contains:

  • ‘indices’: List of selected column indices

  • ‘use_reduced’: List of boolean flags for reduced data usage

reference_trajectoryint, optional

Trajectory index to use as reference for metadata extraction

Examples

>>> selector_data = FeatureSelectorData("my_analysis")
>>> selector_data.add_selection("distances", "res ALA", use_reduced=False)
>>> selector_data.add_selection("contacts", "resid 120-140", use_reduced=True)
__init__(name: str)

Initialize feature selector data with given name.

Parameters

namestr

Name identifier for this feature selector configuration

Returns

None

Initializes empty FeatureSelectorData with given name

Examples

>>> selector_data = FeatureSelectorData("protein_analysis")
>>> print(selector_data.name)
'protein_analysis'
add_selection(feature_key: str, selection: str, use_reduced: bool = False, common_denominator: bool = True, traj_selection: int | str | List[int | str] = 'all', require_all_partners: bool = False) None

Add a selection configuration for a feature type.

Parameters

feature_keystr

Feature type key (e.g., “distances”, “contacts”)

selectionstr

Selection criteria string (e.g., “res ALA”, “resid 123-140”, “7x50-8x50”, “all”)

use_reducedbool, default=False

Whether to use reduced data (True) or original data (False)

common_denominatorbool, default=True

Whether to find common features across trajectories in traj_selection

traj_selectionint, str, list, or “all”, default=”all”

Selection of trajectories to process

require_all_partnersbool, default=False

For pairwise features, require all partners to be present in selection

Returns

None

Adds selection configuration to the selections dictionary

Examples

>>> selector_data = FeatureSelectorData("analysis")
>>> selector_data.add_selection("distances", "res ALA")
>>> selector_data.add_selection("contacts", "resid 120-140", use_reduced=True)
>>> print(len(selector_data.selections["distances"]))
1
get_selections(feature_key: str) List[dict]

Get all selection configurations for a feature type.

Parameters

feature_keystr

Feature type key to get selections for

Returns

List[dict]

List of selection dictionaries for the feature type. Returns empty list if feature_key not found.

Examples

>>> selector_data = FeatureSelectorData("analysis")
>>> selector_data.add_selection("distances", "res ALA")
>>> selector_data.add_selection("distances", "res HIS")
>>> selections = selector_data.get_selections("distances")
>>> print(len(selections))
2
has_feature(feature_key: str) bool

Check if feature type has any selections configured.

Parameters

feature_keystr

Feature type key to check

Returns

bool

True if feature type has selections, False otherwise

Examples

>>> selector_data = FeatureSelectorData("analysis")
>>> selector_data.add_selection("distances", "res ALA")
>>> print(selector_data.has_feature("distances"))
True
>>> print(selector_data.has_feature("contacts"))
False
get_feature_keys() List[str]

Get list of all configured feature type keys.

Returns

List[str]

List of feature type keys that have selections configured

Examples

>>> selector_data = FeatureSelectorData("analysis")
>>> selector_data.add_selection("distances", "res ALA")
>>> selector_data.add_selection("contacts", "resid 120")
>>> keys = selector_data.get_feature_keys()
>>> print(sorted(keys))
['contacts', 'distances']
clear_selections(feature_key: str = None) None

Clear selections for a feature type or all selections.

Parameters

feature_keystr, optional

Feature type key to clear selections for. If None, clears all selections.

Returns

None

Clears specified or all selections

Examples

>>> selector_data = FeatureSelectorData("analysis")
>>> selector_data.add_selection("distances", "res ALA")
>>> selector_data.add_selection("contacts", "resid 120")
>>> selector_data.clear_selections("distances")  # Clear only distances
>>> print(selector_data.has_feature("distances"))
False
>>> selector_data.clear_selections()  # Clear all
>>> print(len(selector_data.get_feature_keys()))
0
get_summary() dict

Get summary information about the selector configuration.

Returns

dict

Summary dictionary with configuration statistics

Examples

>>> selector_data = FeatureSelectorData("analysis")
>>> selector_data.add_selection("distances", "res ALA")
>>> selector_data.add_selection("contacts", "resid 120", use_reduced=True)
>>> summary = selector_data.get_summary()
>>> print(summary['name'])
'analysis'
>>> print(summary['feature_count'])
2
store_results(feature_key: str, result_data: dict) None

Store selection results for a feature type.

Stores the computed indices and metadata for a feature type selection. This method is called by FeatureSelectorManager after processing selections.

Parameters

feature_keystr

Feature type key to store results for

result_datadict

Result dictionary containing:

  • ‘indices’: List of selected column indices

  • ‘use_reduced’: List of boolean flags for reduced data usage

Returns

None

Stores results in selection_results dictionary

Examples

>>> selector_data = FeatureSelectorData("analysis")
>>> results = {"trajectory_indices": {traj_idx: {"indices": [...], "use_reduced": [...]}}}
>>> selector_data.store_results("distances", results)
>>> print(selector_data.has_results("distances"))
True
get_results(feature_key: str) dict

Get selection results for a feature type.

Retrieves the stored selection results (indices and flags) for a specific feature type. Returns empty dict if no results found.

Parameters

feature_keystr

Feature type key to get results for

Returns

dict

Result dictionary with ‘indices’ and ‘use_reduced’ keys, or empty dict if no results stored

Examples

>>> selector_data = FeatureSelectorData("analysis")
>>> results = selector_data.get_results("distances")
>>> if results:
...     print(f"Selected {len(results['indices'])} features")
get_all_results() Dict[str, dict]

Get all stored selection results.

Returns the complete selection_results dictionary containing results for all feature types.

Returns

Dict[str, dict]

Dictionary mapping feature keys to their result dictionaries

Examples

>>> selector_data = FeatureSelectorData("analysis")
>>> all_results = selector_data.get_all_results()
>>> for feature_key, results in all_results.items():
...     print(f"{feature_key}: {len(results['indices'])} selected")
has_results(feature_key: str = None) bool

Check if selection results are available.

Parameters

feature_keystr, optional

Feature type key to check. If None, checks if any results exist.

Returns

bool

True if results are available, False otherwise

Examples

>>> selector_data = FeatureSelectorData("analysis")
>>> print(selector_data.has_results())  # Any results?
False
>>> print(selector_data.has_results("distances"))  # Specific feature?
False
clear_results(feature_key: str = None) None

Clear stored selection results.

Parameters

feature_keystr, optional

Feature type key to clear results for. If None, clears all results.

Returns

None

Clears specified or all selection results

Examples

>>> selector_data = FeatureSelectorData("analysis")
>>> selector_data.clear_results("distances")  # Clear only distances
>>> selector_data.clear_results()  # Clear all results
set_reference_trajectory(reference_traj: int) None

Set reference trajectory for metadata extraction.

Parameters

reference_trajint

Trajectory index to use as reference for metadata

Returns

None

Sets reference trajectory in the selector data

get_reference_trajectory() int | None

Get reference trajectory for metadata extraction.

Returns

Optional[int]

Reference trajectory index, or None if not set

set_n_columns(n_columns: int) None

Set total number of columns in the final selection matrix.

Parameters

n_columnsint

Total number of columns across all features and trajectories

Returns

None

Sets the n_columns attribute

get_n_columns() int | None

Get total number of columns in the final selection matrix.

Returns

Optional[int]

Total number of columns, or None if not calculated yet

save(save_path: str) None

Save FeatureSelectorData object to disk.

Parameters

save_pathstr

Path where to save the FeatureSelectorData object

Returns

None

Saves the FeatureSelectorData object to the specified path

Examples

>>> feature_selector_data.save('analysis_results/feature_selection.pkl')
load(load_path: str) None

Load FeatureSelectorData object from disk.

Parameters

load_pathstr

Path to the saved FeatureSelectorData file

Returns

None

Loads the FeatureSelectorData object from the specified path

Examples

>>> feature_selector_data.load('analysis_results/feature_selection.pkl')
print_info() None

Print comprehensive feature selector information.

Parameters

None

Returns

None

Prints feature selector information to console

Examples

>>> feature_selector_data.print_info()
=== FeatureSelectorData ===
Name: analysis
Feature Types: 2 (distances, contacts)
Total Selections: 3
Selection Results: Available for 2 feature types