Feature Selection Entities
GitHub Link to Code.
Feature selector data entity for storing selection configurations.
This module contains the FeatureSelectorData class that stores feature selection configurations including selection criteria and data type preferences.
- class mdxplain.feature_selection.entities.feature_selector_data.FeatureSelectorData(name: str)
Data entity for storing feature selector configurations.
Stores selection criteria for different feature types, allowing multiple selections per feature type with different data preferences (original vs reduced).
This entity follows the same pattern as ClusterData and DecompositionData, providing a container for feature selector configurations and results that can be stored and retrieved by name.
Attributes
- namestr
Name identifier for this feature selector configuration
- selectionsDict[str, List[dict]]
Dictionary mapping feature type keys to lists of selection dictionaries. Each selection dict contains:
‘selection’: Selection criteria string (e.g., “res ALA”, “resid 123-140”)
‘use_reduced’: Boolean flag for data type preference
- selection_resultsDict[str, dict]
Dictionary mapping feature type keys to computed selection results. Each result dict contains:
‘indices’: List of selected column indices
‘use_reduced’: List of boolean flags for reduced data usage
- reference_trajectoryint, optional
Trajectory index to use as reference for metadata extraction
Examples
>>> selector_data = FeatureSelectorData("my_analysis") >>> selector_data.add_selection("distances", "res ALA", use_reduced=False) >>> selector_data.add_selection("contacts", "resid 120-140", use_reduced=True)
- __init__(name: str)
Initialize feature selector data with given name.
Parameters
- namestr
Name identifier for this feature selector configuration
Returns
- None
Initializes empty FeatureSelectorData with given name
Examples
>>> selector_data = FeatureSelectorData("protein_analysis") >>> print(selector_data.name) 'protein_analysis'
- add_selection(feature_key: str, selection: str, use_reduced: bool = False, common_denominator: bool = True, traj_selection: int | str | List[int | str] = 'all', require_all_partners: bool = False) None
Add a selection configuration for a feature type.
Parameters
- feature_keystr
Feature type key (e.g., “distances”, “contacts”)
- selectionstr
Selection criteria string (e.g., “res ALA”, “resid 123-140”, “7x50-8x50”, “all”)
- use_reducedbool, default=False
Whether to use reduced data (True) or original data (False)
- common_denominatorbool, default=True
Whether to find common features across trajectories in traj_selection
- traj_selectionint, str, list, or “all”, default=”all”
Selection of trajectories to process
- require_all_partnersbool, default=False
For pairwise features, require all partners to be present in selection
Returns
- None
Adds selection configuration to the selections dictionary
Examples
>>> selector_data = FeatureSelectorData("analysis") >>> selector_data.add_selection("distances", "res ALA") >>> selector_data.add_selection("contacts", "resid 120-140", use_reduced=True) >>> print(len(selector_data.selections["distances"])) 1
- get_selections(feature_key: str) List[dict]
Get all selection configurations for a feature type.
Parameters
- feature_keystr
Feature type key to get selections for
Returns
- List[dict]
List of selection dictionaries for the feature type. Returns empty list if feature_key not found.
Examples
>>> selector_data = FeatureSelectorData("analysis") >>> selector_data.add_selection("distances", "res ALA") >>> selector_data.add_selection("distances", "res HIS") >>> selections = selector_data.get_selections("distances") >>> print(len(selections)) 2
- has_feature(feature_key: str) bool
Check if feature type has any selections configured.
Parameters
- feature_keystr
Feature type key to check
Returns
- bool
True if feature type has selections, False otherwise
Examples
>>> selector_data = FeatureSelectorData("analysis") >>> selector_data.add_selection("distances", "res ALA") >>> print(selector_data.has_feature("distances")) True >>> print(selector_data.has_feature("contacts")) False
- get_feature_keys() List[str]
Get list of all configured feature type keys.
Returns
- List[str]
List of feature type keys that have selections configured
Examples
>>> selector_data = FeatureSelectorData("analysis") >>> selector_data.add_selection("distances", "res ALA") >>> selector_data.add_selection("contacts", "resid 120") >>> keys = selector_data.get_feature_keys() >>> print(sorted(keys)) ['contacts', 'distances']
- clear_selections(feature_key: str = None) None
Clear selections for a feature type or all selections.
Parameters
- feature_keystr, optional
Feature type key to clear selections for. If None, clears all selections.
Returns
- None
Clears specified or all selections
Examples
>>> selector_data = FeatureSelectorData("analysis") >>> selector_data.add_selection("distances", "res ALA") >>> selector_data.add_selection("contacts", "resid 120") >>> selector_data.clear_selections("distances") # Clear only distances >>> print(selector_data.has_feature("distances")) False >>> selector_data.clear_selections() # Clear all >>> print(len(selector_data.get_feature_keys())) 0
- get_summary() dict
Get summary information about the selector configuration.
Returns
- dict
Summary dictionary with configuration statistics
Examples
>>> selector_data = FeatureSelectorData("analysis") >>> selector_data.add_selection("distances", "res ALA") >>> selector_data.add_selection("contacts", "resid 120", use_reduced=True) >>> summary = selector_data.get_summary() >>> print(summary['name']) 'analysis' >>> print(summary['feature_count']) 2
- store_results(feature_key: str, result_data: dict) None
Store selection results for a feature type.
Stores the computed indices and metadata for a feature type selection. This method is called by FeatureSelectorManager after processing selections.
Parameters
- feature_keystr
Feature type key to store results for
- result_datadict
Result dictionary containing:
‘indices’: List of selected column indices
‘use_reduced’: List of boolean flags for reduced data usage
Returns
- None
Stores results in selection_results dictionary
Examples
>>> selector_data = FeatureSelectorData("analysis") >>> results = {"trajectory_indices": {traj_idx: {"indices": [...], "use_reduced": [...]}}} >>> selector_data.store_results("distances", results) >>> print(selector_data.has_results("distances")) True
- get_results(feature_key: str) dict
Get selection results for a feature type.
Retrieves the stored selection results (indices and flags) for a specific feature type. Returns empty dict if no results found.
Parameters
- feature_keystr
Feature type key to get results for
Returns
- dict
Result dictionary with ‘indices’ and ‘use_reduced’ keys, or empty dict if no results stored
Examples
>>> selector_data = FeatureSelectorData("analysis") >>> results = selector_data.get_results("distances") >>> if results: ... print(f"Selected {len(results['indices'])} features")
- get_all_results() Dict[str, dict]
Get all stored selection results.
Returns the complete selection_results dictionary containing results for all feature types.
Returns
- Dict[str, dict]
Dictionary mapping feature keys to their result dictionaries
Examples
>>> selector_data = FeatureSelectorData("analysis") >>> all_results = selector_data.get_all_results() >>> for feature_key, results in all_results.items(): ... print(f"{feature_key}: {len(results['indices'])} selected")
- has_results(feature_key: str = None) bool
Check if selection results are available.
Parameters
- feature_keystr, optional
Feature type key to check. If None, checks if any results exist.
Returns
- bool
True if results are available, False otherwise
Examples
>>> selector_data = FeatureSelectorData("analysis") >>> print(selector_data.has_results()) # Any results? False >>> print(selector_data.has_results("distances")) # Specific feature? False
- clear_results(feature_key: str = None) None
Clear stored selection results.
Parameters
- feature_keystr, optional
Feature type key to clear results for. If None, clears all results.
Returns
- None
Clears specified or all selection results
Examples
>>> selector_data = FeatureSelectorData("analysis") >>> selector_data.clear_results("distances") # Clear only distances >>> selector_data.clear_results() # Clear all results
- set_reference_trajectory(reference_traj: int) None
Set reference trajectory for metadata extraction.
Parameters
- reference_trajint
Trajectory index to use as reference for metadata
Returns
- None
Sets reference trajectory in the selector data
- get_reference_trajectory() int | None
Get reference trajectory for metadata extraction.
Returns
- Optional[int]
Reference trajectory index, or None if not set
- set_n_columns(n_columns: int) None
Set total number of columns in the final selection matrix.
Parameters
- n_columnsint
Total number of columns across all features and trajectories
Returns
- None
Sets the n_columns attribute
- get_n_columns() int | None
Get total number of columns in the final selection matrix.
Returns
- Optional[int]
Total number of columns, or None if not calculated yet
- save(save_path: str) None
Save FeatureSelectorData object to disk.
Parameters
- save_pathstr
Path where to save the FeatureSelectorData object
Returns
- None
Saves the FeatureSelectorData object to the specified path
Examples
>>> feature_selector_data.save('analysis_results/feature_selection.pkl')
- load(load_path: str) None
Load FeatureSelectorData object from disk.
Parameters
- load_pathstr
Path to the saved FeatureSelectorData file
Returns
- None
Loads the FeatureSelectorData object from the specified path
Examples
>>> feature_selector_data.load('analysis_results/feature_selection.pkl')
- print_info() None
Print comprehensive feature selector information.
Parameters
None
Returns
- None
Prints feature selector information to console
Examples
>>> feature_selector_data.print_info() === FeatureSelectorData === Name: analysis Feature Types: 2 (distances, contacts) Total Selections: 3 Selection Results: Available for 2 feature types