Representative Finder Helper
GitHub Link to Code.
Representative frame finder for structure visualization.
This module provides utilities for finding representative frames from DataSelectors, supporting both “best” (feature-based) and “centroid” (distance-based) selection modes. Includes memmap-safe implementations.
- class mdxplain.feature_importance.helper.representative_finder_helper.RepresentativeFinderHelper
Helper for finding representative frames from DataSelectors.
Provides methods to find frames that best represent a DataSelector, either by maximizing alignment with top important features (“best”) or by finding the centroid frame (“centroid”).
Examples
>>> # Find best representative for a comparison >>> traj_idx, frame_idx = RepresentativeFinderHelper.find_best_representative( ... pipeline_data, fi_data, "cluster_0_vs_rest", n_top=10 ... )
>>> # Find centroid frame >>> traj_idx, frame_idx = RepresentativeFinderHelper.find_centroid_frame( ... pipeline_data, "cluster_0", "my_features" ... )
- static find_best_tree_based(pipeline_data: PipelineData, fi_data: FeatureImportanceData, comparison_identifier: str, n_top: int = 10, use_memmap: bool = False, chunk_size: int = 1000) Tuple[int, int]
Find frame using tree-based scoring from Decision Tree splits.
Analyzes Decision Tree split rules to find frames that most strongly exhibit the top important features. Uses actual tree thresholds and split directions rather than median values.
Parameters
- pipeline_dataPipelineData
Pipeline data object
- fi_dataFeatureImportanceData
Feature importance data with Decision Tree model
- comparison_identifierstr
Sub-comparison identifier
- n_topint, default=10
Number of top features to consider
- use_memmapbool, default=False
Whether to use memmap-safe processing
- chunk_sizeint, default=1000
Chunk size for memmap processing
Returns
- Tuple[int, int]
(trajectory_index, frame_index) of best representative
Examples
>>> traj_idx, frame_idx = RepresentativeFinderHelper.find_best_tree_based( ... pipeline_data, fi_data, "cluster_0_vs_rest", n_top=10 ... )
Notes
Uses sklearn DecisionTree split thresholds
Handles periodic features with circular distance
Scores frames by alignment with tree rules