Representative Finder Helper

GitHub Link to Code.

Representative frame finder for structure visualization.

This module provides utilities for finding representative frames from DataSelectors, supporting both “best” (feature-based) and “centroid” (distance-based) selection modes. Includes memmap-safe implementations.

class mdxplain.feature_importance.helper.representative_finder_helper.RepresentativeFinderHelper

Helper for finding representative frames from DataSelectors.

Provides methods to find frames that best represent a DataSelector, either by maximizing alignment with top important features (“best”) or by finding the centroid frame (“centroid”).

Examples

>>> # Find best representative for a comparison
>>> traj_idx, frame_idx = RepresentativeFinderHelper.find_best_representative(
...     pipeline_data, fi_data, "cluster_0_vs_rest", n_top=10
... )

>>> # Find centroid frame
>>> traj_idx, frame_idx = RepresentativeFinderHelper.find_centroid_frame(
...     pipeline_data, "cluster_0", "my_features"
... )

static find_best_tree_based(pipeline_data: PipelineData, fi_data: FeatureImportanceData, comparison_identifier: str, n_top: int = 10, use_memmap: bool = False, chunk_size: int = 1000) → Tuple[int, int]

Find frame using tree-based scoring from Decision Tree splits.

Analyzes Decision Tree split rules to find frames that most strongly exhibit the top important features. Uses actual tree thresholds and split directions rather than median values.

Parameters

pipeline_dataPipelineData: Pipeline data object
fi_dataFeatureImportanceData: Feature importance data with Decision Tree model
comparison_identifierstr: Sub-comparison identifier
n_topint, default=10: Number of top features to consider
use_memmapbool, default=False: Whether to use memmap-safe processing
chunk_sizeint, default=1000: Chunk size for memmap processing

Returns

Tuple[int, int]: (trajectory_index, frame_index) of best representative

Examples

>>> traj_idx, frame_idx = RepresentativeFinderHelper.find_best_tree_based(
...     pipeline_data, fi_data, "cluster_0_vs_rest", n_top=10
... )

Notes

Uses sklearn DecisionTree split thresholds
Handles periodic features with circular distance
Scores frames by alignment with tree rules