Representative Finder Helper

GitHub Link to Code.

Representative frame finder for structure visualization.

This module provides utilities for finding representative frames from DataSelectors, supporting both “best” (feature-based) and “centroid” (distance-based) selection modes. Includes memmap-safe implementations.

class mdxplain.feature_importance.helper.representative_finder_helper.RepresentativeFinderHelper

Helper for finding representative frames from DataSelectors.

Provides methods to find frames that best represent a DataSelector, either by maximizing alignment with top important features (“best”) or by finding the centroid frame (“centroid”).

Examples

>>> # Find best representative for a comparison
>>> traj_idx, frame_idx = RepresentativeFinderHelper.find_best_representative(
...     pipeline_data, fi_data, "cluster_0_vs_rest", n_top=10
... )
>>> # Find centroid frame
>>> traj_idx, frame_idx = RepresentativeFinderHelper.find_centroid_frame(
...     pipeline_data, "cluster_0", "my_features"
... )
static find_best_tree_based(pipeline_data: PipelineData, fi_data: FeatureImportanceData, comparison_identifier: str, n_top: int = 10, use_memmap: bool = False, chunk_size: int = 1000) Tuple[int, int]

Find frame using tree-based scoring from Decision Tree splits.

Analyzes Decision Tree split rules to find frames that most strongly exhibit the top important features. Uses actual tree thresholds and split directions rather than median values.

Parameters

pipeline_dataPipelineData

Pipeline data object

fi_dataFeatureImportanceData

Feature importance data with Decision Tree model

comparison_identifierstr

Sub-comparison identifier

n_topint, default=10

Number of top features to consider

use_memmapbool, default=False

Whether to use memmap-safe processing

chunk_sizeint, default=1000

Chunk size for memmap processing

Returns

Tuple[int, int]

(trajectory_index, frame_index) of best representative

Examples

>>> traj_idx, frame_idx = RepresentativeFinderHelper.find_best_tree_based(
...     pipeline_data, fi_data, "cluster_0_vs_rest", n_top=10
... )

Notes

  • Uses sklearn DecisionTree split thresholds

  • Handles periodic features with circular distance

  • Scores frames by alignment with tree rules