Top Features Utils

GitHub Link to Code.

Central utilities for top features extraction and formatting.

This module provides utilities for extracting and formatting top important features from feature importance analysis. Used by feature_importance and structure_visualization modules.

class mdxplain.utils.top_features_utils.TopFeaturesUtils

Utility class for extracting and formatting top important features.

Provides static methods for extracting top features from feature importance data and formatting them with human-readable names and metadata. Breaks down complex extraction into smaller, focused methods.

Examples

>>> # Get top features for specific comparison
>>> features = TopFeaturesUtils.get_top_features_with_names(
...     pipeline_data, fi_data, "cluster_0_vs_rest", 5
... )
>>> # Get top features averaged across all comparisons
>>> features = TopFeaturesUtils.get_top_features_with_names(
...     pipeline_data, fi_data, n=10
... )
static get_top_features_with_names(pipeline_data: PipelineData, fi_data: FeatureImportanceData, comparison_identifier: str | None = None, n: int = 10) List[Dict[str, Any]]

Get top features with complete name mapping and formatting.

Extracts top N most important features from feature importance data, retrieves metadata, and formats the results with human-readable names and types. This is the main entry point for top features processing.

Parameters

pipeline_dataPipelineData

Pipeline data object containing feature metadata

fi_dataFeatureImportanceData

Feature importance data object with importance scores

comparison_identifierstr, optional

Specific sub-comparison to get features from. If None, returns average importance across all sub-comparisons.

nint, default=10

Number of top features to return

Returns

List[Dict[str, Any]]

List of dictionaries with complete feature information. Each dictionary contains:

  • feature_index: int - Feature index in feature array

  • importance_score: float - Importance score

  • rank: int - Rank (1-indexed)

  • feature_name: str - Human-readable feature name

  • feature_type: str - Feature type (e.g., “distances”, “torsions”)

  • residue_seqids: List[int] - List of residue sequence IDs involved

  • residue_indices: List[int] - List of residue indices involved

Examples

>>> # Get top features for specific comparison
>>> features = TopFeaturesUtils.get_top_features_with_names(
...     pipeline_data, fi_data, "cluster_0_vs_rest", 5
... )
>>> # Get top features averaged across all comparisons
>>> features = TopFeaturesUtils.get_top_features_with_names(
...     pipeline_data, fi_data, n=10
... )

Notes

  • Requires feature_selector to be set in fi_data for name mapping

  • If comparison_identifier is None, uses average importance

  • Feature names extracted from metadata using FeatureMetadataUtils

  • Delegates to smaller methods for better code organization