Top Features Utils
GitHub Link to Code.
Central utilities for top features extraction and formatting.
This module provides utilities for extracting and formatting top important features from feature importance analysis. Used by feature_importance and structure_visualization modules.
- class mdxplain.utils.top_features_utils.TopFeaturesUtils
Utility class for extracting and formatting top important features.
Provides static methods for extracting top features from feature importance data and formatting them with human-readable names and metadata. Breaks down complex extraction into smaller, focused methods.
Examples
>>> # Get top features for specific comparison >>> features = TopFeaturesUtils.get_top_features_with_names( ... pipeline_data, fi_data, "cluster_0_vs_rest", 5 ... )
>>> # Get top features averaged across all comparisons >>> features = TopFeaturesUtils.get_top_features_with_names( ... pipeline_data, fi_data, n=10 ... )
- static get_top_features_with_names(pipeline_data: PipelineData, fi_data: FeatureImportanceData, comparison_identifier: str | None = None, n: int = 10) List[Dict[str, Any]]
Get top features with complete name mapping and formatting.
Extracts top N most important features from feature importance data, retrieves metadata, and formats the results with human-readable names and types. This is the main entry point for top features processing.
Parameters
- pipeline_dataPipelineData
Pipeline data object containing feature metadata
- fi_dataFeatureImportanceData
Feature importance data object with importance scores
- comparison_identifierstr, optional
Specific sub-comparison to get features from. If None, returns average importance across all sub-comparisons.
- nint, default=10
Number of top features to return
Returns
- List[Dict[str, Any]]
List of dictionaries with complete feature information. Each dictionary contains:
feature_index: int - Feature index in feature array
importance_score: float - Importance score
rank: int - Rank (1-indexed)
feature_name: str - Human-readable feature name
feature_type: str - Feature type (e.g., “distances”, “torsions”)
residue_seqids: List[int] - List of residue sequence IDs involved
residue_indices: List[int] - List of residue indices involved
Examples
>>> # Get top features for specific comparison >>> features = TopFeaturesUtils.get_top_features_with_names( ... pipeline_data, fi_data, "cluster_0_vs_rest", 5 ... )
>>> # Get top features averaged across all comparisons >>> features = TopFeaturesUtils.get_top_features_with_names( ... pipeline_data, fi_data, n=10 ... )
Notes
Requires feature_selector to be set in fi_data for name mapping
If comparison_identifier is None, uses average importance
Feature names extracted from metadata using FeatureMetadataUtils
Delegates to smaller methods for better code organization