Top Features Helper
GitHub Link to Code.
Top features helper for feature importance operations.
This module provides helper methods for extracting and formatting top important features from feature importance analysis results.
Uses central utils.top_features_utils for main functionality to avoid code redundancy across modules.
- class mdxplain.feature_importance.helper.top_features_helper.TopFeaturesHelper
Helper class for processing top important features.
Provides static methods for extracting top features from feature importance data and formatting them with human-readable names and metadata. These methods extract common logic from FeatureImportanceManager.
Examples
>>> # Get top features for specific comparison >>> top_features = TopFeaturesHelper.get_top_features_for_comparison( ... fi_data, "folded_vs_rest", 5 ... )
>>> # Get top features averaged across comparisons >>> top_features = TopFeaturesHelper.get_top_features_averaged( ... fi_data, 10 ... )
- static get_top_features_for_comparison(fi_data: FeatureImportanceData, comparison_identifier: str, n: int) List[Tuple[int, float]]
Get top N features for a specific comparison.
Extracts the top N most important features for a specific sub-comparison from feature importance data.
Parameters
- fi_dataFeatureImportanceData
Feature importance data object
- comparison_identifierstr
Name/identifier of the specific comparison
- nint
Number of top features to return
Returns
- List[Tuple[int, float]]
List of (feature_index, importance_score) tuples
Examples
>>> indices_scores = TopFeaturesHelper.get_top_features_for_comparison( ... fi_data, "folded_vs_rest", 5 ... ) >>> for idx, score in indices_scores: ... print(f"Feature {idx}: {score:.3f}")
- static get_top_features_averaged(fi_data: FeatureImportanceData, n: int) List[Tuple[int, float]]
Get top N features averaged across all comparisons.
Computes average importance across all sub-comparisons and returns the top N features based on average importance.
Parameters
- fi_dataFeatureImportanceData
Feature importance data object
- nint
Number of top features to return
Returns
- List[Tuple[int, float]]
List of (feature_index, average_importance_score) tuples
Examples
>>> indices_scores = TopFeaturesHelper.get_top_features_averaged( ... fi_data, 10 ... ) >>> for idx, score in indices_scores: ... print(f"Feature {idx}: {score:.3f}")
- static format_features_with_names(indices_scores: List[Tuple[int, float]], feature_metadata: List[Any] | None) List[Dict[str, Any]]
Format feature indices and scores with human-readable names.
Converts a list of (index, score) tuples into dictionaries containing detailed feature information including names and types.
Parameters
- indices_scoresList[Tuple[int, float]]
List of (feature_index, importance_score) tuples
- feature_metadatalist or None
Feature metadata from pipeline for name mapping
Returns
- List[Dict[str, Any]]
List of feature info dictionaries with names and metadata
Examples
>>> formatted = TopFeaturesHelper.format_features_with_names( ... [(42, 0.85), (15, 0.72)], metadata ... ) >>> print(formatted[0]["feature_name"]) # "CA_distance_ALA_15_GLU_89" >>> print(formatted[0]["importance_score"]) # 0.85
- static get_top_features_with_names(pipeline_data: PipelineData, fi_data: FeatureImportanceData, comparison_identifier: str | None = None, n: int = 10) List[Dict[str, Any]]
Get top features with complete name mapping and formatting.
Complete method that extracts top features, retrieves metadata, and formats the results with human-readable names and types. Uses central utils.top_features_utils to avoid code redundancy.
Parameters
- pipeline_dataPipelineData
Pipeline data object containing metadata
- fi_dataFeatureImportanceData
Feature importance data object
- comparison_identifierstr, optional
Specific sub-comparison to get features from. If None, returns average across all sub-comparisons.
- nint, default=10
Number of top features to return
Returns
- List[Dict[str, Any]]
List of dictionaries with complete feature information
Examples
>>> # Get top features for specific comparison >>> top_features = TopFeaturesHelper.get_top_features_with_names( ... pipeline_data, fi_data, "folded_vs_rest", 5 ... )
>>> # Get top features averaged across all comparisons >>> top_features = TopFeaturesHelper.get_top_features_with_names( ... pipeline_data, fi_data, n=10 ... )
Notes
Delegates to central TopFeaturesUtils.get_top_features_with_names() to maintain consistency and avoid code duplication across modules.