Top Features Helper

GitHub Link to Code.

Top features helper for feature importance operations.

This module provides helper methods for extracting and formatting top important features from feature importance analysis results.

Uses central utils.top_features_utils for main functionality to avoid code redundancy across modules.

class mdxplain.feature_importance.helper.top_features_helper.TopFeaturesHelper

Helper class for processing top important features.

Provides static methods for extracting top features from feature importance data and formatting them with human-readable names and metadata. These methods extract common logic from FeatureImportanceManager.

Examples

>>> # Get top features for specific comparison
>>> top_features = TopFeaturesHelper.get_top_features_for_comparison(
...     fi_data, "folded_vs_rest", 5
... )
>>> # Get top features averaged across comparisons
>>> top_features = TopFeaturesHelper.get_top_features_averaged(
...     fi_data, 10
... )
static get_top_features_for_comparison(fi_data: FeatureImportanceData, comparison_identifier: str, n: int) List[Tuple[int, float]]

Get top N features for a specific comparison.

Extracts the top N most important features for a specific sub-comparison from feature importance data.

Parameters

fi_dataFeatureImportanceData

Feature importance data object

comparison_identifierstr

Name/identifier of the specific comparison

nint

Number of top features to return

Returns

List[Tuple[int, float]]

List of (feature_index, importance_score) tuples

Examples

>>> indices_scores = TopFeaturesHelper.get_top_features_for_comparison(
...     fi_data, "folded_vs_rest", 5
... )
>>> for idx, score in indices_scores:
...     print(f"Feature {idx}: {score:.3f}")
static get_top_features_averaged(fi_data: FeatureImportanceData, n: int) List[Tuple[int, float]]

Get top N features averaged across all comparisons.

Computes average importance across all sub-comparisons and returns the top N features based on average importance.

Parameters

fi_dataFeatureImportanceData

Feature importance data object

nint

Number of top features to return

Returns

List[Tuple[int, float]]

List of (feature_index, average_importance_score) tuples

Examples

>>> indices_scores = TopFeaturesHelper.get_top_features_averaged(
...     fi_data, 10
... )
>>> for idx, score in indices_scores:
...     print(f"Feature {idx}: {score:.3f}")
static format_features_with_names(indices_scores: List[Tuple[int, float]], feature_metadata: List[Any] | None) List[Dict[str, Any]]

Format feature indices and scores with human-readable names.

Converts a list of (index, score) tuples into dictionaries containing detailed feature information including names and types.

Parameters

indices_scoresList[Tuple[int, float]]

List of (feature_index, importance_score) tuples

feature_metadatalist or None

Feature metadata from pipeline for name mapping

Returns

List[Dict[str, Any]]

List of feature info dictionaries with names and metadata

Examples

>>> formatted = TopFeaturesHelper.format_features_with_names(
...     [(42, 0.85), (15, 0.72)], metadata
... )
>>> print(formatted[0]["feature_name"])  # "CA_distance_ALA_15_GLU_89"
>>> print(formatted[0]["importance_score"])  # 0.85
static get_top_features_with_names(pipeline_data: PipelineData, fi_data: FeatureImportanceData, comparison_identifier: str | None = None, n: int = 10) List[Dict[str, Any]]

Get top features with complete name mapping and formatting.

Complete method that extracts top features, retrieves metadata, and formats the results with human-readable names and types. Uses central utils.top_features_utils to avoid code redundancy.

Parameters

pipeline_dataPipelineData

Pipeline data object containing metadata

fi_dataFeatureImportanceData

Feature importance data object

comparison_identifierstr, optional

Specific sub-comparison to get features from. If None, returns average across all sub-comparisons.

nint, default=10

Number of top features to return

Returns

List[Dict[str, Any]]

List of dictionaries with complete feature information

Examples

>>> # Get top features for specific comparison
>>> top_features = TopFeaturesHelper.get_top_features_with_names(
...     pipeline_data, fi_data, "folded_vs_rest", 5
... )
>>> # Get top features averaged across all comparisons
>>> top_features = TopFeaturesHelper.get_top_features_with_names(
...     pipeline_data, fi_data, n=10
... )

Notes

Delegates to central TopFeaturesUtils.get_top_features_with_names() to maintain consistency and avoid code duplication across modules.