Feature Importance Add Service

GitHub Link to Code.

Factory for adding feature importance analyzers with simplified syntax.

class mdxplain.feature_importance.services.feature_importance_add_service.FeatureImportanceAddService(manager: FeatureImportanceManager, pipeline_data: PipelineData)

Service for adding feature importance analyzers without explicit type instantiation.

This service provides an intuitive interface for adding feature importance analyzers without requiring users to import and instantiate analyzer types directly. All analyzer type parameters are combined with manager.add_analysis parameters.

Examples

>>> pipeline.feature_importance.add.decision_tree("my_comparison", max_depth=5, analysis_name="tree_analysis")
__init__(manager: FeatureImportanceManager, pipeline_data: PipelineData) None

Initialize factory with manager and pipeline data.

Parameters

managerFeatureImportanceManager

Feature importance manager instance

pipeline_dataPipelineData

Pipeline data container (injected by AutoInjectProxy)

Returns

None

decision_tree(comparison_name: str, analysis_name: str, criterion: str = 'gini', splitter: str = 'best', max_depth: int | None = 3, min_samples_split: int = 2, min_samples_leaf: int = 1, min_weight_fraction_leaf: float = 0.0, max_features: str | None = None, random_state: int | None = None, max_leaf_nodes: int | None = None, min_impurity_decrease: float = 0.0, class_weight: str | None = None, ccp_alpha: float = 0.0, max_samples: int | None = None, force: bool = False) None

Add Decision Tree feature importance analysis.

Decision Tree classifier computes feature importance scores based on how much each feature contributes to reducing impurity at tree splits. This provides interpretable feature rankings for understanding which molecular features are most important for distinguishing between different states.

Parameters

comparison_namestr

Name of the comparison to analyze

analysis_namestr

Name to store the analysis results

criterionstr, default=”gini”

Function to measure quality of splits (“gini” or “entropy”)

splitterstr, default=”best”

Strategy to split at each node (“best” or “random”)

max_depthint, optional, default=3

Maximum depth of tree. None means unlimited depth

min_samples_splitint, default=2

Minimum samples required to split an internal node

min_samples_leafint, default=1

Minimum samples required to be at a leaf node

min_weight_fraction_leaffloat, default=0.0

Minimum weighted fraction of sum total of weights required at leaf

max_featuresstr, optional

Number of features to consider when looking for best split

random_stateint, optional

Random state for reproducible results

max_leaf_nodesint, optional

Maximum number of leaf nodes in best-first manner

min_impurity_decreasefloat, default=0.0

Minimum impurity decrease required to make a split

class_weightstr, optional

Weights associated with classes (“balanced” or None)

ccp_alphafloat, default=0.0

Complexity parameter for Minimal Cost-Complexity Pruning

max_samplesint, optional

Maximum number of samples to use for training. If None, automatically calculated based on max_memory_gb from pipeline. Use this to manually override memory-based sampling (e.g., max_samples=50000 for large datasets).

forcebool, default=False

Whether to overwrite existing analysis with same name

Returns

None

Adds Decision Tree feature importance results to pipeline data

Examples

>>> # Basic decision tree analysis
>>> pipeline.feature_importance.add.decision_tree(
...     "folded_vs_unfolded", 
...     "basic_tree"
... )
>>> # Decision tree with custom parameters
>>> pipeline.feature_importance.add.decision_tree(
...     "state_comparison",
...     "deep_tree",
...     max_depth=10,
...     min_samples_split=20,
...     random_state=42
... )
>>> # Decision tree with entropy criterion
>>> pipeline.feature_importance.add.decision_tree(
...     "conformational_states",
...     "entropy_tree",
...     criterion="entropy",
...     max_features="sqrt",
...     class_weight="balanced"
... )

Notes

Decision trees provide interpretable feature importance based on split criteria. Higher importance scores indicate features that contribute more to reducing impurity when making classification decisions.

Uses sklearn.tree.DecisionTreeClassifier internally.