Feature Importance Add Service

GitHub Link to Code.

Factory for adding feature importance analyzers with simplified syntax.

class mdxplain.feature_importance.services.feature_importance_add_service.FeatureImportanceAddService(manager: FeatureImportanceManager, pipeline_data: PipelineData)

Service for adding feature importance analyzers without explicit type instantiation.

This service provides an intuitive interface for adding feature importance analyzers without requiring users to import and instantiate analyzer types directly. All analyzer type parameters are combined with manager.add_analysis parameters.

Examples

>>> pipeline.feature_importance.add.decision_tree("my_comparison", max_depth=5, analysis_name="tree_analysis")

__init__(manager: FeatureImportanceManager, pipeline_data: PipelineData) → None

Initialize factory with manager and pipeline data.

Parameters

managerFeatureImportanceManager: Feature importance manager instance
pipeline_dataPipelineData: Pipeline data container (injected by AutoInjectProxy)

Returns

None

decision_tree(comparison_name: str, analysis_name: str, criterion: str = 'gini', splitter: str = 'best', max_depth: int | None = 3, min_samples_split: int = 2, min_samples_leaf: int = 1, min_weight_fraction_leaf: float = 0.0, max_features: str | None = None, random_state: int | None = None, max_leaf_nodes: int | None = None, min_impurity_decrease: float = 0.0, class_weight: str | None = None, ccp_alpha: float = 0.0, max_samples: int | None = None, force: bool = False) → None

Add Decision Tree feature importance analysis.

Decision Tree classifier computes feature importance scores based on how much each feature contributes to reducing impurity at tree splits. This provides interpretable feature rankings for understanding which molecular features are most important for distinguishing between different states.

Parameters

comparison_namestr: Name of the comparison to analyze
analysis_namestr: Name to store the analysis results
criterionstr, default=”gini”: Function to measure quality of splits (“gini” or “entropy”)
splitterstr, default=”best”: Strategy to split at each node (“best” or “random”)
max_depthint, optional, default=3: Maximum depth of tree. None means unlimited depth
min_samples_splitint, default=2: Minimum samples required to split an internal node
min_samples_leafint, default=1: Minimum samples required to be at a leaf node
min_weight_fraction_leaffloat, default=0.0: Minimum weighted fraction of sum total of weights required at leaf
max_featuresstr, optional: Number of features to consider when looking for best split
random_stateint, optional: Random state for reproducible results
max_leaf_nodesint, optional: Maximum number of leaf nodes in best-first manner
min_impurity_decreasefloat, default=0.0: Minimum impurity decrease required to make a split
class_weightstr, optional: Weights associated with classes (“balanced” or None)
ccp_alphafloat, default=0.0: Complexity parameter for Minimal Cost-Complexity Pruning
max_samplesint, optional: Maximum number of samples to use for training. If None, automatically calculated based on max_memory_gb from pipeline. Use this to manually override memory-based sampling (e.g., max_samples=50000 for large datasets).
forcebool, default=False: Whether to overwrite existing analysis with same name

Returns

None: Adds Decision Tree feature importance results to pipeline data

Examples

>>> # Basic decision tree analysis
>>> pipeline.feature_importance.add.decision_tree(
...     "folded_vs_unfolded", 
...     "basic_tree"
... )

>>> # Decision tree with custom parameters
>>> pipeline.feature_importance.add.decision_tree(
...     "state_comparison",
...     "deep_tree",
...     max_depth=10,
...     min_samples_split=20,
...     random_state=42
... )

>>> # Decision tree with entropy criterion
>>> pipeline.feature_importance.add.decision_tree(
...     "conformational_states",
...     "entropy_tree",
...     criterion="entropy",
...     max_features="sqrt",
...     class_weight="balanced"
... )

Notes

Decision trees provide interpretable feature importance based on split criteria. Higher importance scores indicate features that contribute more to reducing impurity when making classification decisions.

Uses sklearn.tree.DecisionTreeClassifier internally.