Feature Importance Add Service
GitHub Link to Code.
Factory for adding feature importance analyzers with simplified syntax.
- class mdxplain.feature_importance.services.feature_importance_add_service.FeatureImportanceAddService(manager: FeatureImportanceManager, pipeline_data: PipelineData)
Service for adding feature importance analyzers without explicit type instantiation.
This service provides an intuitive interface for adding feature importance analyzers without requiring users to import and instantiate analyzer types directly. All analyzer type parameters are combined with manager.add_analysis parameters.
Examples
>>> pipeline.feature_importance.add.decision_tree("my_comparison", max_depth=5, analysis_name="tree_analysis")
- __init__(manager: FeatureImportanceManager, pipeline_data: PipelineData) None
Initialize factory with manager and pipeline data.
Parameters
- managerFeatureImportanceManager
Feature importance manager instance
- pipeline_dataPipelineData
Pipeline data container (injected by AutoInjectProxy)
Returns
None
- decision_tree(comparison_name: str, analysis_name: str, criterion: str = 'gini', splitter: str = 'best', max_depth: int | None = 3, min_samples_split: int = 2, min_samples_leaf: int = 1, min_weight_fraction_leaf: float = 0.0, max_features: str | None = None, random_state: int | None = None, max_leaf_nodes: int | None = None, min_impurity_decrease: float = 0.0, class_weight: str | None = None, ccp_alpha: float = 0.0, max_samples: int | None = None, force: bool = False) None
Add Decision Tree feature importance analysis.
Decision Tree classifier computes feature importance scores based on how much each feature contributes to reducing impurity at tree splits. This provides interpretable feature rankings for understanding which molecular features are most important for distinguishing between different states.
Parameters
- comparison_namestr
Name of the comparison to analyze
- analysis_namestr
Name to store the analysis results
- criterionstr, default=”gini”
Function to measure quality of splits (“gini” or “entropy”)
- splitterstr, default=”best”
Strategy to split at each node (“best” or “random”)
- max_depthint, optional, default=3
Maximum depth of tree. None means unlimited depth
- min_samples_splitint, default=2
Minimum samples required to split an internal node
- min_samples_leafint, default=1
Minimum samples required to be at a leaf node
- min_weight_fraction_leaffloat, default=0.0
Minimum weighted fraction of sum total of weights required at leaf
- max_featuresstr, optional
Number of features to consider when looking for best split
- random_stateint, optional
Random state for reproducible results
- max_leaf_nodesint, optional
Maximum number of leaf nodes in best-first manner
- min_impurity_decreasefloat, default=0.0
Minimum impurity decrease required to make a split
- class_weightstr, optional
Weights associated with classes (“balanced” or None)
- ccp_alphafloat, default=0.0
Complexity parameter for Minimal Cost-Complexity Pruning
- max_samplesint, optional
Maximum number of samples to use for training. If None, automatically calculated based on max_memory_gb from pipeline. Use this to manually override memory-based sampling (e.g., max_samples=50000 for large datasets).
- forcebool, default=False
Whether to overwrite existing analysis with same name
Returns
- None
Adds Decision Tree feature importance results to pipeline data
Examples
>>> # Basic decision tree analysis >>> pipeline.feature_importance.add.decision_tree( ... "folded_vs_unfolded", ... "basic_tree" ... )
>>> # Decision tree with custom parameters >>> pipeline.feature_importance.add.decision_tree( ... "state_comparison", ... "deep_tree", ... max_depth=10, ... min_samples_split=20, ... random_state=42 ... )
>>> # Decision tree with entropy criterion >>> pipeline.feature_importance.add.decision_tree( ... "conformational_states", ... "entropy_tree", ... criterion="entropy", ... max_features="sqrt", ... class_weight="balanced" ... )
Notes
Decision trees provide interpretable feature importance based on split criteria. Higher importance scores indicate features that contribute more to reducing impurity when making classification decisions.
Uses sklearn.tree.DecisionTreeClassifier internally.