Feature Importance Facade
GitHub Link to Code.
Facade for feature importance visualization.
Provides simplified interface for creating plots based on feature importance analysis results. Coordinates between feature importance data and specialized plotters.
- class mdxplain.plots.services.feature_importance_facade.FeatureImportanceFacade(manager, pipeline_data)
Facade for feature importance visualization.
Provides high-level interface for creating visualizations from feature importance analysis results. Simplifies access to specialized plotters while managing pipeline data and configuration.
Examples
>>> # Access via plots manager >>> facade = plots_manager.feature_importance >>> fig = facade.violins("tree_analysis", n_top=10) >>> fig = facade.densities("tree_analysis", n_top=10) >>> fig = facade.time_series("tree_analysis", n_top=5)
- __init__(manager, pipeline_data) None
Initialize feature importance facade.
Parameters
- managerPlotsManager
Plots manager instance
- pipeline_dataPipelineData
Pipeline data container
Returns
- None
Initializes FeatureImportanceFacade instance
- violins(feature_importance_name: str, n_top: int = 10, contact_transformation: bool = True, max_cols: int = 4, long_labels: bool = False, contact_threshold: float | None = 4.5, title: str | None = None, legend_title: str | None = None, legend_labels: Dict[str, str] | None = None, save_fig: bool = False, filename: str | None = None, file_format: str = 'png', dpi: int = 300, title_fontsize: int | None = None, subplot_title_fontsize: int | None = None, ylabel_fontsize: int | None = None, tick_fontsize: int | None = None, legend_fontsize: int | None = None, legend_title_fontsize: int | None = None) Figure
Create violin plots from feature importance analysis.
Visualizes the distribution of feature values showing separate violins for each DataSelector group with cluster-consistent colors.
Parameters
- feature_importance_namestr
Name of feature importance analysis
- n_topint, default=10
Number of top features per comparison
- contact_transformationbool, default=True
If True, automatically convert contact features to distances. If False, plot contacts as binary values with Gaussian smoothing.
- max_colsint, default=4
Maximum number of columns in grid layout. Each (Feature, DataSelector) combination gets its own subplot arranged in a grid.
- long_labelsbool, default=False
If True, use long descriptive labels for discrete features (e.g., “Contact”/”Non-Contact”, “Alpha helix”/”Loop”). If False, use short labels (e.g., “C”/”NC”, “H”/”C”). Automatically adjusts subplot spacing when True to prevent overlap.
- contact_thresholdfloat, optional
Distance threshold in Angstrom for drawing contact threshold line on distance features. If provided, draws a red dashed horizontal line at this distance value. Common value: 4.5 Å (default cutoff for contacts).
- titlestr, optional
Custom plot title. Auto-generated if None.
- legend_titlestr, optional
Custom title for the legend.
- legend_labelsDict[str, str], optional
Custom labels for the legend.
- save_figbool, default=False
Save figure to file
- filenamestr, optional
Custom filename. Auto-generated if None.
- file_formatstr, default=”png”
File format for saving (png, pdf, svg, etc.)
- dpiint, default=300
Resolution for saved figure in dots per inch
- title_fontsizeint, optional
Font size for the main title.
- subplot_title_fontsizeint, optional
Font size for the subplot titles.
- ylabel_fontsizeint, optional
Font size for the y-axis labels.
- tick_fontsizeint, optional
Font size for the tick labels.
- legend_fontsizeint, optional
Font size for the legend entries.
- legend_title_fontsizeint, optional
Font size for the legend title.
Returns
- matplotlib.figure.Figure
Figure object containing violin plots
Raises
- ValueError
If parameters invalid or required parameters missing for chosen mode
Examples
>>> # Basic violin plot >>> fig = facade.violins( ... feature_importance_name="tree_analysis", ... n_top=10 ... )
>>> # With long descriptive labels for discrete features >>> fig = facade.violins( ... feature_importance_name="tree_analysis", ... n_top=10, ... long_labels=True ... )
>>> # Save to file >>> fig = facade.violins( ... feature_importance_name="tree_analysis", ... n_top=10, ... save_fig=True, ... filename="important_features.pdf", ... file_format="pdf" ... )
Notes
Each (Feature, DataSelector) combination gets its own subplot
contact_transformation=True: Converts boolean contacts to distances
contact_transformation=False: Visualizes binary contacts with Gaussian smoothing (tall+wide peaks for dominant states, short+narrow for rare states)
Uses DataSelector-based color mapping for cluster consistency
Y-axis shows feature values with units (Distance, Angle, etc.)
Each violin is centered in its subplot showing the full distribution
Grid layout controlled by max_cols parameter (default: 4 columns)
- densities(feature_importance_name: str, n_top: int = 10, contact_transformation: bool = True, max_cols: int = 4, long_labels: bool = False, kde_bandwidth: str | float = 'scott', base_sigma: float = 0.05, max_sigma: float = 0.12, alpha: float = 0.3, line_width: float = 2.0, contact_threshold: float | None = 4.5, title: str | None = None, legend_title: str | None = None, legend_labels: Dict[str, str] | None = None, save_fig: bool = False, filename: str | None = None, file_format: str = 'png', dpi: int = 300, title_fontsize: int | None = None, subplot_title_fontsize: int | None = None, xlabel_fontsize: int | None = None, ylabel_fontsize: int | None = None, tick_fontsize: int | None = None, legend_fontsize: int | None = None, legend_title_fontsize: int | None = None, fill: bool = True, discrete_plot_mode: str = 'density', colors: str | Dict[str, str] | None = None, vertical_markers: Dict[int | str, float | List[float]] | None = None, vertical_marker_labels: str | Dict[int | str, str] | None = None, vertical_marker_label_colors: str | Dict[str, str] | None = None) Figure
Create density plots from feature importance analysis.
Visualizes feature distributions as overlaid density curves in grid layout. Each feature gets one grid cell with curves for each DataSelector group using cluster-consistent colors.
Parameters
- feature_importance_namestr
Name of feature importance analysis
- n_topint, default=10
Number of top features per comparison
- contact_transformationbool, default=True
If True, convert boolean contact features (0/1) to continuous distances for smoother visualization. If False, plot contacts using Gaussian smoothing with height-dependent widths.
- max_colsint, default=4
Maximum number of columns in grid layout. Actual layout may use fewer columns to maintain roughly square overall shape.
- long_labelsbool, default=False
If True, use long descriptive labels for discrete features (e.g., “Contact”/”Non-Contact”, “Alpha helix”/”Loop”). If False, use short labels (e.g., “C”/”NC”, “H”/”C”).
- kde_bandwidthstr or float, default=”scott”
KDE bandwidth for continuous features:
“scott”: Scott’s rule (automatic bandwidth selection)
“silverman”: Silverman’s rule
float: Manual bandwidth value
- base_sigmafloat, default=0.05
Minimum Gaussian width for binary contact features (narrowest peak)
- max_sigmafloat, default=0.12
Maximum Gaussian width for binary contact features (widest peak)
- alphafloat, default=0.3
Transparency for filled density curves (0=transparent, 1=opaque)
- line_widthfloat, default=2.0
Width of density curve contour lines
- fillbool, default=True
If True, draw filled density areas in addition to contour lines. If False, draw contour lines only.
- discrete_plot_modestr, default=”density”
Rendering mode for discrete features:
“density”: Gaussian-smoothed discrete distributions
“bar”: grouped probability bars (recommended for discrete features)
- colorsstr or Dict[str, str], optional
Color configuration for DataSelectors:
str: matplotlib colormap name (e.g., “tab10”)
dict: explicit DataSelector -> color mapping
None: automatic cluster-consistent DataSelector mapping (cluster_* names keep their cluster color)
- vertical_markersDict[int or str, float or List[float]], optional
Optional vertical guide markers keyed by DataSelector.
- vertical_marker_labelsstr or dict, optional
Optional labels for marker legend entries. Use one shared label string or dict[key] = label.
- vertical_marker_label_colorsstr or dict, optional
Optional legend color override for marker labels: one shared color or dict[label] = color.
- contact_thresholdfloat, optional
Distance threshold in Angstrom for drawing contact threshold line.
- titlestr, optional
Custom plot title. Auto-generated if None.
- legend_titlestr, optional
Custom title for DataSelector legend. If None, uses “DataSelectors”.
- legend_labelsDict[str, str], optional
Custom labels for DataSelectors in legend. Maps original names to display names. Example: {“cluster_0”: “Inactive”, “cluster_1”: “Active”}
- save_figbool, default=False
Save figure to file
- filenamestr, optional
Custom filename. Auto-generated if None.
- file_formatstr, default=”png”
File format for saving (png, pdf, svg, etc.)
- dpiint, default=300
Resolution for saved figure in dots per inch
- title_fontsizeint, optional
Font size for the main title.
- subplot_title_fontsizeint, optional
Font size for the subplot titles.
- xlabel_fontsizeint, optional
Font size for the x-axis labels.
- ylabel_fontsizeint, optional
Font size for the y-axis labels.
- tick_fontsizeint, optional
Font size for the tick labels.
- legend_fontsizeint, optional
Font size for the legend entries.
- legend_title_fontsizeint, optional
Font size for the legend title.
Returns
- matplotlib.figure.Figure
Figure object containing density plots in grid layout
Raises
- ValueError
If parameters invalid or required parameters missing for chosen mode
Examples
>>> # Basic density plot >>> fig = facade.densities( ... feature_importance_name="tree_analysis", ... n_top=10 ... )
>>> # Plot binary contacts without distance transformation >>> fig = facade.densities( ... feature_importance_name="tree_analysis", ... n_top=10, ... contact_transformation=False, ... base_sigma=0.04, ... max_sigma=0.15 ... )
>>> # Custom KDE bandwidth for continuous features >>> fig = facade.densities( ... feature_importance_name="tree_analysis", ... n_top=10, ... kde_bandwidth=0.5 ... )
>>> # Save to file with custom layout >>> fig = facade.densities( ... feature_importance_name="tree_analysis", ... n_top=15, ... max_cols=5, ... save_fig=True, ... filename="density_plots.pdf", ... file_format="pdf" ... )
Notes
Binary Contact Features:
When contact_transformation=True: Converts to distances (default)
When contact_transformation=False: Uses Gaussian smoothing where:
Dominant states (high probability) → tall AND wide peaks
Rare states (low probability) → short AND narrow peaks
This prevents visual overlap when multiple DataSelectors plotted
Continuous Features:
Uses standard Kernel Density Estimation (KDE)
Automatic bandwidth selection via Scott’s or Silverman’s rule
Manual bandwidth control available via kde_bandwidth parameter
Grid Layout:
Features grouped by type where possible
max_cols controls maximum columns (default: 4)
Layout algorithm maintains roughly square overall shape
Each grid cell shows one feature with overlaid curves
Color Mapping:
Uses DataSelector-based colors for cluster consistency
Same colors across all plots in pipeline for same DataSelectors
Filled curves with transparency (alpha) + solid contour lines
- time_series(feature_importance_name: str, n_top: int = 5, traj_selection: int | str | List | all = 'all', use_time: bool = True, tags_for_coloring: List[str] | None = None, allow_multi_tag_plotting: bool = False, clustering_name: str | None = None, membership_per_feature: bool = False, membership_traj_selection: str | int | List = 'all', contact_transformation: bool = True, max_cols: int = 2, long_labels: bool = False, subplot_height: float = 2.5, membership_bar_height: float | None = None, show_legend: bool = True, contact_threshold: float | None = 4.5, title: str | None = None, save_fig: bool = False, filename: str | None = None, file_format: str = 'png', dpi: int = 300, smoothing: bool = True, smoothing_method: str = 'savitzky', smoothing_window: int = 51, smoothing_polyorder: int = 3, show_unsmoothed_background: bool = True, title_fontsize: int | None = None, subplot_title_fontsize: int | None = None, xlabel_fontsize: int | None = None, ylabel_fontsize: int | None = None, tick_fontsize: int | None = None, legend_fontsize: int | None = None, legend_title_fontsize: int | None = None, discrete_plot_style: str = 'step', discrete_layout: str = 'auto', discrete_offset_span: float = 0.28, discrete_auto_offset_threshold: int = 15, thickness: float = 1.0, colors: str | Dict[str, str] | None = None, vertical_markers: Dict[int | str, float | List[float]] | None = None, vertical_marker_labels: str | Dict[int | str, str] | None = None, vertical_marker_label_colors: str | Dict[str, str] | None = None, vertical_marker_mode: str = 'auto') Figure
Create time series plots from feature importance analysis.
Visualizes temporal evolution of important features as line plots with one subplot per feature. Each trajectory is shown as a separate line, optionally colored by trajectory number or tags. Can include cluster membership visualization as colored bars below plots.
Parameters
- feature_importance_namestr
Name of feature importance analysis
- n_topint, default=5
Number of top features per sub-comparison. Union taken across all sub-comparisons to determine features to plot.
- traj_selectionint, str, list, or “all”, default=”all”
Trajectories to plot. Can be indices, names, tags, or “all”.
- use_timebool, default=True
If True, use Time (ns) for x-axis. If False, use frame numbers.
- tags_for_coloringlist of str, optional
Tags to use for trajectory coloring. If set, automatically enables tag-based coloring. Trajectories grouped by shared tags from this list.
- allow_multi_tag_plottingbool, default=False
How to handle trajectories with multiple matching tags:
False: Exclude trajectories with multiple tags
True: Plot such trajectories multiple times (once per tag)
- clustering_namestr, optional
Name of clustering analysis for membership visualization. If None, no membership bars shown.
- membership_per_featurebool, default=False
If True, show membership bar below each feature subplot. If False, show single membership bar at bottom of figure.
- membership_traj_selectionint, str, list, or “all”, default=”all”
Trajectories to include in membership visualization. Can differ from main traj_selection.
- contact_transformationbool, default=True
If True, automatically convert contact features to distances.
- max_colsint, default=2
Maximum number of columns in grid layout. Each feature gets one grid cell arranged in rows and columns.
- long_labelsbool, default=False
If True, use long descriptive labels for discrete features (e.g., “Contact”/”Non-Contact”). If False, use short labels (e.g., “C”/”NC”). Applies to binary/discrete features only.
- subplot_heightfloat, default=2.5
Height per feature subplot in inches
- membership_bar_heightfloat, optional
Height per trajectory in membership bar in inches. Default: 0.25 (membership_per_feature=True) or 0.5 (False)
- show_legendbool, default=True
Show legend for trajectory/tag colors
- contact_thresholdOptional[float], default=4.5
Distance threshold in Angstrom for drawing contact threshold line on distance features. If provided, draws a red dashed horizontal line.
- titlestr, optional
Custom plot title. Auto-generated if None.
- save_figbool, default=False
Save figure to file
- filenamestr, optional
Custom filename. Auto-generated if None.
- file_formatstr, default=”png”
File format for saving (png, pdf, svg, etc.)
- dpiint, default=300
Resolution for saved figure in dots per inch
- smoothingbool, default=True
Enable or disable data smoothing for continuous features. Discrete features are always plotted without smoothing.
- smoothing_methodstr, default=”savitzky”
Smoothing method (“moving_average” or “savitzky”)
- smoothing_windowint, default=51
Window size for smoothing in frames
- smoothing_polyorderint, default=3
Polynomial order for Savitzky-Golay filter (ignored for moving_average)
- show_unsmoothed_backgroundbool, default=True
Show unsmoothed data as transparent background line when smoothing is enabled
- discrete_plot_stylestr, default=”step”
Rendering style for discrete features: “line”, “step”, “segments”, or “scatter”.
- discrete_layoutstr, default=”auto”
Discrete rendering layout mode: “auto”, “overlay”, “offset”, or “occupancy”. In “occupancy”, discrete lines represent states as probabilities over time instead of individual trajectories.
- discrete_offset_spanfloat, default=0.28
Vertical half-span for discrete “offset” layout.
- discrete_auto_offset_thresholdint, default=15
Number of discrete traces at which “auto” switches to “offset”.
- thicknessfloat, default=1.0
Global rendering thickness for all feature traces: marker size factor for “scatter” and line width for line-based styles.
- colorsstr or Dict[str, str], optional
Color configuration for trajectories/tags:
str: matplotlib colormap name (e.g., “tab20”)
dict: explicit mapping (trajectory_name -> color or tag -> color)
None: automatic palette assignment. Uses tag colors if tag coloring is active, otherwise trajectory colors.
- vertical_markersDict[int or str, float or List[float]], optional
Optional vertical guide markers. Keys are trajectory selectors or tag names (depending on vertical_marker_mode), values are x-positions where colored vertical lines are drawn.
- vertical_marker_labelsstr or dict, optional
Optional legend labels for marker lines. Use one shared label string or dict[key] = label.
- vertical_marker_label_colorsstr or dict, optional
Optional legend color override for marker labels: one shared color or dict[label] = color.
- vertical_marker_modestr, default=”auto”
Marker key interpretation mode: “auto”, “trajectory”, or “tag”. In “auto”, tag mode is used when tag coloring is active. In “trajectory” mode with tag coloring enabled, the first matching tag color per trajectory is used.
- title_fontsizeint, optional
Font size for main title.
- subplot_title_fontsizeint, optional
Font size for subplot titles.
- xlabel_fontsizeint, optional
Font size for x-axis labels.
- ylabel_fontsizeint, optional
Font size for y-axis labels.
- tick_fontsizeint, optional
Font size for tick labels.
- legend_fontsizeint, optional
Font size for legend entries.
- legend_title_fontsizeint, optional
Font size for legend title.
Returns
- matplotlib.figure.Figure
Figure object containing time series plots
Raises
- ValueError
If parameters are invalid or no trajectories remain after filtering
Examples
>>> # Basic time series plot with top 5 features >>> fig = facade.time_series( ... feature_importance_name="tree_analysis", ... n_top=5 ... )
>>> # Color by trajectory tags >>> fig = facade.time_series( ... feature_importance_name="tree_analysis", ... n_top=5, ... color_by_tags=True, ... tags_for_coloring=["system_A", "system_B"] ... )
>>> # Add cluster membership visualization >>> fig = facade.time_series( ... feature_importance_name="tree_analysis", ... n_top=5, ... clustering_name="dbscan_clustering", ... membership_per_feature=True ... )
>>> # Plot specific trajectories with frame numbers >>> fig = facade.time_series( ... feature_importance_name="tree_analysis", ... n_top=5, ... traj_selection=[0, 1, 2], ... use_time=False ... )
>>> # Custom layout with long labels >>> fig = facade.time_series( ... feature_importance_name="tree_analysis", ... n_top=10, ... max_cols=3, ... long_labels=True, ... subplot_height=3.0 ... )
>>> # Save high-resolution PDF >>> fig = facade.time_series( ... feature_importance_name="tree_analysis", ... n_top=10, ... save_fig=True, ... filename="feature_timeseries.pdf", ... file_format="pdf", ... dpi=600 ... )
Notes
Each feature gets its own subplot with all trajectories overlaid
X-axis shows either Time (ns) or frame numbers
Contact features automatically converted to distances (default)
Cluster membership shown as colored horizontal bars
Memory-efficient rendering using block optimization
Supports trajectory filtering by index, name, or tags
- decision_trees(feature_importance_name: str, max_depth_display: int | None = None, max_cols: int = 2, subplot_width: float = 10.0, subplot_height: float = 8.0, title: str | None = None, save_fig: bool = False, filename: str | None = None, file_format: str = 'png', dpi: int = 300, render: bool = True, separate_trees: bool | str = 'auto', width_scale_factor: float = 1.0, height_scale_factor: float = 1.0, short_labels: bool | None = None, short_naming: bool | None = None, short_layout: bool = False, short_edge_labels: bool | None = None, wrap_length: int = 40, hide_node_frames: bool | None = None, show_edge_symbols: bool | None = None, hide_feature_type_prefix: bool | None = None, hide_path: bool | None = None, edge_symbol_fontsize: int | None = None) Figure | List[str] | None
Create decision tree visualizations from feature importance analysis.
Plots the trained decision tree models from feature importance analysis in a grid layout, with one tree per sub-comparison. Only works with decision_tree analyzer type.
Parameters
- feature_importance_namestr
Name of feature importance analysis (must use decision_tree analyzer)
- max_depth_displayint, optional
Maximum tree depth to display for clarity. None shows full tree. Useful for limiting visualization of very deep trees.
- max_colsint, default=2
Maximum number of columns in grid layout
- subplot_widthfloat, default=10.0
Width of each tree subplot in inches
- subplot_heightfloat, default=8.0
Height of each tree subplot in inches
- titlestr, optional
Custom plot title. Auto-generated if None.
- save_figUnion[bool, str], default=”auto”
Whether to save figure/trees to file(s):
“auto”: True if render=False (prevents no output), else False
True: Always save
False: Never save (requires render=True)
- filenamestr, optional
Custom filename for grid mode. Auto-generated if None.
- file_formatstr, default=”png”
File format for saving (png, pdf, svg, etc.)
- dpiint, default=300
Resolution for saved figure(s) in dots per inch
- renderUnion[bool, str], default=”auto”
Whether to display in Jupyter:
“auto”: False if grid too large (>50”), True for separate trees
True: Always display
False: Never display (requires save_fig=True)
- separate_treesUnion[bool, str], default=”auto”
Tree layout mode:
“auto”: True if depth > 5 OR comparisons > 4
True: Each tree as separate plot (prevents RAM issues)
False: Grid layout (all trees in one figure)
- width_scale_factorfloat, default=1.0
Multiplicative factor for figure width (use >1.0 for wider boxes)
- height_scale_factorfloat, default=1.0
Multiplicative factor for figure height (use >1.0 for taller boxes)
- short_labelsbool, optional, default=None
Use short discrete labels (NC vs Non-Contact) for feature values. If None, determined by short_layout.
- short_namingbool, optional, default=None
Truncate class/selector names to 16 chars with […] pattern. If None, determined by short_layout.
- short_layoutbool, default=False
Minimal tree layout + enables all short options (if not explicitly set)
- short_edge_labelsbool, optional, default=None
Show only values/conditions on edges (e.g., ‘Contact’ or ‘≤ 3.50 Å’) instead of full format ‘contact: Leu13-ARG31 = Contact’. If None, determined by short_layout.
- wrap_lengthint, default=40
Maximum line length for text wrapping in node labels, class lines, feature lines, and edge labels. Text longer than this will wrap at spaces (colons, equals signs, etc.).
- hide_node_framesbool, optional, default=None
Hide frame counts in non-root nodes, showing only percentages (e.g., ‘State_A: 53.3%’ instead of ‘State_A: 80 / 150 (53.3%)’). If None, determined by short_layout.
- show_edge_symbolsbool, optional, default=None
Show only symbols on edges (✓ for left/true, ✗ for right/false) instead of text labels. If None, determined by short_layout.
- hide_feature_type_prefixbool, optional, default=None
Hide feature type prefix in labels (e.g., ‘ALA_5-GLU_10’ instead of ‘contacts: ALA_5-GLU_10’). If None, determined by short_layout.
- hide_pathbool, optional, default=None
Hide decision path in non-root nodes (shown at top of each node). If None, set to True when short_layout=True.
- edge_symbol_fontsizeint, optional
Font size for edge symbols when show_edge_symbols=True. If None, uses the default edge label font size.
Returns
matplotlib.figure.Figure, List[str], or None
Figure: Grid mode with render=True
List[str]: Separate trees with save_fig=True (filenames)
None: render=False or separate trees without saving
Raises
- ValueError
If feature_importance_name not found, analyzer type is not “decision_tree”, models not available in metadata, or both render and save_fig are False (no output method)
Examples
>>> # Basic decision tree visualization >>> fig = facade.decision_trees( ... feature_importance_name="tree_analysis" ... )
>>> # Limit tree depth for clarity >>> fig = facade.decision_trees( ... feature_importance_name="tree_analysis", ... max_depth_display=3 ... )
>>> # Custom layout with larger subplots >>> fig = facade.decision_trees( ... feature_importance_name="tree_analysis", ... max_cols=3, ... subplot_width=12.0, ... subplot_height=10.0 ... )
>>> # Save as PDF >>> fig = facade.decision_trees( ... feature_importance_name="tree_analysis", ... save_fig=True, ... filename="decision_trees.pdf", ... file_format="pdf" ... )
Notes
Red-highlighted node shows the split with maximum discriminative score
Edge labels are feature-type-specific (e.g., “Formed”/”Broken” for contacts)
Node sizes automatically adjusted to prevent overlap
Only available for decision_tree analyzer type