Feature Importance Facade

GitHub Link to Code.

Facade for feature importance visualization.

Provides simplified interface for creating plots based on feature importance analysis results. Coordinates between feature importance data and specialized plotters.

class mdxplain.plots.services.feature_importance_facade.FeatureImportanceFacade(manager, pipeline_data)

Facade for feature importance visualization.

Provides high-level interface for creating visualizations from feature importance analysis results. Simplifies access to specialized plotters while managing pipeline data and configuration.

Examples

>>> # Access via plots manager
>>> facade = plots_manager.feature_importance
>>> fig = facade.violins("tree_analysis", n_top=10)
>>> fig = facade.densities("tree_analysis", n_top=10)
>>> fig = facade.time_series("tree_analysis", n_top=5)

__init__(manager, pipeline_data) → None

Initialize feature importance facade.

Parameters

managerPlotsManager: Plots manager instance
pipeline_dataPipelineData: Pipeline data container

Returns

None: Initializes FeatureImportanceFacade instance

violins(feature_importance_name: str, n_top: int = 10, contact_transformation: bool = True, max_cols: int = 4, long_labels: bool = False, contact_threshold: float | None = 4.5, title: str | None = None, legend_title: str | None = None, legend_labels: Dict[str, str] | None = None, save_fig: bool = False, filename: str | None = None, file_format: str = 'png', dpi: int = 300, title_fontsize: int | None = None, subplot_title_fontsize: int | None = None, ylabel_fontsize: int | None = None, tick_fontsize: int | None = None, legend_fontsize: int | None = None, legend_title_fontsize: int | None = None) → Figure

Create violin plots from feature importance analysis.

Visualizes the distribution of feature values showing separate violins for each DataSelector group with cluster-consistent colors.

Parameters

feature_importance_namestr: Name of feature importance analysis
n_topint, default=10: Number of top features per comparison
contact_transformationbool, default=True: If True, automatically convert contact features to distances. If False, plot contacts as binary values with Gaussian smoothing.
max_colsint, default=4: Maximum number of columns in grid layout. Each (Feature, DataSelector) combination gets its own subplot arranged in a grid.
long_labelsbool, default=False: If True, use long descriptive labels for discrete features (e.g., “Contact”/”Non-Contact”, “Alpha helix”/”Loop”). If False, use short labels (e.g., “C”/”NC”, “H”/”C”). Automatically adjusts subplot spacing when True to prevent overlap.
contact_thresholdfloat, optional: Distance threshold in Angstrom for drawing contact threshold line on distance features. If provided, draws a red dashed horizontal line at this distance value. Common value: 4.5 Å (default cutoff for contacts).
titlestr, optional: Custom plot title. Auto-generated if None.
legend_titlestr, optional: Custom title for the legend.
legend_labelsDict[str, str], optional: Custom labels for the legend.
save_figbool, default=False: Save figure to file
filenamestr, optional: Custom filename. Auto-generated if None.
file_formatstr, default=”png”: File format for saving (png, pdf, svg, etc.)
dpiint, default=300: Resolution for saved figure in dots per inch
title_fontsizeint, optional: Font size for the main title.
subplot_title_fontsizeint, optional: Font size for the subplot titles.
ylabel_fontsizeint, optional: Font size for the y-axis labels.
tick_fontsizeint, optional: Font size for the tick labels.
legend_fontsizeint, optional: Font size for the legend entries.
legend_title_fontsizeint, optional: Font size for the legend title.

Returns

matplotlib.figure.Figure: Figure object containing violin plots

Raises

ValueError: If parameters invalid or required parameters missing for chosen mode

Examples

>>> # Basic violin plot
>>> fig = facade.violins(
...     feature_importance_name="tree_analysis",
...     n_top=10
... )

>>> # With long descriptive labels for discrete features
>>> fig = facade.violins(
...     feature_importance_name="tree_analysis",
...     n_top=10,
...     long_labels=True
... )

>>> # Save to file
>>> fig = facade.violins(
...     feature_importance_name="tree_analysis",
...     n_top=10,
...     save_fig=True,
...     filename="important_features.pdf",
...     file_format="pdf"
... )

Notes

Each (Feature, DataSelector) combination gets its own subplot
contact_transformation=True: Converts boolean contacts to distances
contact_transformation=False: Visualizes binary contacts with Gaussian smoothing (tall+wide peaks for dominant states, short+narrow for rare states)
Uses DataSelector-based color mapping for cluster consistency
Y-axis shows feature values with units (Distance, Angle, etc.)
Each violin is centered in its subplot showing the full distribution
Grid layout controlled by max_cols parameter (default: 4 columns)

densities(feature_importance_name: str, n_top: int = 10, contact_transformation: bool = True, max_cols: int = 4, long_labels: bool = False, kde_bandwidth: str | float = 'scott', base_sigma: float = 0.05, max_sigma: float = 0.12, alpha: float = 0.3, line_width: float = 2.0, contact_threshold: float | None = 4.5, title: str | None = None, legend_title: str | None = None, legend_labels: Dict[str, str] | None = None, save_fig: bool = False, filename: str | None = None, file_format: str = 'png', dpi: int = 300, title_fontsize: int | None = None, subplot_title_fontsize: int | None = None, xlabel_fontsize: int | None = None, ylabel_fontsize: int | None = None, tick_fontsize: int | None = None, legend_fontsize: int | None = None, legend_title_fontsize: int | None = None, fill: bool = True, discrete_plot_mode: str = 'density', colors: str | Dict[str, str] | None = None, vertical_markers: Dict[int | str, float | List[float]] | None = None, vertical_marker_labels: str | Dict[int | str, str] | None = None, vertical_marker_label_colors: str | Dict[str, str] | None = None) → Figure

Create density plots from feature importance analysis.

Visualizes feature distributions as overlaid density curves in grid layout. Each feature gets one grid cell with curves for each DataSelector group using cluster-consistent colors.

Parameters

feature_importance_namestr

Name of feature importance analysis

n_topint, default=10

Number of top features per comparison

contact_transformationbool, default=True

If True, convert boolean contact features (0/1) to continuous distances for smoother visualization. If False, plot contacts using Gaussian smoothing with height-dependent widths.

max_colsint, default=4

Maximum number of columns in grid layout. Actual layout may use fewer columns to maintain roughly square overall shape.

long_labelsbool, default=False

If True, use long descriptive labels for discrete features (e.g., “Contact”/”Non-Contact”, “Alpha helix”/”Loop”). If False, use short labels (e.g., “C”/”NC”, “H”/”C”).

kde_bandwidthstr or float, default=”scott”

KDE bandwidth for continuous features:

“scott”: Scott’s rule (automatic bandwidth selection)
“silverman”: Silverman’s rule
float: Manual bandwidth value

base_sigmafloat, default=0.05

Minimum Gaussian width for binary contact features (narrowest peak)

max_sigmafloat, default=0.12

Maximum Gaussian width for binary contact features (widest peak)

alphafloat, default=0.3

Transparency for filled density curves (0=transparent, 1=opaque)

line_widthfloat, default=2.0

Width of density curve contour lines

fillbool, default=True

If True, draw filled density areas in addition to contour lines. If False, draw contour lines only.

discrete_plot_modestr, default=”density”

Rendering mode for discrete features:

“density”: Gaussian-smoothed discrete distributions
“bar”: grouped probability bars (recommended for discrete features)

colorsstr or Dict[str, str], optional

Color configuration for DataSelectors:

str: matplotlib colormap name (e.g., “tab10”)
dict: explicit DataSelector -> color mapping
None: automatic cluster-consistent DataSelector mapping (cluster_* names keep their cluster color)

vertical_markersDict[int or str, float or List[float]], optional

Optional vertical guide markers keyed by DataSelector.

vertical_marker_labelsstr or dict, optional

Optional labels for marker legend entries. Use one shared label string or dict[key] = label.

vertical_marker_label_colorsstr or dict, optional

Optional legend color override for marker labels: one shared color or dict[label] = color.

contact_thresholdfloat, optional

Distance threshold in Angstrom for drawing contact threshold line.

titlestr, optional

Custom plot title. Auto-generated if None.

legend_titlestr, optional

Custom title for DataSelector legend. If None, uses “DataSelectors”.

legend_labelsDict[str, str], optional

Custom labels for DataSelectors in legend. Maps original names to display names. Example: {“cluster_0”: “Inactive”, “cluster_1”: “Active”}

save_figbool, default=False

Save figure to file

filenamestr, optional

Custom filename. Auto-generated if None.

file_formatstr, default=”png”

File format for saving (png, pdf, svg, etc.)

dpiint, default=300

Resolution for saved figure in dots per inch

title_fontsizeint, optional

Font size for the main title.

subplot_title_fontsizeint, optional

Font size for the subplot titles.

xlabel_fontsizeint, optional

Font size for the x-axis labels.

ylabel_fontsizeint, optional

Font size for the y-axis labels.

tick_fontsizeint, optional

Font size for the tick labels.

legend_fontsizeint, optional

Font size for the legend entries.

legend_title_fontsizeint, optional

Font size for the legend title.

Returns

matplotlib.figure.Figure: Figure object containing density plots in grid layout

Raises

ValueError: If parameters invalid or required parameters missing for chosen mode

Examples

>>> # Basic density plot
>>> fig = facade.densities(
...     feature_importance_name="tree_analysis",
...     n_top=10
... )

>>> # Plot binary contacts without distance transformation
>>> fig = facade.densities(
...     feature_importance_name="tree_analysis",
...     n_top=10,
...     contact_transformation=False,
...     base_sigma=0.04,
...     max_sigma=0.15
... )

>>> # Custom KDE bandwidth for continuous features
>>> fig = facade.densities(
...     feature_importance_name="tree_analysis",
...     n_top=10,
...     kde_bandwidth=0.5
... )

>>> # Save to file with custom layout
>>> fig = facade.densities(
...     feature_importance_name="tree_analysis",
...     n_top=15,
...     max_cols=5,
...     save_fig=True,
...     filename="density_plots.pdf",
...     file_format="pdf"
... )

Notes

Binary Contact Features:

When contact_transformation=True: Converts to distances (default)
When contact_transformation=False: Uses Gaussian smoothing where:
- Dominant states (high probability) → tall AND wide peaks
- Rare states (low probability) → short AND narrow peaks
- This prevents visual overlap when multiple DataSelectors plotted

Continuous Features:

Uses standard Kernel Density Estimation (KDE)
Automatic bandwidth selection via Scott’s or Silverman’s rule
Manual bandwidth control available via kde_bandwidth parameter

Grid Layout:

Features grouped by type where possible
max_cols controls maximum columns (default: 4)
Layout algorithm maintains roughly square overall shape
Each grid cell shows one feature with overlaid curves

Color Mapping:

Uses DataSelector-based colors for cluster consistency
Same colors across all plots in pipeline for same DataSelectors
Filled curves with transparency (alpha) + solid contour lines

time_series(feature_importance_name: str, n_top: int = 5, traj_selection: int | str | List | all = 'all', use_time: bool = True, tags_for_coloring: List[str] | None = None, allow_multi_tag_plotting: bool = False, clustering_name: str | None = None, membership_per_feature: bool = False, membership_traj_selection: str | int | List = 'all', contact_transformation: bool = True, max_cols: int = 2, long_labels: bool = False, subplot_height: float = 2.5, membership_bar_height: float | None = None, show_legend: bool = True, contact_threshold: float | None = 4.5, title: str | None = None, save_fig: bool = False, filename: str | None = None, file_format: str = 'png', dpi: int = 300, smoothing: bool = True, smoothing_method: str = 'savitzky', smoothing_window: int = 51, smoothing_polyorder: int = 3, show_unsmoothed_background: bool = True, title_fontsize: int | None = None, subplot_title_fontsize: int | None = None, xlabel_fontsize: int | None = None, ylabel_fontsize: int | None = None, tick_fontsize: int | None = None, legend_fontsize: int | None = None, legend_title_fontsize: int | None = None, discrete_plot_style: str = 'step', discrete_layout: str = 'auto', discrete_offset_span: float = 0.28, discrete_auto_offset_threshold: int = 15, thickness: float = 1.0, colors: str | Dict[str, str] | None = None, vertical_markers: Dict[int | str, float | List[float]] | None = None, vertical_marker_labels: str | Dict[int | str, str] | None = None, vertical_marker_label_colors: str | Dict[str, str] | None = None, vertical_marker_mode: str = 'auto') → Figure

Create time series plots from feature importance analysis.

Visualizes temporal evolution of important features as line plots with one subplot per feature. Each trajectory is shown as a separate line, optionally colored by trajectory number or tags. Can include cluster membership visualization as colored bars below plots.

Parameters

feature_importance_namestr

Name of feature importance analysis

n_topint, default=5

Number of top features per sub-comparison. Union taken across all sub-comparisons to determine features to plot.

traj_selectionint, str, list, or “all”, default=”all”

Trajectories to plot. Can be indices, names, tags, or “all”.

use_timebool, default=True

If True, use Time (ns) for x-axis. If False, use frame numbers.

tags_for_coloringlist of str, optional

Tags to use for trajectory coloring. If set, automatically enables tag-based coloring. Trajectories grouped by shared tags from this list.

allow_multi_tag_plottingbool, default=False

How to handle trajectories with multiple matching tags:

False: Exclude trajectories with multiple tags
True: Plot such trajectories multiple times (once per tag)

clustering_namestr, optional

Name of clustering analysis for membership visualization. If None, no membership bars shown.

membership_per_featurebool, default=False

If True, show membership bar below each feature subplot. If False, show single membership bar at bottom of figure.

membership_traj_selectionint, str, list, or “all”, default=”all”

Trajectories to include in membership visualization. Can differ from main traj_selection.

contact_transformationbool, default=True

If True, automatically convert contact features to distances.

max_colsint, default=2

Maximum number of columns in grid layout. Each feature gets one grid cell arranged in rows and columns.

long_labelsbool, default=False

If True, use long descriptive labels for discrete features (e.g., “Contact”/”Non-Contact”). If False, use short labels (e.g., “C”/”NC”). Applies to binary/discrete features only.

subplot_heightfloat, default=2.5

Height per feature subplot in inches

membership_bar_heightfloat, optional

Height per trajectory in membership bar in inches. Default: 0.25 (membership_per_feature=True) or 0.5 (False)

show_legendbool, default=True

Show legend for trajectory/tag colors

contact_thresholdOptional[float], default=4.5

Distance threshold in Angstrom for drawing contact threshold line on distance features. If provided, draws a red dashed horizontal line.

titlestr, optional

Custom plot title. Auto-generated if None.

save_figbool, default=False

Save figure to file

filenamestr, optional

Custom filename. Auto-generated if None.

file_formatstr, default=”png”

File format for saving (png, pdf, svg, etc.)

dpiint, default=300

Resolution for saved figure in dots per inch

smoothingbool, default=True

Enable or disable data smoothing for continuous features. Discrete features are always plotted without smoothing.

smoothing_methodstr, default=”savitzky”

Smoothing method (“moving_average” or “savitzky”)

smoothing_windowint, default=51

Window size for smoothing in frames

smoothing_polyorderint, default=3

Polynomial order for Savitzky-Golay filter (ignored for moving_average)

show_unsmoothed_backgroundbool, default=True

Show unsmoothed data as transparent background line when smoothing is enabled

discrete_plot_stylestr, default=”step”

Rendering style for discrete features: “line”, “step”, “segments”, or “scatter”.

discrete_layoutstr, default=”auto”

Discrete rendering layout mode: “auto”, “overlay”, “offset”, or “occupancy”. In “occupancy”, discrete lines represent states as probabilities over time instead of individual trajectories.

discrete_offset_spanfloat, default=0.28

Vertical half-span for discrete “offset” layout.

discrete_auto_offset_thresholdint, default=15

Number of discrete traces at which “auto” switches to “offset”.

thicknessfloat, default=1.0

Global rendering thickness for all feature traces: marker size factor for “scatter” and line width for line-based styles.

colorsstr or Dict[str, str], optional

Color configuration for trajectories/tags:

str: matplotlib colormap name (e.g., “tab20”)
dict: explicit mapping (trajectory_name -> color or tag -> color)
None: automatic palette assignment. Uses tag colors if tag coloring is active, otherwise trajectory colors.

vertical_markersDict[int or str, float or List[float]], optional

Optional vertical guide markers. Keys are trajectory selectors or tag names (depending on vertical_marker_mode), values are x-positions where colored vertical lines are drawn.

vertical_marker_labelsstr or dict, optional

Optional legend labels for marker lines. Use one shared label string or dict[key] = label.

vertical_marker_label_colorsstr or dict, optional

Optional legend color override for marker labels: one shared color or dict[label] = color.

vertical_marker_modestr, default=”auto”

Marker key interpretation mode: “auto”, “trajectory”, or “tag”. In “auto”, tag mode is used when tag coloring is active. In “trajectory” mode with tag coloring enabled, the first matching tag color per trajectory is used.

title_fontsizeint, optional

Font size for main title.

subplot_title_fontsizeint, optional

Font size for subplot titles.

xlabel_fontsizeint, optional

Font size for x-axis labels.

ylabel_fontsizeint, optional

Font size for y-axis labels.

tick_fontsizeint, optional

Font size for tick labels.

legend_fontsizeint, optional

Font size for legend entries.

legend_title_fontsizeint, optional

Font size for legend title.

Returns

matplotlib.figure.Figure: Figure object containing time series plots

Raises

ValueError: If parameters are invalid or no trajectories remain after filtering

Examples

>>> # Basic time series plot with top 5 features
>>> fig = facade.time_series(
...     feature_importance_name="tree_analysis",
...     n_top=5
... )

>>> # Color by trajectory tags
>>> fig = facade.time_series(
...     feature_importance_name="tree_analysis",
...     n_top=5,
...     color_by_tags=True,
...     tags_for_coloring=["system_A", "system_B"]
... )

>>> # Add cluster membership visualization
>>> fig = facade.time_series(
...     feature_importance_name="tree_analysis",
...     n_top=5,
...     clustering_name="dbscan_clustering",
...     membership_per_feature=True
... )

>>> # Plot specific trajectories with frame numbers
>>> fig = facade.time_series(
...     feature_importance_name="tree_analysis",
...     n_top=5,
...     traj_selection=[0, 1, 2],
...     use_time=False
... )

>>> # Custom layout with long labels
>>> fig = facade.time_series(
...     feature_importance_name="tree_analysis",
...     n_top=10,
...     max_cols=3,
...     long_labels=True,
...     subplot_height=3.0
... )

>>> # Save high-resolution PDF
>>> fig = facade.time_series(
...     feature_importance_name="tree_analysis",
...     n_top=10,
...     save_fig=True,
...     filename="feature_timeseries.pdf",
...     file_format="pdf",
...     dpi=600
... )

Notes

Each feature gets its own subplot with all trajectories overlaid
X-axis shows either Time (ns) or frame numbers
Contact features automatically converted to distances (default)
Cluster membership shown as colored horizontal bars
Memory-efficient rendering using block optimization
Supports trajectory filtering by index, name, or tags

decision_trees(feature_importance_name: str, max_depth_display: int | None = None, max_cols: int = 2, subplot_width: float = 10.0, subplot_height: float = 8.0, title: str | None = None, save_fig: bool = False, filename: str | None = None, file_format: str = 'png', dpi: int = 300, render: bool = True, separate_trees: bool | str = 'auto', width_scale_factor: float = 1.0, height_scale_factor: float = 1.0, short_labels: bool | None = None, short_naming: bool | None = None, short_layout: bool = False, short_edge_labels: bool | None = None, wrap_length: int = 40, hide_node_frames: bool | None = None, show_edge_symbols: bool | None = None, hide_feature_type_prefix: bool | None = None, hide_path: bool | None = None, edge_symbol_fontsize: int | None = None) → Figure | List[str] | None

Create decision tree visualizations from feature importance analysis.

Plots the trained decision tree models from feature importance analysis in a grid layout, with one tree per sub-comparison. Only works with decision_tree analyzer type.

Parameters

feature_importance_namestr

Name of feature importance analysis (must use decision_tree analyzer)

max_depth_displayint, optional

Maximum tree depth to display for clarity. None shows full tree. Useful for limiting visualization of very deep trees.

max_colsint, default=2

Maximum number of columns in grid layout

subplot_widthfloat, default=10.0

Width of each tree subplot in inches

subplot_heightfloat, default=8.0

Height of each tree subplot in inches

titlestr, optional

Custom plot title. Auto-generated if None.

save_figUnion[bool, str], default=”auto”

Whether to save figure/trees to file(s):

“auto”: True if render=False (prevents no output), else False
True: Always save
False: Never save (requires render=True)

filenamestr, optional

Custom filename for grid mode. Auto-generated if None.

file_formatstr, default=”png”

File format for saving (png, pdf, svg, etc.)

dpiint, default=300

Resolution for saved figure(s) in dots per inch

renderUnion[bool, str], default=”auto”

Whether to display in Jupyter:

“auto”: False if grid too large (>50”), True for separate trees
True: Always display
False: Never display (requires save_fig=True)

separate_treesUnion[bool, str], default=”auto”

Tree layout mode:

“auto”: True if depth > 5 OR comparisons > 4
True: Each tree as separate plot (prevents RAM issues)
False: Grid layout (all trees in one figure)

width_scale_factorfloat, default=1.0

Multiplicative factor for figure width (use >1.0 for wider boxes)

height_scale_factorfloat, default=1.0

Multiplicative factor for figure height (use >1.0 for taller boxes)

short_labelsbool, optional, default=None

Use short discrete labels (NC vs Non-Contact) for feature values. If None, determined by short_layout.

short_namingbool, optional, default=None

Truncate class/selector names to 16 chars with […] pattern. If None, determined by short_layout.

short_layoutbool, default=False

Minimal tree layout + enables all short options (if not explicitly set)

short_edge_labelsbool, optional, default=None

Show only values/conditions on edges (e.g., ‘Contact’ or ‘≤ 3.50 Å’) instead of full format ‘contact: Leu13-ARG31 = Contact’. If None, determined by short_layout.

wrap_lengthint, default=40

Maximum line length for text wrapping in node labels, class lines, feature lines, and edge labels. Text longer than this will wrap at spaces (colons, equals signs, etc.).

hide_node_framesbool, optional, default=None

Hide frame counts in non-root nodes, showing only percentages (e.g., ‘State_A: 53.3%’ instead of ‘State_A: 80 / 150 (53.3%)’). If None, determined by short_layout.

show_edge_symbolsbool, optional, default=None

Show only symbols on edges (✓ for left/true, ✗ for right/false) instead of text labels. If None, determined by short_layout.

hide_feature_type_prefixbool, optional, default=None

Hide feature type prefix in labels (e.g., ‘ALA_5-GLU_10’ instead of ‘contacts: ALA_5-GLU_10’). If None, determined by short_layout.

hide_pathbool, optional, default=None

Hide decision path in non-root nodes (shown at top of each node). If None, set to True when short_layout=True.

edge_symbol_fontsizeint, optional

Font size for edge symbols when show_edge_symbols=True. If None, uses the default edge label font size.

Returns

matplotlib.figure.Figure, List[str], or None

Figure: Grid mode with render=True
List[str]: Separate trees with save_fig=True (filenames)
None: render=False or separate trees without saving

Raises

ValueError: If feature_importance_name not found, analyzer type is not “decision_tree”, models not available in metadata, or both render and save_fig are False (no output method)

Examples

>>> # Basic decision tree visualization
>>> fig = facade.decision_trees(
...     feature_importance_name="tree_analysis"
... )

>>> # Limit tree depth for clarity
>>> fig = facade.decision_trees(
...     feature_importance_name="tree_analysis",
...     max_depth_display=3
... )

>>> # Custom layout with larger subplots
>>> fig = facade.decision_trees(
...     feature_importance_name="tree_analysis",
...     max_cols=3,
...     subplot_width=12.0,
...     subplot_height=10.0
... )

>>> # Save as PDF
>>> fig = facade.decision_trees(
...     feature_importance_name="tree_analysis",
...     save_fig=True,
...     filename="decision_trees.pdf",
...     file_format="pdf"
... )

Notes

Red-highlighted node shows the split with maximum discriminative score
Edge labels are feature-type-specific (e.g., “Formed”/”Broken” for contacts)
Node sizes automatically adjusted to prevent overlap
Only available for decision_tree analyzer type