Welcome to mdxplain

A Python toolkit designed for interpretable molecular dynamics trajectory analysis, efficiently processing large datasets while delivering understandable results through machine learning models. It combines modular workflows with memory-efficient processing and decision trees to identify key conformational features and streamline complex analytical pipelines.

You can check and download the code on GitHub mdxplain.

mdxplain graphical abstract

Quick Start Installation

mdxplain uses a Makefile for streamlined installation and development workflows. For detailed instructions, please refer to the How to Install mdxplain guide.

git clone https://github.com/maximilian-salomon/mdxplain.git
cd mdxplain

Python Virtual Environment (Alternative)

Create a new virtual environment with production dependencies, Jupyter, and nglview:

make setup-venv
source mdxplain-venv/bin/activate

Install in Existing Environment

Install mdxplain core package in your currently active environment:

# Without Jupyter notebooks and nglview
make install

# With Jupyter notebooks and nglview
make install-jupyter

Optional: PyMOL Installation

PyMOL is not included in the standard installation to avoid compatibility issues due to its complex system-level dependencies. If you need PyMOL for 3D structure visualization, install it separately:

make install-pymol

Note: For system-specific installation instructions, please refer to the official PyMOL documentation. Alternatively, you can install PyMOL independently and load mdxplain’s generated PyMOL scripts manually.

Workflow Example

This example demonstrates a complete analysis pipeline from trajectory loading to interpretable results.

Setup and Data Loading

from mdxplain import PipelineManager

# Initialize pipeline and load trajectory data
pipeline = PipelineManager(show_progress=False)
pipeline.trajectory.load_trajectories(data_input="data/2RJY")
pipeline.trajectory.add_labels(traj_selection="all")

Feature Extraction and Selection

Extract molecular features and create a custom feature subset.

# Add distance and contact features
pipeline.feature.add.distances()
pipeline.feature.add.contacts(cutoff=4.5)

# Create and configure feature selector
pipeline.feature_selector.create("contacts_only")
pipeline.feature_selector.add.contacts("contacts_only", "all")
pipeline.feature_selector.select("contacts_only")

Dimensionality Reduction and Clustering

Reduce feature space and identify conformational states.

# Apply Contact Kernel PCA
pipeline.decomposition.add.contact_kernel_pca(
   n_components=4,
   selection_name="contacts_only",
   decomposition_name="ContactKernelPCA",
)

# Cluster using Density Peak Algorithm
pipeline.clustering.add.dpa(
   "ContactKernelPCA",
   Z=2.5,
   cluster_name="DPA_ContactKPC"
)

Feature Importance Analysis

Identify which molecular features distinguish conformational states.

# Setup comparison groups from clusters
pipeline.data_selector.create_from_clusters(
   group_name="cluster",
   clustering_name="DPA_ContactKPC"
)

# Create one-vs-rest comparison
pipeline.comparison.create_comparison(
   name="cluster_comparison",
   mode="one_vs_rest",
   feature_selector="contacts_only",
   data_selector_groups="cluster"
)

# Add decision tree for interpretability
pipeline.feature_importance.add.decision_tree(
   comparison_name="cluster_comparison",
   analysis_name="feature_importance"
)

Visualization

# Cluster membership visualization
fig = pipeline.plots.clustering.membership(clustering_name="DPA_ContactKPC")

# Free energy landscape
fig = pipeline.plots.landscape(
   decomposition_name="ContactKernelPCA",
   dimensions=[0, 1]
)

# Decision tree visualization
fig = pipeline.plots.feature_importance.decision_trees(
   feature_importance_name="feature_importance",
   short_layout=True
)

# Feature distribution comparison
fig = pipeline.plots.feature_importance.violins(
   feature_importance_name="feature_importance"
)

# Time-resolved feature importance
fig = pipeline.plots.feature_importance.time_series(
   feature_importance_name="feature_importance",
   membership_per_feature=True,
   clustering_name="DPA_ContactKPC"
)

3D Visualization with PyMOL or NGLView

pipeline.structure_visualization.feature_importance.create_pdb_with_beta_factors(
   structure_viz_name="structure_viz",
   feature_importance_name="feature_importance",
)
# Create PyMOL script and call pymol structure_viz.pml
pipeline.structure_visualization.feature_importance.create_pymol_script(
   structure_viz_name="structure_viz"
)
# Or use nglview in Jupyter notebooks
ui, view = pipeline.structure_visualization.feature_importance.visualize_nglview_jupyter(
   structure_viz_name="structure_viz",
)