Feature Add Service

GitHub Link to Code.

Factory for adding features with simplified syntax.

class mdxplain.feature.services.feature_add_service.FeatureAddService(manager: FeatureManager, pipeline_data: PipelineData)

Service for adding features without explicit type instantiation.

This service provides an intuitive interface for adding features to the pipeline without requiring users to import and instantiate feature types directly. All feature type parameters are combined with add_feature parameters.

Examples

>>> pipeline.feature.add.distances(excluded_neighbors=2)
>>> pipeline.feature.add.contacts(threshold=5.0, traj_selection=[0,1,2])
>>> pipeline.feature.add.torsions(calculate_chi=False, force=True)
__init__(manager: FeatureManager, pipeline_data: PipelineData) None

Initialize factory with manager and pipeline data.

Parameters

managerFeatureManager

Feature manager instance

pipeline_dataPipelineData

Pipeline data container (injected by AutoInjectProxy)

Returns

None

distances(excluded_neighbors: int = 1, use_pbc: bool = True, traj_selection: str | int | List = 'all', force: bool = False, force_original: bool = True) None

Add distances feature type.

Computes all pairwise distances from molecular dynamics trajectories. This is a base feature type with no dependencies.

Parameters

excluded_neighborsint, default=1

Number of nearest neighbors to exclude from distance calculation. Chain breaks are automatically excluded based on sequence ID jumps. 0 = all pairs, 1 = exclude direct neighbors, 2 = exclude up to 2nd neighbors, etc.

use_pbcbool, default=True

Use periodic boundary conditions when computing distances

traj_selectionstr, int, list, default=”all”

Which trajectories to compute features for (“all”, index, list of indices, or trajectory names)

forcebool, default=False

Force recalculation even if feature already exists

force_originalbool, default=True

Whether to force using original trajectory data instead of reduced data

Returns

None

Adds distance features to pipeline data

Examples

>>> # Basic distance calculation
>>> pipeline.feature.add.distances()
>>> # With custom neighbor exclusion
>>> pipeline.feature.add.distances(excluded_neighbors=2)
>>> # Without periodic boundary conditions
>>> pipeline.feature.add.distances(use_pbc=False)
>>> # For specific trajectories only
>>> pipeline.feature.add.distances(traj_selection=[0,1,2], force=True)

Notes

Distance features are computed using MDTraj and returned in Angstroms. Missing pairs (due to chain breaks) are handled automatically.

contacts(cutoff: float = 4.5, traj_selection: str | int | List = 'all', force: bool = False, force_original: bool = True) None

Add contacts feature type.

Computes binary contact matrices from distance data using a distance threshold. Requires distances feature to be computed first as input dependency.

Parameters

cutofffloat, default=4.5

Distance cutoff in Angstroms for contact determination

traj_selectionstr, int, list, default=”all”

Which trajectories to compute features for

forcebool, default=False

Force recalculation even if feature already exists

force_originalbool, default=True

Whether to force using original trajectory data

Returns

None

Adds contact features to pipeline data

Examples

>>> # Standard contacts with 4.5Å threshold
>>> pipeline.feature.add.contacts()
>>> # Longer-range contacts
>>> pipeline.feature.add.contacts(cutoff=6.0)
>>> # Force recalculation for specific trajectories
>>> pipeline.feature.add.contacts(cutoff=5.0, traj_selection="all", force=True)

Notes

Contacts are binary (0 or 1) indicating whether atom pairs are within threshold distance. This feature depends on distances - ensure distances are computed first.

torsions(calculate_phi: bool = True, calculate_psi: bool = True, calculate_omega: bool = True, calculate_chi: bool = True, use_pbc: bool = True, traj_selection: str | int | List = 'all', force: bool = False, force_original: bool = True) None

Add torsions feature type.

Computes dihedral torsion angles including backbone (phi, psi, omega) and side chain angles (chi1-4). All angles are returned in degrees (-180 to +180).

Parameters

calculate_phibool, default=True

Whether to compute phi backbone angles

calculate_psibool, default=True

Whether to compute psi backbone angles

calculate_omegabool, default=True

Whether to compute omega backbone angles

calculate_chibool, default=True

Whether to compute side chain chi angles (chi1, chi2, chi3, chi4)

use_pbcbool, default=True

Use periodic boundary conditions when computing torsion angles

traj_selectionstr, int, list, default=”all”

Which trajectories to compute features for

forcebool, default=False

Force recalculation even if feature already exists

force_originalbool, default=True

Whether to force using original trajectory data

Returns

None

Adds torsion features to pipeline data

Examples

>>> # All angles (default)
>>> pipeline.feature.add.torsions()
>>> # Only backbone angles
>>> pipeline.feature.add.torsions(calculate_chi=False)
>>> # Without periodic boundary conditions
>>> pipeline.feature.add.torsions(use_pbc=False)
>>> # Only phi and psi
>>> pipeline.feature.add.torsions(
...     calculate_phi=True,
...     calculate_psi=True,
...     calculate_omega=False,
...     calculate_chi=False
... )
>>> # Only side chain chi angles
>>> pipeline.feature.add.torsions(
...     calculate_phi=False,
...     calculate_psi=False,
...     calculate_omega=False,
...     calculate_chi=True
... )

Notes

All angles are computed and returned in degrees. Uses the MDTraj library for angle calculations. Circular statistics should be used for analysis of torsion angles.

dssp(simplified: bool = False, encoding: str = 'char', traj_selection: str | int | List = 'all', force: bool = False, force_original: bool = True) None

Add DSSP secondary structure feature type.

Computes secondary structure classification using DSSP algorithm. Provides either 8-class (H, B, E, G, I, T, S, C) or simplified 3-class (H, E, C) output.

Parameters

simplifiedbool, default=False

Use simplified 3-state classification (H=helix, E=sheet, C=coil) If False, uses full 8-state DSSP classification

encodingstr, default=’char’

Output encoding format:

  • ‘char’: Character codes (‘H’, ‘E’, ‘C’, etc.)

  • ‘onehot’: One-hot encoded binary vectors

  • ‘integer’: Integer class indices (0, 1, 2, …)

traj_selectionstr, int, list, default=”all”

Which trajectories to compute features for

forcebool, default=False

Force recalculation even if feature already exists

force_originalbool, default=True

Whether to force using original trajectory data

Returns

None

Adds DSSP features to pipeline data

Examples

>>> # Full 8-state DSSP classification
>>> pipeline.feature.add.dssp()
>>> # Simplified 3-state classification
>>> pipeline.feature.add.dssp(simplified=True)
>>> # One-hot encoded output
>>> pipeline.feature.add.dssp(encoding='onehot')
>>> # Integer encoded with simplified classification
>>> pipeline.feature.add.dssp(simplified=True, encoding='integer')
>>> # Force recalculation for specific trajectories
>>> pipeline.feature.add.dssp(simplified=False, traj_selection=[0,1], force=True)

Notes

Uses MDTraj’s DSSP implementation. The full 8-state codes are: DSSP classification: H (α-helix), B (β-bridge), E (β-sheet), G (3-10 helix), I (π-helix), T (turn), S (bend), C (coil). Simplified mode maps: H,G,I → H; B,E → E; T,S,C → C.

sasa(mode: str = 'residue', probe_radius: float = 0.14, traj_selection: str | int | List = 'all', force: bool = False, force_original: bool = True) None

Add SASA (Solvent Accessible Surface Area) feature type.

Computes solvent accessible surface area for each residue using the Shrake-Rupley algorithm implemented in MDTraj.

Parameters

modestr, default=’residue’

Level of SASA calculation:

  • ‘residue’: SASA per residue (sum of constituent atoms)

  • ‘atom’: SASA per individual atom

probe_radiusfloat, default=0.14

Probe radius in nanometers (water molecule radius) Standard values: 0.14 nm (water), 0.12 nm (smaller probe)

traj_selectionstr, int, list, default=”all”

Which trajectories to compute features for

forcebool, default=False

Force recalculation even if feature already exists

force_originalbool, default=True

Whether to force using original trajectory data

Returns

None

Adds SASA features to pipeline data

Examples

>>> # Standard SASA with water probe
>>> pipeline.feature.add.sasa()
>>> # Custom probe radius
>>> pipeline.feature.add.sasa(probe_radius=0.12)
>>> # Atom-level SASA calculation
>>> pipeline.feature.add.sasa(mode='atom')
>>> # Atom-level with custom probe
>>> pipeline.feature.add.sasa(mode='atom', probe_radius=0.12)
>>> # For specific trajectories
>>> pipeline.feature.add.sasa(probe_radius=0.14, traj_selection="all", force=True)

Notes

SASA values are returned in Ų. Higher values indicate more solvent exposure. Mode ‘residue’ provides per-residue values, ‘atom’ provides per-atom values. Useful for identifying buried vs. exposed residues and conformational changes affecting protein-solvent interactions.

coordinates(atom_selection: str = 'name CA', traj_selection: str | int | List = 'all', force: bool = False, force_original: bool = True) None

Add coordinates feature type.

Extracts 3D coordinates (x, y, z) for selected atoms from trajectories. Useful for structural analysis and dimensionality reduction.

Parameters

atom_selectionstr, default=”name CA”

MDTraj atom selection string specifying which atoms to extract Examples: “name CA”, “backbone”, “protein”, “resid 10 to 50 and name CA”

traj_selectionstr, int, list, default=”all”

Which trajectories to compute features for

forcebool, default=False

Force recalculation even if feature already exists

force_originalbool, default=True

Whether to force using original trajectory data

Returns

None

Adds coordinate features to pipeline data

Examples

>>> # CA atoms only (default)
>>> pipeline.feature.add.coordinates()
>>> # All backbone atoms
>>> pipeline.feature.add.coordinates(atom_selection="backbone")
>>> # Specific residue range
>>> pipeline.feature.add.coordinates(
...     atom_selection="resid 1 to 100 and name CA",
...     traj_selection=[0,1,2]
... )
>>> # All protein atoms
>>> pipeline.feature.add.coordinates(atom_selection="protein")

Notes

Coordinates are returned in nanometers as (x, y, z) triplets for each selected atom. The resulting feature matrix has shape (n_frames, n_atoms * 3). Consider alignment/centering for meaningful coordinate-based analysis.