Feature Add Service

GitHub Link to Code.

Factory for adding features with simplified syntax.

class mdxplain.feature.services.feature_add_service.FeatureAddService(manager: FeatureManager, pipeline_data: PipelineData)

Service for adding features without explicit type instantiation.

This service provides an intuitive interface for adding features to the pipeline without requiring users to import and instantiate feature types directly. All feature type parameters are combined with add_feature parameters.

Examples

>>> pipeline.feature.add.distances(excluded_neighbors=2)
>>> pipeline.feature.add.contacts(threshold=5.0, traj_selection=[0,1,2])
>>> pipeline.feature.add.torsions(calculate_chi=False, force=True)

__init__(manager: FeatureManager, pipeline_data: PipelineData) → None

Initialize factory with manager and pipeline data.

Parameters

managerFeatureManager: Feature manager instance
pipeline_dataPipelineData: Pipeline data container (injected by AutoInjectProxy)

Returns

None

distances(excluded_neighbors: int = 1, use_pbc: bool = True, traj_selection: str | int | List = 'all', force: bool = False, force_original: bool = True) → None

Add distances feature type.

Computes all pairwise distances from molecular dynamics trajectories. This is a base feature type with no dependencies.

Parameters

excluded_neighborsint, default=1: Number of nearest neighbors to exclude from distance calculation. Chain breaks are automatically excluded based on sequence ID jumps. 0 = all pairs, 1 = exclude direct neighbors, 2 = exclude up to 2nd neighbors, etc.
use_pbcbool, default=True: Use periodic boundary conditions when computing distances
traj_selectionstr, int, list, default=”all”: Which trajectories to compute features for (“all”, index, list of indices, or trajectory names)
forcebool, default=False: Force recalculation even if feature already exists
force_originalbool, default=True: Whether to force using original trajectory data instead of reduced data

Returns

None: Adds distance features to pipeline data

Examples

>>> # Basic distance calculation
>>> pipeline.feature.add.distances()

>>> # With custom neighbor exclusion
>>> pipeline.feature.add.distances(excluded_neighbors=2)

>>> # Without periodic boundary conditions
>>> pipeline.feature.add.distances(use_pbc=False)

>>> # For specific trajectories only
>>> pipeline.feature.add.distances(traj_selection=[0,1,2], force=True)

Notes

Distance features are computed using MDTraj and returned in Angstroms. Missing pairs (due to chain breaks) are handled automatically.

contacts(cutoff: float = 4.5, traj_selection: str | int | List = 'all', force: bool = False, force_original: bool = True) → None

Add contacts feature type.

Computes binary contact matrices from distance data using a distance threshold. Requires distances feature to be computed first as input dependency.

Parameters

cutofffloat, default=4.5: Distance cutoff in Angstroms for contact determination
traj_selectionstr, int, list, default=”all”: Which trajectories to compute features for
forcebool, default=False: Force recalculation even if feature already exists
force_originalbool, default=True: Whether to force using original trajectory data

Returns

None: Adds contact features to pipeline data

Examples

>>> # Standard contacts with 4.5Å threshold
>>> pipeline.feature.add.contacts()

>>> # Longer-range contacts
>>> pipeline.feature.add.contacts(cutoff=6.0)

>>> # Force recalculation for specific trajectories
>>> pipeline.feature.add.contacts(cutoff=5.0, traj_selection="all", force=True)

Notes

Contacts are binary (0 or 1) indicating whether atom pairs are within threshold distance. This feature depends on distances - ensure distances are computed first.

torsions(calculate_phi: bool = True, calculate_psi: bool = True, calculate_omega: bool = True, calculate_chi: bool = True, use_pbc: bool = True, traj_selection: str | int | List = 'all', force: bool = False, force_original: bool = True) → None

Add torsions feature type.

Computes dihedral torsion angles including backbone (phi, psi, omega) and side chain angles (chi1-4). All angles are returned in degrees (-180 to +180).

Parameters

calculate_phibool, default=True: Whether to compute phi backbone angles
calculate_psibool, default=True: Whether to compute psi backbone angles
calculate_omegabool, default=True: Whether to compute omega backbone angles
calculate_chibool, default=True: Whether to compute side chain chi angles (chi1, chi2, chi3, chi4)
use_pbcbool, default=True: Use periodic boundary conditions when computing torsion angles
traj_selectionstr, int, list, default=”all”: Which trajectories to compute features for
forcebool, default=False: Force recalculation even if feature already exists
force_originalbool, default=True: Whether to force using original trajectory data

Returns

None: Adds torsion features to pipeline data

Examples

>>> # All angles (default)
>>> pipeline.feature.add.torsions()

>>> # Only backbone angles
>>> pipeline.feature.add.torsions(calculate_chi=False)

>>> # Without periodic boundary conditions
>>> pipeline.feature.add.torsions(use_pbc=False)

>>> # Only phi and psi
>>> pipeline.feature.add.torsions(
...     calculate_phi=True,
...     calculate_psi=True,
...     calculate_omega=False,
...     calculate_chi=False
... )

>>> # Only side chain chi angles
>>> pipeline.feature.add.torsions(
...     calculate_phi=False,
...     calculate_psi=False,
...     calculate_omega=False,
...     calculate_chi=True
... )

Notes

All angles are computed and returned in degrees. Uses the MDTraj library for angle calculations. Circular statistics should be used for analysis of torsion angles.

dssp(simplified: bool = False, encoding: str = 'char', traj_selection: str | int | List = 'all', force: bool = False, force_original: bool = True) → None

Add DSSP secondary structure feature type.

Computes secondary structure classification using DSSP algorithm. Provides either 8-class (H, B, E, G, I, T, S, C) or simplified 3-class (H, E, C) output.

Parameters

simplifiedbool, default=False

Use simplified 3-state classification (H=helix, E=sheet, C=coil) If False, uses full 8-state DSSP classification

encodingstr, default=’char’

Output encoding format:

‘char’: Character codes (‘H’, ‘E’, ‘C’, etc.)
‘onehot’: One-hot encoded binary vectors
‘integer’: Integer class indices (0, 1, 2, …)

traj_selectionstr, int, list, default=”all”

Which trajectories to compute features for

forcebool, default=False

Force recalculation even if feature already exists

force_originalbool, default=True

Whether to force using original trajectory data

Returns

None: Adds DSSP features to pipeline data

Examples

>>> # Full 8-state DSSP classification
>>> pipeline.feature.add.dssp()

>>> # Simplified 3-state classification
>>> pipeline.feature.add.dssp(simplified=True)

>>> # One-hot encoded output
>>> pipeline.feature.add.dssp(encoding='onehot')

>>> # Integer encoded with simplified classification
>>> pipeline.feature.add.dssp(simplified=True, encoding='integer')

>>> # Force recalculation for specific trajectories
>>> pipeline.feature.add.dssp(simplified=False, traj_selection=[0,1], force=True)

Notes

Uses MDTraj’s DSSP implementation. The full 8-state codes are: DSSP classification: H (α-helix), B (β-bridge), E (β-sheet), G (3-10 helix), I (π-helix), T (turn), S (bend), C (coil). Simplified mode maps: H,G,I → H; B,E → E; T,S,C → C.

sasa(mode: str = 'residue', probe_radius: float = 0.14, traj_selection: str | int | List = 'all', force: bool = False, force_original: bool = True) → None

Add SASA (Solvent Accessible Surface Area) feature type.

Computes solvent accessible surface area for each residue using the Shrake-Rupley algorithm implemented in MDTraj.

Parameters

modestr, default=’residue’

Level of SASA calculation:

‘residue’: SASA per residue (sum of constituent atoms)
‘atom’: SASA per individual atom

probe_radiusfloat, default=0.14

Probe radius in nanometers (water molecule radius) Standard values: 0.14 nm (water), 0.12 nm (smaller probe)

traj_selectionstr, int, list, default=”all”

Which trajectories to compute features for

forcebool, default=False

Force recalculation even if feature already exists

force_originalbool, default=True

Whether to force using original trajectory data

Returns

None: Adds SASA features to pipeline data

Examples

>>> # Standard SASA with water probe
>>> pipeline.feature.add.sasa()

>>> # Custom probe radius
>>> pipeline.feature.add.sasa(probe_radius=0.12)

>>> # Atom-level SASA calculation
>>> pipeline.feature.add.sasa(mode='atom')

>>> # Atom-level with custom probe
>>> pipeline.feature.add.sasa(mode='atom', probe_radius=0.12)

>>> # For specific trajectories
>>> pipeline.feature.add.sasa(probe_radius=0.14, traj_selection="all", force=True)

Notes

SASA values are returned in Å². Higher values indicate more solvent exposure. Mode ‘residue’ provides per-residue values, ‘atom’ provides per-atom values. Useful for identifying buried vs. exposed residues and conformational changes affecting protein-solvent interactions.

coordinates(atom_selection: str = 'name CA', traj_selection: str | int | List = 'all', force: bool = False, force_original: bool = True) → None

Add coordinates feature type.

Extracts 3D coordinates (x, y, z) for selected atoms from trajectories. Useful for structural analysis and dimensionality reduction.

Parameters

atom_selectionstr, default=”name CA”: MDTraj atom selection string specifying which atoms to extract Examples: “name CA”, “backbone”, “protein”, “resid 10 to 50 and name CA”
traj_selectionstr, int, list, default=”all”: Which trajectories to compute features for
forcebool, default=False: Force recalculation even if feature already exists
force_originalbool, default=True: Whether to force using original trajectory data

Returns

None: Adds coordinate features to pipeline data

Examples

>>> # CA atoms only (default)
>>> pipeline.feature.add.coordinates()

>>> # All backbone atoms
>>> pipeline.feature.add.coordinates(atom_selection="backbone")

>>> # Specific residue range
>>> pipeline.feature.add.coordinates(
...     atom_selection="resid 1 to 100 and name CA",
...     traj_selection=[0,1,2]
... )

>>> # All protein atoms
>>> pipeline.feature.add.coordinates(atom_selection="protein")

Notes

Coordinates are returned in nanometers as (x, y, z) triplets for each selected atom. The resulting feature matrix has shape (n_frames, n_atoms * 3). Consider alignment/centering for meaningful coordinate-based analysis.