DSSP Data

GitHub Link to Code.s

DSSP feature type implementation for molecular dynamics analysis.

DSSP feature type implementation for computing secondary structure assignments using the DSSP algorithm with support for multiple encoding formats.

class mdxplain.feature.feature_type.dssp.dssp.DSSP(simplified: bool = False, encoding: str = 'integer')

DSSP feature type for computing secondary structure assignments.

Computes secondary structure using the DSSP (Dictionary of Secondary Structure in Proteins) algorithm. Supports both simplified classification (H/E/C for helix/sheet/coil) and full classification with all 8 DSSP classes. Multiple encoding formats are available for different analysis needs.

This is a base feature type with no dependencies that provides structural classification information for protein analysis.

Uses mdtraj for dssp calculations under the hood.

Examples

>>> # Simplified DSSP with one-hot encoding
>>> dssp = DSSP(simplified=True, encoding='onehot')
>>> pipeline.feature.add_feature(dssp)
>>> # Full DSSP classification with integer encoding
>>> dssp = DSSP(simplified=False, encoding='integer')
>>> pipeline.feature.add_feature(dssp)
>>> # Character encoding for visualization
>>> dssp = DSSP(simplified=True, encoding='char')
>>> pipeline.feature.add_feature(dssp)
__init__(simplified: bool = False, encoding: str = 'integer') None

Initialize DSSP feature type with classification and encoding parameters.

Parameters

simplifiedbool, default=False

Secondary structure classification level:

  • True: Simplified 3-class (H=helix, E=sheet, C=coil/other)

  • False: Full 8-class DSSP (H, B, E, G, I, T, S, C)

encodingstr, default=’integer’

Output encoding format:

  • ‘onehot’: One-hot encoded binary vectors

  • ‘integer’: Integer class indices (0, 1, 2, …) [default]

  • ‘char’: Character codes (‘H’, ‘E’, ‘C’, etc.)

Returns

None

Examples

>>> # Default: Full classification with one-hot encoding
>>> dssp = DSSP()
>>> # Simplified for basic analysis
>>> dssp = DSSP(simplified=True)
>>> # Integer encoding for machine learning
>>> dssp = DSSP(simplified=False, encoding='integer')
>>> # Character codes for visualization
>>> dssp = DSSP(simplified=True, encoding='char')

Raises

ValueError

If encoding is not one of ‘onehot’, ‘integer’, or ‘char’

init_calculator(use_memmap: bool = False, cache_path: str = './cache', chunk_size: int = 2000) None

Initialize the DSSP calculator with specified configuration.

Parameters

use_memmapbool, default=False

Whether to use memory mapping for large datasets

cache_pathstr, optional

Directory path for storing cache files when using memory mapping

chunk_sizeint, optional

Number of frames to process per chunk for memory-efficient processing

Returns

None

Examples

>>> # Basic initialization
>>> dssp.init_calculator()
>>> # With memory mapping for large datasets
>>> dssp.init_calculator(use_memmap=True, cache_path='./cache/')
>>> # With custom chunk size
>>> dssp.init_calculator(chunk_size=1000)
compute(input_data: Trajectory, feature_metadata: Dict[str, Any]) Tuple[ndarray, Dict[str, Any]]

Compute DSSP secondary structure assignments from trajectory.

Parameters

input_datamdtraj.Trajectory

MD trajectory to compute DSSP from

feature_metadatadict

Residue metadata (passed through for residue-level analysis)

Returns

tuple[numpy.ndarray, dict]

Tuple containing (dssp_array, feature_metadata) where dssp_array format depends on encoding:

  • ‘onehot’: (n_frames, n_residues * n_classes)

  • ‘integer’: (n_frames, n_residues)

  • ‘char’: (n_frames, n_residues) with string dtype

Examples

>>> # Compute simplified DSSP with one-hot encoding
>>> dssp = DSSP(simplified=True, encoding='onehot')
>>> dssp.init_calculator()
>>> data, metadata = dssp.compute(trajectory, res_metadata)
>>> print(f"DSSP array shape: {data.shape}")  # (n_frames, n_residues * 3)
>>> # Compute full DSSP with character encoding
>>> dssp = DSSP(simplified=False, encoding='char')
>>> dssp.init_calculator()
>>> data, metadata = dssp.compute(trajectory, res_metadata)
>>> print(f"Secondary structure codes: {data[0]}")  # ['H', 'H', 'E', 'C', ...]

Raises

ValueError

If calculator is not initialized

get_dependencies() List[str]

Get list of feature type dependencies for DSSP calculations.

Parameters

None

Returns

List[str]

Empty list as DSSP is a base feature with no dependencies

Examples

>>> dssp = DSSP()
>>> print(dssp.get_dependencies())
[]
classmethod get_type_name() str

Return unique string identifier for the DSSP feature type.

Parameters

None

Returns

str

String identifier ‘dssp’ used as key in feature dictionaries

Examples

>>> print(DSSP.get_type_name())
'dssp'
get_input()

Get the input feature type that DSSP depends on.

Parameters

None

Returns

None

None since DSSP is a base feature with no input dependencies

Examples

>>> dssp = DSSP()
>>> print(dssp.get_input())
None