DSSP Encoding Helper

GitHub Link to Code.

DSSP encoding helper utilities for molecular dynamics trajectory analysis.

Helper functions for encoding DSSP secondary structure assignments into different formats (character, integer, one-hot) with memory-efficient chunk-wise processing for large datasets.

class mdxplain.feature.feature_type.dssp.helper.dssp_encoding_helper.DSSPEncodingHelper

Helper class for DSSP encoding operations.

Provides static methods for converting DSSP assignments between different encoding formats with support for memory mapping and chunk-wise processing for large trajectory datasets.

Examples

>>> # Character encoding with space conversion
>>> char_data = DSSPEncodingHelper.encode_char_chunked(
...     dssp_data, False, 1000, "./cache"
... )
>>> # Integer encoding for classification
>>> int_data = DSSPEncodingHelper.encode_integer_chunked(
...     dssp_data, classes, 1000, "./cache"
... )
static encode_char_chunked(dssp_data: ndarray, chunk_size: int, cache_path: str) ndarray

Encode DSSP to character format with chunk-wise processing.

Parameters

dssp_datanumpy.ndarray

Input DSSP data with shape (n_frames, n_residues)

chunk_sizeint

Number of frames to process per chunk

cache_pathstr

Directory path for cache files

Returns

numpy.ndarray

Character-encoded DSSP data

Notes

Uses memory mapping for efficient processing of large datasets. Space conversion happens centrally in dssp_calculator before this method is called, so dssp_data is already cleaned.

static encode_integer(dssp_data: ndarray, classes: list) ndarray

Encode DSSP to integer format in-memory.

Parameters

dssp_datanumpy.ndarray

Input DSSP data with shape (n_frames, n_residues)

classeslist

List of class labels for integer mapping

Returns

numpy.ndarray

Integer-encoded DSSP data

Notes

For small datasets that fit in memory. Uses vectorized operations for efficient conversion.

static encode_integer_chunked(dssp_data: ndarray, classes: list, chunk_size: int, cache_path: str) ndarray

Encode DSSP to integer format with chunk-wise processing.

Parameters

dssp_datanumpy.ndarray

Input DSSP data with shape (n_frames, n_residues)

classeslist

List of class labels for integer mapping

chunk_sizeint

Number of frames to process per chunk

cache_pathstr

Directory path for cache files

Returns

numpy.ndarray

Integer-encoded DSSP data

Notes

Uses memory mapping for efficient processing of large datasets. Processes data in chunks to avoid memory overflow.

static encode_onehot_chunked(dssp_data: ndarray, classes: list, chunk_size: int, cache_path: str) ndarray

Encode DSSP to one-hot format with intelligent processing.

Parameters

dssp_datanumpy.ndarray

Input DSSP data with shape (n_frames, n_residues)

classeslist

List of class labels for one-hot encoding

chunk_sizeint

Number of frames to process per chunk

cache_pathstr

Directory path for cache files

use_memmapbool, default=True

Whether to use memory mapping for output array

Returns

numpy.ndarray

One-hot encoded DSSP data with shape (n_frames, n_residues * n_classes)

Notes

Uses chunk-wise processing for large datasets or when use_memmap=True. For small datasets with use_memmap=False, uses direct vectorized operations.

static encode_onehot_direct(dssp_data: ndarray, classes: list) ndarray

Encode dssp_data from mdtraj output to one-hot format directly.

Parameters

dssp_datanumpy.ndarray

Input DSSP data with shape (n_frames, n_residues)

classeslist

List of class labels

Returns

numpy.ndarray

One-hot encoded DSSP data with shape (n_frames, n_residues * n_classes)