DSSP Encoding Helper
GitHub Link to Code.
DSSP encoding helper utilities for molecular dynamics trajectory analysis.
Helper functions for encoding DSSP secondary structure assignments into different formats (character, integer, one-hot) with memory-efficient chunk-wise processing for large datasets.
- class mdxplain.feature.feature_type.dssp.helper.dssp_encoding_helper.DSSPEncodingHelper
Helper class for DSSP encoding operations.
Provides static methods for converting DSSP assignments between different encoding formats with support for memory mapping and chunk-wise processing for large trajectory datasets.
Examples
>>> # Character encoding with space conversion >>> char_data = DSSPEncodingHelper.encode_char_chunked( ... dssp_data, False, 1000, "./cache" ... )
>>> # Integer encoding for classification >>> int_data = DSSPEncodingHelper.encode_integer_chunked( ... dssp_data, classes, 1000, "./cache" ... )
- static encode_char_chunked(dssp_data: ndarray, chunk_size: int, cache_path: str) ndarray
Encode DSSP to character format with chunk-wise processing.
Parameters
- dssp_datanumpy.ndarray
Input DSSP data with shape (n_frames, n_residues)
- chunk_sizeint
Number of frames to process per chunk
- cache_pathstr
Directory path for cache files
Returns
- numpy.ndarray
Character-encoded DSSP data
Notes
Uses memory mapping for efficient processing of large datasets. Space conversion happens centrally in dssp_calculator before this method is called, so dssp_data is already cleaned.
- static encode_integer(dssp_data: ndarray, classes: list) ndarray
Encode DSSP to integer format in-memory.
Parameters
- dssp_datanumpy.ndarray
Input DSSP data with shape (n_frames, n_residues)
- classeslist
List of class labels for integer mapping
Returns
- numpy.ndarray
Integer-encoded DSSP data
Notes
For small datasets that fit in memory. Uses vectorized operations for efficient conversion.
- static encode_integer_chunked(dssp_data: ndarray, classes: list, chunk_size: int, cache_path: str) ndarray
Encode DSSP to integer format with chunk-wise processing.
Parameters
- dssp_datanumpy.ndarray
Input DSSP data with shape (n_frames, n_residues)
- classeslist
List of class labels for integer mapping
- chunk_sizeint
Number of frames to process per chunk
- cache_pathstr
Directory path for cache files
Returns
- numpy.ndarray
Integer-encoded DSSP data
Notes
Uses memory mapping for efficient processing of large datasets. Processes data in chunks to avoid memory overflow.
- static encode_onehot_chunked(dssp_data: ndarray, classes: list, chunk_size: int, cache_path: str) ndarray
Encode DSSP to one-hot format with intelligent processing.
Parameters
- dssp_datanumpy.ndarray
Input DSSP data with shape (n_frames, n_residues)
- classeslist
List of class labels for one-hot encoding
- chunk_sizeint
Number of frames to process per chunk
- cache_pathstr
Directory path for cache files
- use_memmapbool, default=True
Whether to use memory mapping for output array
Returns
- numpy.ndarray
One-hot encoded DSSP data with shape (n_frames, n_residues * n_classes)
Notes
Uses chunk-wise processing for large datasets or when use_memmap=True. For small datasets with use_memmap=False, uses direct vectorized operations.
- static encode_onehot_direct(dssp_data: ndarray, classes: list) ndarray
Encode dssp_data from mdtraj output to one-hot format directly.
Parameters
- dssp_datanumpy.ndarray
Input DSSP data with shape (n_frames, n_residues)
- classeslist
List of class labels
Returns
- numpy.ndarray
One-hot encoded DSSP data with shape (n_frames, n_residues * n_classes)