Decomposition Types Base

GitHub Link to Code.

Abstract base class defining the interface for all decomposition types.

Defines the interface that all decomposition types (PCA, KernelPCA, etc.) must implement for consistency across different dimensionality reduction methods.

class mdxplain.decomposition.decomposition_type.interfaces.decomposition_type_base.DecompositionTypeBase

Abstract base class for all decomposition types.

Defines the interface that all decomposition types (PCA, KernelPCA, etc.) must implement. Each decomposition type encapsulates computation logic for a specific type of dimensionality reduction analysis.

Examples

>>> class MyDecomposition(DecompositionTypeBase):
...     @classmethod
...     def get_type_name(cls) -> str:
...         return 'my_decomposition'
...     def init_calculator(self, **kwargs):
...         self.calculator = MyCalculator(**kwargs)
...     def compute(self, data, **kwargs):
...         return self.calculator.compute(data, **kwargs)

__init__() → None

Initialize the decomposition type.

Sets up the decomposition type instance with an empty calculator that will be initialized later through init_calculator().

Parameters

None

Returns

None

Examples

>>> # Create decomposition type instance
>>> decomp = MyDecomposition()
>>> print(f"Type: {decomp.get_type_name()}")

abstractmethod classmethod get_type_name() → str

Return unique string identifier for this decomposition type.

Used as the key for storing decomposition results in TrajectoryData dictionaries and for type identification.

Parameters

clstype: The decomposition type class

Returns

str: Unique string identifier (e.g., ‘pca’, ‘kernel_pca’)

Examples

>>> print(PCA.get_type_name())
'pca'
>>> print(KernelPCA.get_type_name())
'kernel_pca'

abstractmethod init_calculator(use_memmap: bool = False, cache_path: str = './cache', chunk_size: int = 2000) → None

Initialize the calculator instance for this decomposition type.

Parameters

use_memmapbool, default=False: Whether to use memory mapping for efficient handling of large datasets
cache_pathstr, optional: Directory path for cache files
chunk_sizeint, optional: Number of samples to process per chunk for incremental computation

Returns

None: Sets self.calculator to initialized calculator instance

Examples

>>> # Basic initialization
>>> pca = PCA()
>>> pca.init_calculator()

>>> # With memory mapping for large datasets
>>> pca.init_calculator(
...     use_memmap=True,
...     cache_path='./cache/',
...     chunk_size=1000
... )

abstractmethod compute(data: ndarray) → Tuple[ndarray, Dict[str, Any]]

Compute decomposition using the initialized calculator.

Parameters

datanumpy.ndarray: Input data matrix to decompose, shape (n_samples, n_features)

Returns

Tuple[numpy.ndarray, Dict]

Tuple containing:

transformed_data: Decomposed data matrix (n_samples, n_components)
metadata: Dictionary with transformation information including hyperparameters, explained variance, components, etc.

Examples

>>> # Compute PCA decomposition
>>> pca = PCA()
>>> pca.init_calculator()
>>> data = np.random.rand(100, 50)
>>> transformed, metadata = pca.compute(data, n_components=10)
>>> print(f"Transformed shape: {transformed.shape}")

Raises

ValueError: If calculator is not initialized or input data is invalid

get_required_feature_type() → str | None

Return required feature type for this decomposition method.

Some decomposition methods require specific feature types (e.g., DiffusionMaps requires ‘coordinates’). If a specific feature type is required, the DecompositionManager will validate that the FeatureSelector contains only features of this type.

Parameters

None

Returns

Optional[str]: Required feature type name, or None if no specific type required

Examples

>>> # Most decompositions work with any feature type
>>> pca = PCA()
>>> print(pca.get_required_feature_type())
None

>>> # DiffusionMaps requires coordinate features
>>> diffmaps = DiffusionMaps()
>>> print(diffmaps.get_required_feature_type())
'coordinates'