PCA Calculator
GitHub Link to Code.
PCA calculator for dimensionality reduction of molecular dynamics data.
Implements PCA computation with support for incremental processing for large datasets using sklearn’s PCA and IncrementalPCA.
- class mdxplain.decomposition.decomposition_type.pca.pca_calculator.PCACalculator(use_memmap: bool = False, cache_path: str = './cache', chunk_size: int = 2000)
Calculator for Principal Component Analysis (PCA) decomposition.
Implements PCA computation with support for both standard and incremental processing for large datasets. Uses sklearn’s PCA for standard computation and sklearn’s IncrementalPCA for chunk-wise processing when memory mapping is enabled.
Examples
>>> # Standard PCA computation >>> calc = PCACalculator() >>> data = np.random.rand(1000, 100) >>> transformed, metadata = calc.compute(data, n_components=10)
>>> # Incremental PCA for large datasets >>> calc = PCACalculator(use_memmap=True, chunk_size=200) >>> large_data = np.random.rand(10000, 500) >>> transformed, metadata = calc.compute(large_data, n_components=50)
- __init__(use_memmap: bool = False, cache_path: str = './cache', chunk_size: int = 2000) None
Initialize PCA calculator.
Parameters
- use_memmapbool, default=False
Whether to use memory mapping and incremental computation
- cache_pathstr, optional
Path for memory-mapped cache files (not used for PCA)
- chunk_sizeint, optional
Size of chunks for incremental PCA processing
Returns
- None
Initializes PCA calculator with specified configuration
Examples
>>> # Standard PCA >>> calc = PCACalculator()
>>> # Incremental PCA for large datasets >>> calc = PCACalculator(use_memmap=True, chunk_size=1000)
- compute(data: ndarray, **kwargs) Tuple[ndarray, Dict[str, Any]]
Compute PCA decomposition of input data.
Performs Principal Component Analysis on the input data matrix, using either standard PCA or incremental PCA based on the configuration settings.
Parameters
- datanumpy.ndarray
Input data matrix to decompose, shape (n_samples, n_features)
- wargsdict
PCA parameters:
- n_componentsint, optional
Number of components to keep (default: min(n_samples, n_features))
- random_stateint, optional
Random state for reproducible results
Returns
- Tuple[numpy.ndarray, Dict]
Tuple containing:
transformed_data: PCA-transformed data (n_samples, n_components)
metadata: Dictionary with PCA information including components, explained variance ratio, and hyperparameters
Examples
>>> # Compute PCA with 10 components >>> calc = PCACalculator() >>> data = np.random.rand(500, 100) >>> transformed, metadata = calc.compute(data, n_components=10) >>> print(f"Explained variance: {metadata['explained_variance_ratio']}")
Raises
- ValueError
If input data is invalid or n_components is too large