Decomposition Type Helper

GitHub Link to Code.

Helper classes for decomposition type calculations.

Automatic Parameter Helper

Automatic parameter calculation helper for decomposition methods.

class mdxplain.decomposition.decomposition_type.helper.automatic_parameter_helper.AutomaticParameterHelper

Helper for automatic parameter calculation in decomposition methods.

Provides methods for automatic calculation of n_components via elbow detection, gamma parameters for kernel methods, and variance computation for large memory-mapped datasets.

Examples

>>> # Elbow detection for PCA
>>> helper = AutomaticParameterHelper()
>>> variance_ratios = np.array([0.4, 0.3, 0.15, 0.1, 0.05])
>>> n = helper.find_elbow(variance_ratios, max_components=5)
>>> # Gamma calculation for KernelPCA
>>> data = np.random.rand(1000, 100)
>>> gamma = helper.calculate_gamma_scale(data)
>>> # Variance for large datasets
>>> large_data = np.memmap('data.dat', dtype='float64', shape=(10000, 500))
>>> var = helper.compute_variance_chunked(large_data, chunk_size=1000)
static find_elbow(values: ndarray, sensitivity: float = 1.0, max_components: int = None, offset: int | float = 0) int

Find elbow point in decreasing curve using kneed algorithm.

Parameters

valuesnumpy.ndarray

Decreasing values (variance ratios or eigenvalues)

sensitivityfloat, default=1.0

Kneed sensitivity parameter (S)

max_componentsint, optional

Maximum number of components for warning

offsetint or float, default=0

Adjustment to the elbow position:

  • int: Direct addition/subtraction of components (e.g., -2 selects 2 fewer components, +3 selects 3 more)

  • float: Percentage-based adjustment (e.g., -0.5 selects 50% fewer, 0.5 selects 50% more)

Returns

int

Optimal number of components (1-indexed)

Examples

>>> helper = AutomaticParameterHelper()
>>> variances = np.array([0.4, 0.3, 0.15, 0.1, 0.05])
>>> n = helper.find_elbow(variances)
>>> print(f"Optimal: {n}")
>>> # Select 2 fewer components than elbow
>>> n = helper.find_elbow(variances, offset=-2)
>>> # Select 50% fewer components than elbow
>>> n = helper.find_elbow(variances, offset=-0.5)
static calculate_gamma_scale(data: ndarray, use_memmap: bool = False, chunk_size: int = 1000) float

Calculate gamma parameter using scale method.

Parameters

datanumpy.ndarray

Input data matrix

use_memmapbool, default=False

Whether to use chunked variance calculation

chunk_sizeint, default=1000

Chunk size for memmap variance calculation

Returns

float

Gamma parameter value

Examples

>>> helper = AutomaticParameterHelper()
>>> data = np.random.rand(1000, 50)
>>> gamma = helper.calculate_gamma_scale(data)
static calculate_gamma_auto(n_features: int) float

Calculate gamma parameter using auto method.

Parameters

n_featuresint

Number of features in the dataset

Returns

float

Gamma parameter value

Examples

>>> helper = AutomaticParameterHelper()
>>> gamma = helper.calculate_gamma_auto(100)
>>> print(gamma)
0.01
static compute_variance_chunked(data: ndarray, chunk_size: int = 1000) float

Compute mean variance using chunk-wise processing.

Parameters

datanumpy.ndarray

Input data matrix

chunk_sizeint, default=1000

Number of samples to process per chunk

Returns

float

Mean variance across all features

Examples

>>> helper = AutomaticParameterHelper()
>>> data = np.random.rand(10000, 100)
>>> var = helper.compute_variance_chunked(data, chunk_size=1000)