Calculator Base

GitHub Link to Code.

Abstract base class for clustering calculators.

Defines the interface that all clustering calculators must implement for consistency across different clustering methods.

class mdxplain.clustering.cluster_type.interfaces.calculator_base.CalculatorBase(cache_path: str = './cache', max_memory_gb: float = 2.0, chunk_size: int = 1000, use_memmap: bool = False, max_blas_threads: int | None = 1, auto_limit_blas: bool = True)

Abstract base class for clustering calculators.

Defines the interface that all clustering calculators (DBSCAN, HDBSCAN, DPA) must implement for consistency across different clustering methods.

Examples

>>> class MyCalculator(CalculatorBase):
...     def __init__(self, cache_path="./cache"):
...         super().__init__(cache_path)
...     def compute(self, data, **kwargs):
...         # Implement computation logic
...         return cluster_labels, metadata

__init__(cache_path: str = './cache', max_memory_gb: float = 2.0, chunk_size: int = 1000, use_memmap: bool = False, max_blas_threads: int | None = 1, auto_limit_blas: bool = True) → None

Initialize the clustering calculator.

Parameters

cache_pathstr, optional: Path for cache files
max_memory_gbfloat, optional: Maximum memory threshold in GB for standard clustering methods
chunk_sizeint, optional: Chunk size for processing large datasets. Default is 1000.
use_memmapbool, optional: Whether to use memory mapping for large datasets. Default is False.
max_blas_threadsint or None, default=1: Preferred BLAS/OpenMP thread limit; set auto_limit_blas=False to disable thread limiting, or None to fall back to a safe default
auto_limit_blasbool, default=True: Apply a safe thread policy: use BLAS=1 when n_jobs != 1, otherwise use max_blas_threads (fallback 2 when None)

Returns

None: Initializes calculator with specified configuration

Examples

>>> # Basic initialization
>>> calc = MyCalculator()

>>> # With custom parameters
>>> calc = MyCalculator(cache_path="./my_cache/", max_memory_gb=4.0, 
...                     chunk_size=5000, use_memmap=True)

abstractmethod compute(data: ndarray, center_method: str = 'centroid', **kwargs) → Tuple[ndarray, Dict[str, Any]]

Compute clustering of input data.

This method performs the actual clustering computation and returns the cluster labels along with metadata about the clustering process.

Parameters

datanumpy.ndarray

Input data matrix to cluster, shape (n_samples, n_features)

center_methodstr, optional

Method for calculating cluster centers, default=”centroid”:

“centroid”: Representative point (medoid - closest to mean)
“mean”: Average of cluster members
“median”: Coordinate-wise median (robust to outliers)
“density_peak”: Point with highest local density
“median_centroid”: Medoid from median (more robust)
“rmsd_centroid”: Centroid using RMSD metric (structural)

NOTE: If algorithm has built-in centers (e.g. some sklearn models), those are ALWAYS used regardless of this parameter.

kwargsdict

Additional parameters specific to the clustering method

Returns

Tuple[numpy.ndarray, Dict]

Tuple containing:

cluster_labels: Cluster labels for each sample (n_samples,)
metadata: Dictionary with clustering information including hyperparameters, number of clusters, silhouette score, cluster centers, etc.

Examples

>>> # Compute clustering
>>> calc = MyCalculator()
>>> data = np.random.rand(100, 50)
>>> labels, metadata = calc.compute(data, center_method="centroid", eps=0.5, min_samples=5)
>>> print(f"Number of clusters: {metadata['n_clusters']}")
>>> print(f"Silhouette score: {metadata['silhouette_score']}")
>>> print(f"Centers shape: {metadata['centers'].shape}")