Calculator Base

GitHub Link to Code.

Abstract base class for clustering calculators.

Defines the interface that all clustering calculators must implement for consistency across different clustering methods.

class mdxplain.clustering.cluster_type.interfaces.calculator_base.CalculatorBase(cache_path: str = './cache', max_memory_gb: float = 2.0, chunk_size: int = 1000, use_memmap: bool = False, max_blas_threads: int | None = 1, auto_limit_blas: bool = True)

Abstract base class for clustering calculators.

Defines the interface that all clustering calculators (DBSCAN, HDBSCAN, DPA) must implement for consistency across different clustering methods.

Examples

>>> class MyCalculator(CalculatorBase):
...     def __init__(self, cache_path="./cache"):
...         super().__init__(cache_path)
...     def compute(self, data, **kwargs):
...         # Implement computation logic
...         return cluster_labels, metadata
__init__(cache_path: str = './cache', max_memory_gb: float = 2.0, chunk_size: int = 1000, use_memmap: bool = False, max_blas_threads: int | None = 1, auto_limit_blas: bool = True) None

Initialize the clustering calculator.

Parameters

cache_pathstr, optional

Path for cache files

max_memory_gbfloat, optional

Maximum memory threshold in GB for standard clustering methods

chunk_sizeint, optional

Chunk size for processing large datasets. Default is 1000.

use_memmapbool, optional

Whether to use memory mapping for large datasets. Default is False.

max_blas_threadsint or None, default=1

Preferred BLAS/OpenMP thread limit; set auto_limit_blas=False to disable thread limiting, or None to fall back to a safe default

auto_limit_blasbool, default=True

Apply a safe thread policy: use BLAS=1 when n_jobs != 1, otherwise use max_blas_threads (fallback 2 when None)

Returns

None

Initializes calculator with specified configuration

Examples

>>> # Basic initialization
>>> calc = MyCalculator()
>>> # With custom parameters
>>> calc = MyCalculator(cache_path="./my_cache/", max_memory_gb=4.0, 
...                     chunk_size=5000, use_memmap=True)
abstractmethod compute(data: ndarray, center_method: str = 'centroid', **kwargs) Tuple[ndarray, Dict[str, Any]]

Compute clustering of input data.

This method performs the actual clustering computation and returns the cluster labels along with metadata about the clustering process.

Parameters

datanumpy.ndarray

Input data matrix to cluster, shape (n_samples, n_features)

center_methodstr, optional

Method for calculating cluster centers, default=”centroid”:

  • “centroid”: Representative point (medoid - closest to mean)

  • “mean”: Average of cluster members

  • “median”: Coordinate-wise median (robust to outliers)

  • “density_peak”: Point with highest local density

  • “median_centroid”: Medoid from median (more robust)

  • “rmsd_centroid”: Centroid using RMSD metric (structural)

NOTE: If algorithm has built-in centers (e.g. some sklearn models), those are ALWAYS used regardless of this parameter.

kwargsdict

Additional parameters specific to the clustering method

Returns

Tuple[numpy.ndarray, Dict]

Tuple containing:

  • cluster_labels: Cluster labels for each sample (n_samples,)

  • metadata: Dictionary with clustering information including hyperparameters, number of clusters, silhouette score, cluster centers, etc.

Examples

>>> # Compute clustering
>>> calc = MyCalculator()
>>> data = np.random.rand(100, 50)
>>> labels, metadata = calc.compute(data, center_method="centroid", eps=0.5, min_samples=5)
>>> print(f"Number of clusters: {metadata['n_clusters']}")
>>> print(f"Silhouette score: {metadata['silhouette_score']}")
>>> print(f"Centers shape: {metadata['centers'].shape}")