Cluster Type Base

GitHub Link to Code.

Abstract base class defining the interface for all cluster types.

Defines the interface that all cluster types (DBSCAN, HDBSCAN, DPA, etc.) must implement for consistency across different clustering methods.

class mdxplain.clustering.cluster_type.interfaces.cluster_type_base.ClusterTypeBase

Abstract base class for all cluster types.

Defines the interface that all cluster types (DBSCAN, HDBSCAN, DPA, etc.) must implement. Each cluster type encapsulates computation logic for a specific type of clustering analysis.

Examples

>>> class MyCluster(ClusterTypeBase):
...     @classmethod
...     def get_type_name(cls) -> str:
...         return 'my_cluster'
...     def init_calculator(self, **kwargs):
...         self.calculator = MyCalculator(**kwargs)
...     def compute(self, data, **kwargs):
...         return self.calculator.compute(data, **kwargs)

__init__() → None

Initialize the cluster type.

Sets up the cluster type instance with an empty calculator that will be initialized later through init_calculator().

Parameters

None

Returns

None

Examples

>>> # Create cluster type instance
>>> cluster = MyCluster()
>>> print(f"Type: {cluster.get_type_name()}")

abstractmethod classmethod get_type_name() → str

Return unique string identifier for this cluster type.

Used as the key for storing clustering results in TrajectoryData dictionaries and for type identification.

Parameters

clstype: The cluster type class

Returns

str: Unique string identifier (e.g., ‘dbscan’, ‘hdbscan’, ‘dpa’)

Examples

>>> print(DBSCAN.get_type_name())
'dbscan'
>>> print(HDBSCAN.get_type_name())
'hdbscan'

abstractmethod init_calculator(cache_path: str = './cache') → None

Initialize the calculator instance for this cluster type.

Parameters

cache_pathstr, optional: Directory path for cache files

Returns

None: Sets self.calculator to initialized calculator instance

Examples

>>> # Basic initialization
>>> dbscan = DBSCAN()
>>> dbscan.init_calculator()

>>> # With custom cache path
>>> dbscan.init_calculator(cache_path='./my_cache/')

abstractmethod compute(data: ndarray, center_method: str = 'centroid') → Tuple[ndarray, Dict[str, Any]]

Compute clustering using the initialized calculator.

Parameters

datanumpy.ndarray

Input data matrix to cluster, shape (n_samples, n_features)

center_methodstr, optional

Method for calculating cluster centers, default=”centroid”:

“centroid”: Representative point (medoid - closest to mean)
“mean”: Average of cluster members
“median”: Coordinate-wise median (robust to outliers)
“density_peak”: Point with highest local density
“median_centroid”: Medoid from median (more robust to outliers)
“rmsd_centroid”: Centroid using RMSD metric (better for structural comparisons)

NOTE: If algorithm has built-in centers (e.g. some sklearn models), those are ALWAYS used regardless of this parameter.

Returns

Tuple[numpy.ndarray, Dict]

Tuple containing:

cluster_labels: Cluster labels for each sample (n_samples,)
metadata: Dictionary with clustering information including hyperparameters, number of clusters, silhouette score, cluster centers, etc.

Examples

>>> # Compute DBSCAN clustering with default center method
>>> dbscan = DBSCAN(eps=0.5, min_samples=5)
>>> dbscan.init_calculator()
>>> data = np.random.rand(100, 50)
>>> labels, metadata = dbscan.compute(data)
>>> print(f"Number of clusters: {metadata['n_clusters']}")
>>> print(f"Centers shape: {metadata['centers'].shape}")

>>> # Compute with custom center method
>>> labels, metadata = dbscan.compute(data, center_method="mean")

Raises

ValueError: If calculator is not initialized or input data is invalid