Cluster Type Base

GitHub Link to Code.

Abstract base class defining the interface for all cluster types.

Defines the interface that all cluster types (DBSCAN, HDBSCAN, DPA, etc.) must implement for consistency across different clustering methods.

class mdxplain.clustering.cluster_type.interfaces.cluster_type_base.ClusterTypeBase

Abstract base class for all cluster types.

Defines the interface that all cluster types (DBSCAN, HDBSCAN, DPA, etc.) must implement. Each cluster type encapsulates computation logic for a specific type of clustering analysis.

Examples

>>> class MyCluster(ClusterTypeBase):
...     @classmethod
...     def get_type_name(cls) -> str:
...         return 'my_cluster'
...     def init_calculator(self, **kwargs):
...         self.calculator = MyCalculator(**kwargs)
...     def compute(self, data, **kwargs):
...         return self.calculator.compute(data, **kwargs)
__init__() None

Initialize the cluster type.

Sets up the cluster type instance with an empty calculator that will be initialized later through init_calculator().

Parameters

None

Returns

None

Examples

>>> # Create cluster type instance
>>> cluster = MyCluster()
>>> print(f"Type: {cluster.get_type_name()}")
abstractmethod classmethod get_type_name() str

Return unique string identifier for this cluster type.

Used as the key for storing clustering results in TrajectoryData dictionaries and for type identification.

Parameters

clstype

The cluster type class

Returns

str

Unique string identifier (e.g., ‘dbscan’, ‘hdbscan’, ‘dpa’)

Examples

>>> print(DBSCAN.get_type_name())
'dbscan'
>>> print(HDBSCAN.get_type_name())
'hdbscan'
abstractmethod init_calculator(cache_path: str = './cache') None

Initialize the calculator instance for this cluster type.

Parameters

cache_pathstr, optional

Directory path for cache files

Returns

None

Sets self.calculator to initialized calculator instance

Examples

>>> # Basic initialization
>>> dbscan = DBSCAN()
>>> dbscan.init_calculator()
>>> # With custom cache path
>>> dbscan.init_calculator(cache_path='./my_cache/')
abstractmethod compute(data: ndarray, center_method: str = 'centroid') Tuple[ndarray, Dict[str, Any]]

Compute clustering using the initialized calculator.

Parameters

datanumpy.ndarray

Input data matrix to cluster, shape (n_samples, n_features)

center_methodstr, optional

Method for calculating cluster centers, default=”centroid”:

  • “centroid”: Representative point (medoid - closest to mean)

  • “mean”: Average of cluster members

  • “median”: Coordinate-wise median (robust to outliers)

  • “density_peak”: Point with highest local density

  • “median_centroid”: Medoid from median (more robust to outliers)

  • “rmsd_centroid”: Centroid using RMSD metric (better for structural comparisons)

NOTE: If algorithm has built-in centers (e.g. some sklearn models), those are ALWAYS used regardless of this parameter.

Returns

Tuple[numpy.ndarray, Dict]

Tuple containing:

  • cluster_labels: Cluster labels for each sample (n_samples,)

  • metadata: Dictionary with clustering information including hyperparameters, number of clusters, silhouette score, cluster centers, etc.

Examples

>>> # Compute DBSCAN clustering with default center method
>>> dbscan = DBSCAN(eps=0.5, min_samples=5)
>>> dbscan.init_calculator()
>>> data = np.random.rand(100, 50)
>>> labels, metadata = dbscan.compute(data)
>>> print(f"Number of clusters: {metadata['n_clusters']}")
>>> print(f"Centers shape: {metadata['centers'].shape}")
>>> # Compute with custom center method
>>> labels, metadata = dbscan.compute(data, center_method="mean")

Raises

ValueError

If calculator is not initialized or input data is invalid