Cluster Type Base
GitHub Link to Code.
Abstract base class defining the interface for all cluster types.
Defines the interface that all cluster types (DBSCAN, HDBSCAN, DPA, etc.) must implement for consistency across different clustering methods.
- class mdxplain.clustering.cluster_type.interfaces.cluster_type_base.ClusterTypeBase
Abstract base class for all cluster types.
Defines the interface that all cluster types (DBSCAN, HDBSCAN, DPA, etc.) must implement. Each cluster type encapsulates computation logic for a specific type of clustering analysis.
Examples
>>> class MyCluster(ClusterTypeBase): ... @classmethod ... def get_type_name(cls) -> str: ... return 'my_cluster' ... def init_calculator(self, **kwargs): ... self.calculator = MyCalculator(**kwargs) ... def compute(self, data, **kwargs): ... return self.calculator.compute(data, **kwargs)
- __init__() None
Initialize the cluster type.
Sets up the cluster type instance with an empty calculator that will be initialized later through init_calculator().
Parameters
None
Returns
None
Examples
>>> # Create cluster type instance >>> cluster = MyCluster() >>> print(f"Type: {cluster.get_type_name()}")
- abstractmethod classmethod get_type_name() str
Return unique string identifier for this cluster type.
Used as the key for storing clustering results in TrajectoryData dictionaries and for type identification.
Parameters
- clstype
The cluster type class
Returns
- str
Unique string identifier (e.g., ‘dbscan’, ‘hdbscan’, ‘dpa’)
Examples
>>> print(DBSCAN.get_type_name()) 'dbscan' >>> print(HDBSCAN.get_type_name()) 'hdbscan'
- abstractmethod init_calculator(cache_path: str = './cache') None
Initialize the calculator instance for this cluster type.
Parameters
- cache_pathstr, optional
Directory path for cache files
Returns
- None
Sets self.calculator to initialized calculator instance
Examples
>>> # Basic initialization >>> dbscan = DBSCAN() >>> dbscan.init_calculator()
>>> # With custom cache path >>> dbscan.init_calculator(cache_path='./my_cache/')
- abstractmethod compute(data: ndarray, center_method: str = 'centroid') Tuple[ndarray, Dict[str, Any]]
Compute clustering using the initialized calculator.
Parameters
- datanumpy.ndarray
Input data matrix to cluster, shape (n_samples, n_features)
- center_methodstr, optional
Method for calculating cluster centers, default=”centroid”:
“centroid”: Representative point (medoid - closest to mean)
“mean”: Average of cluster members
“median”: Coordinate-wise median (robust to outliers)
“density_peak”: Point with highest local density
“median_centroid”: Medoid from median (more robust to outliers)
“rmsd_centroid”: Centroid using RMSD metric (better for structural comparisons)
NOTE: If algorithm has built-in centers (e.g. some sklearn models), those are ALWAYS used regardless of this parameter.
Returns
- Tuple[numpy.ndarray, Dict]
Tuple containing:
cluster_labels: Cluster labels for each sample (n_samples,)
metadata: Dictionary with clustering information including hyperparameters, number of clusters, silhouette score, cluster centers, etc.
Examples
>>> # Compute DBSCAN clustering with default center method >>> dbscan = DBSCAN(eps=0.5, min_samples=5) >>> dbscan.init_calculator() >>> data = np.random.rand(100, 50) >>> labels, metadata = dbscan.compute(data) >>> print(f"Number of clusters: {metadata['n_clusters']}") >>> print(f"Centers shape: {metadata['centers'].shape}")
>>> # Compute with custom center method >>> labels, metadata = dbscan.compute(data, center_method="mean")
Raises
- ValueError
If calculator is not initialized or input data is invalid