HDBSCAN Add Service
GitHub Link to Code.
Service for adding HDBSCAN clustering with flexible center method selection.
- class mdxplain.clustering.services.hdbscan_add_service.HDBSCANAddService(manager: ClusterManager, pipeline_data: PipelineData)
Service for adding HDBSCAN clustering.
Uses centroid (medoid) center calculation by default. The centroid is the actual data point closest to the cluster mean, ensuring the cluster center is a real conformational state from the trajectory.
For alternative center methods, use:
.with_mean_centers() - Arithmetic mean (may not be real state)
.with_median_centers() - Feature-wise median (robust to outliers)
.with_density_peak_centers() - Highest density point
.with_median_centroid_centers() - Medoid from median
.with_rmsd_centroid_centers() - RMSD-based centroid
Examples
>>> # Standard call with default centroid centers >>> pipeline.clustering.add.hdbscan("features", min_cluster_size=10)
>>> # Explicit center method selection >>> pipeline.clustering.add.hdbscan.with_median_centers("features", min_cluster_size=10) >>> pipeline.clustering.add.hdbscan.with_density_peak_centers("features", min_cluster_size=10)
- __init__(manager: ClusterManager, pipeline_data: PipelineData) None
Initialize HDBSCAN service.
Parameters
- managerClusterManager
Cluster manager instance
- pipeline_dataPipelineData
Pipeline data container
Returns
None
- with_centroid_centers(selection_name: str, min_cluster_size: int = 5, min_samples: int | None = None, cluster_selection_epsilon: float = 0.0, cluster_selection_method: str = 'eom', method: str = 'standard', sample_fraction: float = 0.1, knn_neighbors: int = 5, n_jobs: int = -1, max_blas_threads: int | None = 1, auto_limit_blas: bool = True, use_decomposed: bool = True, cluster_name: str | None = None, data_selector_name: str | None = None, force: bool = False, override_cache: bool = False) None
Add HDBSCAN with centroid (medoid) centers.
Parameters
- selection_namestr
Name of feature selection to cluster
- min_cluster_sizeint, default=5
Minimum cluster size
- min_samplesint, optional
Minimum samples for core point
- cluster_selection_epsilonfloat, default=0.0
Cluster selection threshold
- cluster_selection_methodstr, default=”eom”
Cluster selection method
- methodstr, default=”standard”
Clustering method
- sample_fractionfloat, default=0.1
Sampling fraction
- knn_neighborsint, default=5
K-NN neighbors
- n_jobsint, default=-1
Number of parallel jobs for core distance computation. -1 means using all processors.
- max_blas_threadsint or None, default=1
Preferred BLAS/OpenMP thread limit; set auto_limit_blas=False to disable thread limiting, or None to fall back to a safe default
- auto_limit_blasbool, default=True
Apply a safe thread policy: use BLAS=1 when n_jobs != 1, otherwise use max_blas_threads (fallback 2 when None)
- use_decomposedbool, default=True
Use decomposed data if available
- cluster_namestr, optional
Name for clustering result
- data_selector_namestr, optional
Data selector to apply
- forcebool, default=False
Force recalculation
- override_cachebool, default=False
Override cache settings
Returns
None
- with_mean_centers(selection_name: str, min_cluster_size: int = 5, min_samples: int | None = None, cluster_selection_epsilon: float = 0.0, cluster_selection_method: str = 'eom', method: str = 'standard', sample_fraction: float = 0.1, knn_neighbors: int = 5, n_jobs: int = -1, max_blas_threads: int | None = 1, auto_limit_blas: bool = True, use_decomposed: bool = True, cluster_name: str | None = None, data_selector_name: str | None = None, force: bool = False, override_cache: bool = False) None
Add HDBSCAN with mean centers.
Parameters
- selection_namestr
Name of feature selection to cluster
- min_cluster_sizeint, default=5
Minimum cluster size
- min_samplesint, optional
Minimum samples for core point
- cluster_selection_epsilonfloat, default=0.0
Cluster selection threshold
- cluster_selection_methodstr, default=”eom”
Cluster selection method
- methodstr, default=”standard”
Clustering method
- sample_fractionfloat, default=0.1
Sampling fraction
- knn_neighborsint, default=5
K-NN neighbors
- n_jobsint, default=-1
Number of parallel jobs for core distance computation. -1 means using all processors.
- max_blas_threadsint or None, default=1
Preferred BLAS/OpenMP thread limit; set auto_limit_blas=False to disable thread limiting, or None to fall back to a safe default
- auto_limit_blasbool, default=True
Apply a safe thread policy: use BLAS=1 when n_jobs != 1, otherwise use max_blas_threads (fallback 2 when None)
- use_decomposedbool, default=True
Use decomposed data if available
- cluster_namestr, optional
Name for clustering result
- data_selector_namestr, optional
Data selector to apply
- forcebool, default=False
Force recalculation
- override_cachebool, default=False
Override cache settings
Returns
None
- with_median_centers(selection_name: str, min_cluster_size: int = 5, min_samples: int | None = None, cluster_selection_epsilon: float = 0.0, cluster_selection_method: str = 'eom', method: str = 'standard', sample_fraction: float = 0.1, knn_neighbors: int = 5, n_jobs: int = -1, max_blas_threads: int | None = 1, auto_limit_blas: bool = True, use_decomposed: bool = True, cluster_name: str | None = None, data_selector_name: str | None = None, force: bool = False, override_cache: bool = False) None
Add HDBSCAN with median centers.
Parameters
- selection_namestr
Name of feature selection to cluster
- min_cluster_sizeint, default=5
Minimum cluster size
- min_samplesint, optional
Minimum samples for core point
- cluster_selection_epsilonfloat, default=0.0
Cluster selection threshold
- cluster_selection_methodstr, default=”eom”
Cluster selection method
- methodstr, default=”standard”
Clustering method
- sample_fractionfloat, default=0.1
Sampling fraction
- knn_neighborsint, default=5
K-NN neighbors
- n_jobsint, default=-1
Number of parallel jobs for core distance computation. -1 means using all processors.
- max_blas_threadsint or None, default=1
Preferred BLAS/OpenMP thread limit; set auto_limit_blas=False to disable thread limiting, or None to fall back to a safe default
- auto_limit_blasbool, default=True
Apply a safe thread policy: use BLAS=1 when n_jobs != 1, otherwise use max_blas_threads (fallback 2 when None)
- use_decomposedbool, default=True
Use decomposed data if available
- cluster_namestr, optional
Name for clustering result
- data_selector_namestr, optional
Data selector to apply
- forcebool, default=False
Force recalculation
- override_cachebool, default=False
Override cache settings
Returns
None
- with_density_peak_centers(selection_name: str, min_cluster_size: int = 5, min_samples: int | None = None, cluster_selection_epsilon: float = 0.0, cluster_selection_method: str = 'eom', method: str = 'standard', sample_fraction: float = 0.1, knn_neighbors: int = 5, n_jobs: int = -1, max_blas_threads: int | None = 1, auto_limit_blas: bool = True, use_decomposed: bool = True, cluster_name: str | None = None, data_selector_name: str | None = None, force: bool = False, override_cache: bool = False) None
Add HDBSCAN with density peak centers.
Parameters
- selection_namestr
Name of feature selection to cluster
- min_cluster_sizeint, default=5
Minimum cluster size
- min_samplesint, optional
Minimum samples for core point
- cluster_selection_epsilonfloat, default=0.0
Cluster selection threshold
- cluster_selection_methodstr, default=”eom”
Cluster selection method
- methodstr, default=”standard”
Clustering method
- sample_fractionfloat, default=0.1
Sampling fraction
- knn_neighborsint, default=5
K-NN neighbors
- n_jobsint, default=-1
Number of parallel jobs for core distance computation. -1 means using all processors.
- max_blas_threadsint or None, default=1
Preferred BLAS/OpenMP thread limit; set auto_limit_blas=False to disable thread limiting, or None to fall back to a safe default
- auto_limit_blasbool, default=True
Apply a safe thread policy: use BLAS=1 when n_jobs != 1, otherwise use max_blas_threads (fallback 2 when None)
- use_decomposedbool, default=True
Use decomposed data if available
- cluster_namestr, optional
Name for clustering result
- data_selector_namestr, optional
Data selector to apply
- forcebool, default=False
Force recalculation
- override_cachebool, default=False
Override cache settings
Returns
None
- with_median_centroid_centers(selection_name: str, min_cluster_size: int = 5, min_samples: int | None = None, cluster_selection_epsilon: float = 0.0, cluster_selection_method: str = 'eom', method: str = 'standard', sample_fraction: float = 0.1, knn_neighbors: int = 5, n_jobs: int = -1, max_blas_threads: int | None = 1, auto_limit_blas: bool = True, use_decomposed: bool = True, cluster_name: str | None = None, data_selector_name: str | None = None, force: bool = False, override_cache: bool = False) None
Add HDBSCAN with median centroid centers.
Parameters
- selection_namestr
Name of feature selection to cluster
- min_cluster_sizeint, default=5
Minimum cluster size
- min_samplesint, optional
Minimum samples for core point
- cluster_selection_epsilonfloat, default=0.0
Cluster selection threshold
- cluster_selection_methodstr, default=”eom”
Cluster selection method
- methodstr, default=”standard”
Clustering method
- sample_fractionfloat, default=0.1
Sampling fraction
- knn_neighborsint, default=5
K-NN neighbors
- n_jobsint, default=-1
Number of parallel jobs for core distance computation. -1 means using all processors.
- max_blas_threadsint or None, default=1
Preferred BLAS/OpenMP thread limit; set auto_limit_blas=False to disable thread limiting, or None to fall back to a safe default
- auto_limit_blasbool, default=True
Apply a safe thread policy: use BLAS=1 when n_jobs != 1, otherwise use max_blas_threads (fallback 2 when None)
- use_decomposedbool, default=True
Use decomposed data if available
- cluster_namestr, optional
Name for clustering result
- data_selector_namestr, optional
Data selector to apply
- forcebool, default=False
Force recalculation
- override_cachebool, default=False
Override cache settings
Returns
None
- with_rmsd_centroid_centers(selection_name: str, min_cluster_size: int = 5, min_samples: int | None = None, cluster_selection_epsilon: float = 0.0, cluster_selection_method: str = 'eom', method: str = 'standard', sample_fraction: float = 0.1, knn_neighbors: int = 5, n_jobs: int = -1, max_blas_threads: int | None = 1, auto_limit_blas: bool = True, use_decomposed: bool = True, cluster_name: str | None = None, data_selector_name: str | None = None, force: bool = False, override_cache: bool = False) None
Add HDBSCAN with RMSD centroid centers.
Parameters
- selection_namestr
Name of feature selection to cluster
- min_cluster_sizeint, default=5
Minimum cluster size
- min_samplesint, optional
Minimum samples for core point
- cluster_selection_epsilonfloat, default=0.0
Cluster selection threshold
- cluster_selection_methodstr, default=”eom”
Cluster selection method
- methodstr, default=”standard”
Clustering method
- sample_fractionfloat, default=0.1
Sampling fraction
- knn_neighborsint, default=5
K-NN neighbors
- n_jobsint, default=-1
Number of parallel jobs for core distance computation. -1 means using all processors.
- max_blas_threadsint or None, default=1
Preferred BLAS/OpenMP thread limit; set auto_limit_blas=False to disable thread limiting, or None to fall back to a safe default
- auto_limit_blasbool, default=True
Apply a safe thread policy: use BLAS=1 when n_jobs != 1, otherwise use max_blas_threads (fallback 2 when None)
- use_decomposedbool, default=True
Use decomposed data if available
- cluster_namestr, optional
Name for clustering result
- data_selector_namestr, optional
Data selector to apply
- forcebool, default=False
Force recalculation
- override_cachebool, default=False
Override cache settings
Returns
None