DBSCAN Add Service

GitHub Link to Code.

Service for adding DBSCAN clustering with flexible center method selection.

class mdxplain.clustering.services.dbscan_add_service.DBSCANAddService(manager: ClusterManager, pipeline_data: PipelineData)

Service for adding DBSCAN clustering.

Uses centroid (medoid) center calculation by default. The centroid is the actual data point closest to the cluster mean, ensuring the cluster center is a real conformational state from the trajectory.

For alternative center methods, use:

  • .with_mean_centers() - Arithmetic mean (may not be real state)

  • .with_median_centers() - Feature-wise median (robust to outliers)

  • .with_density_peak_centers() - Highest density point

  • .with_median_centroid_centers() - Medoid from median

  • .with_rmsd_centroid_centers() - RMSD-based centroid

Examples

>>> # Standard call with default centroid centers
>>> pipeline.clustering.add.dbscan("features", eps=0.5, min_samples=5)
>>> # Explicit center method selection
>>> pipeline.clustering.add.dbscan.with_median_centers("features", eps=0.5)
>>> pipeline.clustering.add.dbscan.with_density_peak_centers("features", eps=0.5)
__init__(manager: ClusterManager, pipeline_data: PipelineData) None

Initialize DBSCAN service.

Parameters

managerClusterManager

Cluster manager instance

pipeline_dataPipelineData

Pipeline data container

Returns

None

with_centroid_centers(selection_name: str, eps: float = 0.5, min_samples: int = 5, method: str = 'standard', sample_fraction: float = 0.1, knn_neighbors: int = 5, n_jobs: int = -1, max_blas_threads: int | None = 1, auto_limit_blas: bool = True, use_decomposed: bool = True, cluster_name: str | None = None, data_selector_name: str | None = None, force: bool = False, override_cache: bool = False) None

Add DBSCAN with centroid (medoid) centers.

Parameters

selection_namestr

Name of feature selection to cluster

epsfloat, default=0.5

Maximum distance between samples in neighborhood

min_samplesint, default=5

Minimum samples for core point

methodstr, default=”standard”

Clustering method: ‘standard’ or ‘knn_sampling’

sample_fractionfloat, default=0.1

Fraction for knn_sampling method

knn_neighborsint, default=5

Neighbors for k-NN sampling

n_jobsint, default=-1

Number of parallel jobs (-1 for all processors)

max_blas_threadsint or None, default=1

Preferred BLAS/OpenMP thread limit; set auto_limit_blas=False to disable thread limiting, or None to fall back to a safe default

auto_limit_blasbool, default=True

Apply a safe thread policy: use BLAS=1 when n_jobs != 1, otherwise use max_blas_threads (fallback 2 when None)

use_decomposedbool, default=True

Use decomposed data if available

cluster_namestr, optional

Name for clustering result

data_selector_namestr, optional

Data selector to apply

forcebool, default=False

Force recalculation

override_cachebool, default=False

Override cache settings

Returns

None

with_mean_centers(selection_name: str, eps: float = 0.5, min_samples: int = 5, method: str = 'standard', sample_fraction: float = 0.1, knn_neighbors: int = 5, n_jobs: int = -1, max_blas_threads: int | None = 1, auto_limit_blas: bool = True, use_decomposed: bool = True, cluster_name: str | None = None, data_selector_name: str | None = None, force: bool = False, override_cache: bool = False) None

Add DBSCAN with mean centers.

Parameters

selection_namestr

Name of feature selection to cluster

epsfloat, default=0.5

Maximum distance between samples in neighborhood

min_samplesint, default=5

Minimum samples for core point

methodstr, default=”standard”

Clustering method: ‘standard’ or ‘knn_sampling’

sample_fractionfloat, default=0.1

Fraction for knn_sampling method

knn_neighborsint, default=5

Neighbors for k-NN sampling

n_jobsint, default=-1

Number of parallel jobs (-1 for all processors)

max_blas_threadsint or None, default=1

Preferred BLAS/OpenMP thread limit; set auto_limit_blas=False to disable thread limiting, or None to fall back to a safe default

auto_limit_blasbool, default=True

Apply a safe thread policy: use BLAS=1 when n_jobs != 1, otherwise use max_blas_threads (fallback 2 when None)

use_decomposedbool, default=True

Use decomposed data if available

cluster_namestr, optional

Name for clustering result

data_selector_namestr, optional

Data selector to apply

forcebool, default=False

Force recalculation

override_cachebool, default=False

Override cache settings

Returns

None

with_median_centers(selection_name: str, eps: float = 0.5, min_samples: int = 5, method: str = 'standard', sample_fraction: float = 0.1, knn_neighbors: int = 5, n_jobs: int = -1, max_blas_threads: int | None = 1, auto_limit_blas: bool = True, use_decomposed: bool = True, cluster_name: str | None = None, data_selector_name: str | None = None, force: bool = False, override_cache: bool = False) None

Add DBSCAN with median centers.

Parameters

selection_namestr

Name of feature selection to cluster

epsfloat, default=0.5

Maximum distance between samples in neighborhood

min_samplesint, default=5

Minimum samples for core point

methodstr, default=”standard”

Clustering method: ‘standard’ or ‘knn_sampling’

sample_fractionfloat, default=0.1

Fraction for knn_sampling method

knn_neighborsint, default=5

Neighbors for k-NN sampling

n_jobsint, default=-1

Number of parallel jobs (-1 for all processors)

max_blas_threadsint or None, default=1

Preferred BLAS/OpenMP thread limit; set auto_limit_blas=False to disable thread limiting, or None to fall back to a safe default

auto_limit_blasbool, default=True

Apply a safe thread policy: use BLAS=1 when n_jobs != 1, otherwise use max_blas_threads (fallback 2 when None)

use_decomposedbool, default=True

Use decomposed data if available

cluster_namestr, optional

Name for clustering result

data_selector_namestr, optional

Data selector to apply

forcebool, default=False

Force recalculation

override_cachebool, default=False

Override cache settings

Returns

None

with_density_peak_centers(selection_name: str, eps: float = 0.5, min_samples: int = 5, method: str = 'standard', sample_fraction: float = 0.1, knn_neighbors: int = 5, n_jobs: int = -1, max_blas_threads: int | None = 1, auto_limit_blas: bool = True, use_decomposed: bool = True, cluster_name: str | None = None, data_selector_name: str | None = None, force: bool = False, override_cache: bool = False) None

Add DBSCAN with density peak centers.

Parameters

selection_namestr

Name of feature selection to cluster

epsfloat, default=0.5

Maximum distance between samples in neighborhood

min_samplesint, default=5

Minimum samples for core point

methodstr, default=”standard”

Clustering method: ‘standard’ or ‘knn_sampling’

sample_fractionfloat, default=0.1

Fraction for knn_sampling method

knn_neighborsint, default=5

Neighbors for k-NN sampling

n_jobsint, default=-1

Number of parallel jobs (-1 for all processors)

max_blas_threadsint or None, default=1

Preferred BLAS/OpenMP thread limit; set auto_limit_blas=False to disable thread limiting, or None to fall back to a safe default

auto_limit_blasbool, default=True

Apply a safe thread policy: use BLAS=1 when n_jobs != 1, otherwise use max_blas_threads (fallback 2 when None)

use_decomposedbool, default=True

Use decomposed data if available

cluster_namestr, optional

Name for clustering result

data_selector_namestr, optional

Data selector to apply

forcebool, default=False

Force recalculation

override_cachebool, default=False

Override cache settings

Returns

None

with_median_centroid_centers(selection_name: str, eps: float = 0.5, min_samples: int = 5, method: str = 'standard', sample_fraction: float = 0.1, knn_neighbors: int = 5, n_jobs: int = -1, max_blas_threads: int | None = 1, auto_limit_blas: bool = True, use_decomposed: bool = True, cluster_name: str | None = None, data_selector_name: str | None = None, force: bool = False, override_cache: bool = False) None

Add DBSCAN with median centroid centers.

Parameters

selection_namestr

Name of feature selection to cluster

epsfloat, default=0.5

Maximum distance between samples in neighborhood

min_samplesint, default=5

Minimum samples for core point

methodstr, default=”standard”

Clustering method: ‘standard’ or ‘knn_sampling’

sample_fractionfloat, default=0.1

Fraction for knn_sampling method

knn_neighborsint, default=5

Neighbors for k-NN sampling

n_jobsint, default=-1

Number of parallel jobs (-1 for all processors)

max_blas_threadsint or None, default=1

Preferred BLAS/OpenMP thread limit; set auto_limit_blas=False to disable thread limiting, or None to fall back to a safe default

auto_limit_blasbool, default=True

Apply a safe thread policy: use BLAS=1 when n_jobs != 1, otherwise use max_blas_threads (fallback 2 when None)

use_decomposedbool, default=True

Use decomposed data if available

cluster_namestr, optional

Name for clustering result

data_selector_namestr, optional

Data selector to apply

forcebool, default=False

Force recalculation

override_cachebool, default=False

Override cache settings

Returns

None

with_rmsd_centroid_centers(selection_name: str, eps: float = 0.5, min_samples: int = 5, method: str = 'standard', sample_fraction: float = 0.1, knn_neighbors: int = 5, n_jobs: int = -1, max_blas_threads: int | None = 1, auto_limit_blas: bool = True, use_decomposed: bool = True, cluster_name: str | None = None, data_selector_name: str | None = None, force: bool = False, override_cache: bool = False) None

Add DBSCAN with RMSD centroid centers.

Parameters

selection_namestr

Name of feature selection to cluster

epsfloat, default=0.5

Maximum distance between samples in neighborhood

min_samplesint, default=5

Minimum samples for core point

methodstr, default=”standard”

Clustering method: ‘standard’ or ‘knn_sampling’

sample_fractionfloat, default=0.1

Fraction for knn_sampling method

knn_neighborsint, default=5

Neighbors for k-NN sampling

n_jobsint, default=-1

Number of parallel jobs (-1 for all processors)

max_blas_threadsint or None, default=1

Preferred BLAS/OpenMP thread limit; set auto_limit_blas=False to disable thread limiting, or None to fall back to a safe default

auto_limit_blasbool, default=True

Apply a safe thread policy: use BLAS=1 when n_jobs != 1, otherwise use max_blas_threads (fallback 2 when None)

use_decomposedbool, default=True

Use decomposed data if available

cluster_namestr, optional

Name for clustering result

data_selector_namestr, optional

Data selector to apply

forcebool, default=False

Force recalculation

override_cachebool, default=False

Override cache settings

Returns

None