DPA Add Service

GitHub Link to Code.

Service for adding DPA clustering with flexible center method selection.

class mdxplain.clustering.services.dpa_add_service.DPAAddService(manager: ClusterManager, pipeline_data: PipelineData)

Service for adding DPA clustering.

Uses centroid (medoid) center calculation by default. The centroid is the actual data point closest to the cluster mean, ensuring the cluster center is a real conformational state from the trajectory.

For alternative center methods, use:

.with_mean_centers() - Arithmetic mean (may not be real state)
.with_median_centers() - Feature-wise median (robust to outliers)
.with_density_peak_centers() - Highest density point
.with_median_centroid_centers() - Medoid from median
.with_rmsd_centroid_centers() - RMSD-based centroid

Examples

>>> # Standard call with default centroid centers
>>> pipeline.clustering.add.dpa("features", Z=2.0)

>>> # Explicit center method selection
>>> pipeline.clustering.add.dpa.with_median_centers("features", Z=2.0)
>>> pipeline.clustering.add.dpa.with_density_peak_centers("features", Z=2.0)

__init__(manager: ClusterManager, pipeline_data: PipelineData) → None

Initialize DPA service.

Parameters

managerClusterManager: Cluster manager instance
pipeline_dataPipelineData: Pipeline data container

Returns

None

with_centroid_centers(selection_name: str, Z: float = 1.0, metric: str = 'euclidean', affinity: str = 'nearest_neighbors', density_algo: str = 'PAk', k_max: int = 1000, D_thr: float = 23.92812698, dim_algo: str = 'twoNN', blockAn: bool = True, block_ratio: int = 20, frac: float = 1.0, halos: bool = False, method: str = 'standard', sample_fraction: float = 0.1, knn_neighbors: int = 5, n_jobs: int = -1, max_blas_threads: int | None = 1, auto_limit_blas: bool = True, use_decomposed: bool = True, cluster_name: str | None = None, data_selector_name: str | None = None, force: bool = False, override_cache: bool = False) → None

Add DPA with centroid (medoid) centers.

Parameters

selection_namestr: Name of feature selection to cluster
Zfloat, default=1.0: Z-score threshold
metricstr, default=”euclidean”: Distance metric
affinitystr, default=”nearest_neighbors”: Affinity type
density_algostr, default=”PAk”: Density algorithm
k_maxint, default=1000: Maximum k for density
D_thrfloat, default=23.92812698: Density threshold
dim_algostr, default=”twoNN”: Dimensionality algorithm
blockAnbool, default=True: Block analysis
block_ratioint, default=20: Block ratio
fracfloat, default=1.0: Fraction of data
halosbool, default=False: Include halos
methodstr, default=”standard”: Clustering method
sample_fractionfloat, default=0.1: Sampling fraction
knn_neighborsint, default=5: K-NN neighbors
n_jobsint, default=-1: Number of parallel jobs (-1 for all processors)
max_blas_threadsint or None, default=1: Preferred BLAS/OpenMP thread limit; set auto_limit_blas=False to disable thread limiting, or None to fall back to a safe default
auto_limit_blasbool, default=True: Apply a safe thread policy: use BLAS=1 when n_jobs != 1, otherwise use max_blas_threads (fallback 2 when None)
use_decomposedbool, default=True: Use decomposed data if available
cluster_namestr, optional: Name for clustering result
data_selector_namestr, optional: Data selector to apply
forcebool, default=False: Force recalculation
override_cachebool, default=False: Override cache settings

Returns

None

with_mean_centers(selection_name: str, Z: float = 1.0, metric: str = 'euclidean', affinity: str = 'nearest_neighbors', density_algo: str = 'PAk', k_max: int = 1000, D_thr: float = 23.92812698, dim_algo: str = 'twoNN', blockAn: bool = True, block_ratio: int = 20, frac: float = 1.0, halos: bool = False, method: str = 'standard', sample_fraction: float = 0.1, knn_neighbors: int = 5, n_jobs: int = -1, max_blas_threads: int | None = 1, auto_limit_blas: bool = True, use_decomposed: bool = True, cluster_name: str | None = None, data_selector_name: str | None = None, force: bool = False, override_cache: bool = False) → None

Add DPA with mean centers.

Parameters

selection_namestr: Name of feature selection to cluster
Zfloat, default=1.0: Z-score threshold
metricstr, default=”euclidean”: Distance metric
affinitystr, default=”nearest_neighbors”: Affinity type
density_algostr, default=”PAk”: Density algorithm
k_maxint, default=1000: Maximum k for density
D_thrfloat, default=23.92812698: Density threshold
dim_algostr, default=”twoNN”: Dimensionality algorithm
blockAnbool, default=True: Block analysis
block_ratioint, default=20: Block ratio
fracfloat, default=1.0: Fraction of data
halosbool, default=False: Include halos
methodstr, default=”standard”: Clustering method
sample_fractionfloat, default=0.1: Sampling fraction
knn_neighborsint, default=5: K-NN neighbors
n_jobsint, default=-1: Number of parallel jobs (-1 for all processors)
max_blas_threadsint or None, default=1: Preferred BLAS/OpenMP thread limit; set auto_limit_blas=False to disable thread limiting, or None to fall back to a safe default
auto_limit_blasbool, default=True: Apply a safe thread policy: use BLAS=1 when n_jobs != 1, otherwise use max_blas_threads (fallback 2 when None)
use_decomposedbool, default=True: Use decomposed data if available
cluster_namestr, optional: Name for clustering result
data_selector_namestr, optional: Data selector to apply
forcebool, default=False: Force recalculation
override_cachebool, default=False: Override cache settings

Returns

None

with_median_centers(selection_name: str, Z: float = 1.0, metric: str = 'euclidean', affinity: str = 'nearest_neighbors', density_algo: str = 'PAk', k_max: int = 1000, D_thr: float = 23.92812698, dim_algo: str = 'twoNN', blockAn: bool = True, block_ratio: int = 20, frac: float = 1.0, halos: bool = False, method: str = 'standard', sample_fraction: float = 0.1, knn_neighbors: int = 5, n_jobs: int = -1, max_blas_threads: int | None = 1, auto_limit_blas: bool = True, use_decomposed: bool = True, cluster_name: str | None = None, data_selector_name: str | None = None, force: bool = False, override_cache: bool = False) → None

Add DPA with median centers.

Parameters

selection_namestr: Name of feature selection to cluster
Zfloat, default=1.0: Z-score threshold
metricstr, default=”euclidean”: Distance metric
affinitystr, default=”nearest_neighbors”: Affinity type
density_algostr, default=”PAk”: Density algorithm
k_maxint, default=1000: Maximum k for density
D_thrfloat, default=23.92812698: Density threshold
dim_algostr, default=”twoNN”: Dimensionality algorithm
blockAnbool, default=True: Block analysis
block_ratioint, default=20: Block ratio
fracfloat, default=1.0: Fraction of data
halosbool, default=False: Include halos
methodstr, default=”standard”: Clustering method
sample_fractionfloat, default=0.1: Sampling fraction
knn_neighborsint, default=5: K-NN neighbors
n_jobsint, default=-1: Number of parallel jobs (-1 for all processors)
max_blas_threadsint or None, default=1: Preferred BLAS/OpenMP thread limit; set auto_limit_blas=False to disable thread limiting, or None to fall back to a safe default
auto_limit_blasbool, default=True: Apply a safe thread policy: use BLAS=1 when n_jobs != 1, otherwise use max_blas_threads (fallback 2 when None)
use_decomposedbool, default=True: Use decomposed data if available
cluster_namestr, optional: Name for clustering result
data_selector_namestr, optional: Data selector to apply
forcebool, default=False: Force recalculation
override_cachebool, default=False: Override cache settings

Returns

None

with_density_peak_centers(selection_name: str, Z: float = 1.0, metric: str = 'euclidean', affinity: str = 'nearest_neighbors', density_algo: str = 'PAk', k_max: int = 1000, D_thr: float = 23.92812698, dim_algo: str = 'twoNN', blockAn: bool = True, block_ratio: int = 20, frac: float = 1.0, halos: bool = False, method: str = 'standard', sample_fraction: float = 0.1, knn_neighbors: int = 5, n_jobs: int = -1, max_blas_threads: int | None = 1, auto_limit_blas: bool = True, use_decomposed: bool = True, cluster_name: str | None = None, data_selector_name: str | None = None, force: bool = False, override_cache: bool = False) → None

Add DPA with density peak centers.

Parameters

selection_namestr: Name of feature selection to cluster
Zfloat, default=1.0: Z-score threshold
metricstr, default=”euclidean”: Distance metric
affinitystr, default=”nearest_neighbors”: Affinity type
density_algostr, default=”PAk”: Density algorithm
k_maxint, default=1000: Maximum k for density
D_thrfloat, default=23.92812698: Density threshold
dim_algostr, default=”twoNN”: Dimensionality algorithm
blockAnbool, default=True: Block analysis
block_ratioint, default=20: Block ratio
fracfloat, default=1.0: Fraction of data
halosbool, default=False: Include halos
methodstr, default=”standard”: Clustering method
sample_fractionfloat, default=0.1: Sampling fraction
knn_neighborsint, default=5: K-NN neighbors
n_jobsint, default=-1: Number of parallel jobs (-1 for all processors)
max_blas_threadsint or None, default=1: Preferred BLAS/OpenMP thread limit; set auto_limit_blas=False to disable thread limiting, or None to fall back to a safe default
auto_limit_blasbool, default=True: Apply a safe thread policy: use BLAS=1 when n_jobs != 1, otherwise use max_blas_threads (fallback 2 when None)
use_decomposedbool, default=True: Use decomposed data if available
cluster_namestr, optional: Name for clustering result
data_selector_namestr, optional: Data selector to apply
forcebool, default=False: Force recalculation
override_cachebool, default=False: Override cache settings

Returns

None

with_median_centroid_centers(selection_name: str, Z: float = 1.0, metric: str = 'euclidean', affinity: str = 'nearest_neighbors', density_algo: str = 'PAk', k_max: int = 1000, D_thr: float = 23.92812698, dim_algo: str = 'twoNN', blockAn: bool = True, block_ratio: int = 20, frac: float = 1.0, halos: bool = False, method: str = 'standard', sample_fraction: float = 0.1, knn_neighbors: int = 5, n_jobs: int = -1, max_blas_threads: int | None = 1, auto_limit_blas: bool = True, use_decomposed: bool = True, cluster_name: str | None = None, data_selector_name: str | None = None, force: bool = False, override_cache: bool = False) → None

Add DPA with median centroid centers.

Parameters

selection_namestr: Name of feature selection to cluster
Zfloat, default=1.0: Z-score threshold
metricstr, default=”euclidean”: Distance metric
affinitystr, default=”nearest_neighbors”: Affinity type
density_algostr, default=”PAk”: Density algorithm
k_maxint, default=1000: Maximum k for density
D_thrfloat, default=23.92812698: Density threshold
dim_algostr, default=”twoNN”: Dimensionality algorithm
blockAnbool, default=True: Block analysis
block_ratioint, default=20: Block ratio
fracfloat, default=1.0: Fraction of data
halosbool, default=False: Include halos
methodstr, default=”standard”: Clustering method
sample_fractionfloat, default=0.1: Sampling fraction
knn_neighborsint, default=5: K-NN neighbors
n_jobsint, default=-1: Number of parallel jobs (-1 for all processors)
max_blas_threadsint or None, default=1: Preferred BLAS/OpenMP thread limit; set auto_limit_blas=False to disable thread limiting, or None to fall back to a safe default
auto_limit_blasbool, default=True: Apply a safe thread policy: use BLAS=1 when n_jobs != 1, otherwise use max_blas_threads (fallback 2 when None)
use_decomposedbool, default=True: Use decomposed data if available
cluster_namestr, optional: Name for clustering result
data_selector_namestr, optional: Data selector to apply
forcebool, default=False: Force recalculation
override_cachebool, default=False: Override cache settings

Returns

None

with_rmsd_centroid_centers(selection_name: str, Z: float = 1.0, metric: str = 'euclidean', affinity: str = 'nearest_neighbors', density_algo: str = 'PAk', k_max: int = 1000, D_thr: float = 23.92812698, dim_algo: str = 'twoNN', blockAn: bool = True, block_ratio: int = 20, frac: float = 1.0, halos: bool = False, method: str = 'standard', sample_fraction: float = 0.1, knn_neighbors: int = 5, n_jobs: int = -1, max_blas_threads: int | None = 1, auto_limit_blas: bool = True, use_decomposed: bool = True, cluster_name: str | None = None, data_selector_name: str | None = None, force: bool = False, override_cache: bool = False) → None

Add DPA with RMSD centroid centers.

Parameters

selection_namestr: Name of feature selection to cluster
Zfloat, default=1.0: Z-score threshold
metricstr, default=”euclidean”: Distance metric
affinitystr, default=”nearest_neighbors”: Affinity type
density_algostr, default=”PAk”: Density algorithm
k_maxint, default=1000: Maximum k for density
D_thrfloat, default=23.92812698: Density threshold
dim_algostr, default=”twoNN”: Dimensionality algorithm
blockAnbool, default=True: Block analysis
block_ratioint, default=20: Block ratio
fracfloat, default=1.0: Fraction of data
halosbool, default=False: Include halos
methodstr, default=”standard”: Clustering method
sample_fractionfloat, default=0.1: Sampling fraction
knn_neighborsint, default=5: K-NN neighbors
n_jobsint, default=-1: Number of parallel jobs (-1 for all processors)
max_blas_threadsint or None, default=1: Preferred BLAS/OpenMP thread limit; set auto_limit_blas=False to disable thread limiting, or None to fall back to a safe default
auto_limit_blasbool, default=True: Apply a safe thread policy: use BLAS=1 when n_jobs != 1, otherwise use max_blas_threads (fallback 2 when None)
use_decomposedbool, default=True: Use decomposed data if available
cluster_namestr, optional: Name for clustering result
data_selector_namestr, optional: Data selector to apply
forcebool, default=False: Force recalculation
override_cachebool, default=False: Override cache settings

Returns

None