Clustering Entities
GitHub Link to Code.
Clustering entities module.
Contains data classes for clustering results and metadata.
Cluster Data
Cluster data container for computed clustering results.
Container for clustering results (DBSCAN, HDBSCAN, DPA) with associated metadata and hyperparameters. Stores cluster labels with clustering information.
- class mdxplain.clustering.entities.cluster_data.ClusterData(cluster_type: str, cache_path: str = './cache')
Container for clustering results with metadata and hyperparameters.
Stores results from clustering methods (DBSCAN, HDBSCAN, DPA) along with clustering metadata and hyperparameters used for computation.
Attributes
- cluster_typestr
Type of clustering algorithm used (e.g., “dbscan”, “hdbscan”, “dpa”)
- cache_pathstr
Path for cached results
- labelsnumpy.ndarray or None
Array of cluster labels for each trajectory frame, or None if not computed
- metadatadict or None
Dictionary containing clustering parameters, metrics, and other metadata, or None if not computed
- frame_mappingdict or None
Mapping from global_frame_index to (trajectory_index, local_frame_index), or None if not computed
- __init__(cluster_type: str, cache_path: str = './cache')
Initialize cluster data container.
Parameters
- cluster_typestr
Type of clustering algorithm used (e.g., “dbscan”, “hdbscan”, “dpa”)
- cache_pathstr, optional
Path for cached results, default=”./cache”
Returns
- None
Initializes cluster data container
Examples
>>> # Basic initialization >>> cluster_data = ClusterData("dbscan")
>>> # With custom cache path >>> cluster_data = ClusterData( ... "hdbscan", ... cache_path="./cache/clustering" ... )
- get_labels() ndarray | None
Get cluster labels for each trajectory frame.
Returns
- numpy.ndarray or None
Array of cluster labels corresponding to trajectory frame indices, or None if clustering has not been computed yet
Examples
>>> cluster_data = ClusterData("dbscan") >>> labels = cluster_data.get_labels() >>> if labels is not None: ... print(f"Number of frames: {len(labels)}")
- get_metadata() Dict[str, Any] | None
Get clustering metadata including parameters and metrics.
Returns
- dict or None
Dictionary containing clustering parameters, metrics, and other metadata, or None if clustering has not been computed yet
Examples
>>> cluster_data = ClusterData("dbscan") >>> metadata = cluster_data.get_metadata() >>> if metadata is not None: ... print(f"Number of clusters: {metadata.get('n_clusters', 'Unknown')}") ... print(f"Algorithm: {metadata.get('algorithm', 'Unknown')}")
- get_cluster_type() str
Get the clustering algorithm type.
Returns
- str
The type of clustering algorithm used
Examples
>>> cluster_data = ClusterData("dbscan") >>> print(cluster_data.get_cluster_type()) 'dbscan'
- get_cache_path() str
Get the cache path for clustering results.
Returns
- str
Path where clustering results are cached
Examples
>>> cluster_data = ClusterData("dbscan", cache_path="./my_cache") >>> print(cluster_data.get_cache_path()) './my_cache'
- has_data() bool
Check if clustering data has been computed and stored.
Returns
- bool
True if both labels and metadata are available, False otherwise
Examples
>>> cluster_data = ClusterData("dbscan") >>> print(cluster_data.has_data()) False >>> # After clustering computation... >>> # cluster_data.labels = computed_labels >>> # cluster_data.metadata = computed_metadata >>> # print(cluster_data.has_data()) >>> # True
- get_n_clusters() int | None
Get the number of clusters found.
Returns
- int or None
Number of clusters found, or None if clustering has not been computed or if the information is not available in metadata
Examples
>>> cluster_data = ClusterData("dbscan") >>> n_clusters = cluster_data.get_n_clusters() >>> if n_clusters is not None: ... print(f"Found {n_clusters} clusters")
- get_n_frames() int | None
Get the number of trajectory frames that were clustered.
Returns
- int or None
Number of trajectory frames, or None if clustering has not been computed
Examples
>>> cluster_data = ClusterData("dbscan") >>> n_frames = cluster_data.get_n_frames() >>> if n_frames is not None: ... print(f"Clustered {n_frames} frames")
- get_frame_mapping() Dict[int, tuple] | None
Get frame mapping from global frame indices to trajectory origins.
Returns
- dict or None
Mapping from global_frame_index to (trajectory_index, local_frame_index), or None if clustering has not been computed or mapping is not available
Examples
>>> cluster_data = ClusterData("dbscan") >>> frame_mapping = cluster_data.get_frame_mapping() >>> if frame_mapping is not None: ... print(f"Frame 0 comes from: {frame_mapping[0]}") # (traj_idx, local_frame_idx)
- set_frame_mapping(frame_mapping: Dict[int, tuple]) None
Set frame mapping from global frame indices to trajectory origins.
Parameters
- frame_mappingdict
Mapping from global_frame_index to (trajectory_index, local_frame_index)
Returns
- None
Sets the frame mapping for trajectory tracking
Examples
>>> cluster_data = ClusterData("dbscan") >>> mapping = {0: (0, 10), 1: (0, 11), 2: (1, 5)} # global -> (traj, local) >>> cluster_data.set_frame_mapping(mapping)
- get_centers() ndarray | None
Get cluster centers from metadata.
Returns
- numpy.ndarray or None
Array of cluster centers, shape (n_clusters, n_features), or None if centers have not been calculated
Examples
>>> cluster_data = ClusterData("dbscan") >>> centers = cluster_data.get_centers() >>> if centers is not None: ... print(f"Number of centers: {len(centers)}")
- save(save_path: str) None
Save ClusterData object to disk.
Parameters
- save_pathstr
Path where to save the ClusterData object
Returns
- None
Saves the ClusterData object to the specified path
Examples
>>> cluster_data.save('analysis_results/dbscan_clustering.pkl')