Clustering Entities

GitHub Link to Code.

Clustering entities module.

Contains data classes for clustering results and metadata.

Cluster Data

Cluster data container for computed clustering results.

Container for clustering results (DBSCAN, HDBSCAN, DPA) with associated metadata and hyperparameters. Stores cluster labels with clustering information.

class mdxplain.clustering.entities.cluster_data.ClusterData(cluster_type: str, cache_path: str = './cache')

Container for clustering results with metadata and hyperparameters.

Stores results from clustering methods (DBSCAN, HDBSCAN, DPA) along with clustering metadata and hyperparameters used for computation.

Attributes

cluster_typestr

Type of clustering algorithm used (e.g., “dbscan”, “hdbscan”, “dpa”)

cache_pathstr

Path for cached results

labelsnumpy.ndarray or None

Array of cluster labels for each trajectory frame, or None if not computed

metadatadict or None

Dictionary containing clustering parameters, metrics, and other metadata, or None if not computed

frame_mappingdict or None

Mapping from global_frame_index to (trajectory_index, local_frame_index), or None if not computed

__init__(cluster_type: str, cache_path: str = './cache')

Initialize cluster data container.

Parameters

cluster_typestr

Type of clustering algorithm used (e.g., “dbscan”, “hdbscan”, “dpa”)

cache_pathstr, optional

Path for cached results, default=”./cache”

Returns

None

Initializes cluster data container

Examples

>>> # Basic initialization
>>> cluster_data = ClusterData("dbscan")
>>> # With custom cache path
>>> cluster_data = ClusterData(
...     "hdbscan",
...     cache_path="./cache/clustering"
... )
get_labels() ndarray | None

Get cluster labels for each trajectory frame.

Returns

numpy.ndarray or None

Array of cluster labels corresponding to trajectory frame indices, or None if clustering has not been computed yet

Examples

>>> cluster_data = ClusterData("dbscan")
>>> labels = cluster_data.get_labels()
>>> if labels is not None:
...     print(f"Number of frames: {len(labels)}")
get_metadata() Dict[str, Any] | None

Get clustering metadata including parameters and metrics.

Returns

dict or None

Dictionary containing clustering parameters, metrics, and other metadata, or None if clustering has not been computed yet

Examples

>>> cluster_data = ClusterData("dbscan")
>>> metadata = cluster_data.get_metadata()
>>> if metadata is not None:
...     print(f"Number of clusters: {metadata.get('n_clusters', 'Unknown')}")
...     print(f"Algorithm: {metadata.get('algorithm', 'Unknown')}")
get_cluster_type() str

Get the clustering algorithm type.

Returns

str

The type of clustering algorithm used

Examples

>>> cluster_data = ClusterData("dbscan")
>>> print(cluster_data.get_cluster_type())
'dbscan'
get_cache_path() str

Get the cache path for clustering results.

Returns

str

Path where clustering results are cached

Examples

>>> cluster_data = ClusterData("dbscan", cache_path="./my_cache")
>>> print(cluster_data.get_cache_path())
'./my_cache'
has_data() bool

Check if clustering data has been computed and stored.

Returns

bool

True if both labels and metadata are available, False otherwise

Examples

>>> cluster_data = ClusterData("dbscan")
>>> print(cluster_data.has_data())
False
>>> # After clustering computation...
>>> # cluster_data.labels = computed_labels
>>> # cluster_data.metadata = computed_metadata
>>> # print(cluster_data.has_data())
>>> # True
get_n_clusters() int | None

Get the number of clusters found.

Returns

int or None

Number of clusters found, or None if clustering has not been computed or if the information is not available in metadata

Examples

>>> cluster_data = ClusterData("dbscan")
>>> n_clusters = cluster_data.get_n_clusters()
>>> if n_clusters is not None:
...     print(f"Found {n_clusters} clusters")
get_n_frames() int | None

Get the number of trajectory frames that were clustered.

Returns

int or None

Number of trajectory frames, or None if clustering has not been computed

Examples

>>> cluster_data = ClusterData("dbscan")
>>> n_frames = cluster_data.get_n_frames()
>>> if n_frames is not None:
...     print(f"Clustered {n_frames} frames")
get_frame_mapping() Dict[int, tuple] | None

Get frame mapping from global frame indices to trajectory origins.

Returns

dict or None

Mapping from global_frame_index to (trajectory_index, local_frame_index), or None if clustering has not been computed or mapping is not available

Examples

>>> cluster_data = ClusterData("dbscan")
>>> frame_mapping = cluster_data.get_frame_mapping()
>>> if frame_mapping is not None:
...     print(f"Frame 0 comes from: {frame_mapping[0]}")  # (traj_idx, local_frame_idx)
set_frame_mapping(frame_mapping: Dict[int, tuple]) None

Set frame mapping from global frame indices to trajectory origins.

Parameters

frame_mappingdict

Mapping from global_frame_index to (trajectory_index, local_frame_index)

Returns

None

Sets the frame mapping for trajectory tracking

Examples

>>> cluster_data = ClusterData("dbscan")
>>> mapping = {0: (0, 10), 1: (0, 11), 2: (1, 5)}  # global -> (traj, local)
>>> cluster_data.set_frame_mapping(mapping)
get_centers() ndarray | None

Get cluster centers from metadata.

Returns

numpy.ndarray or None

Array of cluster centers, shape (n_clusters, n_features), or None if centers have not been calculated

Examples

>>> cluster_data = ClusterData("dbscan")
>>> centers = cluster_data.get_centers()
>>> if centers is not None:
...     print(f"Number of centers: {len(centers)}")
save(save_path: str) None

Save ClusterData object to disk.

Parameters

save_pathstr

Path where to save the ClusterData object

Returns

None

Saves the ClusterData object to the specified path

Examples

>>> cluster_data.save('analysis_results/dbscan_clustering.pkl')
load(load_path: str) None

Load ClusterData object from disk.

Parameters

load_pathstr

Path to the saved ClusterData file

Returns

None

Loads the ClusterData object from the specified path

Examples

>>> cluster_data.load('analysis_results/dbscan_clustering.pkl')
print_info() None

Print comprehensive cluster information.

Parameters

None

Returns

None

Prints cluster information to console

Examples

>>> cluster_data.print_info()
=== ClusterData ===
Cluster Type: DBSCAN
Number of Clusters: 5
Number of Frames: 1000
Noise Points: 127 (12.7%)