Group Creation Helper

GitHub Link to Code.

Helper for creating data selector groups from clusters and tags.

This module provides utilities to automatically create multiple data selectors from clustering results or trajectory tags.

class mdxplain.data_selector.helper.group_creation_helper.GroupCreationHelper

Helper class for creating data selector groups.

This class provides static methods to automatically create multiple data selectors from clustering results or tags, organizing them into named groups.

Examples

>>> # Create selectors for all clusters
>>> group = GroupCreationHelper.create_cluster_selectors(
...     pipeline_data, manager, "clusters", "my_clustering"
... )
>>> print(group.selector_names)
['clusters_0', 'clusters_1', 'clusters_2']
static create_cluster_selectors(pipeline_data: PipelineData, manager: DataSelectorManager, group_name: str, clustering_name: str, cluster_ids: List[int] | None = None, noise_id: int | None = -1, min_cluster_size: int | None = 2, force: bool = False) DataSelectorGroup

Create data selectors for clusters.

Creates one data selector per cluster using the manager. Noise clusters are filtered out by default.

Parameters

pipeline_dataPipelineData

Pipeline data object containing clustering results

managerDataSelectorManager

Manager instance for creating and managing selectors

group_namestr

Name for the selector group

clustering_namestr

Name of the clustering to use for selector creation

cluster_idsList[int], optional

Specific cluster IDs to include. If None, includes all non-noise clusters.

noise_idint or None, default=-1

Cluster ID that represents noise/outliers to filter out.

  • If int: Filters out this specific cluster ID (e.g., -1 for sklearn)

  • If None: No filtering, creates selectors for ALL cluster IDs

min_cluster_sizeint or None, optional

Minimum number of frames required for a cluster to be included. Default is 2 to avoid single-frame clusters (Decision Trees need >=2). If None, includes all clusters (except noise filtering).

forcebool, default=False

Whether to overwrite existing selectors with same names. If False, raises ValueError when selector already exists.

Returns

DataSelectorGroup

Created group containing all generated selector names. Access selector names via group.selector_names attribute.

Raises

ValueError

If clustering_name does not exist in pipeline_data

ValueError

If selector already exists and force is False

Examples

>>> # Create selectors for all non-noise clusters
>>> group = GroupCreationHelper.create_cluster_selectors(
...     pipeline_data, manager, "clusters", "dbscan_clustering"
... )
>>> print(group.selector_names)
['clusters_0', 'clusters_1', 'clusters_2']
>>> # Create selectors for specific clusters only
>>> group = GroupCreationHelper.create_cluster_selectors(
...     pipeline_data, manager, "folded", "clustering",
...     cluster_ids=[0, 1]
... )
>>> # Include ALL clusters (even noise)
>>> group = GroupCreationHelper.create_cluster_selectors(
...     pipeline_data, manager, "all_states", "clustering",
...     noise_id=None
... )
static create_tag_selectors(pipeline_data: PipelineData, manager: DataSelectorManager, group_name: str, tags: List[str] | None = None, force: bool = False) DataSelectorGroup

Create data selectors for trajectory tags.

Creates one data selector per tag using the manager. Each selector contains all frames from trajectories with the specified tag. No logic duplication - manager does all the work.

Parameters

pipeline_dataPipelineData

Pipeline data object containing trajectory tag information

managerDataSelectorManager

Manager instance for creating and managing selectors

group_namestr

Name for the selector group

tagsList[str], optional

Specific tags to create selectors for. If None, creates selectors for all available tags found in trajectories.

forcebool, default=False

Whether to overwrite existing selectors with same names. If False, raises ValueError when selector already exists.

Returns

DataSelectorGroup

Created group containing all generated selector names. Access selector names via group.selector_names attribute.

Raises

ValueError

If selector already exists and force is False

Examples

>>> # Create selectors for specific tags
>>> group = GroupCreationHelper.create_tag_selectors(
...     pipeline_data, manager, "systems",
...     tags=["system_A", "system_B"]
... )
>>> print(group.selector_names)
['systems_system_A', 'systems_system_B']
>>> # Create selectors for all available tags
>>> group = GroupCreationHelper.create_tag_selectors(
...     pipeline_data, manager, "conditions", tags=None
... )
>>> print(group.selector_names)
['conditions_wild_type', 'conditions_mutant', 'conditions_biased']