Group Creation Helper
GitHub Link to Code.
Helper for creating data selector groups from clusters and tags.
This module provides utilities to automatically create multiple data selectors from clustering results or trajectory tags.
- class mdxplain.data_selector.helper.group_creation_helper.GroupCreationHelper
Helper class for creating data selector groups.
This class provides static methods to automatically create multiple data selectors from clustering results or tags, organizing them into named groups.
Examples
>>> # Create selectors for all clusters >>> group = GroupCreationHelper.create_cluster_selectors( ... pipeline_data, manager, "clusters", "my_clustering" ... ) >>> print(group.selector_names) ['clusters_0', 'clusters_1', 'clusters_2']
- static create_cluster_selectors(pipeline_data: PipelineData, manager: DataSelectorManager, group_name: str, clustering_name: str, cluster_ids: List[int] | None = None, noise_id: int | None = -1, min_cluster_size: int | None = 2, force: bool = False) DataSelectorGroup
Create data selectors for clusters.
Creates one data selector per cluster using the manager. Noise clusters are filtered out by default.
Parameters
- pipeline_dataPipelineData
Pipeline data object containing clustering results
- managerDataSelectorManager
Manager instance for creating and managing selectors
- group_namestr
Name for the selector group
- clustering_namestr
Name of the clustering to use for selector creation
- cluster_idsList[int], optional
Specific cluster IDs to include. If None, includes all non-noise clusters.
- noise_idint or None, default=-1
Cluster ID that represents noise/outliers to filter out.
If int: Filters out this specific cluster ID (e.g., -1 for sklearn)
If None: No filtering, creates selectors for ALL cluster IDs
- min_cluster_sizeint or None, optional
Minimum number of frames required for a cluster to be included. Default is 2 to avoid single-frame clusters (Decision Trees need >=2). If None, includes all clusters (except noise filtering).
- forcebool, default=False
Whether to overwrite existing selectors with same names. If False, raises ValueError when selector already exists.
Returns
- DataSelectorGroup
Created group containing all generated selector names. Access selector names via group.selector_names attribute.
Raises
- ValueError
If clustering_name does not exist in pipeline_data
- ValueError
If selector already exists and force is False
Examples
>>> # Create selectors for all non-noise clusters >>> group = GroupCreationHelper.create_cluster_selectors( ... pipeline_data, manager, "clusters", "dbscan_clustering" ... ) >>> print(group.selector_names) ['clusters_0', 'clusters_1', 'clusters_2']
>>> # Create selectors for specific clusters only >>> group = GroupCreationHelper.create_cluster_selectors( ... pipeline_data, manager, "folded", "clustering", ... cluster_ids=[0, 1] ... )
>>> # Include ALL clusters (even noise) >>> group = GroupCreationHelper.create_cluster_selectors( ... pipeline_data, manager, "all_states", "clustering", ... noise_id=None ... )
- static create_tag_selectors(pipeline_data: PipelineData, manager: DataSelectorManager, group_name: str, tags: List[str] | None = None, force: bool = False) DataSelectorGroup
Create data selectors for trajectory tags.
Creates one data selector per tag using the manager. Each selector contains all frames from trajectories with the specified tag. No logic duplication - manager does all the work.
Parameters
- pipeline_dataPipelineData
Pipeline data object containing trajectory tag information
- managerDataSelectorManager
Manager instance for creating and managing selectors
- group_namestr
Name for the selector group
- tagsList[str], optional
Specific tags to create selectors for. If None, creates selectors for all available tags found in trajectories.
- forcebool, default=False
Whether to overwrite existing selectors with same names. If False, raises ValueError when selector already exists.
Returns
- DataSelectorGroup
Created group containing all generated selector names. Access selector names via group.selector_names attribute.
Raises
- ValueError
If selector already exists and force is False
Examples
>>> # Create selectors for specific tags >>> group = GroupCreationHelper.create_tag_selectors( ... pipeline_data, manager, "systems", ... tags=["system_A", "system_B"] ... ) >>> print(group.selector_names) ['systems_system_A', 'systems_system_B']
>>> # Create selectors for all available tags >>> group = GroupCreationHelper.create_tag_selectors( ... pipeline_data, manager, "conditions", tags=None ... ) >>> print(group.selector_names) ['conditions_wild_type', 'conditions_mutant', 'conditions_biased']