eschr.tl.consensus_cluster

eschr.tl.consensus_cluster#

eschr.tl.consensus_cluster(adata, zarr_loc, reduction='pca', metric=None, ensemble_size=150, auto_stop=False, k_range=(15, 150), la_res_range=(25, 175), nprocs=None, return_multires=False)#

Run ensemble of clusterings and find consensus.

Runs ensemble of leiden clusterings on random subsamples of input data with random hyperparameters from within the default range (or user specified). Then generates a bipartite graph from these results where data instances have edges to all clusters they were assigned to accross the ensemble of clsuterings. Bipartite community detection is run on this resulting graph to obtain final hard and soft clusters.

Parameters:
  • adata (anndata.AnnData) – AnnData object containing preprocessed data to be clustered in slot .X

  • zarr_loc (str) – Path to save zarr store which will hold the data to be clustered.

  • reduction ({'all', ‘pca’}) – Which method to use for feature extraction/selection/dimensionality reduction, or all for use all features. Currently only PCA is supported, but alternative options will be added in future releases. Once other options are added, the default will be to randomly select a reduction for each ensemble member. For datasets with fewer than 10 features, all features are used.

  • metric ({'euclidean', 'cosine', None}) – Metric used for neighborhood graph construction. Can be “euclidean”, “cosine”, or None. Default is None, in which case the metric is randomly selected for each ensemble member. Other metrics will be added in future releases.

  • ensemble_size (int, default=150) – Number of clusterings to run in the ensemble.

  • k_range (tuple of (int, int)) – Upper and lower limits for selecting random k for neighborhood graph construction.

  • la_res_range (tuple of (int, int)) – Upper and lower limits for selecting random resolution parameter for leiden community detection.

  • nprocs (int, default=None) – How many processes to run in parallel. If None, value is set using multiprocessing.cpu_count() to find number of available cores. This is used as a check and the minimum value between number of cores detected and specified number of processes is set as final value.

  • return_multires (bool, default=False) – Whether or not to add consensus results from all tested resolutions to the adata object. Default is False as this can add subtantial memory usage.

Returns:

: anndata.AnnData object modified in place.