Open In Colab

ESCHR analysis of mouse hematopoiesis#

Finding population strucutures in scRNAseq of myeloid and erythroid differentiation from Paul et al. (2015).

Setup environment#

!pip install git+https://github.com/zunderlab/eschr.git
Collecting git+https://github.com/zunderlab/eschr.git@zarr_anndata
  Cloning https://github.com/zunderlab/eschr.git (to revision zarr_anndata) to /tmp/pip-req-build-wez41lh5
  Running command git clone --filter=blob:none --quiet https://github.com/zunderlab/eschr.git /tmp/pip-req-build-wez41lh5
  Running command git checkout -b zarr_anndata --track origin/zarr_anndata
  Switched to a new branch 'zarr_anndata'
  Branch 'zarr_anndata' set up to track remote branch 'zarr_anndata' from 'origin'.
  Resolved https://github.com/zunderlab/eschr.git to commit 8ec212f9eae2367e4e07294378a99759d608dc93
  Installing build dependencies ... ?25l?25hdone
  Getting requirements to build wheel ... ?25l?25hdone
  Preparing metadata (pyproject.toml) ... ?25l?25hdone
Collecting anndata>=0.8 (from eschr==1.0.1)
  Downloading anndata-0.11.3-py3-none-any.whl.metadata (8.2 kB)
Collecting annoy<2.0.0,>=1.17.0 (from eschr==1.0.1)
  Downloading annoy-1.17.3.tar.gz (647 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 647.5/647.5 kB 12.6 MB/s eta 0:00:00
?25h  Preparing metadata (setup.py) ... ?25l?25hdone
Collecting igraph (from eschr==1.0.1)
  Downloading igraph-0.11.8-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.8 kB)
Collecting leidenalg>=0.9.0 (from eschr==1.0.1)
  Downloading leidenalg-0.10.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (10 kB)
Requirement already satisfied: matplotlib>=3.6 in /usr/local/lib/python3.11/dist-packages (from eschr==1.0.1) (3.10.0)
Requirement already satisfied: numpy>=1.23 in /usr/local/lib/python3.11/dist-packages (from eschr==1.0.1) (1.26.4)
Requirement already satisfied: pandas>=1.5 in /usr/local/lib/python3.11/dist-packages (from eschr==1.0.1) (2.2.2)
Collecting scanpy (from eschr==1.0.1)
  Downloading scanpy-1.10.4-py3-none-any.whl.metadata (9.3 kB)
Requirement already satisfied: scipy<2.0.0,>=1.11.1 in /usr/local/lib/python3.11/dist-packages (from eschr==1.0.1) (1.13.1)
Collecting session-info (from eschr==1.0.1)
  Downloading session_info-1.0.0.tar.gz (24 kB)
  Preparing metadata (setup.py) ... ?25l?25hdone
Collecting sklearn-ann (from eschr==1.0.1)
  Downloading sklearn_ann-0.1.2-py3-none-any.whl.metadata (3.4 kB)
Collecting umap-learn!=0.5.0,>=0.5 (from eschr==1.0.1)
  Downloading umap_learn-0.5.7-py3-none-any.whl.metadata (21 kB)
Collecting zarr<3 (from eschr==1.0.1)
  Downloading zarr-2.18.4-py3-none-any.whl.metadata (5.8 kB)
Collecting array-api-compat!=1.5,>1.4 (from anndata>=0.8->eschr==1.0.1)
  Downloading array_api_compat-1.10.0-py3-none-any.whl.metadata (1.6 kB)
Requirement already satisfied: h5py>=3.7 in /usr/local/lib/python3.11/dist-packages (from anndata>=0.8->eschr==1.0.1) (3.12.1)
Requirement already satisfied: natsort in /usr/local/lib/python3.11/dist-packages (from anndata>=0.8->eschr==1.0.1) (8.4.0)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.11/dist-packages (from anndata>=0.8->eschr==1.0.1) (24.2)
Collecting texttable>=1.6.2 (from igraph->eschr==1.0.1)
  Downloading texttable-1.7.0-py2.py3-none-any.whl.metadata (9.8 kB)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=3.6->eschr==1.0.1) (1.3.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=3.6->eschr==1.0.1) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=3.6->eschr==1.0.1) (4.55.8)
Requirement already satisfied: kiwisolver>=1.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=3.6->eschr==1.0.1) (1.4.8)
Requirement already satisfied: pillow>=8 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=3.6->eschr==1.0.1) (11.1.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=3.6->eschr==1.0.1) (3.2.1)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.11/dist-packages (from matplotlib>=3.6->eschr==1.0.1) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas>=1.5->eschr==1.0.1) (2025.1)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas>=1.5->eschr==1.0.1) (2025.1)
Requirement already satisfied: scikit-learn>=0.22 in /usr/local/lib/python3.11/dist-packages (from umap-learn!=0.5.0,>=0.5->eschr==1.0.1) (1.6.1)
Requirement already satisfied: numba>=0.51.2 in /usr/local/lib/python3.11/dist-packages (from umap-learn!=0.5.0,>=0.5->eschr==1.0.1) (0.61.0)
Collecting pynndescent>=0.5 (from umap-learn!=0.5.0,>=0.5->eschr==1.0.1)
  Downloading pynndescent-0.5.13-py3-none-any.whl.metadata (6.8 kB)
Requirement already satisfied: tqdm in /usr/local/lib/python3.11/dist-packages (from umap-learn!=0.5.0,>=0.5->eschr==1.0.1) (4.67.1)
Collecting asciitree (from zarr<3->eschr==1.0.1)
  Downloading asciitree-0.3.3.tar.gz (4.0 kB)
  Preparing metadata (setup.py) ... ?25l?25hdone
Collecting fasteners (from zarr<3->eschr==1.0.1)
  Downloading fasteners-0.19-py3-none-any.whl.metadata (4.9 kB)
Collecting numcodecs!=0.14.0,!=0.14.1,>=0.10.0 (from zarr<3->eschr==1.0.1)
  Downloading numcodecs-0.15.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.9 kB)
Requirement already satisfied: joblib in /usr/local/lib/python3.11/dist-packages (from scanpy->eschr==1.0.1) (1.4.2)
Collecting legacy-api-wrap>=1.4 (from scanpy->eschr==1.0.1)
  Downloading legacy_api_wrap-1.4.1-py3-none-any.whl.metadata (2.1 kB)
Requirement already satisfied: networkx>=2.7 in /usr/local/lib/python3.11/dist-packages (from scanpy->eschr==1.0.1) (3.4.2)
Requirement already satisfied: patsy!=1.0.0 in /usr/local/lib/python3.11/dist-packages (from scanpy->eschr==1.0.1) (1.0.1)
Requirement already satisfied: seaborn>=0.13 in /usr/local/lib/python3.11/dist-packages (from scanpy->eschr==1.0.1) (0.13.2)
Requirement already satisfied: statsmodels>=0.13 in /usr/local/lib/python3.11/dist-packages (from scanpy->eschr==1.0.1) (0.14.4)
Collecting stdlib_list (from session-info->eschr==1.0.1)
  Downloading stdlib_list-0.11.0-py3-none-any.whl.metadata (3.3 kB)
Requirement already satisfied: llvmlite<0.45,>=0.44.0dev0 in /usr/local/lib/python3.11/dist-packages (from numba>=0.51.2->umap-learn!=0.5.0,>=0.5->eschr==1.0.1) (0.44.0)
Requirement already satisfied: deprecated in /usr/local/lib/python3.11/dist-packages (from numcodecs!=0.14.0,!=0.14.1,>=0.10.0->zarr<3->eschr==1.0.1) (1.2.18)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.11/dist-packages (from python-dateutil>=2.7->matplotlib>=3.6->eschr==1.0.1) (1.17.0)
Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn>=0.22->umap-learn!=0.5.0,>=0.5->eschr==1.0.1) (3.5.0)
Requirement already satisfied: wrapt<2,>=1.10 in /usr/local/lib/python3.11/dist-packages (from deprecated->numcodecs!=0.14.0,!=0.14.1,>=0.10.0->zarr<3->eschr==1.0.1) (1.17.2)
Downloading anndata-0.11.3-py3-none-any.whl (142 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 142.7/142.7 kB 11.5 MB/s eta 0:00:00
?25hDownloading leidenalg-0.10.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 52.9 MB/s eta 0:00:00
?25hDownloading igraph-0.11.8-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/3.1 MB 87.0 MB/s eta 0:00:00
?25hDownloading umap_learn-0.5.7-py3-none-any.whl (88 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88.8/88.8 kB 7.6 MB/s eta 0:00:00
?25hDownloading zarr-2.18.4-py3-none-any.whl (210 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 210.6/210.6 kB 14.5 MB/s eta 0:00:00
?25hDownloading scanpy-1.10.4-py3-none-any.whl (2.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 70.1 MB/s eta 0:00:00
?25hDownloading sklearn_ann-0.1.2-py3-none-any.whl (13 kB)
Downloading array_api_compat-1.10.0-py3-none-any.whl (50 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50.4/50.4 kB 3.7 MB/s eta 0:00:00
?25hDownloading legacy_api_wrap-1.4.1-py3-none-any.whl (10.0 kB)
Downloading numcodecs-0.15.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.9/8.9 MB 88.6 MB/s eta 0:00:00
?25hDownloading pynndescent-0.5.13-py3-none-any.whl (56 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.9/56.9 kB 5.4 MB/s eta 0:00:00
?25hDownloading texttable-1.7.0-py2.py3-none-any.whl (10 kB)
Downloading fasteners-0.19-py3-none-any.whl (18 kB)
Downloading stdlib_list-0.11.0-py3-none-any.whl (83 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 83.6/83.6 kB 6.6 MB/s eta 0:00:00
?25hBuilding wheels for collected packages: eschr, annoy, session-info, asciitree
  Building wheel for eschr (pyproject.toml) ... ?25l?25hdone
  Created wheel for eschr: filename=eschr-1.0.1-py3-none-any.whl size=38120 sha256=7b20d0863bd3d087664911c1c178f802c57b18812a349ba53ed559d9fd70f8e0
  Stored in directory: /tmp/pip-ephem-wheel-cache-_w585eqt/wheels/41/83/ea/eaa77160a18a35eea52096570c4c7f2136ebab2870ab614051
  Building wheel for annoy (setup.py) ... ?25l?25hdone
  Created wheel for annoy: filename=annoy-1.17.3-cp311-cp311-linux_x86_64.whl size=553317 sha256=4d774651f7eb043d5f7b9f0e6c68ccca70ce3b59f2d46f694eeda752de2aee5d
  Stored in directory: /root/.cache/pip/wheels/33/e5/58/0a3e34b92bedf09b4c57e37a63ff395ade6f6c1099ba59877c
  Building wheel for session-info (setup.py) ... ?25l?25hdone
  Created wheel for session-info: filename=session_info-1.0.0-py3-none-any.whl size=8023 sha256=ee1980d639bec5c58b4a90fff5759ed5c6540b3580c4ab45bea6b750a7d56d2c
  Stored in directory: /root/.cache/pip/wheels/4e/56/35/a748fc57279a4b84d0b332879445fed1ad8478e7257986b015
  Building wheel for asciitree (setup.py) ... ?25l?25hdone
  Created wheel for asciitree: filename=asciitree-0.3.3-py3-none-any.whl size=5034 sha256=495d4b7506b4ae9f148ab615e06cb30bbd539d897068d851b358ca817128be87
  Stored in directory: /root/.cache/pip/wheels/71/c1/da/23077eb3b87d24d6f3852ed1ed1a1ac2d3c885ad6ebd2b4a07
Successfully built eschr annoy session-info asciitree
Installing collected packages: texttable, asciitree, annoy, stdlib_list, legacy-api-wrap, igraph, fasteners, array-api-compat, session-info, numcodecs, leidenalg, zarr, sklearn-ann, pynndescent, anndata, umap-learn, scanpy, eschr
Successfully installed anndata-0.11.3 annoy-1.17.3 array-api-compat-1.10.0 asciitree-0.3.3 eschr-1.0.1 fasteners-0.19 igraph-0.11.8 legacy-api-wrap-1.4.1 leidenalg-0.10.2 numcodecs-0.15.0 pynndescent-0.5.13 scanpy-1.10.4 session-info-1.0.0 sklearn-ann-0.1.2 stdlib_list-0.11.0 texttable-1.7.0 umap-learn-0.5.7 zarr-2.18.4
import eschr as es
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib import rcParams
import scanpy as sc
sc.settings.verbosity = 0  # verbosity: errors (0), warnings (1), info (2), hints (3)
#sc.logging.print_versions()
results_file = './write/paul15.h5ad'
sc.settings.set_figure_params(dpi=80, frameon=False, figsize=(3, 3), facecolor='white')  # low dpi (dots per inch) yields small inline figures

Read in and preprocess data#

adata = sc.datasets.paul15()
adata
AnnData object with n_obs × n_vars = 2730 × 3451
    obs: 'paul15_clusters'
    uns: 'iroot'

Apply a simple preprocessing recipe.

sc.pp.recipe_zheng17(adata)
adata
AnnData object with n_obs × n_vars = 2730 × 999
    obs: 'paul15_clusters', 'n_counts_all'
    var: 'n_counts', 'mean', 'std'
    uns: 'iroot', 'log1p'

Run ESCHR analysis#

# Specify the path for creating the zarr store that
# will be used for interacting with your data.
zarr_loc = "./data/paul15.zarr"
# Now you can run the method with your prepped data!
# (add any optional hyperparameter specifications,
# but bear in mind the method was designed to work for
# diverse datasets with the default settings.)

# Note that Colab will run single-core so runtime will be
# substantially longer than when run with multiple cores,
# as the multi-process parallelization in the ensemble stage
# cannot occur in a single core scenario

adata = es.tl.consensus_cluster(
            adata=adata,
            zarr_loc=zarr_loc
        )
Multiprocessing will use 2 cores
making zarr
making new zarr
starting ensemble clustering multiprocess
Ensemble clustering finished in 152.601823728 seconds
starting consensus multiprocess
Final res: 0.35
Consensus clustering finished in 131.646210789 seconds
Final Clustering:
n hard clusters: 10
n soft clusters: 46
Full runtime: 284.5486943721771

ESCHR visualizations#

# Plot soft membership matrix heatmap visualization
es.pl.smm_heatmap(adata)
# Plot UMAP layout with points colored by
# ESCHR hard cluster labels and by uncertainty scores
es.pl.umap_heatmap(adata)
'X_umap'
No umap found - running umap...
../_images/b65057ed830a42a1e1403771984ba750ef34e2b9c02ce9f70aa0729e43d03991.png

Scanpy visualizations#

# You can also use Scanpy to prepare a umap layout
# (or swap in your favorite 2D layout)
# to visualize the clustering results
sc.tl.pca(adata, n_comps=50)
sc.pp.neighbors(adata, n_neighbors=4, n_pcs=20)
sc.tl.umap(adata)
sc.pl.umap(adata, color=['hard_clusters', 'uncertainty_score', 'paul15_clusters'],show=True, color_map='viridis_r', s=10)
adata.obs['eschr_clust_7_membership'] = adata.obsm['soft_membership_matrix'][:,7]
adata.obs['eschr_clust_8_membership'] = adata.obsm['soft_membership_matrix'][:,8]
sc.pl.umap(adata, color=['eschr_clust_7_membership', 'eschr_clust_8_membership'],show=True, s=10)
cell_type_markers = {"Erythroid": ['Car1', 'Klf1'],
                     "Megakaryocytes": ['Pf4','Sdpr'],
                     "Dendritic cells": ['Cd74','H2-Aa'],
                    "Monocytes":['Irf8', 'Flt3'],
                    "Neutrophils":['Elane','Cebpe'],
                    "Basophils": ['Prss34','Lmo4'],
                    "Eosonophils": ['Prg2'],
                    "Natural killer cells":['Xcl1','Ccl5']}
for markers_ls in cell_type_markers.values():
  sc.pl.umap(adata, color=markers_ls,show=True,s=10)

Downstream scverse analyses with ESCHR clusters#

ESCHR clusters can be used for external downstream analyses such as PAGA

sc.tl.paga(adata, groups='hard_clusters')
plt.rcParams["figure.figsize"] = [4,4]
sc.pl.paga(adata, color=['hard_clusters'], threshold=0.2)