Open In Colab

ESCHR analysis of mouse hematopoiesis#

Finding population strucutures in scRNAseq of myeloid and erythroid differentiation from Paul et al. (2015).

Setup environment#

!pip install git+https://github.com/zunderlab/eschr.git@change_api
Collecting git+https://github.com/zunderlab/eschr.git@change_api
  Cloning https://github.com/zunderlab/eschr.git (to revision change_api) to /tmp/pip-req-build-vgbe9mbo
  Running command git clone --filter=blob:none --quiet https://github.com/zunderlab/eschr.git /tmp/pip-req-build-vgbe9mbo
  Running command git checkout -b change_api --track origin/change_api
  Switched to a new branch 'change_api'
  Branch 'change_api' set up to track remote branch 'change_api' from 'origin'.
  Resolved https://github.com/zunderlab/eschr.git to commit 6cbb9671b410327f6ccf1760dcde4d670d0465cc
  Installing build dependencies ... ?25l?25hdone
  Getting requirements to build wheel ... ?25l?25hdone
  Preparing metadata (pyproject.toml) ... ?25l?25hdone
Collecting anndata==0.7.8 (from eschr==0.0.1)
  Downloading anndata-0.7.8-py3-none-any.whl (91 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 91.1/91.1 kB 1.2 MB/s eta 0:00:00
?25hCollecting igraph==0.10.4 (from eschr==0.0.1)
  Downloading igraph-0.10.4-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.3/3.3 MB 18.6 MB/s eta 0:00:00
?25hCollecting leidenalg (from eschr==0.0.1)
  Downloading leidenalg-0.10.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 47.3 MB/s eta 0:00:00
?25hRequirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from eschr==0.0.1) (3.7.1)
Collecting nmslib==2.1.1 (from eschr==0.0.1)
  Downloading nmslib-2.1.1.tar.gz (188 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 188.7/188.7 kB 11.5 MB/s eta 0:00:00
?25h  Preparing metadata (setup.py) ... ?25l?25hdone
Requirement already satisfied: numpy>=1.22 in /usr/local/lib/python3.10/dist-packages (from eschr==0.0.1) (1.25.2)
Collecting pandas==1.5.3 (from eschr==0.0.1)
  Downloading pandas-1.5.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.1/12.1 MB 43.6 MB/s eta 0:00:00
?25hCollecting scanpy (from eschr==0.0.1)
  Downloading scanpy-1.10.1-py3-none-any.whl (2.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 64.8 MB/s eta 0:00:00
?25hRequirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from eschr==0.0.1) (1.11.4)
Collecting session-info (from eschr==0.0.1)
  Downloading session_info-1.0.0.tar.gz (24 kB)
  Preparing metadata (setup.py) ... ?25l?25hdone
Collecting umap-learn==0.5.2 (from eschr==0.0.1)
  Downloading umap-learn-0.5.2.tar.gz (86 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 86.8/86.8 kB 10.2 MB/s eta 0:00:00
?25h  Preparing metadata (setup.py) ... ?25l?25hdone
Collecting zarr (from eschr==0.0.1)
  Downloading zarr-2.17.2-py3-none-any.whl (208 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 208.5/208.5 kB 19.3 MB/s eta 0:00:00
?25hRequirement already satisfied: h5py in /usr/local/lib/python3.10/dist-packages (from anndata==0.7.8->eschr==0.0.1) (3.9.0)
Requirement already satisfied: natsort in /usr/local/lib/python3.10/dist-packages (from anndata==0.7.8->eschr==0.0.1) (8.4.0)
Requirement already satisfied: packaging>=20 in /usr/local/lib/python3.10/dist-packages (from anndata==0.7.8->eschr==0.0.1) (24.0)
Collecting xlrd<2.0 (from anndata==0.7.8->eschr==0.0.1)
  Downloading xlrd-1.2.0-py2.py3-none-any.whl (103 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 103.3/103.3 kB 11.6 MB/s eta 0:00:00
?25hCollecting texttable>=1.6.2 (from igraph==0.10.4->eschr==0.0.1)
  Downloading texttable-1.7.0-py2.py3-none-any.whl (10 kB)
Collecting pybind11<2.6.2 (from nmslib==2.1.1->eschr==0.0.1)
  Using cached pybind11-2.6.1-py2.py3-none-any.whl (188 kB)
Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from nmslib==2.1.1->eschr==0.0.1) (5.9.5)
Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas==1.5.3->eschr==0.0.1) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas==1.5.3->eschr==0.0.1) (2023.4)
Requirement already satisfied: scikit-learn>=0.22 in /usr/local/lib/python3.10/dist-packages (from umap-learn==0.5.2->eschr==0.0.1) (1.2.2)
Requirement already satisfied: numba>=0.49 in /usr/local/lib/python3.10/dist-packages (from umap-learn==0.5.2->eschr==0.0.1) (0.58.1)
Collecting pynndescent>=0.5 (from umap-learn==0.5.2->eschr==0.0.1)
  Downloading pynndescent-0.5.12-py3-none-any.whl (56 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.8/56.8 kB 6.3 MB/s eta 0:00:00
?25hRequirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from umap-learn==0.5.2->eschr==0.0.1) (4.66.2)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->eschr==0.0.1) (1.2.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->eschr==0.0.1) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->eschr==0.0.1) (4.51.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->eschr==0.0.1) (1.4.5)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->eschr==0.0.1) (9.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->eschr==0.0.1) (3.1.2)
INFO: pip is looking at multiple versions of scanpy to determine which version is compatible with other requirements. This could take a while.
Collecting scanpy (from eschr==0.0.1)
  Downloading scanpy-1.10.0-py3-none-any.whl (2.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 74.6 MB/s eta 0:00:00
?25h  Downloading scanpy-1.9.8-py3-none-any.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 78.1 MB/s eta 0:00:00
?25hRequirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from scanpy->eschr==0.0.1) (1.4.0)
Requirement already satisfied: networkx>=2.3 in /usr/local/lib/python3.10/dist-packages (from scanpy->eschr==0.0.1) (3.3)
Requirement already satisfied: patsy in /usr/local/lib/python3.10/dist-packages (from scanpy->eschr==0.0.1) (0.5.6)
Requirement already satisfied: seaborn>=0.13.0 in /usr/local/lib/python3.10/dist-packages (from scanpy->eschr==0.0.1) (0.13.1)
Requirement already satisfied: statsmodels>=0.10.0rc2 in /usr/local/lib/python3.10/dist-packages (from scanpy->eschr==0.0.1) (0.14.2)
Collecting stdlib_list (from session-info->eschr==0.0.1)
  Downloading stdlib_list-0.10.0-py3-none-any.whl (79 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 79.8/79.8 kB 10.3 MB/s eta 0:00:00
?25hCollecting asciitree (from zarr->eschr==0.0.1)
  Downloading asciitree-0.3.3.tar.gz (4.0 kB)
  Preparing metadata (setup.py) ... ?25l?25hdone
Collecting numcodecs>=0.10.0 (from zarr->eschr==0.0.1)
  Downloading numcodecs-0.12.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.7/7.7 MB 51.8 MB/s eta 0:00:00
?25hCollecting fasteners (from zarr->eschr==0.0.1)
  Downloading fasteners-0.19-py3-none-any.whl (18 kB)
Requirement already satisfied: llvmlite<0.42,>=0.41.0dev0 in /usr/local/lib/python3.10/dist-packages (from numba>=0.49->umap-learn==0.5.2->eschr==0.0.1) (0.41.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas==1.5.3->eschr==0.0.1) (1.16.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=0.22->umap-learn==0.5.2->eschr==0.0.1) (3.4.0)
Building wheels for collected packages: eschr, nmslib, umap-learn, session-info, asciitree
  Building wheel for eschr (pyproject.toml) ... ?25l?25hdone
  Created wheel for eschr: filename=eschr-0.0.1-py3-none-any.whl size=37127 sha256=63e5e9edead49a46e8f4dbe7de21efcfbd2b727a91a1ebef865a65ee8fb6dfd6
  Stored in directory: /tmp/pip-ephem-wheel-cache-evi0xmm4/wheels/02/33/ed/4964bd8c346d50547a28bfff70934fe0a9aec9047312b6cdbf
  Building wheel for nmslib (setup.py) ... ?25l?25hdone
  Created wheel for nmslib: filename=nmslib-2.1.1-cp310-cp310-linux_x86_64.whl size=13578642 sha256=c29f35267ed6ccfe412a9e2e4bed339db6a03ba455ec6165b1489e4ce8f38c57
  Stored in directory: /root/.cache/pip/wheels/21/1a/5d/4cc754a5b1a88405cad184b76f823897a63a8d19afcd4b9314
  Building wheel for umap-learn (setup.py) ... ?25l?25hdone
  Created wheel for umap-learn: filename=umap_learn-0.5.2-py3-none-any.whl size=82684 sha256=2c3504c92aba1637146026fab24bb88f5fac6597cbb4638ddce096d48afbdfad
  Stored in directory: /root/.cache/pip/wheels/ff/50/f5/c6dc74059096b9bd10a4446d33ad53748c67850e5c73eb85bd
  Building wheel for session-info (setup.py) ... ?25l?25hdone
  Created wheel for session-info: filename=session_info-1.0.0-py3-none-any.whl size=8026 sha256=fcc36f3f8595ed635f100b19e714eea7075961114d0b5f9c4725378d2593331b
  Stored in directory: /root/.cache/pip/wheels/6a/aa/b9/eb5d4031476ec10802795b97ccf937b9bd998d68a9b268765a
  Building wheel for asciitree (setup.py) ... ?25l?25hdone
  Created wheel for asciitree: filename=asciitree-0.3.3-py3-none-any.whl size=5034 sha256=75634dd19e2d7cc033a545445f4bbfd49186c8e5406330b621a7e1edacb6501f
  Stored in directory: /root/.cache/pip/wheels/7f/4e/be/1171b40f43b918087657ec57cf3b81fa1a2e027d8755baa184
Successfully built eschr nmslib umap-learn session-info asciitree
Installing collected packages: texttable, asciitree, xlrd, stdlib_list, pybind11, numcodecs, igraph, fasteners, zarr, session-info, pandas, nmslib, leidenalg, pynndescent, anndata, umap-learn, scanpy, eschr
  Attempting uninstall: xlrd
    Found existing installation: xlrd 2.0.1
    Uninstalling xlrd-2.0.1:
      Successfully uninstalled xlrd-2.0.1
  Attempting uninstall: pandas
    Found existing installation: pandas 2.0.3
    Uninstalling pandas-2.0.3:
      Successfully uninstalled pandas-2.0.3
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires pandas==2.0.3, but you have pandas 1.5.3 which is incompatible.
Successfully installed anndata-0.7.8 asciitree-0.3.3 eschr-0.0.1 fasteners-0.19 igraph-0.10.4 leidenalg-0.10.2 nmslib-2.1.1 numcodecs-0.12.1 pandas-1.5.3 pybind11-2.6.1 pynndescent-0.5.12 scanpy-1.9.8 session-info-1.0.0 stdlib_list-0.10.0 texttable-1.7.0 umap-learn-0.5.2 xlrd-1.2.0 zarr-2.17.2
import eschr as es
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib import rcParams
import scanpy as sc
sc.settings.verbosity = 0  # verbosity: errors (0), warnings (1), info (2), hints (3)
#sc.logging.print_versions()
results_file = './write/paul15.h5ad'
sc.settings.set_figure_params(dpi=80, frameon=False, figsize=(3, 3), facecolor='white')  # low dpi (dots per inch) yields small inline figures

Read in and preprocess data#

adata = sc.datasets.paul15()
adata
AnnData object with n_obs × n_vars = 2730 × 3451
    obs: 'paul15_clusters'
    uns: 'iroot'

Apply a simple preprocessing recipe.

sc.pp.recipe_zheng17(adata)
adata
AnnData object with n_obs × n_vars = 2730 × 999
    obs: 'paul15_clusters', 'n_counts_all'
    var: 'n_counts', 'mean', 'std'
    uns: 'iroot', 'log1p'

Run ESCHR analysis#

# Specify the path for creating the zarr store that
# will be used for interacting with your data.
zarr_loc = "./data/paul15.zarr"
# Now you can run the method with your prepped data!
# (add any optional hyperparameter specifications,
# but bear in mind the method was designed to work for
# diverse datasets with the default settings.)

# Note that Colab will run single-core so runtime will be
# substantially longer than when run with multiple cores,
# as the multi-process parallelization in the ensemble stage
# cannot occur in a single core scenario

adata = es.tl.consensus_cluster(
            adata=adata,
            zarr_loc=zarr_loc
        )
Multiprocessing will use 2 cores
making zarr
storing zarr data object as ./data/paul15.zarr
starting ensemble clustering multiprocess
Ensemble clustering finished in 301.419547791 seconds
starting consensus multiprocess
Final res: 0.35
Consensus clustering finished in 79.92671029600001 seconds
Final Clustering:
n hard clusters: 9
n soft clusters: 36
Full runtime: 381.61227440834045

ESCHR visualizations#

# Plot soft membership matrix heatmap visualization
es.pl.smm_heatmap(adata)
# Plot UMAP layout with points colored by
# ESCHR hard cluster labels and by uncertainty scores
es.pl.umap_heatmap(adata)
'X_umap'
No umap found - running umap...
../_images/6dc6d34fd57394dcde933596d216b6293f55655d3d94c5d947a808e7fce98c73.png

Scanpy visualizations#

# You can also use Scanpy to prepare a umap layout
# (or swap in your favorite 2D layout)
# to visualize the clustering results
sc.tl.pca(adata, n_comps=50)
sc.pp.neighbors(adata, n_neighbors=4, n_pcs=20)
sc.tl.umap(adata)
sc.pl.umap(adata, color=['hard_clusters', 'uncertainty_score', 'paul15_clusters'],show=True, color_map='viridis_r', s=10)
adata.obs['eschr_clust_7_membership'] = adata.obsm['soft_membership_matrix'][:,7]
adata.obs['eschr_clust_8_membership'] = adata.obsm['soft_membership_matrix'][:,8]
sc.pl.umap(adata, color=['eschr_clust_7_membership', 'eschr_clust_8_membership'],show=True, s=10)
cell_type_markers = {"Erythroid": ['Car1', 'Klf1'],
                     "Megakaryocytes": ['Pf4','Sdpr'],
                     "Dendritic cells": ['Cd74','H2-Aa'],
                    "Monocytes":['Irf8', 'Flt3'],
                    "Neutrophils":['Elane','Cebpe'],
                    "Basophils": ['Prss34','Lmo4'],
                    "Eosonophils": ['Prg2'],
                    "Natural killer cells":['Xcl1','Ccl5']}
for markers_ls in cell_type_markers.values():
  sc.pl.umap(adata, color=markers_ls,show=True,s=10)

Downstream scverse analyses with ESCHR clusters#

ESCHR clusters can be used for external downstream analyses such as PAGA

sc.tl.paga(adata, groups='hard_clusters')
plt.rcParams["figure.figsize"] = [4,4]
sc.pl.paga(adata, color=['hard_clusters'], threshold=0.2)