ESCHR analysis of mouse hematopoiesis#
Finding population strucutures in scRNAseq of myeloid and erythroid differentiation from Paul et al. (2015).
Setup environment#
!pip install git+https://github.com/zunderlab/eschr.git
Collecting git+https://github.com/zunderlab/eschr.git
Cloning https://github.com/zunderlab/eschr.git to /tmp/pip-req-build-6jn8t0s7
Running command git clone --filter=blob:none --quiet https://github.com/zunderlab/eschr.git /tmp/pip-req-build-6jn8t0s7
Resolved https://github.com/zunderlab/eschr.git to commit 9bab41742202e2b9903e513d6b4a57a9a01dcac8
Installing build dependencies ... ?25l?25hdone
Getting requirements to build wheel ... ?25l?25hdone
Preparing metadata (pyproject.toml) ... ?25l?25hdone
Collecting anndata==0.7.8 (from eschr==0.0.1)
Downloading anndata-0.7.8-py3-none-any.whl (91 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 91.1/91.1 kB 1.1 MB/s eta 0:00:00
?25hCollecting igraph==0.10.4 (from eschr==0.0.1)
Downloading igraph-0.10.4-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.3/3.3 MB 17.4 MB/s eta 0:00:00
?25hCollecting leidenalg (from eschr==0.0.1)
Downloading leidenalg-0.10.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 35.8 MB/s eta 0:00:00
?25hRequirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from eschr==0.0.1) (3.7.1)
Collecting nmslib==2.1.1 (from eschr==0.0.1)
Downloading nmslib-2.1.1.tar.gz (188 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 188.7/188.7 kB 12.0 MB/s eta 0:00:00
?25h Preparing metadata (setup.py) ... ?25l?25hdone
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from eschr==0.0.1) (1.25.2)
Requirement already satisfied: pandas==1.5.3 in /usr/local/lib/python3.10/dist-packages (from eschr==0.0.1) (1.5.3)
Collecting scanpy (from eschr==0.0.1)
Downloading scanpy-1.9.8-py3-none-any.whl (2.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 50.8 MB/s eta 0:00:00
?25hRequirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from eschr==0.0.1) (1.11.4)
Collecting session-info (from eschr==0.0.1)
Downloading session_info-1.0.0.tar.gz (24 kB)
Preparing metadata (setup.py) ... ?25l?25hdone
Collecting umap-learn==0.5.2 (from eschr==0.0.1)
Downloading umap-learn-0.5.2.tar.gz (86 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 86.8/86.8 kB 10.2 MB/s eta 0:00:00
?25h Preparing metadata (setup.py) ... ?25l?25hdone
Collecting zarr (from eschr==0.0.1)
Downloading zarr-2.17.1-py3-none-any.whl (207 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 207.5/207.5 kB 21.3 MB/s eta 0:00:00
?25hRequirement already satisfied: h5py in /usr/local/lib/python3.10/dist-packages (from anndata==0.7.8->eschr==0.0.1) (3.9.0)
Requirement already satisfied: natsort in /usr/local/lib/python3.10/dist-packages (from anndata==0.7.8->eschr==0.0.1) (8.4.0)
Requirement already satisfied: packaging>=20 in /usr/local/lib/python3.10/dist-packages (from anndata==0.7.8->eschr==0.0.1) (24.0)
Collecting xlrd<2.0 (from anndata==0.7.8->eschr==0.0.1)
Downloading xlrd-1.2.0-py2.py3-none-any.whl (103 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 103.3/103.3 kB 11.0 MB/s eta 0:00:00
?25hCollecting texttable>=1.6.2 (from igraph==0.10.4->eschr==0.0.1)
Downloading texttable-1.7.0-py2.py3-none-any.whl (10 kB)
Collecting pybind11<2.6.2 (from nmslib==2.1.1->eschr==0.0.1)
Using cached pybind11-2.6.1-py2.py3-none-any.whl (188 kB)
Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from nmslib==2.1.1->eschr==0.0.1) (5.9.5)
Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas==1.5.3->eschr==0.0.1) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas==1.5.3->eschr==0.0.1) (2023.4)
Requirement already satisfied: scikit-learn>=0.22 in /usr/local/lib/python3.10/dist-packages (from umap-learn==0.5.2->eschr==0.0.1) (1.2.2)
Requirement already satisfied: numba>=0.49 in /usr/local/lib/python3.10/dist-packages (from umap-learn==0.5.2->eschr==0.0.1) (0.58.1)
Collecting pynndescent>=0.5 (from umap-learn==0.5.2->eschr==0.0.1)
Downloading pynndescent-0.5.11-py3-none-any.whl (55 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 55.8/55.8 kB 6.2 MB/s eta 0:00:00
?25hRequirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from umap-learn==0.5.2->eschr==0.0.1) (4.66.2)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->eschr==0.0.1) (1.2.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->eschr==0.0.1) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->eschr==0.0.1) (4.49.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->eschr==0.0.1) (1.4.5)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->eschr==0.0.1) (9.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->eschr==0.0.1) (3.1.2)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from scanpy->eschr==0.0.1) (1.3.2)
Requirement already satisfied: networkx>=2.3 in /usr/local/lib/python3.10/dist-packages (from scanpy->eschr==0.0.1) (3.2.1)
Requirement already satisfied: patsy in /usr/local/lib/python3.10/dist-packages (from scanpy->eschr==0.0.1) (0.5.6)
Requirement already satisfied: seaborn>=0.13.0 in /usr/local/lib/python3.10/dist-packages (from scanpy->eschr==0.0.1) (0.13.1)
Requirement already satisfied: statsmodels>=0.10.0rc2 in /usr/local/lib/python3.10/dist-packages (from scanpy->eschr==0.0.1) (0.14.1)
Collecting stdlib_list (from session-info->eschr==0.0.1)
Downloading stdlib_list-0.10.0-py3-none-any.whl (79 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 79.8/79.8 kB 9.5 MB/s eta 0:00:00
?25hCollecting asciitree (from zarr->eschr==0.0.1)
Downloading asciitree-0.3.3.tar.gz (4.0 kB)
Preparing metadata (setup.py) ... ?25l?25hdone
Collecting numcodecs>=0.10.0 (from zarr->eschr==0.0.1)
Downloading numcodecs-0.12.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.7/7.7 MB 53.1 MB/s eta 0:00:00
?25hCollecting fasteners (from zarr->eschr==0.0.1)
Downloading fasteners-0.19-py3-none-any.whl (18 kB)
Requirement already satisfied: llvmlite<0.42,>=0.41.0dev0 in /usr/local/lib/python3.10/dist-packages (from numba>=0.49->umap-learn==0.5.2->eschr==0.0.1) (0.41.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas==1.5.3->eschr==0.0.1) (1.16.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=0.22->umap-learn==0.5.2->eschr==0.0.1) (3.3.0)
Building wheels for collected packages: eschr, nmslib, umap-learn, session-info, asciitree
Building wheel for eschr (pyproject.toml) ... ?25l?25hdone
Created wheel for eschr: filename=eschr-0.0.1-py3-none-any.whl size=38566 sha256=36c879c27e7f1a06577a0af559f7599e0e5470ff46cab8c9a4f6fd6dba270893
Stored in directory: /tmp/pip-ephem-wheel-cache-ff29qd3s/wheels/0b/c9/53/e6b1fdd3f7caceacbace5fdc8d0db343188cb3bb606116665a
Building wheel for nmslib (setup.py) ... ?25l?25hdone
Created wheel for nmslib: filename=nmslib-2.1.1-cp310-cp310-linux_x86_64.whl size=13578647 sha256=a049b8192727ee7b4d6223dc14e7f3283e58513f47c1ce445df549e905c345c5
Stored in directory: /root/.cache/pip/wheels/21/1a/5d/4cc754a5b1a88405cad184b76f823897a63a8d19afcd4b9314
Building wheel for umap-learn (setup.py) ... ?25l?25hdone
Created wheel for umap-learn: filename=umap_learn-0.5.2-py3-none-any.whl size=82684 sha256=ca2bac872c286555f480b10daf4bae3d2ad9a99c0be45d092e063f167cff1c97
Stored in directory: /root/.cache/pip/wheels/ff/50/f5/c6dc74059096b9bd10a4446d33ad53748c67850e5c73eb85bd
Building wheel for session-info (setup.py) ... ?25l?25hdone
Created wheel for session-info: filename=session_info-1.0.0-py3-none-any.whl size=8026 sha256=85d3f67bf6dbbd9386fff7c602d327a5963daca48a271e316afa9370d6b0a0b0
Stored in directory: /root/.cache/pip/wheels/6a/aa/b9/eb5d4031476ec10802795b97ccf937b9bd998d68a9b268765a
Building wheel for asciitree (setup.py) ... ?25l?25hdone
Created wheel for asciitree: filename=asciitree-0.3.3-py3-none-any.whl size=5034 sha256=4b4413a74237651b66fe9f552d8651577b7b64f4bf0a06d2516661994a415b71
Stored in directory: /root/.cache/pip/wheels/7f/4e/be/1171b40f43b918087657ec57cf3b81fa1a2e027d8755baa184
Successfully built eschr nmslib umap-learn session-info asciitree
Installing collected packages: texttable, asciitree, xlrd, stdlib_list, pybind11, numcodecs, igraph, fasteners, zarr, session-info, nmslib, leidenalg, pynndescent, anndata, umap-learn, scanpy, eschr
Attempting uninstall: xlrd
Found existing installation: xlrd 2.0.1
Uninstalling xlrd-2.0.1:
Successfully uninstalled xlrd-2.0.1
Successfully installed anndata-0.7.8 asciitree-0.3.3 eschr-0.0.1 fasteners-0.19 igraph-0.10.4 leidenalg-0.10.2 nmslib-2.1.1 numcodecs-0.12.1 pybind11-2.6.1 pynndescent-0.5.11 scanpy-1.9.8 session-info-1.0.0 stdlib_list-0.10.0 texttable-1.7.0 umap-learn-0.5.2 xlrd-1.2.0 zarr-2.17.1
import eschr as es
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib import rcParams
import scanpy as sc
sc.settings.verbosity = 0 # verbosity: errors (0), warnings (1), info (2), hints (3)
#sc.logging.print_versions()
results_file = './write/paul15.h5ad'
sc.settings.set_figure_params(dpi=80, frameon=False, figsize=(3, 3), facecolor='white') # low dpi (dots per inch) yields small inline figures
Read in and preprocess data#
adata = sc.datasets.paul15()
adata
AnnData object with n_obs × n_vars = 2730 × 3451
obs: 'paul15_clusters'
uns: 'iroot'
Apply a simple preprocessing recipe.
sc.pp.recipe_zheng17(adata)
adata
AnnData object with n_obs × n_vars = 2730 × 999
obs: 'paul15_clusters', 'n_counts_all'
var: 'n_counts', 'mean', 'std'
uns: 'iroot', 'log1p'
Run ESCHR analysis#
# Create the zarr store that will be used for interacting with your data
#from ESCHR import _read_write_utils as es_rw
zarr_loc = "./data/paul15.zarr"
#es_rw.make_zarr(data=adata.X, zarr_loc=zarr_loc)
es.make_zarr(data=adata.X, zarr_loc=zarr_loc)
# Initialize a ConsensusCluster instance
# (you can try changing any optional hyperparameter specifications,
# but bear in mind the method was designed to work for
# diverse datasets with the default settings.)
# Note that Colab will run single-core so runtime will be
# substantially longer than when run with multiple cores,
# as the multi-process parallelization in the ensemble stage
# cannot occur in a single core scenario
cc_obj = es.ConsensusCluster(zarr_loc=zarr_loc)
# Now you can run the method with your prepped data:
cc_obj.consensus_cluster()
starting ensemble clustering multiprocess
Ensemble clustering finished in -1710787346.8872783 seconds
starting consensus multiprocess
Program finished in 107.39319952300002 seconds
Final res: 0.425
Final Clustering:
n hard clusters: 11
n soft clusters: 52
Full runtime: 500.18497228622437
# For most built-in visualizations and/or
# for compatibility with scverse suite of tools,
# you should next generate an AnnData object containing all outputs.
# There is a ConsensusCluster class method for doing this!
# This will add the AnnData object as an attribute to the
# ConsensusCluster object. It also works to add the ESCHR
# clustering attributes to an existing adata:
adata = cc_obj.make_adata( data = adata,
return_adata = True
)
adata
AnnData object with n_obs × n_vars = 2730 × 999
obs: 'paul15_clusters', 'n_counts_all', 'hard_clusters', 'uncertainty_score'
var: 'n_counts', 'mean', 'std'
uns: 'iroot', 'log1p'
obsm: 'soft_membership_matrix'
ESCHR visualizations#
Scanpy visualizations#
# You can also use Scanpy to prepare a umap layout
# (or swap in your favorite 2D layout)
# to visualize the clustering results
sc.tl.pca(adata, n_comps=50)
sc.pp.neighbors(adata, n_neighbors=4, n_pcs=20)
sc.tl.umap(adata)
sc.pl.umap(adata, color=['hard_clusters', 'uncertainty_score', 'paul15_clusters'],show=True, color_map='viridis_r', s=10)
adata.obs['eschr_clust_8_membership'] = adata.obsm['soft_membership_matrix'][:,8]
adata.obs['eschr_clust_9_membership'] = adata.obsm['soft_membership_matrix'][:,9]
sc.pl.umap(adata, color=['eschr_clust_8_membership', 'eschr_clust_9_membership'],show=True, s=10)
cell_type_markers = {"Erythroid": ['Car1', 'Klf1'],
"Megakaryocytes": ['Pf4','Sdpr'],
"Dendritic cells": ['Cd74','H2-Aa'],
"Monocytes":['Irf8', 'Flt3'],
"Neutrophils":['Elane','Cebpe'],
"Basophils": ['Prss34','Lmo4'],
"Eosonophils": ['Prg2'],
"Natural killer cells":['Xcl1','Ccl5']}
for markers_ls in cell_type_markers_paper.values():
sc.pl.umap(adata, color=markers_ls,show=True,s=10)
Downstream scverse analyses with ESCHR clusters#
ESCHR clusters can be used for external downstream analyses such as PAGA
sc.tl.paga(adata, groups='hard_clusters')