Accelerating Single Cell Genomic Analysis using RAPIDS

The human body is made up of nearly 40 trillion cells, of many different types. Recent advances in experimental biology have made it possible to explore the genetic material of single cells. With the birth of this new field of single-cell genomics, scientists can now probe the DNA and RNA of individual cells in the human body.

Single-cell genomic analysis has identified new types of cells in the human body, discovered what makes these cells different from each other, and how different types of cells respond to disease or drugs. Single-cell genomics has also proven key in the current COVID-19 pandemic, identifying cells susceptible to infection and revealing changes in the immune systems of infected patients.

Schematic showing a matrix of gene activity across single cells, which is analyzed to produce a 2-D visualization showing clusters of similar cells.
Figure 1. Workflow for a single-cell RNA sequencing experiment. Individual cells are isolated and gene activity is measured in each cell. Cells with similar gene activity are clustered together to identify the various types of cells in the population.

The availability of single-cell data is continuously increasing, as are dataset sizes, with recent experiments sequencing millions of cells. This analysis is often exploratory and further benefits from being interactive – to identify different types of cells at finer scales, to compare the cell types and to visualize the relationships between them. Current workflows are still very slow, making them prohibitive for the interactive analysis needed for research.

RAPIDS: Accelerating data science with GPUs

RAPIDS is a suite of open-source libraries that can speed up end-to-end data science workflows through the power of GPU acceleration. RAPIDS makes it possible to perform interactive data analysis on large datasets using Python APIs that closely resemble NumPy, Pandas, and scikit-learn.

Consider a typical workflow to perform single cell analysis. This begins with a matrix that maps the counts of each gene encountered in each cell. Preprocessing steps are performed to filter out noise, then the data is normalized to obtain the activity of every human gene in every individual cell of the dataset. Machine learning is also commonly used in this step to correct artifacts from data collection. Next, you perform dimensionality reduction before clustering and visualization to identify clusters of cells with similar genetic activity. Finally, you compare the genetic activity of these cell clusters to understand why different types of cells behave and respond differently.

Pipeline showing the process of RNA-seq data analysis and RAPIDS libraries that were used to accelerate each step.
Figure 2: Pipeline showing the steps in analysis of single-cell RNA sequencing data. Starting from a matrix of gene activity in every cell, RAPIDS libraries can be used to perform data processing, dimensionality reduction, clustering, and visualization, and to discover differential genes with different activity across clusters.

We released a GPU-accelerated version of this exact workflow in the clara-parabricks/rapids-single-cell-examples GitHub repo. The repo contains an example notebook that uses RAPIDS and Scanpy to analyze a dataset of 70,000 human lung cells, to identify cells that are susceptible to COVID-19. Scanpy is a toolkit for analyzing single-cell gene expression data, with options to accelerate specific commands using RAPIDS. We also have a CPU version of this notebook in the repo for comparison.

For example, running UMAP to visualize almost 70,000 cells with RAPIDS requires the following command:

sc.tl.umap(adata, min_dist=umap_min_dist, spread=umap_spread, method='rapids')
UMAP visualization showing approximately 70,000 cells grouped into 35 clusters.
Figure 3. UMAP visualization of approximately 70,000 cells from human lung samples, created by RAPIDS. Cells are labeled by Louvain clustering.

Generating this UMAP visualization takes one second using RAPIDS, compared to 80 seconds on a CPU. In fact, RAPIDS can accelerate the entire single-cell analysis workflow, making it possible to do interactive exploratory data analysis even on large datasets.

Instancem5a.12xlargep3.2xlargeAcceleration Factor
CPU/GPU typeIntel Xeon Platinum 8000, 48 vCPUsV100-16GB
Preprocessing311844
PCA183.45
t-SNE2082.295
k-Means clustering310.478
KNN256.14
UMAP80180
Louvain clustering170.357
Differential Gene Expression5410.85
End-to-end787 (13 Min)134 (2 Min)6
Instance Price/hr ($)2.0643.06 
Total Run Cost ($)0.4510.1144
Table 1: CPU runtime, GPU runtime, and GPU acceleration for each step in the analysis of approximately 70,000 human lung cells. All times are in seconds.

Analyzing one million cells in 11 minutes

We applied our RAPIDS analysis workflow next to one of the largest single-cell datasets available, one million mouse brain cells sequenced by 10X Genomics. For more information, see the 1M_brain_gpu_analysis_uvm.ipynb Jupyter notebook.

With this scale of data, analysis on CPUs becomes impractically slow; our end-to-end workflow took over three hours to run on an AWS M5a CPU instance. This makes interactive analysis virtually impossible. On the other hand, we observed even higher GPU acceleration on this larger dataset and were able to analyze the entire dataset in just over 11 minutes on a single GPU. Running the RAPIDS analysis on AWS was also 3x cheaper than the CPU version!

AWS Instancem5a.12xlargep3.8xlargeAcceleration Factor
CPU/GPU typeIntel Xeon Platinum 8000, 48 vCPUsV100-16GB
Preprocessing403332312.5
PCA3420.61.7
t-SNE541741132.1
k-Means clustering1062.150.5
KNN58553.411.0
UMAP175120.386.3
Louvain clustering5972.5238.8
End-to-end13002672.719.3
Instance Price/hr ($)2.06412.24 
Total Run Cost ($)7.4552.2873.3
Table 2. CPU runtime, GPU runtime, and GPU acceleration for each step in the analysis of 1 million mouse brain cells. All times are in seconds.

A GPU-powered cell browser for interactive single-cell analysis

As I mentioned earlier, the speed of data analysis with RAPIDS enables researchers to analyze data interactively in real time. We made this process even easier by developing a GPU-powered interactive cell browser that runs within a Jupyter notebook. Within this cell browser, you can visualize all the cells in a dataset and perform clustering analysis of your data through point and click methods. Using RAPIDS, these steps run in real time.

In this post, I show how you can easily select a group of cells and perform UMAP and Louvain clustering to identify subpopulations within this cell type.

Animated GIF showing a UMAP visualization of cells. A group of cells is selected using the mouse pointer and re-clustered using RAPIDS.
Figure 4. Point-and-click re-clustering of a selected group of cells in real time, by using RAPIDS in an interactive cell browser.

Conclusion

In this post, you saw how easy it is to use RAPIDS to accelerate single-cell genomic analysis on GPUs. With RAPIDS, it becomes easy to explore the data interactively in real time, cluster cells at different scales, and re-analyze large datasets with different parameters. All of this enables faster scientific discoveries. 

In addition to the APIs covered, RAPIDS has a large library of other algorithms that you might find useful in your work. For more information, see the clara-parabricks/rapids-single-cell-examples GitHub repo for this work as well as RAPIDS.