Data Science

Cell Imaging Feature Extraction and Morphology Clustering for Spatial Omics

Live cell image showing cell segmentations.

VISTA-2D is a new foundational model from NVIDIA that can quickly and accurately perform cell segmentation, a fundamental task in cell imaging and spatial omics workflows that is critical to the accuracy of all downstream tasks. 

The VISTA-2D model uses an image encoder to create image embeddings, which it can then turn into segmentation masks (Figure 1). The embeddings must contain information about the morphologies of each cell. 

Diagram shows the data that Vista2D was trained on as input, the architecture of the Vista2D model, and the output segmentation maps.
Figure 1. VISTA-2D network architecture of VISTA-2D network

If an embedding could be generated for every cell segmentation, then you could run clustering on all the embeddings and as a method to automatically group together cells with similar morphologies. 

In this post, I walk you through an accompanying Jupyter notebook in depth to show how these tools can be used to first segment cells and extract their spatial features using VISTA-2D, then cluster these cell features using RAPIDS. This creates an automated pipeline to quickly classify cell types. 

Prerequisites

To follow along with this tutorial, you’ll need the following resources:

  • A basic familiarity with Python, Jupyter, and Docker
  • Docker version 19.03+ 

Starting the notebook 

The code for this Jupyter notebook is in the /clara-parabricks-workflows/vista2d_rapids_clustering GitHub repo and runs within the PyTorch Docker container from NGC. The notebook was built using the 24:03-py3 tag for this container. Run the container using the following command: 

docker run --rm -it \
    -v /path/to/this/repo/:/workspace \
    -p 8888:8888 \
    --gpus all \
    nvcr.io/nvidia/pytorch:24.03-py3 \
    /bin/bash

This command initiates the following actions:

  • Starts the Docker container.
  • Mounts the folder for the repo into the container.
  • Maps port 8888 on the host machine to port 8888 inside Docker.
  • Assigns all the GPUs available to the container.
  • Launches the PyTorch container.
  • Returns a terminal. 

Next, you need a few additional Python packages, which can be found in requirements.txt.

fastremap
tifffile 
monai
plotly

These packages are mostly for helper functions and plotting, which becomes more apparent later in this post. For now, they can be installed on top of the Docker container: 

pip install -r requirements.txt 

Next, start the notebook: 

jupyter notebook 

Now the notebook server is running and the notebook can be accessed using a web browser, either on the same machine on which the server is running or on a separate machine. 

In a browser, enter the IP address of the machine on which the server is running, followed by port 8888: 

<ip-address>/8888

Now the notebook is ready to run. For more information, see the /clara-parabricks-workflows/vista2d_rapids_clustering GitHub repo.

Cell segmentation and feature extraction with VISTA-2D 

The first half of this notebook uses data from Live Cell in combination with VISTA-2D to segment the cells in the image and extract features using the encoding layer from the VISTA-2D model itself. 

First, load a VISTA-2D model checkpoint, as this notebook does not focus on training the model, but on using it for the purposes of feature extraction. 

model_ckpt = "cell_vista_segmentation/results/model.pt"

Next, load helper functions that serve the purpose of keeping the main notebook from being too lengthy. 

from segmentation import segment_cells, plot_segmentation, feature_extract

The next sections offer more information about what these helper functions do. They can all be found inside segmentation.py

segment_cells 

This function takes the cell image and runs it through VISTA-2D from start to finish. This results in two additional images, one for the full segmentation, and one with every cell labeled from 1 to the number of cells that were found in the image (referred to in the notebook as pred_mask). This enables the cells to be individually indexed for feature extraction down the line. 

img_path="example_livecell_image.tif"
patch, segmentation, pred_mask = segment_cells(img_path, model_ckpt)

plot_segmentation

This function takes the output of segment_cells and displays the images so they can be visually verified for accuracy in the segmentation and prediction masks. Figure 2 shows an example of what the output should look like using the cell image provided in the notebook. 

plot_segmentation(patch, segmentation, pred_mask)

Alt: Three images show the result of the VISTA-2D segmentation: the original cell image, segmentation of all the cells from the background, and individual masks for each cell.

feature_extract

This function takes every individual cell segmentation and generates a feature vector. Each cell is contained in a square mask cropped to just fit the one cell and any surrounding background. It uses the first half of the VISTA-2D model as an encoder to generate these feature vectors. 

The idea is that the resulting vector contains all the information that’s needed for cell segmentation and thus must also contain information about the morphology of each cell. This information, as a vector, can easily be plugged into a clustering algorithm. Cells with similar morphologies should have similar feature vectors and get assigned to similar clusters. 

cell_features = feature_extract(pred_mask, patch, model_ckpt) 

This results in a matrix that has num_cells rows and 1024 columns, which is the length of the encoding vector for each cell. 

Now that you have the feature vectors for every cell, it’s time to run them all through a clustering algorithm using RAPIDS. 

Clustering with RAPIDS 

RAPIDS is a GPU-accelerated machine learning library with matching APIs for commonly used Python data science libraries, such as pandas and sci-kit learn. In this notebook, you only use the feature reduction and clustering portions of RAPIDS, but there are many more offerings available. 

from cuml import TruncatedSVD, DBSCAN

TruncatedSVD

The feature vectors that you get from VISTA-2D are of length 1024. However, as you only have on the order of about 80 cells in the image, it doesn’t make sense to make clusters with this many features. 

You can use dimensionality reduction algorithms to reduce the length of these embeddings while minimizing the information lost. In this notebook, use the Truncated SVD algorithm to reduce the dimensions from 1024 to 3. This also makes it easier to plot the clusters, as you can visualize the clusters in 3D space. 

dim_red_model = TruncatedSVD(n_components=3)
X = dim_red_model.fit_transform(cell_features)

This results in the new matrix of feature vectors, X, which is now of size [num_cells, 3] instead of the original vectors in cell_features, which were of size [num_cells, 1024].

DBSCAN

There are lots of clustering algorithms available in RAPIDS. For this notebook, I chose DBSCAN. Here you set the eps (maximum distance between two points) to be 0.003 and set the minimum number of samples allowed to constitute a cluster as 2

model = DBSCAN(eps=0.003, min_samples=2)
labels = model.fit_predict(X)

Running fit_predict now yields a cluster label for each cell in the image. If you convert the list of labels to a dictionary of labels, it is easier to see which cells have been assigned to which clusters. 

# Background is 0, so cell IDs start at 1
labels_dict = {x:np.add(np.where(labels==x),1) for x in np.unique(labels)}

# Label -1 means "data was too noisy" so we remove it
labels_dict.pop(-1)
labels_dict

Lastly, you can use Plotly to configure a 3D interactive plot to show where each cell was clustered. 

import plotly

data = []

for l in labels_dict.keys():
    
    cluster_indices = labels_dict[l][0]-1

    # Configure the trace
    trace = go.Scatter3d(
        x=X[cluster_indices,0],  
        y=X[cluster_indices,1],  
        z=X[cluster_indices,2],
        name="Cluster "+str(l),
        mode='markers',
        marker={
            'size': 10,
            'opacity': 0.8,
        }
    )
    
    data.append(trace)

# Configure the layout
layout = go.Layout(
    margin={'l': 0, 'r': 0, 'b': 0, 't': 0}
)

plot_figure = go.Figure(data=data, layout=layout)

# Render the plot
plotly.offline.iplot(plot_figure)
A 3D scatter plot where each color represents a different cluster found by RAPIDS.
Figure 3. Interactive 3D diagram that results from the plot of the clustered feature vectors

Conclusion

In this post, I showed you how you can use the VISTA-2D model to segment cells in an image and extract feature vectors from each of those segmented cells. I also demonstrated how to use RAPIDS to run clustering on those vectors. 

For more information, see the following resources:

Discuss (1)

Tags