Plot PCA, UMAP and tSNE
We have already generated PCA, UMAP and tSNE embeddings for the dataset. However, it would be helpful to plot these embeddings and understand how our clusters look with different dimensionality reduction methods.
For users running the workflow -
For the tSNE plot, scroll to the “Scanpy PlotEmbed” component of the workflow. All the values for parameters are default
You can add a title under “Figure title”. You can use the edit button next to other parameters to change the value
For the UMAP plot, scroll to the second “Scanpy PlotEmbed” component of the workflow. All the values for parameters are default
The third “Scanpy PlotEmbed” tool is for plotting the PCA. The same things as above apply here
For users running each step -
The first plot is the tSNE plot -
For the tSNE plot, open up the tool “Scanpy PlotEmbed” and enter tsne under “name of the embedding to plot”
Under “Field for gene symbols”, enter Symbol
You can add a title under “Figure title”
You can change any other plot parameters that you would like to modify
Click on “Execute”
The next plot to generate is the UMAP plot
For the UMAP plot, open up the tool “Scanpy PlotEmbed” and enter umap under “name of the embedding to plot”
Under “Field for gene symbols”, enter Symbol
You can add a title under “Figure title”
You can change any other plot parameters that you would like to modify
Click on “Execute”
The final plot to generate is the PCA plot
For the PCA plot, open up the tool “Scanpy PlotEmbed” and enter pca under “name of the embedding to plot”
Under “Field for gene symbols”, enter Symbol
You can add a title under “Figure title”
You can change any other plot parameters that you would like to modify
Click on “Execute”
Your PCA, tSNE, and UMAP plots are ready to be viewed. These plots reduce the dimensionality of your data, and you can peek at different clusters broadly in a lower dimension. PCA plot is not as helpful as UMAP and tSNE plots since the PCA plots, in general, could plot one PC against the other (mostly PC1 vs. PC2), like in Figure 1. UMAP and tSNE plots are most helpful and can provide more insight into the clusters, like their distribution, and also help determine if there is a batch effect. Batch effect is when we see that some samples have more expression of genes compared to some other samples. Let’s look at some plots for PCA, tSNE and UMAP dimensioanlity reduction.
The PCA plot above only plots the first PC against the second PC and hence cannot capture the complexity of the data properly
Since PCA cannot capture the complexity of the data, it would be better to look at the UMAP and tSNE plots
The tSNE plot above shows the 7 clusters that are obtained
The UMAP plot above shows the 7 clusters that are obtained
The UMAP or tSNE plot can also be used to study batch effects in your data. For example, the UMAP plot below shows that the cluster of cells marked consists mainly of N705. We want the clusters should consist of all samples equally distributed. This is called a batch effect. We want to minimize batch effects and not want one sample to be overrepresented in a cluster/cell type. You should examine your UMAP to make sure that there are no batch effects. There are tools like Harmony and Seurat’s CCA method that remove batch effects effectively.
The area marked looks to be mostly comprised of N705