Extended Data Fig. 4: UMI curves from the raw data together with various CellBender outputs for the pbmc8k and rat6k datasets.

(a-d) pbmc8k, and (e-h) rat6k. (a,e) The raw UMI curves, annotated with areas of cells and empty droplets. Notably, the distinction is much more difficult in (e), the nuclei dataset extracted from heart tissue. (b,f) Cells probabilities inferred by CellBender on same UMI curves from (a,e) respectively. The region of transition from “surely-cell” to “surely-empty” is much broader in the snRNA-seq dataset. (c,g) First two principal components of the latent gene expression embedding inferred by CellBender, colored by Leiden clustering from a separate scanpy analysis. The structure very closely reflects the labels attributed by that separate analysis. (d,h) Scatter plots showing removal of each gene by CellBender (each dot is a gene, MALAT1 is off-scale). Several top denoised genes are indicated.