Extended Data Fig. 5: Evaluating PENCIL on the simulated datasets with batch-effect.
From: Supervised learning of high-confidence phenotypic subpopulations from single-cell data

a, UMAP based on the manually curated genes showing the cells of two conditions from two batches separated by the dashed line. b, UMAP based on the manually curated genes showing the cells of two conditions after batch corrections. c, UMAP based on the top 3000 MVGs showing all cells. d, PENCIL selected genes. e, UMAP based on the PENCIL selected genes showing the PENCIL selected cells. f, The Venn diagram showing the overlap between the ground truth phenotype-enriched subpopulations and the PENCIL selected cells. g, The box plots comparing the performances of PENCIL, Milo, DAseq and MELD in simulated batch effects datasets with mixing rates 0, 0.1, 0.2 and 0.3 (n = 50 simulations). In the box plots, the center line and the box bounds represent median value and upper and lower quartiles, respectively. Box whiskers indicate the largest and smallest values no more than 1.5 times the interquartile range from the quartiles.