Extended Data Fig. 1: A simple simulation consists of cells from two conditions and three cell types, each containing only two genes (X_1 and X_2).
From: Supervised learning of high-confidence phenotypic subpopulations from single-cell data

a, Visualizing cells from two conditions colored by condition labels using the two genes. b, Standard clustering of the cells. Cell number in parentheses. c, Percentage of cell condition labels within each cluster. d, The identified phenotypic subpopulations from the clustering-based method. e, The learned prediction model from PENCIL with the orange line as the boundary with prediction scores ℎ(𝑥) = 0 to classify the two conditions. Cells colored by the condition labels as in a. f, The learned rejection model from PENCIL with the green curve as the boundary with confidence scores 𝑟(𝑥) = 0 to reject cells. Cells colored by the condition labels as in a. g, PENCIL identified phenotypic subpopulations.