Extended Data Fig. 1: Identifying discrete clusters and continuous expression programs. | Nature Genetics

Extended Data Fig. 1: Identifying discrete clusters and continuous expression programs.

From: Pan-cancer single-cell RNA-seq identifies recurring programs of cellular heterogeneity

Extended Data Fig. 1

a, Performance of DBSCAN using different sizes of epsilon neighborhood (eps) and minimum numbers of points required to form a dense region (MinPts). We randomly selected cells from two different cell lines and tested the ability of DBSCAN to distinguish between them (two-sided Fisher’s exact test, P < 0.001) using different parameter combinations. The procedure was repeated 1,000 times and the combination yielding the highest rate of correct classification was applied in the subsequent analyses. b, t-SNE plots for additional two examples of cell lines from each of the four classes defined by presence and number of discrete subpopulations identified by DBSCAN (as in Fig. 2b). c, d, Identification of discrete programs of heterogeneity, as in Fig. 2b, using less stringent eps (1.2 and 1.5) highlights common trends. e, Number of heterogeneity programs identified per cell line using NMF. NMF was applied to each cell line using k (number of factors) of 6–9, and gene programs identified as variable with 2 or more values of k were retained (left panel, n = 1,445). To identify common expression programs varying within multiple cell lines, we excluded programs with limited similarity to all other programs as well as those associated with technical confounders (right panel, n = 800). f, Pairwise similarities between programs identified by NMF across all the cell lines analyzed, with cell lines ordered by hierarchical clustering. Programs with limited similarity to all other programs were excluded. Top panel indicates correlations between program scores and cell complexity (that is number of genes detected per cell). The cluster of programs that correlates with complexity (indicated by dashed lines) was excluded from subsequent analyses.

Back to article page