Extended Data Fig. 5: Batch-effect correction displayed by cell type identity.
From: A multicenter study benchmarking single-cell RNA sequencing technologies using reference samples

Batch-effect corrections were performed for the following four scenarios: (a) Scenario 1, where all 20 scRNA-seq datasets were combined, including mixed and non-mixed, with large proportions of two dissimilar types of cells (sample A, breast cancer cell line HCC1395 and sample B, B-lymphocyte line HCC1395BL); Datasets from 10x were down-sampled to 1200 cells per dataset. (b) Scenario 2, where five datasets (10X_LLU_A, 10X_NCI_A, C1_FDA_HT_A, C1_LLU_A, and ICELL8_SE_A) from the breast cancer cells (sample A, HCC1395) were generated separately at four centers (LLU, NCI, FDA, and TBU) on four platforms (10x, Fluidigm C1, Fluidigm C1_HT, and TBU ICELL8); (c) Scenario 3, where five datasets (10X_LLU_B, 10X_NCI_B, C1_FDA_HT_B, C1_LLU_B, and ICELL8_SE_B) from B-lymphocytes (sample B, HCC1395BL) were generated separately at four centers (LLU, NCI, FDA, and TBU) on four platforms (10x, Fluidigm C1, Fluidigm C1_HT, and TBU ICELL8); and (d) Scenario 4, where four datasets (10X_LLU_Mix10, 10X_NCI_M_Mix5, 10X_NCI_M_Mix5_F, 10X_NCI_M_Mix5_F2) were generated from 5% or 10% of breast cancer cells (sample A, HCC1395) spiked into the B-lymphocytes (sample B, HCC1395BL) and analyzed with the 10x Genomics platform at two centers (LLU and NCI) in four different batches. *For BBKNN, only UMAPs were available and shown in (a–d); all others are t-SNE plots. The HCC1395 breast cancer cells (sample A) were labeled in red and the HCC1395BL B lymphocytes (sample B) were labeled in blue. Batch correction methods included Seurat v3.1, fastMNN (SeuratWrappers v0.1.0), Scanorama v1.4, BBKNN v1.3.5, Harmony v0.99.9, limma v3.40.4, and Combat (sva v3.32.1). The top 2000 HVGs were used as the gene set for batch correction. All the 10x data were preprocessed using Cell Ranger version 3.1.