Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Recovery of biological signals lost in single-cell batch integration with CellANOVA

Abstract

Data integration to align cells across batches has become a cornerstone of single-cell data analysis, critically affecting downstream results. Currently, there are no guidelines for when the biological differences between samples are separable from batch effects. Here we show that current paradigms for single-cell data integration remove biologically meaningful variation and introduce distortion. We present a statistical model and computationally scalable algorithm, CellANOVA (cell state space analysis of variance), that harnesses experimental design to explicitly recover biological signals that are erased during single-cell data integration. CellANOVA uses a ‘pool-of-controls’ design concept, applicable across diverse settings, to separate unwanted variation from biological variation of interest and allow the recovery of subtle biological signals. We apply CellANOVA to diverse contexts and validate the recovered biological signals by orthogonal assays. In particular, we show that CellANOVA is effective in the challenging case of single-cell and single-nucleus data integration, where it recovers subtle biological signals that can be validated and replicated by external data.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Examples of control pool construction and integration results.
Fig. 2: Overview of CellANOVA.
Fig. 3: Benchmarking of CellANOVA and other integration methods on batch effect correction and data distortion.
Fig. 4: Benchmarking of CellANOVA and other integration methods on biological signal preservation.
Fig. 5: Evaluation of CellANOVA in multimodal data integration.
Fig. 6: Descriptive analysis of CellANOVA-estimated batch effects.

Similar content being viewed by others

Data availability

The mouse radiation therapy dataset is available from the GEO under accession number GSE280883. The JAK inhibition and PD1 immunotherapy trial dataset was obtained from Mathew et al.30 and is availabe from the Gene Expression Omnibus (GEO) under accession number GSE266219. The T1D dataset was obtained from Fasolino et al.29 and is available from the GEO under accession number GSE148073. The human kidney multiomics atlas dataset was obtained from Abedini et al.31 and is available from the GEO under accession number GSE211785. The scRNA-seq data by Lake et al.36 can be downloaded from KPMP (https://www.kpmp.org.; DK133081, DK133091, DK133092, DK133093, DK133095, DK1330971, DK114866, DK114908, DK133090, DK133113, DK133766, DK133768, DK114907, DK114920, DK114923, DK114933 and DK114886). The NSCLC T cell dataset was obtained from Salcher et al.32 and downloaded from another study59.

Code availability

All code used in this study, including the CellANOVA software and the analysis code, can be found on GitHub (https://github.com/Janezjz/cellanova).

References

  1. Hicks, S. C., Townes, F. W., Teng, M. & Irizarry, R. A. Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics 19, 562–578 (2018).

    Article  PubMed  Google Scholar 

  2. Tung, P.-Y. et al. Batch effects and the effective design of single-cell gene expression studies. Sci. Rep. 7, 39921 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Luecken, M. D. & Theis, F. J. Current best practices in single-cell RNA-seq analysis: a tutorial. Mol. Syst. Biol. 15, e8746 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Zhang, Y. et al. Sample-multiplexing approaches for single-cell sequencing. Cell. Mol. Life Sci. 79, 466 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Kang, H. M. et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 36, 89–94 (2018).

    Article  CAS  PubMed  Google Scholar 

  6. Stoeckius, M. et al. Cell hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics. Genome Biol. 19, 224 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Xu, J. et al. Genotype-free demultiplexing of pooled single-cell RNA-seq. Genome Biol. 20, 290 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Gehring, J., Hwee Park, J., Chen, S., Thomson, M. & Pachter, L. Highly multiplexed single-cell RNA-seq by DNA oligonucleotide tagging of cellular proteins. Nat. Biotechnol. 38, 35–38 (2020).

  9. McGinnis, C. S. et al. MULTI-seq: sample multiplexing for single-cell RNA sequencing using lipid-tagged indices. Nat. Methods 16, 619–626 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Heaton, H. et al. Souporcell: robust clustering of single-cell RNA-seq data by genotype without reference genotypes. Nat. Methods 17, 615–620 (2020).

    Article  CAS  PubMed  Google Scholar 

  11. Haghverdi, L., Lun, A. T., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with harmony. Nat. Methods 16, 1289–1296 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Welch, J. D. et al. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 177, 1873–1887 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Hie, B., Bryson, B. & Berger, B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 37, 685–691 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Lin, Y. et al. scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets. Proc. Natl Acad. Sci. USA 116, 9775–9784 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Huang, M., Zhang, Z. & Zhang, N. R. Dimension reduction and denoising of single-cell rna sequencing data in the presence of observed confounding variables. Preprint at bioRxiv https://doi.org/10.1101/2020.08.03.234765 (2020).

  20. Song, F., Chan, G. M. A. & Wei, Y. Flexible experimental designs for valid single-cell RNA-sequencing experiments allowing batch effects correction. Nat. Commun. 11, 3274 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).

    Article  CAS  PubMed  Google Scholar 

  22. Tran, H. T. N. et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol. 21, 12 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Goh, W. W. B., Wang, W. & Wong, L. Why batch effects matter in omics data, and how to avoid them. Trends Biotechnol. 35, 498–507 (2017).

    Article  CAS  PubMed  Google Scholar 

  24. Leek, J. T. et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11, 733–739 (2010).

    Article  CAS  PubMed  Google Scholar 

  25. Molania, R. et al. Removing unwanted variation from large-scale RNA sequencing data with PRPS. Nat. Biotechnol. 41, 82–95 (2023).

    Article  CAS  PubMed  Google Scholar 

  26. Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E. & Storey, J. D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28, 882–883 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Sun, Y., Zhang, N. R. & Owen, A. B. Multiple hypothesis testing adjusted for latent variables, with an application to the AGEMAP gene expression data. Ann. Appl. Stat. 6, 1664–1688 (2012).

  28. Risso, D., Ngai, J., Speed, T. P. & Dudoit, S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 32, 896–902 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Fasolino, M. et al. Single-cell multi-omics analysis of human pancreatic islets reveals novel cellular states in type 1 diabetes. Nat. Metab. 4, 284–299 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Mathew, D. et al. Combined JAK inhibition and PD-1 immunotherapy for non-small cell lung cancer patients. Science 384, eadf1329 (2024).

  31. Abedini, A. et al. Single-cell multi-omic and spatial profiling of human kidneys implicates the fibrotic microenvironment in kidney disease progression. Nat. Genet. 56, 1712–1724 (2024).

  32. Salcher, S. et al. High-resolution single-cell atlas reveals diversity and plasticity of tissue-resident neutrophils in non-small cell lung cancer. Cancer Cell 40, 1503–1520 (2022).

    Article  Google Scholar 

  33. Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Liu, J. et al. Jointly defining cell types from multiple single-cell datasets using LIGER. Nat. Protoc. 15, 3632–3662 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Kang, J. B. et al. Efficient and precise single-cell reference atlas mapping with Symphony. Nat. Commun. 12, 5890 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Lake, B. B. et al. An atlas of healthy and injured cell states and niches in the human kidney. Nature 619, 585–594 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Liberzon, A. et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417–425 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Schröder, M. & Kaufman, R. J. The mammalian unfolded protein response. Annu. Rev. Biochem. 74, 739–789 (2005).

    Article  PubMed  Google Scholar 

  39. Wang, J., Yang, X. & Zhang, J. Bridges between mitochondrial oxidative stress, ER stress and mTOR signaling in pancreatic β cells. Cell. Signal. 28, 1099–1104 (2016).

    Article  CAS  PubMed  Google Scholar 

  40. Sarbassov, D. D., Ali, S. M. & Sabatini, D. M. Growing roles for the mTOR pathway. Curr. Opin. Cell Biol. 17, 596–603 (2005).

    Article  CAS  PubMed  Google Scholar 

  41. Wang, X. & Proud, C. G. The mTOR pathway in the control of protein synthesis. Physiology 21, 362–369 (2006).

    Article  CAS  PubMed  Google Scholar 

  42. Thompson, E. B. The many roles of c-Myc in apoptosis. Ann. Rev. Physiol. 60, 575–600 (1998).

    Article  CAS  Google Scholar 

  43. Dang, C. V. c-Myc target genes involved in cell growth, apoptosis, and metabolism. Mol. Cell. Biol. 19, 1–11 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Yates, F. The analysis of multiple classifications with unequal numbers in the different classes. J. Am. Stat. Assoc. 29, 51–66 (1934).

    Article  Google Scholar 

  45. Fisher, R. A. Statistical Methods for Research Workers (Oliver and Boyd, 1970).

  46. Mootha, V. K. et al. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 34, 267–273 (2003).

    Article  CAS  PubMed  Google Scholar 

  47. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Wolf, F. A. et al. PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol. 20, 59 (2019).

  49. Haghverdi, L., Büttner, M., Wolf, F. A., Buettner, F. & Theis, F. J. Diffusion pseudotime robustly reconstructs lineage branching. Nat. Methods 13, 845–848 (2016).

    Article  CAS  PubMed  Google Scholar 

  50. Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Xu, C. et al. Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models. Mol. Syst. Biol. 17, e9620 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Lotfollahi, M., Wolf, F. A. & Theis, F. J. scGen predicts single-cell perturbation responses. Nat. Methods 16, 715–721 (2019).

    Article  CAS  PubMed  Google Scholar 

  53. Rousseeuw, P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987).

    Article  Google Scholar 

  54. Büttner, M., Miao, Z., Wolf, F. A., Teichmann, S. A. & Theis, F. J. A test metric for assessing single-cell RNA-seq batch correction. Nat. Methods 16, 43–49 (2019).

    Article  PubMed  Google Scholar 

  55. Cover, T. M. & Thomas, J. A. Elements of Information Theory 2nd edn (Wiley, 2006).

  56. Hubert, L. & Arabie, P. Comparing partitions. J. Classif. 2, 193–218 (1985).

    Article  Google Scholar 

  57. Pearson, K. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos. Mag. 50, 157–175 (1900).

    Article  Google Scholar 

  58. Monaco, G. et al. RNA-seq signatures normalized by mRNA abundance allow absolute deconvolution of human immune cell types. Cell Rep. 26, 1627–1640 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Megill, C. et al. cellxgene: a performant, scalable exploration platform for high dimensional sparse matrices. Preprint at bioRxiv https://doi.org/10.1101/2021.04.05.438318 (2021).

Download references

Acknowledgements

We would like to acknowledge support from the National Science Foundation (NSF; DMS-2345215 and DMS-2210104 to Z.M. and DMS-2245575 to Z.M., Z.Z. and N.R.Z.), National Institutes of Health (R01-HG006137, U2C-CA233285, and 5R01GM125301 to N.R.Z., 5R01DK087635 to K.S. and TL1TR001880 to T.L.L.), Mark Foundation Center for Radiobiology and Immunology (to N.R.Z. and A.M.) and Parker Institute for Cancer Immunotherapy (to D.M.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding organizations.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, Z.Z., Z.M. and N.R.Z. Algorithm development, Z.Z., Z.M. and N.R.Z. Software implementation, Z.Z. Design of computational experiments, Z.Z., Z.M. and N.R.Z. Immunotherapy trial experiment, D.M. Mouse irradiation experiment, C.M.M. and T.L.L. Kidney disease experiment, K.S. Data analysis, Z.Z., with help from D.M., T.L.L., K.M., S.H., Z.M. and N.R.Z. Manuscript writing, Z.Z., Z.M. and N.R.Z., with feedback from T.L., D.M. and K.M. Supervision, A.J.M., E.J.W., K.S., Z.M. and N.R.Z.

Corresponding authors

Correspondence to Zongming Ma or Nancy R. Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Biotechnology thanks Jinmiao Chen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Benchmarking of Harmony and CellANOVA integrations of single-cell samples measured with multiple technologies on the NSCLC atlas dataset.

Benchmarking of Harmony and CellANOVA integrations of single-cell samples measured with multiple technologies on the NSCLC atlas dataset32. Harmony is used in the first step of CellANOVA. (a) UMAP visualizations of uncorrected data, Harmony-integrated data, and CellANOVA-integrated data. Each sample was treated as a separate batch. UMAPs were colored by technology, sample origin, and cell type in the top, the middle, and the bottom row, respectively. (b) Comparisons of batch correction (left), gene-wise distortion (middle) and global distortion (right) between Harmony and CellANOVA (scRNA-seq data: 20000 cells, 26023 genes; snRNA-seq data: 20000 cells, 30899 genes.) Left: Boxplots of iLISI scores across cells of each hold-out sample based on uncorrected data, Harmony-integrated data, and CellANOVA-integrated data. In each box plot, the box covers the middle 50% of iLISI scores and the whiskers stop at the largest iLISI score value below the upper quartile plus 1.5 * IQR and the smallest iLISI score value above the lower quartile minus 1.5 * IQR. Middle: Comparisons of p-values obtained from DE analysis pre- and post-Harmony integration and pre- and post-CellANOVA integration. Red dashed line corresponds to equal p-values pre- and post-integration. P-values for DEG analysis are calculated using a two-sided Wilcoxon signed-rank test. Right: Violin plots of Pearson correlations between pre- and post-integration gene expressions per cell with Harmony integration (top) and with CellANOVA integration (bottom). (c) Signal preservation evaluation of batch-corrected embeddings by out-of-batch nearest neighbor proportion distributions. We focused on the combined group of ‘T cell CD8’, ‘T cell CD4’, and ‘T regulatory’ cells, considering sample origin as the primary biological condition of interest. For cells with each origin, we computed out-of-batch nearest neighbor proportions across all sample origins. With successful signal preservation, cells with the same biological condition should be enriched in out-of-batch nearest neighbors.

Extended Data Fig. 2 Top ten upregulated pathways the disease condition for each specific cell type after Harmony integration without CellANOVA recovery.

Top ten upregulated pathways identified within the disease condition for each specific cell type in the Abedini et al. data and the KPMP data after Harmony integration without CellANOVA recovery. P-values for DEG analysis are calculated using a two-sided Wilcoxon signed-rank test.

Extended Data Fig. 3 Additional evaluation of CellANVOA in multimodal data integration.

Left: Scatter plots of epithelial-mesenchymal transition (EMT) pathway activity versus fibroblast ratio, with p-values obtained from a two-sided t-test of slope in linear regression. Middle: Epithelial-mesenchymal transition pathway activity. Right: Abundance of fibroblast cells from deconvolution.

Extended Data Fig. 4 Comparison of pathway enrichment analysis based on scRNA-seq versus flow cytometry of corresponding markers in NSCLC immunotherapy trial data.

(a) G2-M checkpoint pathway enrichment (scRNA-seq) versus Ki-67 frequency (flow cytometry). A positive normalized enrichment score (NES) from GSEA indicates higher pathway enrichment in the later time points. Both Ki-67 and G2-M checkpoint pathway activity measure cell proliferation. For the aPD1 group, comparing Ki-67 frequency (flow cytometry), the Chi-squared test yielded a p-value of 0.03824 between cycle 1 and cycle 2, and a p-value of 0.00043 between cycle 2 and cycle 4. (b) Interferon alpha/gamma response pathway enrichment (scRNA-seq) versus ISG15 mean fluorescence intensity (flow cytometry). For the aPD1 group, comparing between cycle 1 and cycle 2, the Chi-squared test yielded a p-value of 0.07008. (MFI: mean fluorescence intensity). (c) Cell-subtype-specific gene set enrichment analysis within each response group between cycle 2 and cycle 4 after Harmony-based CellANOVA integration. Top 5 up-regulated and down-regulated pathways in cycle 4 compared to cycle 2 are shown. P-value for DEG analysis are calculated using a two-sided Wilcoxon signed-rank test.

Extended Data Fig. 5 Benchmarking analysis on type 1 diabetes dataset.

(a) Comparison of methods from three complementary perspectives: batch correction, cell type prediction, and preservation of biological signals beyond cell type. (b) Performance of each method measured by different metrics from each perspective and overall performance percentiles of those methods. Performance scores for individual metrics are rank-based and have been normalized within a 0 to 1 range. For each metric, a higher score indicates better performance.

Extended Data Fig. 6 Benchmarking analysis on immunotherapy trial dataset.

(a) Comparison of methods from three complementary perspectives: batch correction, cell type prediction, and preservation of biological signals beyond cell type. (b) Performance of each method measured by different metrics from each perspective and overall performance percentiles of those methods. Performance scores for individual metrics are rank-based and have been normalized within a 0 to 1 range. For each metric, a higher score indicates better performance.

Extended Data Fig. 7 Benchmarking analysis on mouse radiation dataset.

(a) Comparison of methods from three complementary perspectives: batch correction, cell type prediction, and preservation of biological signals beyond cell type. (b) Performance of each method measured by different metrics from each perspective and overall performance percentiles of those methods. Performance scores for individual metrics are rank-based and have been normalized within a 0 to 1 range. For each metric, a higher score indicates better performance.

Extended Data Fig. 8 Benchmarking analysis on human kidney multi-omics dataset.

(a) Comparison of methods from three complementary perspectives: batch correction, cell type prediction, and preservation of biological signals beyond cell type. (b) Performance of each method measured by different metrics from each perspective and overall performance percentiles of those methods. Performance scores for individual metrics are rank-based and have been normalized within a 0 to 1 range. For each metric, a higher score indicates better performance.

Extended Data Fig. 9 Additional benchmarking of CellANOVA and other batch correction methods on real and simulated datasets.

(a) UMAP visualizations of the ‘hold-one-celltype-out’ experiment using immunotherapy data before and after batch correction, colored by cell type, condition, and batch. (b) Box plots showing the LISI score of cells from the control pool (left) and cells belonging to treatment-specific cell type Terminal Effector (right) in the experiment in (a) (30153 cells, 18126 genes). In each box plot, the box covers the middle 50% of iLISI scores and the whiskers stop at the largest iLISI score value below the upper quartile plus 1.5 * IQR and the smallest iLISI score value above the lower quartile minus 1.5 * IQR. (c) UMAP visualizations of simulated data before and after batch correction, colored by cell type, condition, and batch. The simulation setting is described in the Section ‘Additional benchmarks and cell type hold-out experiments’ of the main article. (d) Box plots showing the LISI score of cells from the control pool (left) and cells belonging to treatment-specific cell type CT6 (right) in the simulation in (b). Box plots constructed in the same way as in (b). (simulation data: 21000 cells, 5000 genes). (e) ROC curves obtained from differential expression analysis between control and treatment groups, using the batch-corrected expressions of CT6 cells from different integration methods.

Extended Data Fig. 10 Descriptive analysis of CellANOVA-estimated batch effects with initial integration by Seurat.

(a) UMAP visualizations of batch effects estimated by Seurat-based CellANOVA on three datasets, colored by batch and cell type. (b) Top ten batch-affected pathways of each study based on batch-susceptibility score (BSS) with Seurat-based CellANOVA. P-values are calculated using a two-sided Wilcoxon signed-rank test.

Supplementary information

Supplementary Information

Supplementary Figs. 1–22 and Discussion.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Z., Mathew, D., Lim, T.L. et al. Recovery of biological signals lost in single-cell batch integration with CellANOVA. Nat Biotechnol (2024). https://doi.org/10.1038/s41587-024-02463-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41587-024-02463-1

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing