Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Multipotent lineage potential in B cell acute lymphoblastic leukemia is associated with distinct cellular origins and clinical features

Abstract

Developmental origins and their associations with lineage plasticity and treatment response in B-cell progenitor acute lymphoblastic leukemia (B-ALL) are mostly unexplored. Here, we integrated single-cell transcriptome sequencing (scRNA-seq) of 89 B-ALL samples with a single-cell atlas of normal human B cell development incorporating functional and molecular assays. We observed subtype- and sample-dependent correlation with normal developmental stage, with intra-subtype and intra-patient heterogeneity. We show that subtypes prone to shift from the B-lineage (for example BCR::ABL1, KMT2A-R and DUX4-R B-ALL) are enriched for multipotent progenitors and show this developmental stage exhibits CEBPA activation and retains myeloid potential, providing a mechanistic explanation for this clinical observation. We developed a ‘multipotency score’ most enriched in subtypes exhibiting lineage plasticity that was independently associated with inferior survival. Thus, multipotent B-ALL states reflect the early progenitor origins of a subset of patients with B-ALL and may be relevant for understanding lineage shifting following conventional chemotherapy or immunotherapies.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Single-cell RNA sequencing landscape of B-ALL.
Fig. 2: Single-cell genomic and transcriptomic heterogeneity.
Fig. 3: Characterization of normal human B cell development.
Fig. 4: Heterogeneity in B cell developmental states within B-ALL.
Fig. 5: B-ALL developmental state abundance and multipotency score.
Fig. 6: B-ALL developmental states refine existing genomic subtypes.
Fig. 7: Clinical associations with B-ALL developmental states.
Fig. 8: Single-cell RNA-seq analysis of nonmalignant cells in B-ALL samples.

Similar content being viewed by others

Data availability

Single-cell RNA and DNA sequences have been deposited in the European Genome–phenome Archive (EGA) under accession no. EGAS00001007512 and at Alex’s Lemonade Stand Foundation for Childhood Cancer single-cell portal (https://scpca.alexslemonade.org/projects/SCPCP000008). Access to the EGA dataset is currently available and can be requested by submitting an application to the Data Access Committee (EGAC00001002449), which is chaired by C.G.M. All requests from investigators seeking to use the data to examine scientific questions are approved and data released according to the terms outlined in the data access agreements. Feature-barcode matrix, feature and barcode sequences from the Cell Ranger output of scRNA-seq data have also been deposited to the Gene Expression Omnibus under accession no. GSE241405. RNA-seq and ATAC-seq data from immunophenotypically sorted populations have been deposited to GSE125345 and GSE285437, respectively. The scRNA-seq UMAPs can be interactively visualized at https://proteinpaint.stjude.org/BALLscrna. Source data are provided with this paper.

Code availability

The scripts for the BM map are available at https://github.com/andygxzeng/BoneMarrowMap and those for the B cell developmental map can be found at https://github.com/andygxzeng/b_development_map. The scripts used to generate the figures are available at https://github.com/gaoqs313/BALL-Single-Cell-Landscape.

References

  1. Arber, D. A. et al. International Consensus Classification of Myeloid Neoplasms and Acute Leukemias: integrating morphologic, clinical, and genomic data. Blood 140, 1200–1228 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Alaggio, R. et al. The 5th edition of the World Health Organization classification of haematolymphoid tumours: lymphoid neoplasms. Leukemia 36, 1720–1748 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Duncavage, E. J. et al. Genomic profiling for clinical decision making in myeloid neoplasms and acute leukemia. Blood 140, 2228–2247 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Iacobucci, I., Kimura, S. & Mullighan, C. G. Biologic and therapeutic implications of genomic alterations in acute lymphoblastic leukemia. J. Clin. Med. 10, 3792 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Brady, S. W. et al. The genomic landscape of pediatric acute lymphoblastic leukemia. Nat. Genet. 54, 1376–1389 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Jeha, S. et al. Clinical significance of novel subtypes of acute lymphoblastic leukemia in the context of minimal residual disease-directed therapy. Blood Cancer Discov. 2, 326–337 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Waanders, E. et al. Mutational landscape and patterns of clonal evolution in relapsed pediatric acute lymphoblastic leukemia. Blood Cancer Discov. 1, 96–111 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Dobson, S. M. et al. Relapse-fated latent diagnosis subclones in acute B lineage leukemia are drug tolerant and possess distinct metabolic programs. Cancer Discov. 10, 568–587 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Zeng, A. G. X. et al. A cellular hierarchy framework for understanding heterogeneity and predicting drug response in acute myeloid leukemia. Nat. Med. 28, 1212–1223 (2022).

    Article  CAS  PubMed  Google Scholar 

  10. van Galen, P. et al. Single-cell RNA-seq reveals aml hierarchies relevant to disease progression and immunity. Cell 176, 1265–1281 e1224 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Tirtakusuma, R. et al. Epigenetic regulator genes direct lineage switching in MLL/AF4 leukemia. Blood 140, 1875–1890 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Novakova, M. et al. DUX4r, ZNF384r and PAX5-P80R mutated B-cell precursor acute lymphoblastic leukemia frequently undergo monocytic switch. Haematologica 106, 2066–2075 (2021).

    Article  CAS  PubMed  Google Scholar 

  13. Iacobucci, I. & Mullighan, C. G. KMT2A-rearranged leukemia: the shapeshifter. Blood 140, 1833–1835 (2022).

    Article  CAS  PubMed  Google Scholar 

  14. Kim, J. C. et al. Transcriptomic classes of BCR-ABL1 lymphoblastic leukemia. Nat. Genet. 55, 1186–1197 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Caron, M. et al. Single-cell analysis of childhood leukemia reveals a link between developmental states and ribosomal protein expression as a source of intra-individual heterogeneity. Sci. Rep. 10, 8079 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Turati, V. A. et al. Chemotherapy induces canalization of cell state in childhood B-cell precursor acute lymphoblastic leukemia. Nat. Cancer 2, 835–852 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Wu, L. et al. Single-cell transcriptome analysis identifies ligand-receptor pairs associated with BCP-ALL prognosis. Front. Oncol. 11, 639013 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Khabirova, E. et al. Single-cell transcriptomics reveals a distinct developmental state of KMT2A-rearranged infant B-cell acute lymphoblastic leukemia. Nat. Med. 28, 743–751 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. O’Byrne, S. et al. Discovery of a CD10-negative B-progenitor in human fetal life identifies unique ontogeny-related developmental programs. Blood 134, 1059–1071 (2019).

    Article  PubMed  Google Scholar 

  20. Jackson, T. R., Ling, R. E. & Roy, A. The origin of B-cells: human fetal B cell development and implications for the pathogenesis of childhood acute lymphoblastic leukemia. Front. Immunol. 12, 637975 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Patel, A. P. et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344, 1396–1401 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Gu, Z. et al. PAX5-driven subtypes of B-progenitor acute lymphoblastic leukemia. Nat. Genet. 51, 296–307 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Kimura, S. et al. Enhancer retargeting of CDX2 and UBTF::ATXN7L3 define a subtype of high-risk B-progenitor acute lymphoblastic leukemia. Blood 139, 3519–3531 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Puram, S. V. et al. Single-cell transcriptomic analysis of primary and metastatic tumor ecosystems in head and neck cancer. Cell 171, 1611–1624 e1624 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Hovestadt, V. et al. Resolving medulloblastoma cellular architecture by single-cell genomics. Nature 572, 74–79 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Kotliar, D. et al. Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-seq. eLife 8, e43803 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Lee, R. D. et al. Single-cell analysis identifies dynamic gene expression networks that govern B cell development and transformation. Nat. Commun. 12, 6843 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Oetjen, K. A. et al. Human bone marrow assessment by single-cell RNA sequencing, mass cytometry, and flow cytometry. JCI Insight 3, e124928 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Ainciburu, M. et al. Uncovering perturbations in human hematopoiesis associated with healthy aging and myeloid malignancies at single-cell resolution. eLife 12, e79363 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Setty, M. et al. Characterization of cell fate probabilities in single-cell data with Palantir. Nat. Biotechnol. 37, 451–460 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Popescu, D. M. et al. Decoding human fetal liver haematopoiesis. Nature 574, 365–371 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Jardine, L. et al. Blood and immune development in human fetal bone marrow and Down syndrome. Nature 598, 327–331 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Roy, A. et al. Transitions in lineage specification and gene regulatory networks in hematopoietic stem/progenitor cells over human development. Cell Rep. 36, 109698 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Li, B. et al. HCA Data Portal: census of immune cells (Human Cell Atlas, 2019); https://explore.data.humancellatlas.org/projects/cc95ff89-2e68-4a08-a234-480eca21ce79

  35. Triana, S. et al. Single-cell proteo-genomic reference maps of the hematopoietic system enable the purification and massive profiling of precisely defined cell states. Nat. Immunol. 22, 1577–1589 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Kondo, M., Weissman, I. L. & Akashi, K. Identification of clonogenic common lymphoid progenitors in mouse bone marrow. Cell 91, 661–672 (1997).

    Article  CAS  PubMed  Google Scholar 

  37. Hao, Q. L. et al. Identification of a novel, human multilymphoid progenitor in cord blood. Blood 97, 3683–3690 (2001).

    Article  CAS  PubMed  Google Scholar 

  38. Van de Sande, B. et al. A scalable SCENIC workflow for single-cell gene regulatory network analysis. Nat. Protoc. 15, 2247–2276 (2020).

    Article  PubMed  Google Scholar 

  39. Doulatov, S. et al. Revised map of the human progenitor hierarchy shows the origin of macrophages and dendritic cells in early lymphoid development. Nat. Immunol. 11, 585–593 (2010).

    Article  CAS  PubMed  Google Scholar 

  40. Avellino, R. et al. An autonomous CEBPA enhancer specific for myeloid-lineage priming and neutrophilic differentiation. Blood 127, 2991–3003 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Barnes, J. L. et al. Early human lung immune cell development and its role in epithelial cell fate. Sci. Immunol. 8, eadf9988 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Alexander, T. B. et al. The genetic basis and cell of origin of mixed phenotype acute leukaemia. Nature 562, 373–379 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Dickerson, K. M. et al. ZNF384 fusion oncoproteins drive lineage aberrancy in acute leukemia. Blood Cancer Discov. 3, 240–263 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Mehtonen, J. et al. Single cell characterization of B-lymphoid differentiation and leukemic cell states during chemotherapy in ETV6-RUNX1-positive pediatric leukemia identifies drug-targetable transcription factor activities. Genome Med. 12, 99 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Newman, A. M. et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol. 37, 773–782 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Chu, T., Wang, Z., Pe’er, D. & Danko, C. G. Cell type and gene expression deconvolution with BayesPrism enables Bayesian integrative analysis across bulk and single-cell RNA sequencing in oncology. Nat. Cancer 3, 505–517 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Meyer, C. et al. The KMT2A recombinome of acute leukemias in 2023. Leukemia 37, 988–1005 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Montefiori, L. E. et al. Enhancer hijacking drives oncogenic BCL11B expression in lineage-ambiguous stem cell leukemia. Cancer Discov. 11, 2846–2867 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Weinberg, O. K. et al. The International Consensus Classification of acute leukemias of ambiguous lineage. Blood 141, 2275–2277 (2023).

    CAS  PubMed  Google Scholar 

  50. Lee, S. H. R. et al. Pharmacotypes across the genomic landscape of pediatric acute lymphoblastic leukemia and impact on treatment response. Nat. Med. 29, 170–179 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Witkowski, M. T. et al. Extensive remodeling of the immune microenvironment in B cell acute lymphoblastic leukemia. Cancer Cell 37, 867–882 e812 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Bastian, L. et al. Developmental trajectories and cooperating genomic events define molecular subtypes of BCR::ABL1-positive ALL. Blood 143, 1391–1398 (2024).

    Article  CAS  PubMed  Google Scholar 

  53. Pui, C. H. et al. Treating childhood acute lymphoblastic leukemia without cranial irradiation. N. Engl. J. Med. 360, 2730–2741 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Jeha, S. et al. Improved CNS control of childhood acute lymphoblastic leukemia without cranial irradiation: St Jude total therapy study 16. J. Clin. Oncol. 37, 3377–3391 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Inaba, H., Azzato, E. M. & Mullighan, C. G. Integration of next-generation sequencing to treat acute lymphoblastic leukemia with targetable lesions: the St. Jude Children’s Research Hospital approach. Front. Pediatr. 5, 258 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  56. Gao, Q. et al. The genomic landscape of acute lymphoblastic leukemia with intrachromosomal amplification of chromosome 21. Blood 142, 711–723 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587 e3529 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. McGinnis, C. S., Murrow, L. M. & Gartner, Z. J. DoubletFinder: doublet detection in single-cell RNA sequencing data using artificial nearest neighbors. Cell. Syst. 8, 329–337 e324 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Young, M. D. & Behjati, S. SoupX removes ambient RNA contamination from droplet-based single-cell RNA sequencing data. Gigascience 9, giaa151 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  60. Notta, F. et al. Distinct routes of lineage development reshape the human blood hierarchy across ontogeny. Science 351, aab2116 (2016).

    Article  PubMed  Google Scholar 

  61. Wagenblast, E. et al. Functional profiling of single CRISPR/Cas9-edited human long-term hematopoietic stem cells. Nat. Commun. 10, 4730 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Karamitros, D. et al. Single-cell analysis reveals the continuum of human lympho-myeloid progenitor cells. Nat. Immunol. 19, 85–97 (2018).

    Article  CAS  PubMed  Google Scholar 

  63. Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Zeng, A. G. X. et al. Single-cell Transcriptional Atlas of Human Hematopoiesis Reveals Genetic and Hierarchy-Based Determinants of Aberrant AML Differentiation. Blood Cancer Discov. OF1–OF18 (2025).

  65. Granja, J. M. et al. Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nat. Biotechnol. 37, 1458–1465 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Mende, N. et al. Unique molecular and functional features of extramedullary hematopoietic stem and progenitor cell reservoirs in humans. Blood 139, 3387–3401 (2022).

    Article  CAS  PubMed  Google Scholar 

  67. Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  68. Bueno, C. et al. CD34+CD19CD22+ B-cell progenitors may underlie phenotypic escape in patients treated with CD19-directed therapies. Blood 140, 38–44 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Choudhary, S. & Satija, R. Comparison and evaluation of statistical error models for scRNA-seq. Genome Biol. 23, 27 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Zeng, A. G. X. Developmental state quantification from bulk transcriptomes in B-ALL. Protocols.io https://doi.org/10.17504/protocols.io.dm6gpde9jgzp/v1 (2025).

  71. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This study was supported by the American Lebanese Syrian Associated Charities of St. Jude Children’s Research Hospital; the Alex’s Lemonade Stand Foundation for Childhood Cancer (C.G.M.); National Cancer Institute grants P30 CA021765 (C.G.M.), R35 CA197695 (C.G.M.); U10 CA180888 (Southwest Oncology Group National Clinical Trials Network grant), U10 CA180820 and UG1 CA189859; St Baldrick’s Foundation Robert J. Arceci Innovation Award (C.G.M.); the Henry Schueler 41&9 Foundation (C.G.M.); University of Toronto MD/PhD studentship award (A.G.X.Z.); Princess Margaret Cancer Foundation (J.E.D.); Ontario Institute for Cancer Research through funding provided by the Government of Ontario (J.E.D.); Canadian Institutes for Health Research RN380110-409786 (J.E.D.); International Development Research Centre Ottawa Canada (J.E.D.); Canadian Cancer Society 703212 (J.E.D.); Terry Fox New Frontiers Program project grant 1106 (J.E.D.); University of Toronto’s Medicine by Design initiative with funding from the Canada First Research Excellence Fund (J.E.D.); The Ontario Ministry of Health (J.E.D.); Canada Research Chair (J.E.D.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. We thank K. Itoh (Department of Biology, Faculty of Science, Niigata University, Japan) for providing MS-5 stromal cells.

Author information

Authors and Affiliations

Authors

Contributions

I.I. contributed to study design, led the study, performed experiments, analyzed data and wrote the manuscript; A.G.X.Z. contributed to study design, developed B cell developmental maps, analyzed data and wrote the manuscript; Q.G. analyzed data and wrote the manuscript; L.G.P. performed functional experiments and wrote the manuscript; P.B. contributed to perform single-cell experiments; S.S. analyzed data; A.M. analyzed data; V.V. performed and analyzed data; M.C.S. performed and analyzed data; C.C. reviewed statistical analysis; C.Q. performed data analysis; C.B. contributed to sample collection; M.L. provided clinical data; M.T.W. contributed to data analysis; X.Z. data visualization; A.Z.P. data visualization; K.G. data visualization; A.S.A. provided clinical data; S.M.L. provided clinical data; M.R.L. provided clinical data; J.M.R. provided clinical data; E.M.P. provided clinical data; W.S. provided clinical data; J.E.D. contributed to study design, data analysis and revised the manuscript for final approval; and C.G.M. conceived, designed and led the study and wrote the manuscript. All authors contributed to manuscript writing and gave final approval.

Corresponding authors

Correspondence to John E. Dick or Charles G. Mullighan.

Ethics declarations

Competing interests

I.I. reports consultation honoraria from Arima and travel expenses reimbursed by Mission Bio and Takara for invited talks. C.G.M. received research funding from AbbVie and Pfizer, honoraria from Amgen and Illumina, and royalty payments from Cyrus. He is on an advisory board for Illumina. J.E.D. received research funding from BMS/Celgene and intellectual property licenses from Pfizer/Trillium Therapeutics. A.S.A. reported advisory board membership for Nkarta, Pfizer, Novartis and Jazz Pharmaceuticals; honoraria from KSA, PER, MD Education, ALF Medscape/Global Activity, Web med, Onc live AML Geronimo; steering committee membership for Glycomimetics; and royalty payments from Springer. The other authors declare no competing interests.

Peer review

Peer review information

Nature Cancer thanks Nora Heisterkamp and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Genetic heterogeneity elucidated by inferCNV.

a) Copy number alterations from whole genome sequencing or SNP array in B-ALL samples analyzed by scRNA-seq. Copy number changes are depicted in red for copy number gains and in blue for copy number losses. Only samples with genomic data available are included. b) PCA plots of pairwise view of the top three PCs for cells (n = 489,019) from the 89 single-cell RNA-seq samples colored by cell types. c,d) Bar plots showing inferCNVs in near haploid (c) and hyperdiploid (d) B-ALL. InferCNV heatmap (e) of chromosome gains and UMAP (f) of SJBALL030285_D1. g,h) InferCNV heatmap from SJALL040066 (g) and SJALL040070 (h) with TCF3::PBX1 derived from the unbalanced translocation der(19)t(1;19)(q23;p13.3). SJALL040066 (g) has three CNV clusters sharing gain from PBX1 to 1q telomere, and chr7p loss/7q gain with additional alterations undetectable by bulk sequencing: chr9p loss and chr22 gain in clone 2 (C2); chr5 gain and chr8 gain in clone 3 (C3). SJALL040070 (f) has one CNV clone with gain of 1q region from PBX1 to telomere. i) InferCNV heatmap from SJE2A067 with TCF3::PBX1 fusion derived from balanced translocation t(1;19)(q23;p13.3), showing 3 clusters having distinct gene expression profiles: clone 2 (C2) is the founder clone; clone 1 (C1) is characterized by a gain of a region near the PBX1 gene (chr1: 145002063-206421902); clone 3 (C3) shows loss of chr13. A subset of cells in each sample in e, g, h and i show increased expression of genes on chr6p reflecting cell-cycle associated expression of histone genes rather than copy number variation. j) UMAP representation of cells from SJE2A067 colored by cell type (upper panel), inferCNV group (middle panel) and detection of TCF3::PBX1 fusion (lower panel). k) Scatterplot from scWGS in SJE2A067 showing copy numbers of chr1p, 1q regions upstream and downstream of PBX1, chr11p, chr13q and chr19p. Dotted line indicates diploid DNA content. This analysis revealed multiple distinct clones demonstrating the complex patterns of genetic evolution in samples with TCF3::PBX1. l) Screenshot from UCSC Genome Brower showing genomic coordinates (hg38) of alterations on chr1 in SJE2A063 with amplification of regions both upstream and downstream of PBX1 as derived from two independent events and in SJE2A067 with a single alteration encompassing the PBX1 gene. m) Heatmap showing the pairwise correlations of the sample-level metaprograms using consensus non-negative matrix factorization (cNMF). Each panel includes samples from the same subtype. Clustering identified five coherent malignant gene expression signatures, including A - Cell Cycle (S), B - Cell Cycle (G2/M), C - Metabolism, D - Differentiation, and E - Inflammation. n) Distribution of the number of subtypes expressing the top 30 signature genes in the five metaprograms of different subtypes indicating that most of the signature genes are shared for A - Cell Cycle (S), B - Cell Cycle (G2/M), C - Metabolism, but they are subtype-specific for D - Cell Differentiation, and E - Inflammation.

Source data

Extended Data Fig. 2 Transcriptional and functional characterization of human B cell development.

a-c) Centered log ratio (CLR)-normalized protein levels of key cell surface markers across B cell development from combined scRNA-seq and protein expression (AbSeq) data. a) 5,102 adult bone marrow cells profiled through BD Rhapsody from Triana et al.35. These cell states were classified through reference map projection onto the B cell development atlas. b) 4,189 fetal liver cells and c) 7,782 fetal bone marrow cells profiled through CITE-seq from Jardine et al.32. d,e) Differential expression (DE) results between pre-natal and post-natal samples across human B cell development. Single cells belonging to each cell state from each donor from our B cell development atlas were pooled into pseudobulk profiles prior to DE analysis using DESeq2 with P values determined by a Wald test and subject to Benjamini-Hochberg correction to obtain FDR values. d) The number of DE genes at FDR < 0.01 specific to either pre-natal or post-natal samples is shown. e) DE results in CLPs from pre-natal compared to post-natal cells. Genes that are significantly DE at log2 fold change > 1 and FDR < 0.01 are depicted in red. MME, which encodes CD10, is significantly upregulated within post-natal CLPs and shown within a box for emphasis. f,g) cNMF signature analysis in normal B-lymphoid development including the S and G2/M-phases of cell cycle (f) and the early myeloid, plasmacytoid DC, and mature B programs (g). h) CD7 and IL7R expression by cell state across the B cell development atlas. i) Analysis of CD7 protein expression by flow cytometry in the indicated cell populations (for each population n = 2). The bar plots represent mean values. j) Analysis of IL7R protein expression by flow cytometry in the indicated cell populations. k) RAG1/RAG2 gene expression in B cell developmental cell types from bone marrow, fetal bone marrow and fetal liver. Single-cell transcriptomes (n = 90 donors) were pooled together based on tissue source, study, and sequencing into pseudobulk profiles for visualization of vst-normalized RAG1 and RAG2 expression patterns. Box plots indicate the range of the central 50% of the data, with the central line marking the median. Whiskers extend from each box to 1.5x the interquartile range. The y axis shows variance stabilizing transformation (vst) normalized gene expression levels for RAG1 and RAG2 within each population.

Source data

Extended Data Fig. 3 Functional modeling of normal flow-sorted populations.

a) Two hundred cells from the indicated populations were sorted and cultured on MS-5 stroma cells for 3, 7 or 14 days (see Fig. 3g for 7 and 14 days) in Ly-My or LyP promoting media and resulting colonies analyzed by flow cytometry. The proportion of each population is represented in the pie charts. b) Representative FACS plots from individual wells summarized in (a) at day 14 of culture. c) Total number of cells from (a) at day 14 of culture. N = 2 cord blood pools for all groups (a, b, c). The dots in the plots represent technical replicates (Ly-My: LMPP, n = 4; MLP, n = 4; CLP, n = 4; pre-pro-B, n = 3; pro-B, n = 8. LyP: LMPP, n = 4; MLP, n = 4; CLP, n = 4; pre-pro-B, n = 3; pro-B, n = 3). d) Engraftment analysis of indicated cell populations 2 weeks after intra-femoral injection into NSG mice. Injected cells were (number/mouse): LMPP: 3,000; MLP: 4,000; CLP: 6,000; pre-pro-B: 6,000; pro-B: 10,000. The percentage of human CD45 positive (CD45 + ) cells and relative engraftment normalized by the number of cells injected in the right femur (injected bone) are shown. The dots in the plots represent different injected mice from 3 human cord blood donors (n = 23 for LMPP and MLP; n = 14 for CLP; n = 3 for pre-pro-B; n = 9 for pro-B). Two-tailed P values are from unpaired t test. e) Engraftment levels in the left femur. The dots in the plot represent different injected mice from 3 human cord blood donors (n = 13 for LMPP and MLP; n = 9 for CLP; n = 3 for pre-pro-B; n = 9 for pro-B). Two-tailed P values are from unpaired t test. f) Percentage of CD33 + , CD19 + , CD56+ and CD1a+ cells in the right femur from panel (d). The dots in the plots represent different injected mice from 3 human cord blood donors (n = 23 for LMPP and MLP; n = 14 for CLP; n = 3 for pre-pro-B; n = 9 for pro-B). g) Human CB (n = 5), adult bone marrow (BM)(n = 8) and mobilized peripheral blood (MPB)(n = 7) were stained and analyzed by flow cytometry with the indicated cell surface markers. The percentage of CD33+ and CD33- cells in each population is represented. Each histogram bar represents an individual donor. h) Single cells from indicated populations were sorted into MS-5 stroma cells and cultured for 16-17 days with Ly-My or LyP media (Supplementary Table 14). Colonies were scored under the microscope and analyzed by flow cytometry for differentiation markers. The percentage of each colony type resulting from each starting population as labels on the x axis is shown. i) From (h) the clonogenic potential (wells with colonies/seeded wells multiplied by 100; bar plots represent the mean with SD) and the number of B, Myeloid (M) or NK cells/colony are shown for both media (LMPP CD33 + , n = 4 cord blood pools; MLP CD33 + , n = 4 cord blood pools; MLP CD33-, n = 3 cord blood pools; CLP CD33 + , n = 5 cord blood pools; CLP CD33-, n = 5 cord blood pools; pre-pro-B CD33 + , n = 2 cord blood pools; pre-pro-B CD33-, n = 3 cord blood pools; pro-B, n = 3 cord blood pools).

Source data

Extended Data Fig. 4 B-ALL developmental states within B-ALL samples.

a,b) Validation through projection of human fetal lung data. a) 9,491 single cell transcriptomes mapped to human B cell development from fetal lung tissue profiled in Barnes et al.41. Projected cells are colored by cell type labels assigned in the original publication. Reference single-cell transcriptomes from our B cell development atlas are depicted in gray. b) Density plots depicting projected single-cell transcriptomes split by each author assigned cell type within the query fetal lung scRNA-seq data. Darker shades of red depict higher density of query cells within a given region along the reference atlas. c) UMAP embedding based on cell state composition from 89 B-ALL samples. Color scheme depicts centered log ratio (CLR) normalized abundance of each mapped cell population for each sample. d) Consolidation of similar B-ALL cell states into broader developmental states. To identify B-ALL states with correlated abundance across patient samples, NMF was performed on the normalized abundance scores of all mapped cell states from B-ALL bone marrow samples and NMF modules were interpreted based on the weights assigned to the cell states that comprise each module. e) For NMF modules that correspond to a distinct stage along B-lymphoid development, termed ‘Developmental State’, feature weights are visualized in this heatmap by constituent cell type. f) Abundance of broad developmental states defined by NMF spanning early myeloid progenitors and plasmacytoid DC cells. Color scheme denotes scaled abundance (CLR) of each indicated population. g) Remaining genomic subtypes (DUX4-R, BCR::ABL1-like (CRLF2-R), BCR::ABL1-like (non-CRLF2-R), iAMP21, PAX5alt and near haploid) are overlaid based on the cell composition of their constituent patient samples. h,i) Developmental state in ETV6::RUNX1 B-ALL. h) Cell state composition-based clustering of 8 ETV6::RUNX1 B-ALL samples from Mehtonen et al.44 and 7 ETV6::RUNX1-like B-ALL samples from this study. Each column represents a scRNA-seq sample. i) Projection results on the B cell development map of ETV6::RUNX1 B-ALL samples from Mehtonen et al.44.

Extended Data Fig. 5 Benchmarking gene expression deconvolution approaches in B-ALL.

a) Overview of benchmarking inference of developmental state abundance in B-ALL across various gene expression deconvolution approaches. This spans the stages of 1) feature selection and signature matrix generation, 2) application of gene expression deconvolution algorithm, and 3) benchmarking deconvolution accuracy using 85 patient samples with matched scRNA-seq and bulk RNA-seq. For the feature selection step, patient/donor ID was controlled for as a covariate in order to emphasize intra-donor heterogeneity in identifying transcriptional markers of normal and leukemic cell states along B-cell development. b,c) Association between predicted abundance (bulk RNA-seq deconvolution) and observed abundance (scRNA-seq) of B-ALL developmental states across 85 matched patient samples, shown for each deconvolution method and split by signature matrix. Associations shown include the Pearson correlation (b) as well as the coefficient of determination denoted as R-squared (c). Lines between points in the boxplots link the performance for each B-ALL developmental state (HSC/MPP, myeloid progenitor, pre-pDC, early lymphoid, pro-B, pre-B, mature B) across quantification methods. P values denote results from two-tailed paired t-tests comparing our LASSO regression approach with CIBERSORTx deconvolution. Box plots indicate the range of the central 50% of the data, with the central line marking the median. Whiskers extend from each box to 1.5x the interquartile range. d) Scatterplots of predicted abundance (bulk RNA-seq) and observed abundance (scRNA-seq) for each quantification method, with each dot representing one of the 85 patient samples with matched scRNA-seq and bulk RNA-seq. For each method, the B-ALL biologically relevant genes signature matrix was used. For each association, the linear regression line shaded with the 95% CI, as well as r and P values from Pearson correlation, are shown.

Source data

Extended Data Fig. 6 Quantification of B-ALL developmental states and multipotency score.

a) Development of a 99-gene regression model to quantify PC1, hereafter named the ‘multipotency score’. Correlation results for 10-fold cross validation with 10 repeats are shown alongside correlation for the final model trained on all 2,046 samples. r and P values from Pearson correlation are shown. b) B-ALL multipotency score validation on donor-level pseudobulk profiles from normal B cell development atlas. Single-cell transcriptomes spanning n = 90 donors were pooled together based on tissue source, study, and sequencing technology into pseudobulk profiles for visualization of multipotency score enrichment. For the specific number of pseudobulk profiles as well as the number of cells for each cell state along B cell development, refer to Supplementary Table 11. Box plots indicate the range of the central 50% of the data, with the central line marking the median. Whiskers extend from each box to 1.5x the interquartile range. c) Ridge plot comparing the inferred abundance of HSC/MPP and pre-pDC states by genomic subtype across 2,046 B-ALL patient samples. d,e) Association plots between multipotency score or developmental state abundance and driver alterations (d) or gene fusions (e). The magnitude of each association, quantified as the –log10 (P value), is depicted through the size and color intensity of each dot. The direction of the association is depicted through the color, wherein higher abundance is green and lower abundance is purple. Only associations with an FDR-corrected P value < 0.05 are depicted. For driver alterations (d), abundances of samples with each alteration were compared to all other samples, and genomic subtype was adjusted for as a covariate. For gene fusions (e), abundances of samples with each fusion were compared to abundances from samples with no gene fusions. f-h) Differences in developmental state abundance (myeloid progenitors and pre-pDCs) between transcriptional subtypes of DUX4-R (total n = 112; DUX4-R early/multipotent = 70; DUX4-R committed  = 42). (f), KMT2A-R (total n = 144; KMT2A-R early/multipotent = 125; KMT2A-R committed = 19). (g) and BCR::ABL1 (total n = 127; BCR::ABL1 early/multipotent = 32; BCR::ABL1 Inter = 26; BCR::ABL1 Ccommitted = 69). (h). i) Multipotency score and developmental state abundance explains differences between Early-Pro (early/multipotent, n = 24), Inter-Pro (n = 8), and Late-Pro (committed, n = 22) transcriptional subgroups of BCR::ABL1 (n = 54) from Kim et al.14. For each comparison, P values from a two-tailed Wilcoxon rank-sum test are reported. Box plots indicate the range of the central 50% of the data, with the central line marking the median. Whiskers extend from each box to 1.5x the interquartile range.

Source data

Extended Data Fig. 7 B-ALL developmental states refine existing genomic subtypes.

a-c) Association between genetic alterations and multipotency score or inferred abundance of B-ALL developmental states from patients with DUX4-R (a), KMT2A-R (b) and BCR::ABL1 (c) B-ALL. The magnitude of each association, quantified as the –log10 (P value), is depicted through the size and color intensity of each dot. The direction of the association is depicted through the color, wherein higher abundance is green and lower abundance is purple. Only associations with an unadjusted P < 0.05 are depicted, with a star denoting at FDR < 0.05. Within each subtype, samples with each alteration were compared against all other samples. BCR::ABL1 transcript type (p190, p210) is not shown in (c) due to a lack of significant associations at P < 0.05. d) Violin plot of the AUCell enrichment scores for the four indicated DUX4-R subtype-level cNMF metaprograms. e) GSEA on the genes from (d) ranked by their contribution to Differentiation and Inflammation programs in DUX4-R transcriptional subgroups. f) Violin plots of gene expression pattern for twelve selected genes in 10 DUX4-R patients. g) Scatter dot plots of CD44 (left) and CD10 (right) protein expression by flow cytometric analysis in DUX4-R early/multipotent (n = 6) and DUX4-R Committed (n = 4) B-ALL samples. Two-tailed P value is from unpaired t test. Each point in the scatter plot represents individual samples. The bar plot represents the mean with error bars indicating the standard deviation. h) KMT2A gene fusion partners in Early/Multipotent and Committed KMT2A-R subgroups. i) Association between KMT2A fusion partner and multipotency score or developmental state abundance. KMT2A fusion partners (AFF1, n = 93; EPS15, n = 4; MLLT1, n = 19; MLLT3, n = 11; MLLT10, n = 10) harbored by patients within each KMT2A-R subgroup are shown alongside the inferred abundance of early lymphoid, myeloid progenitor and pre-B states. For each comparison, P values from a two-tailed Wilcoxon rank-sum test are reported. Box plots indicate the range of the central 50% of the data, with the central line marking the median. Whiskers extend from each box to 1.5x the interquartile range. j) Oncoprint showing co-lesions in pediatric BCR::ABL1 patients for which both SNVs/indels and bulk RNA-seq data were available (BCR::ABL1 early/multipotent, n = 9; BCR::ABL1 inter-pro, n = 6; BCR::ABL1 committed, n = 28. Only significant P values from Fisher's exact test of early/multipotent versus committed are shown.

Source data

Extended Data Fig. 8 Abundance of cell types in B-ALL and MPAL.

Comparison of B cell Developmental State abundance between uniformly processed MPAL (n = 66) and B-ALL (n = 1,153) patient samples. Abundance of HSC/MPP/LMPP, early lymphoid, myeloid progenitor, pre-pDC, pro-B and pre-B in early/multipotent (early/multip.) and committed subgroups of BCR::ABL1 (n = 50 and n = 57, respectively), KMT2A-R (n = 69 and n = 10, respectively) and ZNF384-R (n = 53) with B-ALL and in ‘Other B-ALL’ (n = 793, non-BCR::ABL1, non-KMT2A-R and non-ZNF384-R), and in MPAL patients with same genetic drivers: BCR::ABL1 (n = 4), KMT2A-R (n = 13) and ZNF384-R (n = 13) or ‘Other’ (n = 36). For each comparison, P values from a two-tailed Wilcoxon rank-sum test are reported. Box plots indicate the range of the central 50% of the data, with the central line marking the median. Whiskers extend from each box to 1.5x the interquartile range.

Source data

Extended Data Fig. 9 B-ALL developmental states are associated with clinical outcomes.

a-d) Association between age at diagnosis and inferred abundance of HSC/MPP (a), myeloid progenitor (b), pre-pDC (c) and pre-B developmental states (d). For each association, the LOESS regression line shaded with the 95% CI is shown for 2,019 B-ALL with available age information. e-h) Association between developmental state abundance and clinical characteristics in B-ALL. For each comparison, P values from a two-tailed Wilcoxon rank-sum test are reported. Box plots indicate the range of the central 50% of the data, with the central line marking the median. Whiskers extend from each box to 1.5x the interquartile range. e) Association between B-ALL clinical risk groups and inferred abundance of HSC/MPP, myeloid progenitor, and pre-pDC developmental states among 2,022 B-ALL patients annotated for clinical risk (childhood high risk, HR, n = 680; childhood standard risk, SR, n = 527; AYA, n = 430; adult, n = 385). f,g) Association between measurable residual disease (MRD) status at day 29 of induction and inferred abundance of indicated developmental states (f) as well as of early lymphoid, pro-B and pre-B developmental states (g) at diagnosis (n = 1,197 pediatric patients with MRD status (Negative < 0.01%, n = 787; Positive > 0.01%, n = 410). h) Association between MRD levels at day 29 and inferred developmental state abundance at diagnosis (n = 794 pediatric B-ALL patients with MRD levels available: < 0.01%, n = 448; 0.01–0.1%, n = 148; 0.1–1%, n = 104; 1–10%, n = 54). i-k) Association of B-ALL developmental states with survival outcomes across different subsets of B-ALL patients. Hazard ratios from univariate cox regression, or multivariable cox regression when indicated, related to OS and EFS are reported for each standard deviation increase in inferred abundance. Error bars depict the 95% confidence interval for each variable of interest, presented as the hazard ratio ±− 1.96 standard errors. i,j) For each developmental state and multipotency score, association with survival outcomes is performed within 1,039 pediatric B-ALL patients (i) and within 649 pediatric B-ALL patients with negative MRD (j). COG, Children's Oncology Group. k) Association of B-ALL developmental states with survival outcomes within genetically diverse adult B-ALL patients. ECOG-ACRIN, Eastern Cooperative Oncology Group-American College of Radiology Imaging Network Cancer Research Group. Hazard ratios for overall survival and event-free survival from univariate Cox regression (n = 324 for each developmental state/multipotency score, left panel) or adjusted hazard ratios (n = 312 for each developmental state/multipotency score, right panel) from multivariable Cox regression accounting for age, sex, WBC, genomic and clinical risk group as independent covariates are reported for each standard deviation increase in inferred abundance.

Source data

Extended Data Fig. 10 Multipotency score and outcome in B-ALL.

a-f) OS and EFS Kaplan–Meier plots evaluating the multipotency score within genomic risk group categories (a,b), clinical risk groups (c,d) and MRD categories (e,f). Patients were assigned to high (red) and low (blue) multipotency score categories based on a median split. Hazard ratios and 95% confidence intervals for high vs low multipotency score are shown within each category. P values were derived from a Wald test. Sample sizes for genomic risk: Favorable (Low = 283, High = 258), Intermediate (Low = 156, High = 117), Unfavorable (Low = 66, High = 108), Unclassified (Low = 15, High = 36). Sample sizes for clinical risk: Childhood Standard Risk (Low = 230, High = 122), Childhood High Risk (Low = 250, High = 273), Adolescent/Young Adult (Low = 31, High = 109). Sample sizes for MRD: Negative (Low = 373, High = 277), Positive (Low = 90, High = 175). g,h) Association of B-ALL developmental states and of multipotency score with OS and EFS outcomes within 47 pediatric BCR::ABL1 patients (g) and 41 adult BCR::ABL1 patients from Kim et al.14 (h). For each developmental state (n = 47 in panel g, n = 41 in panel h). Hazard ratios from multivariable Cox regression, adjusting for age, sex, WBC, clinical risk group and genomic subtype, with overall survival and event-free survival are reported for each standard deviation increase in inferred abundance. Error bars depict the 95% confidence interval of the hazard ratio for each variable of interest.

Supplementary information

Supplementary Information

Supplementary Figs. 1–4.

Reporting Summary

Software policy.

Supplementary Tables

Supplementary Tables 1–26.

Source data

Source Data Fig. 1

Source data for Fig. 1b.

Source Data Fig. 2

Source data for Fig. 2a,f,i.

Source Data Fig. 3

Source data for Fig. 3g.

Source Data Fig. 4

Source data for Fig. 4d.

Source Data Fig. 5

Source data for Fig. 5b,e–j.

Source Data Fig. 6

Source data for Fig. 6a,c,e,g–i.

Source Data Fig. 7

Source data for Fig. 7a–e.

Source Data Fig. 8

Source data for Fig. 8a–d.

Source Data Extended Data Fig. 1

Source data for Extended Data Fig. 1c,d,k,m.

Source Data Extended Data Fig. 2

Source data for Extended Data Fig. 2e,i,k.

Source Data Extended Data Fig. 3

Source data for Extended Data Fig. 3e–i.

Source Data Extended Data Fig. 5

Source data for Extended Data Fig. 5b–d.

Source Data Extended Data Fig. 6

Source data for Extended Data Fig. 6a–i.

Source Data Extended Data Fig. 7

Source data for Extended Data Fig. 7a,–c,g,i.

Source Data Extended Data Fig. 8

Source data for Extended Data Fig. 8.

Source Data Extended Data Fig. 9

Source data for Extended Data Fig. 9a–i,k.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Iacobucci, I., Zeng, A.G.X., Gao, Q. et al. Multipotent lineage potential in B cell acute lymphoblastic leukemia is associated with distinct cellular origins and clinical features. Nat Cancer (2025). https://doi.org/10.1038/s43018-025-00987-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s43018-025-00987-2

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer