Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A likelihood-based framework for demographic inference from genealogical trees

An Author Correction to this article was published on 17 June 2025

This article has been updated

Abstract

The demographic history of a population underlies patterns of genetic variation and is encoded in the gene-genealogical trees of the sampled haplotypes. Here we propose a demographic inference framework called the genealogical likelihood (gLike). Our method uses a graph-based structure to summarize the relationships among all lineages in a gene-genealogical tree with all possible trajectories of population memberships through time and derives the full likelihood across trees under a parameterized demographic model. We show through simulations and empirical applications that for populations that have experienced multiple admixtures, gLike can accurately estimate dozens of demographic parameters, including ancestral population sizes, admixture timing and admixture proportions, and it outperforms conventional demographic inference methods using the site frequency spectrum. Taken together, our proposed gLike framework harnesses underused genealogical information to offer high sensitivity and accuracy in inferring complex demographies for humans and other species.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: A schematic of the major steps of the gLike algorithm with examples.
Fig. 2: gLike accurately reconstructs three-way admixture without ancestral population samples.
Fig. 3: gLike distinguishes three-way admixture from two-way admixture.
Fig. 4: gLike reconstructs the American admixture demography.
Fig. 5: gLike reconstructs the ancient Europe demography.
Fig. 6: Parameter estimations for the demographic histories of Latinos and Native Hawaiians.

Similar content being viewed by others

Data availability

The individual-level genetic data for Native Hawaiian and Latino datasets were derived from the Multiethnic Cohort and are available on dbGaP (accession numbers phs000220.v2.p2 and phs002183.v1.p1).

Code availability

The gLike package is available on its GitHub page (https://github.com/Ephraim-usc/glike). The version of gLike as well as codes used for simulation and plotting presented in this study can also be found on Zenodo (https://doi.org/10.5281/zenodo.14708630)52.

Change history

References

  1. Schlebusch, C. M. & Jakobsson, M. Tales of human migration, admixture, and selection in Africa. Annu. Rev. Genom. Hum. Genet. 19, 405–428 (2018).

    Article  CAS  Google Scholar 

  2. Chiang, C. W. K. et al. Genomic history of the Sardinian population. Nat. Genet. 50, 1426–1434 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Micheletti, S. J. et al. Genetic consequences of the transatlantic slave trade in the Americas. Am. J. Hum. Genet. 107, 265–277 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Peter, B. M. Admixture, population structure, and F-statistics. Genetics 202, 1485–1501 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Lipson, M. Applying f4‐statistics and admixture graphs: theory and examples. Mol. Ecol. Resour. 20, 1658–1667 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Lohmueller, K. E. The distribution of deleterious genetic variation in human populations. Curr. Opin. Genet. Dev. 29, 139–146 (2014).

    Article  CAS  PubMed  Google Scholar 

  8. Keinan, A. & Clark, A. G. Recent explosive human population growth has resulted in an excess of rare genetic variants. Science 336, 740–743 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Wang, S. R. et al. Simulation of Finnish population history, guided by empirical genetic data, to assess power of rare-variant tests in Finland. Am. J. Hum. Genet. 94, 710–720 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Medina-Muñoz, S. G. et al. Demographic modeling of admixed Latin American populations from whole genomes. Am. J. Hum. Genet. 110, 1804–1816 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Uricchio, L. H., Zaitlen, N. A., Ye, C. J., Witte, J. S. & Hernandez, R. D. Selection and explosive growth alter genetic architecture and hamper the detection of causal rare variants. Genome Res. 26, 863–873 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Li, H. & Durbin, R. Inference of human population history from individual whole-genome sequences. Nature 475, 493–496 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Sheehan, S., Harris, K. & Song, Y. S. Estimating variable effective population sizes from multiple genomes: a sequentially Markov conditional sampling distribution approach. Genetics 194, 647–662 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Schiffels, S. & Durbin, R. Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46, 919–925 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Terhorst, J., Kamm, J. A. & Song, Y. S. Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat. Genet. 49, 303–309 (2017).

    Article  CAS  PubMed  Google Scholar 

  16. Palamara, P. F., Lencz, T., Darvasi, A. & Pe’er, I. Length distributions of identity by descent reveal fine-scale demographic history. Am. J. Hum. Genet. 91, 809–822 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Browning, S. R. et al. Ancestry-specific recent effective population size in the Americas. PLoS Genet. 14, e1007385 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Gutenkunst, R. N., Hernandez, R. D., Williamson, S. H. & Bustamante, C. D. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Bhaskar, A., Wang, Y. X. R. & Song, Y. S. Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data. Genome Res. 25, 268–279 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Kamm, J., Terhorst, J., Durbin, R. & Song, Y. S. Efficiently inferring the demographic history of many populations with allele count data. J. Am. Stat. Assoc. 115, 1472–1487 (2020).

    Article  CAS  PubMed  Google Scholar 

  21. Excoffier, L. et al. fastsimcoal2: demographic inference under complex evolutionary scenarios. Bioinformatics 37, 4882–4885 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Excoffier, L., Dupanloup, I., Huerta-Sánchez, E., Sousa, V. C. & Foll, M. Robust demographic Inference from genomic and SNP data. PLoS Genet. 9, e1003905 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Liu, X. & Fu, Y.-X. Stairway Plot 2: demographic history inference with folded SNP frequency spectra. Genome Biol 21, 280 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  24. McVean, G. A genealogical interpretation of principal components analysis. PLoS Genet. 5, e1000686 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Opgen-Rhein, R., Fahrmeir, L. & Strimmer, K. Inference of demographic history from genealogical trees using reversible jump Markov chain Monte Carlo. BMC Evol. Biol. 5, 6 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Fan, C., Mancuso, N. & Chiang, C. W. K. A genealogical estimate of genetic relationships. Am. J. Hum. Genet. 109, 812–824 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Hudson, R. R. Gene genealogies and the coalescent process. In Oxford Surveys in Evolutionary Biology Vol. 7 (eds Futuyma, D. & Antonovics, J.) 1–44 (Oxford Univ. Press, 1990).

  28. Griffiths, R. C. & Marjoram, P. Ancestral inference from samples of DNA sequences with recombination. J. Comput. Biol. 3, 479–502 (1996).

    Article  CAS  PubMed  Google Scholar 

  29. Lewanski, A. L., Grundler, M. C. & Bradburd, G. S. The era of the ARG: an introduction to ancestral recombination graphs and their significance in empirical evolutionary genomics. PLoS Genet. 20, e1011110 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Brandt, Y. C. et al. Evaluation of methods for estimating coalescence times using ancestral recombination graphs. Genetics 221, iyac044 (2022).

    Article  Google Scholar 

  31. Peng, D., Mulder, O. J. & Edge, M. D. Evaluating ARG-estimation methods in the context of estimating population-mean polygenic score histories. Genetics https://doi.org/10.1093/genetics/iyaf033 (2025).

  32. Brandt, D. Y. C., Huber, C. D., Chiang, C. W. K. & Ortega-Del Vecchyo, D. The promise of inferring the past using the ancestral recombination graph. Genome Biol. Evol. 16, evae005 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Pearson, A. & Durbin, R. Local ancestry inference for complex population histories. Preprint at bioRxiv https://doi.org/10.1101/2023.03.06.529121 (2023).

  34. Wang, Z. et al. Automatic inference of demographic parameters using generative adversarial networks. Mol. Ecol. Resour. 21, 2689–2705 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Adrion, J. R. et al. A community-maintained standard library of population genetic models. eLife 9, e54967 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Sun, H. et al. The impact of global and local Polynesian genetic ancestry on complex traits in Native Hawaiians. PLoS Genet. 17, e1009273 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Jeon, S. et al. Genome-wide trans-ethnic meta-analysis identifies novel susceptibility loci for childhood acute lymphoblastic leukemia. Leukemia 36, 865–868 (2022).

    Article  CAS  PubMed  Google Scholar 

  38. Myers, S., Fefferman, C. & Patterson, N. Can one learn history from the allelic spectrum? Theor. Popul. Biol. 73, 342–348 (2008).

    Article  PubMed  Google Scholar 

  39. Kubatko, L. S., Carstens, B. C. & Knowles, L. L. STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25, 971–973 (2009).

    Article  CAS  PubMed  Google Scholar 

  40. Pickrell, J. K. & Pritchard, J. K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 8, e1002967 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Molloy, E. K., Durvasula, A. & Sankararaman, S. Advancing admixture graph estimation via maximum likelihood network orientation. Bioinformatics 37, i142–i150 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Nielsen, S. V. et al. Bayesian inference of admixture graphs on Native American and Arctic populations. PLoS Genet. 19, e1010410 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Deng, Y., Nielsen, R. & Song, Y. S. Robust and accurate Bayesian inference of genome-wide genealogies for large samples. Preprint at bioRxiv https://doi.org/10.1101/2024.03.16.585351 (2024).

  44. Kelleher, J., Etheridge, A. M. & McVean, G. Efficient coalescent simulation and genealogical analysis for large sample sizes. PLoS Comput. Biol. 12, e1004842 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Kelleher, J. et al. Inferring whole-genome histories in large population datasets. Nat. Genet. 51, 1330–1338 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Wohns, A. W. et al. A unified genealogy of modern and ancient genomes. Science https://doi.org/10.1126/science.abi8264 (2022).

  47. Speidel, L., Forest, M., Shi, S. & Myers, S. R. A method for genome-wide genealogy estimation for thousands of samples. Nat. Genet. 51, 1321–1329 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Sheng, X. et al. Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing. HGG Adv. 4, 100159 (2023).

    CAS  PubMed  Google Scholar 

  49. Loh, P.-R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Woerner, A. E., Veeramah, K. R., Watkins, J. C. & Hammer, M. F. The role of phylogenetically conserved elements in shaping patterns of human genomic diversity. Mol. Biol. Evol. 35, 2284–2295 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Lin, M. et al. Population-specific reference panels are crucial for genetic analyses: an example of the CREBRF locus in Native Hawaiians. Hum. Mol. Genet. 29, 2275–2284 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Fan, C. Ephraim-usc/glike: v1.0. Zenodo https://doi.org/10.5281/zenodo.14708630 (2025).

Download references

Acknowledgements

We would like to thank I. Mathieson, S. Mathieson and L. Speidel for discussions and advice. Research reported in this publication was supported by National Institute of Health under award number R35GM142783 and R01HG12605 to C.W.K.C., R35GM137758 to M.D.E., R01HG012133 and P01CA196569 to N.M. and by Programa de Apoyo a Proyectos de Investigación e Innovación Tecnológica–Universidad Nacional Autónoma de México (PAPIIT–UNAM) under award number IN215524 to D.O.-D.V. The funders had no role in the study design, data collection and analysis, decision to publish or preparation of the manuscript. Computation for this work was supported by the University of Southern California’s Center for Advanced Research Computing (https://carc.usc.edu).

Author information

Authors and Affiliations

Authors

Contributions

C.W.K.C., D.O.-D.V. and C.D.H. conceived of the study. C.F. and C.W.K.C. designed the study. C.F., J.L.C. and B.L.D. performed the analysis. B.L.D. curated the data. C.F., M.D.E., N.A.M. and C.W.K.C. interpreted the data. C.F., J.L.C., M.D.E. and C.W.K.C. wrote the manuscript with input from all co-authors.

Corresponding authors

Correspondence to Caoqi Fan or Charleston W. K. Chiang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Laurent Excoffier, Harald Ringbauer and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 The expected coalescence distribution based on the inferred demography matches the simulated input.

(A) We simulated 100 equal-distant trees of 1000 haplotypes were simulated on a 30 Mb chromosome, under the same demography as in Fig. 2a. The demography is inferred by gLike on the true trees with default settings, and the expected coalescence distribution is computed by simulation of 10,000 trees under the inferred demography. The two distributions are highly consistent, except for small random fluctuations on the observed distribution. (B) The same experiment as in (A), but tsdate reconstruction is applied to the observed trees, the parameters are then inferred by gLike on the reconstructed trees, and tsdate reconstruction is again applied to the simulated trees under the inferred parameters. Vertical dash lines indicate t1, t2, and t3 in the simulated demography, corresponding to the time of the more recent admixture event, the more distant admixture event, and the split of three ancestral populations, respectively.

Extended Data Fig. 2 The inferred distribution of coalescence under the three-way admixture demography as function of input parameter Ne during ARG inference.

For ARG inference based on (A) tsdate, (B) Relate, and (C) Relate with branch sampling, the left panels show the times of coalescences (that is, inner nodes) in ascending order in a genealogical tree of 1000 haplotypes simulated under the three-way admixture demography as in Fig. 2a. Different color bands show 2 times standard deviation across 50 independent simulations. Right panels show TMRCA in the true tree versus the reconstructed tree, using Ne = 10,000, which we use as default for all ARG inference in this study. Results from 50 independent simulations are pooled for display.

Extended Data Fig. 3 Log likelihood distribution around Nooa values for different thresholds of maximum number of edges connecting all states between two time points.

For computational efficiency, if the total connections between two adjacent layers of the GOS exceeds a customizable hyperparameter, κ, gLike will approximate via sampleing (see Methods for details). Here we evaluate the impact of setting this threshold, κ, on the apparent biased estimate of Nooa parameter in Fig. 4. A total of 50 replicate experiments were conducted in each panel. Solid circles and error bars indicate mean and standard deviation, respectively, across the replicates. In each replicate experiment, 100 equally distant trees of 1000 haplotypes were simulated on a 30 Mb chromosome from population ADMIX under the same demography as in Fig. 4. The log-likelihood (logP) of observing these 100 trees were calculated by gLike assuming different Nooa values and all other demographic parameters fixed at true values. The logP calculated from the true Nooa = 1867 were subtracted from all logP values, for comparability between replicates. As we increased the default threshold for connections before gLike begin approximating the likelihood, the maximum likelihood estimate (dashed line) also tended towards the true value (solid line), suggesting that the exact computation of likelihood is unbiased, though approximation for computational reasons could lead to bias.

Extended Data Fig. 4 Average gLike runtime on a single genealogical tree with varying sample sizes.

50 replicate experiments were conducted for each sample size. Solid circles indicate the average runtime on each tree, and squares indicate the average time spent on scipy logsumexp function for each tree. Error bars indicate the standard deviation across 50 replicates. In each replicate experiment, 100 equally distant trees of 1000 haplotypes were simulated on a 30 Mb chromosome under the same three-way admixture demography as in Fig. 2.

Extended Data Fig. 5 Robustness of gLike against misspecified continuous migrations.

The same experiment in Fig. 4a except that the true demography contains AFR-EUR, AFR-ASIA, EUR-ASIA and AFR-OOA continuous migrations that are set to be 1x (A), 10x (B) and 100x (C) of their rates as in the stdpopsim 4B11 model. gLike was applied on the true trees in the same way as in Fig. 4a, assuming no continuous migrations. Note that the 1x continuous migrations have no visible impact on the results, while 100x continuous migrations lead to considerable underestimations of \({t}_{3}\), \({t}_{4}\) and \({N}_{{\rm{afr}}}\), due to the accumulation of coalescences earlier than expected in a migration-free demography. Boxplots display the first, second (the median), and third quartiles of the data, with whiskers extending from the box to the farthest data point lying within 1.5x of the inter-quartile range.

Extended Data Fig. 6 Unidentifiability between population sizes and growth rates.

The log-likelihood of the gLike model on the population sizes (at time of admixture) and growth rates of the Latinos and Native Hawaiians in a grid of possible parameters. All other parameters were fixed at their estimates shown in Fig. 6. This result indicates the potential bias when estimating entangled parameters, because the hill-climbing optimization could stop anywhere along the red curve, depending on the initial values.

Supplementary information

Supplementary Information

Supplementary Notes, Supplementary Figs. 1–11 and Supplementary Tables 1–6.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fan, C., Cahoon, J.L., Dinh, B.L. et al. A likelihood-based framework for demographic inference from genealogical trees. Nat Genet 57, 865–874 (2025). https://doi.org/10.1038/s41588-025-02129-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-025-02129-x

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing