A likelihood-based framework for demographic inference from genealogical trees

Fan, Caoqi; Cahoon, Jordan L.; Dinh, Bryan L.; Ortega-Del Vecchyo, Diego; Huber, Christian D.; Edge, Michael D.; Mancuso, Nicholas; Chiang, Charleston W. K.

doi:10.1038/s41588-025-02129-x

Article
Published: 20 March 2025

A likelihood-based framework for demographic inference from genealogical trees

Nature Genetics volume 57, pages 865–874 (2025)Cite this article

1889 Accesses
10 Citations
7 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 17 June 2025

This article has been updated

Abstract

The demographic history of a population underlies patterns of genetic variation and is encoded in the gene-genealogical trees of the sampled haplotypes. Here we propose a demographic inference framework called the genealogical likelihood (gLike). Our method uses a graph-based structure to summarize the relationships among all lineages in a gene-genealogical tree with all possible trajectories of population memberships through time and derives the full likelihood across trees under a parameterized demographic model. We show through simulations and empirical applications that for populations that have experienced multiple admixtures, gLike can accurately estimate dozens of demographic parameters, including ancestral population sizes, admixture timing and admixture proportions, and it outperforms conventional demographic inference methods using the site frequency spectrum. Taken together, our proposed gLike framework harnesses underused genealogical information to offer high sensitivity and accuracy in inferring complex demographies for humans and other species.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: A schematic of the major steps of the gLike algorithm with examples.**

**Fig. 2: gLike accurately reconstructs three-way admixture without ancestral population samples.**

**Fig. 3: gLike distinguishes three-way admixture from two-way admixture.**

**Fig. 4: gLike reconstructs the American admixture demography.**

**Fig. 5: gLike reconstructs the ancient Europe demography.**

**Fig. 6: Parameter estimations for the demographic histories of Latinos and Native Hawaiians.**

Ignoring population structure in hominin evolutionary models can lead to the inference of spurious admixture events

Article 13 December 2024

Inference with selection, varying population size, and evolving population structure: application of ABC to a forward–backward coalescent process with interactions

Article 30 October 2020

100 ancient genomes show repeated population turnovers in Neolithic Denmark

Article Open access 10 January 2024

Data availability

The individual-level genetic data for Native Hawaiian and Latino datasets were derived from the Multiethnic Cohort and are available on dbGaP (accession numbers phs000220.v2.p2 and phs002183.v1.p1).

Code availability

The gLike package is available on its GitHub page (https://github.com/Ephraim-usc/glike). The version of gLike as well as codes used for simulation and plotting presented in this study can also be found on Zenodo (https://doi.org/10.5281/zenodo.14708630)⁵².

Change history

17 June 2025
A Correction to this paper has been published: https://doi.org/10.1038/s41588-025-02257-4

References

Schlebusch, C. M. & Jakobsson, M. Tales of human migration, admixture, and selection in Africa. Annu. Rev. Genom. Hum. Genet. 19, 405–428 (2018).
Article CAS Google Scholar
Chiang, C. W. K. et al. Genomic history of the Sardinian population. Nat. Genet. 50, 1426–1434 (2018).
Article CAS PubMed PubMed Central Google Scholar
Micheletti, S. J. et al. Genetic consequences of the transatlantic slave trade in the Americas. Am. J. Hum. Genet. 107, 265–277 (2020).
Article CAS PubMed PubMed Central Google Scholar
Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012).
Article PubMed PubMed Central Google Scholar
Peter, B. M. Admixture, population structure, and F-statistics. Genetics 202, 1485–1501 (2016).
Article CAS PubMed PubMed Central Google Scholar
Lipson, M. Applying f₄‐statistics and admixture graphs: theory and examples. Mol. Ecol. Resour. 20, 1658–1667 (2020).
Article PubMed PubMed Central Google Scholar
Lohmueller, K. E. The distribution of deleterious genetic variation in human populations. Curr. Opin. Genet. Dev. 29, 139–146 (2014).
Article CAS PubMed Google Scholar
Keinan, A. & Clark, A. G. Recent explosive human population growth has resulted in an excess of rare genetic variants. Science 336, 740–743 (2012).
Article CAS PubMed PubMed Central Google Scholar
Wang, S. R. et al. Simulation of Finnish population history, guided by empirical genetic data, to assess power of rare-variant tests in Finland. Am. J. Hum. Genet. 94, 710–720 (2014).
Article CAS PubMed PubMed Central Google Scholar
Medina-Muñoz, S. G. et al. Demographic modeling of admixed Latin American populations from whole genomes. Am. J. Hum. Genet. 110, 1804–1816 (2023).
Article PubMed PubMed Central Google Scholar
Uricchio, L. H., Zaitlen, N. A., Ye, C. J., Witte, J. S. & Hernandez, R. D. Selection and explosive growth alter genetic architecture and hamper the detection of causal rare variants. Genome Res. 26, 863–873 (2016).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Inference of human population history from individual whole-genome sequences. Nature 475, 493–496 (2011).
Article CAS PubMed PubMed Central Google Scholar
Sheehan, S., Harris, K. & Song, Y. S. Estimating variable effective population sizes from multiple genomes: a sequentially Markov conditional sampling distribution approach. Genetics 194, 647–662 (2013).
Article PubMed PubMed Central Google Scholar
Schiffels, S. & Durbin, R. Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46, 919–925 (2014).
Article CAS PubMed PubMed Central Google Scholar
Terhorst, J., Kamm, J. A. & Song, Y. S. Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat. Genet. 49, 303–309 (2017).
Article CAS PubMed Google Scholar
Palamara, P. F., Lencz, T., Darvasi, A. & Pe’er, I. Length distributions of identity by descent reveal fine-scale demographic history. Am. J. Hum. Genet. 91, 809–822 (2012).
Article CAS PubMed PubMed Central Google Scholar
Browning, S. R. et al. Ancestry-specific recent effective population size in the Americas. PLoS Genet. 14, e1007385 (2018).
Article PubMed PubMed Central Google Scholar
Gutenkunst, R. N., Hernandez, R. D., Williamson, S. H. & Bustamante, C. D. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695 (2009).
Article PubMed PubMed Central Google Scholar
Bhaskar, A., Wang, Y. X. R. & Song, Y. S. Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data. Genome Res. 25, 268–279 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kamm, J., Terhorst, J., Durbin, R. & Song, Y. S. Efficiently inferring the demographic history of many populations with allele count data. J. Am. Stat. Assoc. 115, 1472–1487 (2020).
Article CAS PubMed Google Scholar
Excoffier, L. et al. fastsimcoal2: demographic inference under complex evolutionary scenarios. Bioinformatics 37, 4882–4885 (2021).
Article CAS PubMed PubMed Central Google Scholar
Excoffier, L., Dupanloup, I., Huerta-Sánchez, E., Sousa, V. C. & Foll, M. Robust demographic Inference from genomic and SNP data. PLoS Genet. 9, e1003905 (2013).
Article PubMed PubMed Central Google Scholar
Liu, X. & Fu, Y.-X. Stairway Plot 2: demographic history inference with folded SNP frequency spectra. Genome Biol 21, 280 (2020).
Article PubMed PubMed Central Google Scholar
McVean, G. A genealogical interpretation of principal components analysis. PLoS Genet. 5, e1000686 (2009).
Article PubMed PubMed Central Google Scholar
Opgen-Rhein, R., Fahrmeir, L. & Strimmer, K. Inference of demographic history from genealogical trees using reversible jump Markov chain Monte Carlo. BMC Evol. Biol. 5, 6 (2005).
Article PubMed PubMed Central Google Scholar
Fan, C., Mancuso, N. & Chiang, C. W. K. A genealogical estimate of genetic relationships. Am. J. Hum. Genet. 109, 812–824 (2022).
Article CAS PubMed PubMed Central Google Scholar
Hudson, R. R. Gene genealogies and the coalescent process. In Oxford Surveys in Evolutionary Biology Vol. 7 (eds Futuyma, D. & Antonovics, J.) 1–44 (Oxford Univ. Press, 1990).
Griffiths, R. C. & Marjoram, P. Ancestral inference from samples of DNA sequences with recombination. J. Comput. Biol. 3, 479–502 (1996).
Article CAS PubMed Google Scholar
Lewanski, A. L., Grundler, M. C. & Bradburd, G. S. The era of the ARG: an introduction to ancestral recombination graphs and their significance in empirical evolutionary genomics. PLoS Genet. 20, e1011110 (2024).
Article CAS PubMed PubMed Central Google Scholar
Brandt, Y. C. et al. Evaluation of methods for estimating coalescence times using ancestral recombination graphs. Genetics 221, iyac044 (2022).
Article Google Scholar
Peng, D., Mulder, O. J. & Edge, M. D. Evaluating ARG-estimation methods in the context of estimating population-mean polygenic score histories. Genetics https://doi.org/10.1093/genetics/iyaf033 (2025).
Brandt, D. Y. C., Huber, C. D., Chiang, C. W. K. & Ortega-Del Vecchyo, D. The promise of inferring the past using the ancestral recombination graph. Genome Biol. Evol. 16, evae005 (2024).
Article PubMed PubMed Central Google Scholar
Pearson, A. & Durbin, R. Local ancestry inference for complex population histories. Preprint at bioRxiv https://doi.org/10.1101/2023.03.06.529121 (2023).
Wang, Z. et al. Automatic inference of demographic parameters using generative adversarial networks. Mol. Ecol. Resour. 21, 2689–2705 (2021).
Article PubMed PubMed Central Google Scholar
Adrion, J. R. et al. A community-maintained standard library of population genetic models. eLife 9, e54967 (2020).
Article CAS PubMed PubMed Central Google Scholar
Sun, H. et al. The impact of global and local Polynesian genetic ancestry on complex traits in Native Hawaiians. PLoS Genet. 17, e1009273 (2021).
Article CAS PubMed PubMed Central Google Scholar
Jeon, S. et al. Genome-wide trans-ethnic meta-analysis identifies novel susceptibility loci for childhood acute lymphoblastic leukemia. Leukemia 36, 865–868 (2022).
Article CAS PubMed Google Scholar
Myers, S., Fefferman, C. & Patterson, N. Can one learn history from the allelic spectrum? Theor. Popul. Biol. 73, 342–348 (2008).
Article PubMed Google Scholar
Kubatko, L. S., Carstens, B. C. & Knowles, L. L. STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25, 971–973 (2009).
Article CAS PubMed Google Scholar
Pickrell, J. K. & Pritchard, J. K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 8, e1002967 (2012).
Article CAS PubMed PubMed Central Google Scholar
Molloy, E. K., Durvasula, A. & Sankararaman, S. Advancing admixture graph estimation via maximum likelihood network orientation. Bioinformatics 37, i142–i150 (2021).
Article CAS PubMed PubMed Central Google Scholar
Nielsen, S. V. et al. Bayesian inference of admixture graphs on Native American and Arctic populations. PLoS Genet. 19, e1010410 (2023).
Article CAS PubMed PubMed Central Google Scholar
Deng, Y., Nielsen, R. & Song, Y. S. Robust and accurate Bayesian inference of genome-wide genealogies for large samples. Preprint at bioRxiv https://doi.org/10.1101/2024.03.16.585351 (2024).
Kelleher, J., Etheridge, A. M. & McVean, G. Efficient coalescent simulation and genealogical analysis for large sample sizes. PLoS Comput. Biol. 12, e1004842 (2016).
Article PubMed PubMed Central Google Scholar
Kelleher, J. et al. Inferring whole-genome histories in large population datasets. Nat. Genet. 51, 1330–1338 (2019).
Article CAS PubMed PubMed Central Google Scholar
Wohns, A. W. et al. A unified genealogy of modern and ancient genomes. Science https://doi.org/10.1126/science.abi8264 (2022).
Speidel, L., Forest, M., Shi, S. & Myers, S. R. A method for genome-wide genealogy estimation for thousands of samples. Nat. Genet. 51, 1321–1329 (2019).
Article CAS PubMed PubMed Central Google Scholar
Sheng, X. et al. Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing. HGG Adv. 4, 100159 (2023).
CAS PubMed Google Scholar
Loh, P.-R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).
Article CAS PubMed PubMed Central Google Scholar
Woerner, A. E., Veeramah, K. R., Watkins, J. C. & Hammer, M. F. The role of phylogenetically conserved elements in shaping patterns of human genomic diversity. Mol. Biol. Evol. 35, 2284–2295 (2018).
Article CAS PubMed PubMed Central Google Scholar
Lin, M. et al. Population-specific reference panels are crucial for genetic analyses: an example of the CREBRF locus in Native Hawaiians. Hum. Mol. Genet. 29, 2275–2284 (2020).
Article CAS PubMed PubMed Central Google Scholar
Fan, C. Ephraim-usc/glike: v1.0. Zenodo https://doi.org/10.5281/zenodo.14708630 (2025).

Download references

Acknowledgements

We would like to thank I. Mathieson, S. Mathieson and L. Speidel for discussions and advice. Research reported in this publication was supported by National Institute of Health under award number R35GM142783 and R01HG12605 to C.W.K.C., R35GM137758 to M.D.E., R01HG012133 and P01CA196569 to N.M. and by Programa de Apoyo a Proyectos de Investigación e Innovación Tecnológica–Universidad Nacional Autónoma de México (PAPIIT–UNAM) under award number IN215524 to D.O.-D.V. The funders had no role in the study design, data collection and analysis, decision to publish or preparation of the manuscript. Computation for this work was supported by the University of Southern California’s Center for Advanced Research Computing (https://carc.usc.edu).

Author information

Authors and Affiliations

Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
Caoqi Fan, Bryan L. Dinh, Nicholas Mancuso & Charleston W. K. Chiang
Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
Caoqi Fan, Jordan L. Cahoon, Bryan L. Dinh, Michael D. Edge, Nicholas Mancuso & Charleston W. K. Chiang
Department of Computer Science, University of Southern California, Los Angeles, CA, USA
Jordan L. Cahoon
Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Querétaro, México
Diego Ortega-Del Vecchyo
Department of Biology, Penn State University, University Park, PA, USA
Christian D. Huber

Authors

Caoqi Fan
View author publications
Search author on:PubMed Google Scholar
Jordan L. Cahoon
View author publications
Search author on:PubMed Google Scholar
Bryan L. Dinh
View author publications
Search author on:PubMed Google Scholar
Diego Ortega-Del Vecchyo
View author publications
Search author on:PubMed Google Scholar
Christian D. Huber
View author publications
Search author on:PubMed Google Scholar
Michael D. Edge
View author publications
Search author on:PubMed Google Scholar
Nicholas Mancuso
View author publications
Search author on:PubMed Google Scholar
Charleston W. K. Chiang
View author publications
Search author on:PubMed Google Scholar

Contributions

C.W.K.C., D.O.-D.V. and C.D.H. conceived of the study. C.F. and C.W.K.C. designed the study. C.F., J.L.C. and B.L.D. performed the analysis. B.L.D. curated the data. C.F., M.D.E., N.A.M. and C.W.K.C. interpreted the data. C.F., J.L.C., M.D.E. and C.W.K.C. wrote the manuscript with input from all co-authors.

Corresponding authors

Correspondence to Caoqi Fan or Charleston W. K. Chiang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Laurent Excoffier, Harald Ringbauer and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 The expected coalescence distribution based on the inferred demography matches the simulated input.

(A) We simulated 100 equal-distant trees of 1000 haplotypes were simulated on a 30 Mb chromosome, under the same demography as in Fig. 2a. The demography is inferred by gLike on the true trees with default settings, and the expected coalescence distribution is computed by simulation of 10,000 trees under the inferred demography. The two distributions are highly consistent, except for small random fluctuations on the observed distribution. (B) The same experiment as in (A), but tsdate reconstruction is applied to the observed trees, the parameters are then inferred by gLike on the reconstructed trees, and tsdate reconstruction is again applied to the simulated trees under the inferred parameters. Vertical dash lines indicate t₁, t₂, and t₃ in the simulated demography, corresponding to the time of the more recent admixture event, the more distant admixture event, and the split of three ancestral populations, respectively.

Extended Data Fig. 2 The inferred distribution of coalescence under the three-way admixture demography as function of input parameter Ne during ARG inference.

For ARG inference based on (A) tsdate, (B) Relate, and (C) Relate with branch sampling, the left panels show the times of coalescences (that is, inner nodes) in ascending order in a genealogical tree of 1000 haplotypes simulated under the three-way admixture demography as in Fig. 2a. Different color bands show 2 times standard deviation across 50 independent simulations. Right panels show TMRCA in the true tree versus the reconstructed tree, using Ne = 10,000, which we use as default for all ARG inference in this study. Results from 50 independent simulations are pooled for display.

Extended Data Fig. 3 Log likelihood distribution around N_ooa values for different thresholds of maximum number of edges connecting all states between two time points.

For computational efficiency, if the total connections between two adjacent layers of the GOS exceeds a customizable hyperparameter, κ, gLike will approximate via sampleing (see Methods for details). Here we evaluate the impact of setting this threshold, κ, on the apparent biased estimate of N_ooa parameter in Fig. 4. A total of 50 replicate experiments were conducted in each panel. Solid circles and error bars indicate mean and standard deviation, respectively, across the replicates. In each replicate experiment, 100 equally distant trees of 1000 haplotypes were simulated on a 30 Mb chromosome from population ADMIX under the same demography as in Fig. 4. The log-likelihood (logP) of observing these 100 trees were calculated by gLike assuming different N_ooa values and all other demographic parameters fixed at true values. The logP calculated from the true N_ooa = 1867 were subtracted from all logP values, for comparability between replicates. As we increased the default threshold for connections before gLike begin approximating the likelihood, the maximum likelihood estimate (dashed line) also tended towards the true value (solid line), suggesting that the exact computation of likelihood is unbiased, though approximation for computational reasons could lead to bias.

Extended Data Fig. 4 Average gLike runtime on a single genealogical tree with varying sample sizes.

50 replicate experiments were conducted for each sample size. Solid circles indicate the average runtime on each tree, and squares indicate the average time spent on scipy logsumexp function for each tree. Error bars indicate the standard deviation across 50 replicates. In each replicate experiment, 100 equally distant trees of 1000 haplotypes were simulated on a 30 Mb chromosome under the same three-way admixture demography as in Fig. 2.

Extended Data Fig. 5 Robustness of gLike against misspecified continuous migrations.

The same experiment in Fig. 4a except that the true demography contains AFR-EUR, AFR-ASIA, EUR-ASIA and AFR-OOA continuous migrations that are set to be 1x (A), 10x (B) and 100x (C) of their rates as in the stdpopsim 4B11 model. gLike was applied on the true trees in the same way as in Fig. 4a, assuming no continuous migrations. Note that the 1x continuous migrations have no visible impact on the results, while 100x continuous migrations lead to considerable underestimations of \({t}_{3}\), \({t}_{4}\) and \({N}_{{\rm{afr}}}\), due to the accumulation of coalescences earlier than expected in a migration-free demography. Boxplots display the first, second (the median), and third quartiles of the data, with whiskers extending from the box to the farthest data point lying within 1.5x of the inter-quartile range.

Extended Data Fig. 6 Unidentifiability between population sizes and growth rates.

The log-likelihood of the gLike model on the population sizes (at time of admixture) and growth rates of the Latinos and Native Hawaiians in a grid of possible parameters. All other parameters were fixed at their estimates shown in Fig. 6. This result indicates the potential bias when estimating entangled parameters, because the hill-climbing optimization could stop anywhere along the red curve, depending on the initial values.

Supplementary information

Supplementary Information

Supplementary Notes, Supplementary Figs. 1–11 and Supplementary Tables 1–6.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Fan, C., Cahoon, J.L., Dinh, B.L. et al. A likelihood-based framework for demographic inference from genealogical trees. Nat Genet 57, 865–874 (2025). https://doi.org/10.1038/s41588-025-02129-x

Download citation

Received: 23 October 2023
Accepted: 14 February 2025
Published: 20 March 2025
Issue Date: April 2025
DOI: https://doi.org/10.1038/s41588-025-02129-x

A likelihood-based framework for demographic inference from genealogical trees

Subjects

Abstract

Access options

Similar content being viewed by others

Ignoring population structure in hominin evolutionary models can lead to the inference of spurious admixture events

Inference with selection, varying population size, and evolving population structure: application of ABC to a forward–backward coalescent process with interactions

100 ancient genomes show repeated population turnovers in Neolithic Denmark

Data availability

Code availability

Change history

17 June 2025

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Extended Data Fig. 1 The expected coalescence distribution based on the inferred demography matches the simulated input.

Extended Data Fig. 2 The inferred distribution of coalescence under the three-way admixture demography as function of input parameter Ne during ARG inference.

Extended Data Fig. 3 Log likelihood distribution around N_ooa values for different thresholds of maximum number of edges connecting all states between two time points.

Extended Data Fig. 4 Average gLike runtime on a single genealogical tree with varying sample sizes.

Extended Data Fig. 5 Robustness of gLike against misspecified continuous migrations.

Extended Data Fig. 6 Unidentifiability between population sizes and growth rates.

Supplementary information

Supplementary Information

Reporting Summary

Rights and permissions

About this article

Cite this article

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

Change history

17 June 2025

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links