Fig. 1: Twigstats performance on simulated data. | Nature

Fig. 1: Twigstats performance on simulated data.

From: High-resolution genomic history of early medieval Europe

Fig. 1

a, A diagram of the Twigstats approach. We first construct genealogies from genetic variation data and then use Twigstats to compute f2-statistics between pairs of groups to be used by ADMIXTOOLS2. b, Admixture proportions inferred from an f4-ratio statistic or non-negative least squares method. Source groups P1 and P2 split 250 generations ago and mix 50 generations ago, where P2 contributes proportion α and P1 contributes 1 − α. Effective population sizes are equal and constant except for a recent bottleneck in P2 (see Methods for simulation details). The Twigstats cut-off is set to 500 generations, the rare variant cut-off is set to 5%, and we additionally infer admixture proportions by generating ‘first coalescence profiles’ for each population and modelling PX as a mixture of sources P1 and P2 using non-negative least squares (NNLS) (Methods). We sample 20 haploid sequences from each population. Data are mean ± 2 s.e. around the point estimate. c, The fold improvement of s.e. relative to the genotype case as a function of the Twigstats cut-off time, for the same simulation as in b and averaged across different true admixture proportions. The dashed line shows the best fold improvement of s.e. when ascertaining genotypes by frequency, when evaluated at different frequency cut-offs. d, The optimal Twigstats cut-off, defined as the largest reduction in s.e. relative to the genotype case, as a function of source split time in simulations using true trees. The dashed line indicates our theoretical prediction (Supplementary Note).

Back to article page