Extended Data Fig. 1: A comparison of the standard and little bootstrap approaches.
From: Fast and accurate bootstrap confidence limits on genome-scale phylogenies using little bootstraps

Steps of (a) the standard phylogeny bootstrap and (b) the little bootstraps (BS) approach. Shaded boxes represent sequence alignments, with width representing sequence length. In standard BS, L sites are randomly sampled with replacement from the original dataset containing L sites. In this resampling process, ~63.2% of the data points17,30 are expected to be represented in a bootstrap replicate dataset. Each replicate dataset is compressed into weighted resamples that contain only distinct site configurations and a vector of their counts (represented by stacks of dots). An ML tree is inferred from each replicate dataset, and the BCL for a species group is the proportion of times that appeared in bootstrap replicate phylogenies. In little BS, L sites are randomly sampled with replacement from the little dataset consisting of only l = Lg sites, which produces bootstrap replicate datasets. Because \(l \ll {{{\mathrm{L}}}}\), each site will be represented many times in the little bootstraps replicate datasets, which we refer to as upsampling that changes the frequency of unique site configurations. Stacks of dots are much higher for little BS due to upsampling than standard BS that involves only resampling. The number of distinct site configurations in the upsampled dataset is smaller than in the standard bootstrap replicate dataset because of \(l \ll {{{\mathrm{L}}}}\).