Fig. 3: Estimation of the number of genomovars making up the natural Salinibacter ruber population.
From: Towards estimating the number of strains that make up a natural bacterial population

Metagenomic reads from each sample (Panels B–D) or all samples combined (Panel A) of the control pond were mapped to the Sal. ruber genomes preserving all matches with identity ≥99.3%. The mapping file was manipulated to remove one target genome at a time (randomly sorted) while recording the number of unique reads mapping at each step, and this process was repeated 100 times to reduce the impact of randomization on the estimates obtained (below). The number of reads were then expressed as the fraction of the maximum number of reads from the Sal. ruber species by dividing the observed counts by the total number of reads mapping to any reference genome with identity ≥ 95%. The logarithm of the number of total (dereplicated) genomes used was then expressed as a function of the fraction of Sal. ruber reads captured by the genomes, and a linear regression was determined by unweighted least squares and evaluated using Pearson correlation for the region between 20 and 100 genomes. This trendline was extrapolated to 100% coverage of the genomovar diversity (i.e., all reads from the species) to provide an estimate of the number of genomovars represented (Y-axis) in the total sequenced fraction (X-axis). Filled dots represent the fraction of the total Sal. ruber reads captured by the genomovars used, and the shaded bands around the observed subsamples represent the central inter-quantile ranges at 100%, 80%, 60, 40%, and 20%. Source data are provided as Source Data 2.