Fig. 4: Record-matching accuracy in subsamples of varying size.
From: Record-matching of STR profiles with fragmentary genomic SNP data

The figure uses sampled individuals from the 1000 Genomes with full coverage (c = 1), considering 15 Codis loci and ∆true = ∆test = same individual. For each number of individuals in {1000, 1300, 1600, 1900, 2200}, we randomly sampled 100 sets of individuals from 2504 individuals in the 1000 Genomes dataset and performed record-matching on the reduced dataset, choosing 75% of the individuals for the training set and 25% for the test set. Green points consider all 2504 individuals in the 1000 Genomes and show the values for the 100 replicates summarized in Table 1. A One-to-one matching. B One-to-many matching with a query SNP profile. C One-to-many matching with a query STR profile. D Needle-in-haystack matching. The pink line indicates the median one-to-one matching accuracy of 100 trials. For comparison, the blue points indicate the corresponding results using the full HGDP dataset of 872 individuals, reporting the values for the 100 replicates summarized in the upper left corner of Table S3.