Fig. 2: Schematic for simulating fragmentary SNP datasets for the individuals in the test set. | European Journal of Human Genetics

Fig. 2: Schematic for simulating fragmentary SNP datasets for the individuals in the test set.

From: Record-matching of STR profiles with fragmentary genomic SNP data

Fig. 2

A An example of the SNPs in a 1-Mb window (green) of a Codis locus (red) in two specific individuals. We denote the total number of SNPs in the whole genome with full coverage (c = 1) by Nall = 27,185,239. Dℓ (ℓ = 1, 2, ..., L) indicates the number of SNPs in the 1-Mb window of the ℓth Codis locus, and \(N_{win} = \mathop {\sum}\nolimits_{\ell = 1}^L {D_\ell = 161,968}\) represents the number of SNPs in all L 1-Mb windows (Table S2). The symbol ‘|’ indicates phased genotypes. B The simulated set of fragmentary SNPs for the individuals in (A). The symbol ‘/ ’ indicates unphased genotypes. C The simulation pipeline for generating simulated fragmentary SNPs from the 1000 Genomes dataset. For a given sequencing coverage c, the total number of SNPs sequenced from the whole genome is \(N_{{{{{{{{\mathrm{all}}}}}}}}}^{\left( c \right)} = \left[ {N_{{{{{{{{\mathrm{all}}}}}}}}}c} \right]\). Given c, we repeat the following procedure 100 times to generate 100 random sets of fragmentary SNPs. We first sample X, the number of sequenced SNPs in L 1-Mb windows combined, from a binomial distribution with parameters \(N_{{{{{{{{\mathrm{all}}}}}}}}}^{\left( c \right)}\,{{{{{{{\mathrm{and}}}}}}}}\,f = N_{{{{{{{{\mathrm{win}}}}}}}}}/N_{{{{{{{{\mathrm{all}}}}}}}}} \approx 0.006\). Using the sampled value of X, for each test individual i (i = 1, 2, ..., I), we generate random sets of sequenced SNPs in the 1-Mb windows by first sampling individual-specific \({{{\boldsymbol{d}}}}^{( i )} = ( {d_1^{( i )},\,d_2^{( i )},...,\,d_L^{( i )}} )\)—the vector of numbers of sequenced SNPs from each of the L windows—from a multinomial distribution with parameters X and (D1/Nwin, D2/Nwin, ..., DL/Nwin). For each ℓ (ℓ = 1, 2, ..., L), we then sample \(d_{\ell}^{(i)}\) SNPs uniformly at random without replacement from the Dℓ SNPs of the full-coverage set.

Back to article page