Figure 9
From: Large scale proteomic studies create novel privacy considerations

How the matching technique can be used to identified mislabeled omics data. (A) two subjects (1 and 2) were enrolled at the same clinical center at a baseline visit. Their plasma proteomes matched (P = 1) a different subject’s genome at baseline from the same clinical center, but their plasma proteomes matched the correct genome at subsequent visits. Another example of this is two subjects (3 and 4) from a different clinical center who appear to have their plasma samples swapped at their year 1 visit. This suggests that plasma samples were swapped at a single clinical center during a single visit and should be relabeled. (B) A subject (Subject A) who has multiple visits in which the proteomes were all mapping consistently to the genome of a different person (Subject B). This suggests that the DNA sample that was used for genotyping was swapped and that the DNA genotype data from Subject A should be labeled as coming from Subject B. Note that the x-axis for all the figures are shown on a log-scale because the probability all the unrelated a proteome matching to an unrelated genome is essentially zero (e.g., P < 1040).