Figure 2

Optical mapping unambiguously reveals the correct structures and haplotypes in 22q11.2. Illumina sequencing reads of the 11744C genome map to the hg38 reference genome 22q11.2 sequence. This includes reads mapping to the SD22-3 segmental duplication (mustard). Upon inspection of SD22-3 in the UCSC Genome Browser’s segmental duplication track, reads mapping to several loci within SD22-3 would also map to other segmental duplications in 22q11.2 with > 98% identity. A small section of SD22-3 is indicated in the figure, where reads mapping from 18,560,037 to 18,560,186 would also map to two LCR22D loci with 100% identity. Optical mapping of the 11744C genome indicates the complete absence of SD22-3 in either of its haplotypes (blue contigs) when aligned to the hg38 reference map (green). One haplotype consists of two copies of SD22-4, a previously characterized 160 kbp element, with inverted and reference orientations. The other haplotype consists of three copies of SD22-4, all inverted relative to the reference. Anchor regions outside the segmental duplications (green) validate correct mapping of contigs, providing clear evidence of these two haplotypes, demonstrating that SD22-3 does not exist as a gross structure in this genome as Illumina short reads incorrectly indicated.