Fig. 2: The impact of copy number variants (CNV) deleting or duplicating full repeat units and the importance of hidden split reads in detecting such CNVs, studied on HG002. | Nature Communications

Fig. 2: The impact of copy number variants (CNV) deleting or duplicating full repeat units and the importance of hidden split reads in detecting such CNVs, studied on HG002.

From: SurVIndel2: improving copy number variant calling from next-generation sequencing using hidden split reads

Fig. 2

a We consider a CNV to be contained in a tandem repeat if at least 90% of the CNV is covered by the repetitive region. Nearly two-thirds of the deletions and more than 93% of the tandem duplications in this dataset are contained in a tandem repeat. b For every CNV that overlaps with a tandem repeat, we calculated the number of repeat units that are deleted or inserted, by dividing the length of the CNV by the length of the repeat unit. Bars representing an integer number of copies (i.e., the length of the CNV is a multiple of the length of the repeat unit) are coloured red. Most CNVs add or delete an integer number of repeat units. c, d Number of deletions (c) and tandem duplications (d) inside and outside of tandem repeats that are supported by split reads (SR), hidden split reads (HSR), and neither. Hidden split reads have the potential to recover a significant number of deletions and duplications missed by regular split reads. e HSR scores of the hidden split reads supporting the existing CNVs in HG002. Most reads have a low  (< 20) HSR score. f We found all hidden split reads in 10,000 randomly sampled repetitive regions, and for each, we tried to predict a CNV by finding the optimal split alignment. Most CNVs  (> 91%) detected this way were false positives. This shows that using hidden split reads to detect CNVs is challenging, and effective filters must be employed to distinguish real from false CNVs.

Back to article page