Fig. 3: Benchmarking the specificity of SAVANA against existing algorithms using sequencing replicates of matched germline controls.

a, Schematic representation of the COLO829BL normal flow cell replicate analysis strategy implemented to quantify the false-positive rate of somatic SV detection algorithms. Created with BioRender.com. b, The number of somatic SVs detected in the COLO829BL cell line when running the algorithms benchmarked using a normal replicate as the tumor sample. The number on top of each bar indicates the number of false-positive calls for each algorithm. c, Schematic representation of the replicate analysis strategy implemented to quantify the false-positive rate of somatic SV detection algorithms. Created with BioRender.com. d, The distribution of false-positive SV calls detected when running the SV detection algorithms benchmarked using replicates of 37 whole-blood normal samples with at least 30× coverage, generated in silico by splitting sequencing reads randomly into two BAM files. Each dot represents one blood sample. The significance in d was assessed using the two-sided Wilcoxon’s rank test (****P < 0.0001). The P values for the comparison of SAVANA against cuteSV, Sniffles2, SVIM, Severus, SVision-pro and NanomonSV were P = 2.5 × 10−12, P = 2.5 × 10−12, P = 2.5 × 10−12, P = 2.5 × 10−12, P = 1.4 × 10−11 and P = 7 × 10−12, respectively. The box plots in d show the median, first and third quartiles (boxes), and the whiskers encompass observations within 1.5× the interquartile range from the first and third quartiles.