Fig. 4: Stratifications reveal nuances in precision and recall performance when benchmarking using hap.py. | Nature Communications

Fig. 4: Stratifications reveal nuances in precision and recall performance when benchmarking using hap.py.

From: The GIAB genomic stratifications resource for human reference genomes

Fig. 4

A Performance within important stratifications using assembly-based HG002 benchmark and GRCh37, GRCh38, or CHM13 as reference and a HiFi-DeepVariant query callset. B The CHM13 performance results from (A) compared with the same benchmarking pipeline restricted to nonsyntenic regions relative to GRCh38. C Performance within all autosomes or tandem repeats/homopolymer regions for ONT callsets created with either guppy4+clair1 or guppy5+clair3. D Comparison of HiFi and Illumina callsets on CHM13 using the Q100 benchmark. Each bar is the mean of the given metric which is also shown as text. Error bars are 95% binomial confidence intervals computed with the Wilson method (see Methods). Stratification meaning on y axes: Lowmap = low-mappability regions (100 and 250 bp sizes); High/Low GC = GC content > 25% or > 65%; SegDup = segmental duplications >= 1 kb; TRs = tandem repeats; HPs = homopolymers >= 7 bp or imperfect homopolymers >= 11 bp; Difficult = SegDup+LowMap+HPs+TRs+XY PAR/XTR/Ampliconic+High/Low GC; Autosomes = all autosomal regions; 2-mer TRs = repeats with unit size 2; Short TRs = tandem repeats <50 bp long.

Back to article page