Extended Data Fig. 1: HiDEF-seq library preparation and sequencing metrics. | Nature

Extended Data Fig. 1: HiDEF-seq library preparation and sequencing metrics.

From: DNA mismatch and damage patterns revealed by single-molecule sequencing

Extended Data Fig. 1

a, Representative DNA sizing electropherogram after Hpy166II restriction enzyme digestion (top) and after completion of the HiDEF-seq library preparation, which removes fragments <1 kb (bottom). b, Two-dimensional histogram of all molecules from a representative HiDEF-seq sequencing run of each molecule’s longest strand read length (bp, base pairs) versus its total polymerase read length (PRL). Dashed line signifies the expected strand length distribution. The red diagonal line reflects 18% of molecules with <1 strand pass, which is typical in PacBio sequencing. c, Histogram (200 bp bins) for representative HiDEF-seq samples (n = 51) of molecule consensus sequence lengths (i.e., molecule sizes). Line and shaded region show average and standard deviation, respectively, across samples for each bin. The average of these samples’ median lengths is 1.7 kilobases (kb). d, Histogram as in panel (c), showing HiDEF-seq (n = 51 representative samples) yields smaller molecule lengths than standard PacBio (HiFi) samples (n = 10 samples). The average of samples’ median lengths are 1.7 kb and 18.3 kb for HiDEF-seq and HiFi, respectively. e, Two-dimensional histogram of the number of passes (bin width of 5 passes) vs. consensus sequence lengths (bin width of 200 bp) for molecules from the 51 representative HiDEF-seq samples plotted in panels (c,d). Bins are coloured if there is at least one molecule in the bin. f, Box plots of the fraction of a molecule’s consensus sequence bases (average of forward and reverse strands) that have the maximum predicted quality (quality=93, as predicted by ccs, Methods) versus the number of passes per strand, across all molecules of the same samples included in panels (c-e). Note: 93 is the quality required for HiDEF-seq analysis. This plot illustrates that the number of passes is a key determinant of consensus quality in both HiDEF-seq and HiFi. b, Plot generated by SMRT Link (Pacific Biosciences) software. c-e, The single-molecule consensus sequence length is the average of the forward and reverse strand lengths. Bin values are normalized to the bin with the highest molecule count. e,f, The number of passes per strand is the average of the forward and reverse strand ‘ec’ tags (Methods). c-f, Plots show data of HiDEF-seq molecules that are output by the primary data processing step of the HiDEF-seq analysis pipeline and standard PacBio HiFi molecules that are output by the ccs HiFi pipeline (Methods). f, Box plot: middle line, median; boxes, 1st and 3rd quartiles; whiskers, the maximum/minimum values within 1.5 x interquartile range. X-axis: square brackets and parentheses signify inclusion and exclusion of interval endpoints, respectively.

Back to article page