Fig. 3: The strategy of de novo calling and phasing of hetSNPs across the whole genome.

a Pipeline for two-round haplotype reconstructions to call more accurate and complete haplotype-resolved SNPs. Calling referred to the process of identifying SNPs relative to the reference genome (GRCh38). b The consistency of SNP calling between NanoStrand-seq and GIAB. c The consistency of SNP position between NanoStrand-seq and GIAB. d The precision of genotyping using NanoStrand-seq. Genotyping referred to the process of determining the genotype at each locus for the homologous allele. e The Hamming error rate and recall rate of hetSNPs on each chromosome. We used the GIAB dataset as a benchmark. f Total number of detected and phased SVs (including both insertions and deletions) and SNPs compared to the reference genome (GRCh38). The genotyping and phasing of reference SVs were determined by PacBio CCS reads and the nearest hetSNPs annotated by GIAB. g Schematic of phasing with known hetSNPs (derived from GIAB) using NanoStrand-seq long reads. h Recall rate and Hamming error rate of de novo phasing hetSNPs by NanoStrand-seq only and a strategy that NanoStrand-seq combined with known hetSNPs (from GIAB) that linked in the same reads with hetSNPs called by NanoStrand-seq.