Fig. 3: Long reads improve gene isoform quantification by reducing data deconvolution uncertainty.

a, Left, schematic illustration of the gene isoform structure and corresponding aligned short reads of FAM219A. Right, raincloud plots represent the median MARD of FAM219A quantified by miniQuant-L and kallisto under GENCODE (top) and sample-specific (bottom) annotation (n = 200 simulations). b, Barcharts represent the ΔMARD between five short-read-based tools and miniQuant-L within eight different K-value groups under GENCODE and sample-specific annotation, respectively. c, Comparison of median MARD between seven long-read-based tools and miniQuant-L across nine different sequencing depths on cDNA-ONT data under GENCODE annotation. d, Comparison of median MARD among cDNA-PacBio, cDNA-ONT and dRNA-ONT protocols across sequencing depths (n = 9) under GENCODE and sample-specific annotations. e, Two-dimension scatterplot shows the ΔMARD between kallisto and miniQuant-L against gene expression levels on different short-read and long-read sequencing depth combinations. SR, short-read pairs. LR, long reads. f, Coverage tracks and gene isoform structures of three indicated genes, OR1I1, CPLANE2 and GCLC, from Set 1, Set 2 and Set 3 in e (left), respectively. The K-value and TPM are labeled in the bracket under the gene symbols. CPLANE2 is described in Supplementary Note 5. In GCLC, low-expressed (TPM < 1) gene isoforms are covered in gray. g, Violin plot shows the MARD of genes with low K-values (≤2) (n = 2,826 genes) and high K-values (>25) (n = 3,339 genes) by miniQuant-L and kallisto across 11 short-read and long-read sequencing depths. In box plots, the hinges represent the first and third quartiles, the center line represents the median, and the whiskers extend to the smallest and largest datapoints within 1.5 interquartile from the hinges.