Fig. 1: K-value indexes the uncertainty in gene isoform quantification.

a, Schematic illustration of gene isoform structure and corresponding short and long reads assignment. Shared exonic sequences among isoforms lead to uncertainty in assigning reads to gene isoforms of origin, especially for short reads due to the limited read lengths. b, Overview of the statistical models used in gene isoform abundance estimation. Methods have been developed to estimate gene isoform abundance using region-based or read-/fragment-based models. Both models can be unified into a GLM framework b = Aϕ. c, Top, design of a new statistic, K-value, to index the gene isoform quantification error in RNA-seq data. Middle, for full-column-rank matrix A, when \({\Vert \delta {{ \textbf{A}}}\Vert }_{2} \ll {\Vert {{\textbf{A}}}\Vert }_{2}\), K(A) is low, and the error b − Aϕ is small, the relative quantification error \(\frac{{\Vert \delta {\mathbf{\upphi}}\Vert}_{2}}{{\Vert{\mathbf{\upphi}}\Vert}_{2}}\) is bounded approximately by \(K({\textbf{A}}) \times(\frac{{\Vert \delta {\textbf{b}}\Vert }_{2}}{{\Vert {\textbf{b}}\Vert }_{2}}+\frac{{\Vert \delta {{\textbf{A}}}\Vert }_{2}}{{\Vert {{\textbf{A}}}\Vert }_{2}})\) (Supplementary Note 3). Bottom, higher K-value indicates the model is more sensitive to perturbations, thus the quantification results are less reliable. Fix the value of ϕ; given same level of perturbations on b, the linear model b = Aϕ tends to have larger estimation errors δϕ when the K-value of A is larger.