Extended Data Fig. 2: Classification and quantification of splice junction classes across datasets. | Nature Genetics

Extended Data Fig. 2: Classification and quantification of splice junction classes across datasets.

From: Global impact of unproductive splicing on human gene expression

Extended Data Fig. 2

(a) The percent of splice junctions in each sample that are uniquely attributable to transcripts tagged as ‘nonsense_mediated_decay’ (Gencode v37). Box and whiskers show quartiles for LCL samples (individual jittered points) in each RNA-seq data-type (n=86, 66, 66, and 462 for naRNA, 4sU 30min, 4sU 60min, and steady-state RNA-seq, respectively). Median for each data-type is labeled. (b) Splice junctions (arcs) overlapping the NUP42 gene illustrate approach (Supplemental Methods) for classifying splice junctions. Annotated splice donors and splice acceptors are marked with vertical dashed lines in dark and light gray, respectively. Annotated productive junctions are defined by their presence in at least one transcript with the value of ‘protein_coding’ in the Gencode transcript type tag. Unannotated productive junctions are not in any Gencode transcripts, and skip exons in the principal isoform such that the reading frame is maintained (that is, splice junction marked with 1). Annotated unproductive junctions are unique to Gencode transcripts not tagged with ‘protein coding’. Splice junction 2 is unique to NUP42-207, a ‘retained_intron’ tagged transcript. This splice junction uses a deep intronic 5′ss, creating a premature termination codon. Junctions 3 and 5 are unique to transcripts tagged as ‘nonsense_mediated_decay’, and junction 4 is unique to a transcript tagged with ‘processed_transcript’. All other junctions are classified as Unannotated unproductive. We attempted to translate the resulting transcripts that use these junctions, finding that they overwhelmingly introduce frameshift or in-frame stop codons (Supplemental Methods), such as the splice junction 6 which we predict to introduce a frameshift. (c) Similar to (b), where sample is represented as a column, grouped by dataset type, and the fraction of splice junction reads that are either productive (annotated or unannotated, classified as in (b), blue) or unproductive (annotated or unannotated, classified as in (b), red). The median in each dataset is marked with a dashed line and labeled.

Back to article page