Assessing the diagnostic impact of blood transcriptome profiling in a pediatric cohort previously assessed by genome sequencing

Hou, Huayun; Yuki, Kyoko E.; Costain, Gregory; Szuto, Anna; Barnes, Sierra; Ramani, Arun K.; Celik, Alper; Braga, Michael; Gloven-Brown, Meagan; Stavropoulos, Dimitri J.; Bowdin, Sarah; Cohn, Ronald D.; Mendoza-Londono, Roberto; Scherer, Stephen W.; Brudno, Michael; Marshall, Christian R.; Stephen Meyn, M.; Shlien, Adam; Dowling, James J.; Wilson, Michael D.; Kyriakopoulou, Lianna

doi:10.1038/s41525-025-00505-4

Download PDF

Article
Open access
Published: 01 July 2025

Assessing the diagnostic impact of blood transcriptome profiling in a pediatric cohort previously assessed by genome sequencing

Huayun Hou¹^na1,
Kyoko E. Yuki^1,2^na1,
Gregory Costain^1,3,4,5,
Anna Szuto^1,3,5,
Sierra Barnes²,
Arun K. Ramani⁶,
Alper Celik⁶,
Michael Braga²,
Meagan Gloven-Brown²,
Dimitri J. Stavropoulos^2,7,
Sarah Bowdin^2,8,
Ronald D. Cohn^1,4,5,9,
Roberto Mendoza-Londono^3,4,
Stephen W. Scherer^1,5,10,
Michael Brudno^1,11,12,13,
Christian R. Marshall^2,7,
M. Stephen Meyn^14,15,
Adam Shlien^1,2,7,
James J. Dowling^1,5,16,
Michael D. Wilson^1,5 &
…
Lianna Kyriakopoulou^2,7

npj Genomic Medicine volume 10, Article number: 51 (2025) Cite this article

Subjects

Abstract

Despite advances in genome sequencing, many individuals with rare genetic disorders remain undiagnosed. Transcriptional profiling via RNA-seq can reveal functional impacts of DNA variants and improve diagnosis. We assessed blood-derived RNA-seq in the largely undiagnosed SickKids Genome Clinic cohort (n = 134), which has been subjected to multiple analyses benchmarking the utility of genome sequencing. Our RNA-centric analysis identifies gene expression outliers, aberrant splicing, and allele-specific expression. In one-third of diagnosed individuals (20/61), RNA-seq reinforced DNA-based findings. In 2/61 cases, RNA-seq revised diagnoses (EPG5 to LZTR1 in an individual with a Noonan syndrome-like disorder) and discovered an additional relevant gene (CEP120 in addition to SON in an individual with ZTTK syndrome). Additionally, ~7% (5/73) of undiagnosed cases had at least one plausible candidate gene identified. This study highlights both the benefits and limitations of whole-blood RNA profiling in refining genetic diagnoses and uncovering novel disease mechanisms.

Introduction

Exome and genome sequencing (GS) technologies have transformed our ability to identify causal genetic variants and make diagnoses for rare diseases (e.g¹.; reviewed by refs. ^2,3). Despite this progress, more than half of individuals with a genetic disorder remain without a diagnosis. This can be partly attributed to our limited understanding of the downstream consequences of genetic variation and the lack of functional evidence for its interpretation. RNA sequencing (RNA-seq) has the potential to identify clinically relevant aberrations and provide a high throughput functional platform for those variants related to RNA expression levels and/or RNA processing.

Several studies have used transcriptome profiling to assess its potential diagnostic utility in rare disorders^{4,5,6,7,8,9,10,11,12,13,14,15}. For example, RNA-seq on muscle biopsies identified molecular diagnoses in up to 35% of individuals with neuromuscular disorders^8,9,12. RNA-seq using fibroblasts yielded a diagnostic rate of 10–16% in individuals with mitochondrial diseases^13,14. Fresard et al. (2019) demonstrated that RNA-seq in whole blood increased the absolute diagnostic rate by 7.5% in a phenotypically heterogeneous cohort with suspected rare diseases¹¹. While these studies support the ability of RNA-seq to improve diagnostic rates, there are few studies benchmarking the RNA-seq yield when performed as an adjunct to genome sequencing¹⁰.

In this study, we assessed the utility of RNA-seq for decreasing uncertainty and increasing the yield of GS-based diagnoses. To do this we performed RNA-seq on RNA isolated from peripheral blood samples donated by children and youth with suspected genetic disorders who enrolled in the Genome Clinic at the Hospital for Sick Children (Toronto, ON, Canada). This longitudinal study was designed to test the diagnostic and predictive use of GS^16,17. The participants presented with a spectrum of complex clinical phenotypes such as epilepsy, global developmental delay and multiple congenital anomalies and had previously been tested by standard of care genetic testing. After genome sequencing and subsequent reanalyses, the diagnostic rate was 41%^16,17,18. Given the clinical work up and intense focus on genome analysis in this cohort, we reasoned that performing RNA-seq on all individuals who provided blood samples would be an important and unique opportunity to examine the concordance between genome-based diagnoses and RNA-seq as well as ask whether new potential candidate disease genes related to the individual’s phenotype could be revealed by RNA-seq. In this study we focused on two clinically meaningful scenarios, where we asked if transcriptomic data could: (1) provide functional evidence for several classes of suspected/known diagnostic genetic variants identified by GS; and (2) identify plausible candidate genes for individuals for whom GS was initially non-diagnostic.

Results

The cohort for this study consisted of all individuals from the SickKids Genome Clinic for whom a primary blood sample was available (84/100 from Stavropoulos et al. 2016¹⁶ and 50/103 from Lionel et al.¹⁷). This provided us with a cohort of 134 participants (75 males and 59 females) between 0 and 18 years of age at the time of sample collection (median 6 years; Supplementary Data 1). These individuals displayed a wide array of symptoms described by 453 unique HPO terms across the cohort. The most observed phenotypes were epilepsy (29%), global developmental delay (25%), multiple congenital abnormalities (19%), connective tissue (5%), eye (6%), behavior/cognition - other (3%), neurological - other (2%), immune (2%), cardiovascular (1%) and metabolic including mitochondrial (9%); Table 1 and Supplementary Data 2).

Table 1 Transcriptional impact of disease genes detected in blood

Full size table

Based on prior published analyses, we split this cohort into two groups: Group 1 consisted of 61 individuals (35 male and 26 female) who received a diagnostic DNA variant(s) in the initial GS-based studies^16,17,18. These variants were classified as pathogenic/likely pathogenic as per ACMG guidelines¹⁹ and were considered causative of the patient’s clinical phenotype. Group 2 consisted of the remaining 73 individuals (40 male and 33 female), who had yet to receive a molecular genetic diagnosis. The logic behind partitioning this well studied cohort into two groups was that it would allow us to: (a) assess the relevance of RNA-seq alone in identifying/corroborating existing diagnoses (Group 1); and (b) suggest genes and possible pathogenic variants related to the primary indication for referral in the unsolved cases (Group 2).

Developing and applying a scalable and reproducible RNA-seq library preparation method to 134 whole blood samples

Compared to genome sequencing, considerable technical variability can occur when performing RNA-seq. Our long-term objective is to establish a clinically validated RNA-seq procedure that can run in our clinical lab so that RNA-seq samples prepared over extended periods of time will show minimal batch effects. We established a semi-automated RNA-seq library preparation protocol with a workflow suitable for clinical laboratory testing. The sample preparation protocol included synthetic spike-in controls that allow us to assess the quality of the library preparation (ERCC + SIRV controls; see Methods and Fig. 1A). We chose not to pursue globin depletion strategies with the rationale that this would increase the variability in the library preparation step. Instead, we implemented deeper sequencing of libraries to ensure sufficient gene detection in samples where globin mRNA is high.

Using RNA-seq to reveal the impact of variants detected in disease genes obtained through prior genomic testing

Despite blood samples having been archived for 3-5 years, we obtained RNA libraries suitable for subsequent sequencing for all samples. The resulting RNA-seq libraries were sequenced to a median of 116 million reads with a median of 78% of the reads mapped uniquely to the hg19 genome. We detected a median of 12,071 genes (TPM > 1) and 176,284 junctions (>=5 reads supporting a junction) per sample across our cohort. A full list of quality control metrics for each sample can be found in Supplementary Data 1.

**Fig. 1: Effects of pathogenic or likely pathogenic SNVs and indels on gene expression.**

We focused on the 61 individuals for whom genomic variants in the initial studies and subsequent re-analysis had been considered as receiving a diagnosis^16,17,18 (Group 1). These 61 cases were comprised of 52 cases with SNVs or indels, six cases with structural variants (SVs)/copy number variants (CNVs), and three cases with both candidate SNVs/indels and SV/CNVs. In 39 of the 61 cases, the disease-causing genes underlying the diagnosis were detectable in our blood RNA-seq data. RNA-seq revealed that in 20/39 cases the variants influenced transcription (altered expression, splicing, or allele-specific expression; Table 1, Fig. 1B) and the pathogenic classification of the DNA variants, as per ACMG guidelines, remains unchanged.

RNA expression changes in previously identified disease genes

We examined the effect of SNVs or indels on gene expression for genes that we detected by whole blood RNA-seq. This variation included missense variants, splice site variants, frameshift or nonsense variants (Fig. 1C, Table 1, Supplementary Data 2). We found decreased gene expression levels in genes from 9 cases, including 5 cases with nonsense or frameshift variants associated with autosomal dominant (AD) Infantile Epileptic Encephalopathy with Severe Global Developmental delay (PPP3CA), autosomal recessive (AR) Alazami syndrome (LARP7), AD Mental Retardation Autosomal Dominant 19 (CTNNB1) and AR Diarrhea 10, protein-losing enteropathy type (PLVAP), AD ZTTK syndrome (SON) (Fig. 1C–F, Table 1).

RNA-seq also gave us an opportunity to assess the gene expression consequences of missense variants. While it is typically assumed that missense variants would not overtly influence gene expression levels, genome sequencing alone cannot determine this. We found that 19/21 of the diagnosed cases with a missense variant showed no obvious effect on gene expression. The genes from the two participants with a diagnostic missense variant showed decreased gene expression (z-score < -3) were SLC25A19 (Case 11) and BBS1 (Case 38) (Fig. 1C, Table 1). The SLC25A19 result can be explained by the fact that in addition to a missense variant, the individual also possessed a heterozygous deletion that removes part of the last intron and the last exon of SLC25A19 (Table 1). Case 38 did not have an initial explanation for the observed reduced BBS1 expression, and the clinical data we used^16,17,18 only reported the heterozygous pathogenic missense variant (NM_024649.5) c.1169 T > G (p.Met390Arg). However, additional consultation revealed that a subsequent genome reanalysis study at SickKids independently determined that a retrotransposon insertion occurred in trans to the heterozygous pathogenic missense variant, resulting in a premature termination codon²⁰. Our RNA-seq analysis pipeline does not readily allow for the automated detection of such insertions and deletions. We did not initially flag these reads for further inspection due to the low expression of BBS1 in blood. Manual inspection of the RNA-seq reads identified reads mapping to the exon and the retrotransposon sequence (Supplementary Fig. 2).

Evidence for RNA splicing changes in previously identified disease genes

We observed splicing-related consequences in five individuals:

Case 46: aberrant splicing was present in an individual with a suspected mitochondrial disorder carrying a homozygous deletion occurring in the splice donor site of exon 5 of the DNAJC19 gene (NM_145261.3) c.280+1_280+5delGTAAG which is associated with 3-Methylglutaconic Aciduria Type V. Interestingly, this variant results in two novel junctions, one causing the skipping of exon 5 and the other causing the skipping of both exon 4 and 5 (Fig. 1E). Both events are predicted to result in a frameshift. In addition, reduced expression of DNAJC19 gene was also observed (Fig. 1D). While this variant was predicted to affect a canonical splice site, RNA-seq demonstrates a functional consequence of this deletion in vivo.

Case 59: aberrant splicing was confirmed in a case of Infantile Sialic Acid Storage disease where two rare variants in trans (confirmed by parental studies) were identified in the SLC17A5 gene (NM_012434.4) c.819+1 G > A; c.291 G > A p.(Thr97 = ). Both variants were predicted to influence splicing by in silico predictions. RNA-seq analyses also confirm that aberrant splicing occurs and showed skipping of exon 2 (c.291 G > A p.Thr97 = ) and exon 6 (c.819+1 G > A p.?) (Fig. 1F). Both events are predicted to result in a frameshift. Moreover, these aberrant splicing events were associated with reduced levels of the SLC17A5 transcripts suggesting the occurrence of nonsense mediated decay (NMD) (Table 1; Fig. 1D).

Case 1008: a rare heterozygous variant in the SMARCB1 gene (NM_003073.3) c.364del (p.Glu122Asnfs*21) was previously identified in an individual clinically diagnosed with Coffin-Siris syndrome. This variant was predicted to cause a frameshift in this haploinsufficient gene. Our RNA-seq analysis did not reveal a significantly reduced expression of this gene (z-score: -1.53, fold change: 0.89, p value: 0.12). However, RNA splicing analysis showed a decreased usage of exon 4 via exon skipping (Supplementary Fig. 3A). Importantly c.364del (p.Glu122Asnfs*21) occurs adjacent to the splice acceptor site, two base pairs from the start of exon 4. Skipping of exon 4 results in an in-frame truncation of the SMARCB1 protein, the functional consequences of which remain unknown.

Case 1016: a homozygous frameshift deletion was identified in exon 2 of the PANK2 gene (NM_153638.2) c.824_825del (p.Cys276Trpfs*15)) in an individual diagnosed with Neurodegeneration with Brain Iron Accumulation-1. We observed 88% (65/74 reads) of transcripts spliced to the short exon 2 of ENST00000336066, compared to only an average of 55% (152/272 reads) in ten randomly selected controls. Although short-read RNA-seq does not allow accurate measurement of isoform expression, estimation using RSEM suggested an increased usage of ENST00000336066 (65% vs. 52% cohort median). Long-read RNA-seq would likely help resolve the change in isoform usage²¹. While we only observed a small effect on the overall gene expression level (z-score: -1.97, fold change: 0.88, p value: 0.07; Supplementary Fig. 3B), the decreased usage of the transcript expressing the frameshift is likely due to NMD. The shift in usage to an isoform that skips the frameshift may be a compensatory mechanism for maintaining levels of PANK2 expression. The impact at the protein level remains to be seen.

Case 1049: a heterozygous canonical splice site variant was identified in intron 6 of the NSD1 gene (NM_022455.4) c.3922-1 G > C (p.?), which is associated with Sotos Syndrome. Our RNA-seq analysis detects two aberrant splicing events: The first involves skipping of exon 7 and increased usage of the resulting isoform (ENST00000375350), which is a frameshift splice isoform annotated as an ‘NMD transcript’ in the Ensembl database. However, our data shows that the NSD1 variant increased the fraction of this isoform. Overall, the levels of NSD1 expression were slightly higher in the proband relative to controls. Supporting this observation, a recent genome-wide screen for NMD escaping transcripts also identified ENST000000375350 as a NMD escaping frameshift variant²⁰. The second splicing event involves the use of an alternative splice acceptor site 5 bp into exon 7 and is predicted to result in a frameshift (Supplementary Fig. 3C). The detection of the two aberrant transcripts suggests that they may be stable enough to be translated into a truncated protein with little residual or altered function.

Evidence for allele-biased gene expression in previously assigned disease genes

We observed allele-biased expression in two individuals:

Case 86: A heterozygous missense variant in gene SLC35A2 (NM_005660.1: c.991 G > A (p.Val331Ile)), associated with X-linked dominant Congenital disorder of glycosylation type I, showing allelic imbalance favoring the reference allele (45/69 reads, 76%). This could be due to skewed X-inactivation in different tissues, as has been previously reported²².

Case 1103: The two variants in trans in the VPS53 gene (NM_018289.3) a c.1429 C > T (p.Arg477*) and a c.1716 T > G (p.Ser572Arg), are associated with Pontocerebellar hypoplasia, type 2E. We observed allelic bias towards the p.Arg477 reference residue 392/51 reads; 76%) and the p.Ser572Arg alternative allele (39/51 reads; 76%), respectively. This suggests that NMD of transcripts harboring the nonsense variant is occurring.

Detecting gene expression outliers at disease-associated CNVs

CNVs contribute significantly to rare diseases in human (e.g.^23,24,25,26 and the effect of a CNV on gene expression can be detected in the general population, cancer and in rare diseases^{27,28,29,30,31,32,33,34}. We examined the nine cases that carried diagnostic or partially diagnostic CNVs and asked if genes overlapping these CNVs show expression changes. In seven cases, these CNVs overlap one or more genes that are sufficiently detected in our cohort. Among these cases, six harbored heterozygous deletions that range from 4 kb to 1.3 Mb, overlapping 3 to 42 genes expressed in blood. In all six cases, we observed decreased expression of genes within the deleted regions as indicated by the decreased average z-score (compared to the rest of the cohort) of their expression (Fig. 2A). However, despite the consistent gene dosage change, the expression of genes within the same CNVs were not impacted uniformly (see Supplementary Data 3 for a list of outlier genes within CNVs for each sample). One individual with 22q11.2 deletion syndrome harbors the well-characterized heterozygous ~3MB deletion on chromosome 22 overlapping 137 genes. We focused on the expression of 34 protein-coding genes in this region and found that 24 of them showed decreased expression (z-score <= -3) with fold changes ranging from 0.37 to 0.69 (Fig. 2B, C, Supplementary Data 3).

**Fig. 2: Modulation of gene expression by CNVs.**

Finally, the last CNV case (Case 1090) is an individual that carried complex X chromosome rearrangements (arr Xp22.33q21.32(60,701-91,873,757)x3, Xq21.32q28(91,877,172-155,174,078)x1) shows congenital anomalies, global developmental delay, severe intellectual disability, seizures, blindness, and deafness (Table 1). Additional clinical testing (G-banding and FISH analyses) performed on this individual indicates the presence of one normal X chromosome and one abnormal X chromosome. The abnormal X chromosome is an isodicentric X, which contains two copies of Xp22.33 to Xq21.3 with deletion of Xq21.3 to Xq28. No X chromosome material was located on another chromosome in the genome. Previous studies have shown that despite X chromosome copy number variations, most differentially expressed genes in individuals with Turner Syndrome are on autosomes (reviewed in ref. ³⁵). To control for X chromosome copy number, we performed gene expression outlier analysis with OUTRIDER using female samples (n = 58) only. We first confirmed that this reanalysis did not change the observation of CNV-overlapping genes in the other cases (Supplementary Fig. 4A). Next, we re-examined gene expression changes in this individual. Consistent with previous reports, most genes with expression changes in case 1090 are not X-linked (7 X-linked genes /127 genes with absolute z-score > =3 or adjusted p value < 0.05, Supplementary Data 4). We found that X-linked genes within the duplicated region showed higher expression relative to the rest of the cohort, especially for p-arm genes distal to the centromere (Supplementary Fig. 4B, C). In contrast, the expression of genes within the deleted region is comparable to controls (Supplementary Fig. 4B, C), which could be explained by skewed X inactivation, which is known to occur with large structural changes on X chromosome³⁶.

Identifying splicing and expression outliers in a cohort with no previous molecular findings

There were 73 participants who despite traditional testing and reanalysis remained undiagnosed^16,17,18. The clinical findings in this cohort included global developmental delay, epilepsy, and multiple congenital anomalies, connective tissue abnormalities, cardiomyopathy, suspected metabolic disorder, and ophthalmological or immune system abnormalities (see Supplementary Data 2). Using the information obtained from our RNA-seq analyses and discussions with clinicians, we identified plausible candidate genes related to the phenotype in five cases (Table 2):

Table 2 Candidate genes and additional variants detected in 7 case

Full size table

Case 18: Decreased expression of the UGDH gene, which is associated with AR Developmental and Epileptic Encephalopathy 84, was identified as an extreme outlier (z-score –4.15, fold change 0.62, p value 7.50e−05) in a male with global developmental delay and epileptic encephalopathy (Fig. 3A). The UGDH protein is involved in the biosynthesis of extracellular matrix components and loss-of-function variants result in epileptic encephalopathy with variable degrees of developmental delay³⁷. No variant was detected in the GS data.

**Fig. 3: Aberrant splicing events and expression of candidate genes identified using RNA-seq.**

Case 91: Decreased expression of the PPP2CA gene (Protein Phosphatase 2, Catalytic Subunit, Alpha Isoform associated with AD Houge-Janssens syndrome 3) was identified as an outlier (z-score: −3.51, FC: 0.88, p value: 6.97e−4) in a female with global developmental delay, autism spectrum disorder, epilepsy, atrial septal defects (Fig. 3A). No variant was detected in the GS data.

Case 1035: Splicing outlier analysis identified increased usage of exon 9 of the NFIX gene (NM_002501) in the blood sample obtained from a male with constitutional overgrowth, autistic spectrum disorder, pectus excavatum, and advanced bone age (Fig. 3A,B). NFIX encodes a transcription factor and is associated with AD Marshall-Smith syndrome and Malan syndrome (Sotos syndrome 2), and is highly expressed in the brain cerebellum. No variant was detected in the GS data.

Case 1085: Splicing and expression analysis identified the PTK2B gene as a candidate gene due to an intron retention event in intron 27 in a male with features of autism, behavioral issues, obesity, epilepsy and growth hormone deficiency. This intron retention is predicted to result in a frameshift (NM_004103.4) (Fig. 3A, C). The PTK2B gene showed significantly increased expression (z-score 5.97, fold change 1.68, p value 1.56e−10). A re-examination of the GS data revealed a novel variant at +1 splice donor site of intron 27 (c.2523+1 G > A) that explains the intron retention. The function of the PTK2B gene is associated with signaling neuropeptide-activated receptors or neurotransmitters and may also provide a mechanism for a variety of short- and long-term calcium-dependent signaling events in the nervous system^38,39. PTK2B has been identified as a risk factor for Alzheimer’s disease by multiple studies, and the mechanism of some susceptibility alleles has been attributed to splicing alterations^40,41,42. One Alzheimer’s Disease-risk allele (rs28834970-C) present in this individual is associated with higher protein expression of PTK2B in monocytes⁴³. In the GTEx whole blood expression dataset, rs28834970-C is also associated with moderately increased expression of PTK2B⁴⁴. This raises the possibility that the individual’s phenotype is the result of two independent molecular events where the common disease-associated SNV (rs28834970-C) upregulates the expression of the aberrant frameshift containing transcript. Since it was not possible to phase these variants, the relevance of this proposed mechanism remains to be seen.

Case 1101: Decreased expression of the B4GALT2 gene (UDP-GAL: beta-GlcNac beta-1,4-galactosyltransferase, polypeptide 2) gene was found in blood isolated from a male with profound global developmental delay, spastic quadriplegia and acute rhabdomyolysis (z-score: −3.40, FC: 0.45, p value: 8.30e−4) (Fig. 3A). Although the B4GALT2 gene shares 55% amino acid identity with the B4GALT1 gene, which is associated with Congenital disorder of glycosylation, type IId, the role of B4GALT2 in disease is not known, and further analysis will be required to understand the possible role of the aberrant expression of the gene and the role it may play in relation to the phenotype of this individual. No variant was detected in the GS data. Overall, the application of RNA-seq to cases with no DNA findings resulted in candidate genes related to participant phenotypes in 5 out of the 73 cases.

We identified two cases where RNA-seq provided an alternative diagnosis or additional variants of interest:

Case 87: In a male with history of hypertrophic cardiomyopathy, short stature, developmental delay, and distinctive facies in keeping with Noonan syndrome, we detected increased usage of a 117 bp exon within intron 16 of the LZTR1 gene which is predicted to result in an in-frame insertion of 36 amino acids (NM_006767.4; Fig. 4A). Within this exon, we also detected an alternative splice acceptor site that is predicted to result in a premature stop site (Fig. A). We identified a homozygous rare deep intronic variant c.1943-351 G > C that is likely the cause of these two aberrant splicing events. All of the RNA-seq reads aligning to this exon originated from this variant. Although an EPG5 variant had been detected by GS (Supplementary Data 2), this LZTR1 finding was a stronger match to the individual’s clinical presentation and was considered diagnostic.

**Fig. 4: Aberrant splicing events and expression of candidate genes identified using RNA-seq.**

Case 1058: We detected a significant decrease in the expression of the SON gene (z-score −3.50, fold change 0.80, p value 6.00e−04) and no effect on splicing in blood obtained from a male who presented with global developmental delay, hypoplastic toes, syndactyly and macrocephaly (Fig. 4B). We also detected the SON (NM_138927.2) c.3476del (p.Pro1159Arg*9) variant, associated with AD ZTTK syndrome, which was independently identified during DNA re-analysis of the case¹⁸. This demonstrates the ability of an RNA-first approach to detect clinically relevant RNA events even if the DNA variant is not known at the time of analysis. We also identified another candidate gene CEP120, which is associated with Joubert syndrome⁴⁵, with a novel exon between exons 18 and 19 (Fig. 4B) and decreased expression (z-score −4.54, fold change 0.76, p value 4.15e−7). No DNA variants were identified in the GS analysis. However, it has been demonstrated that a 57 bp deletion overlapping a pair of Tigger4 transposons within intron 17 causes expression of this novel exon⁴⁶. We see evidence of this deletion in the RNA-seq data, but the zygosity is unclear (Supplementary Fig. 5). The SON protein regulates the splicing of centriolar protein genes, including CEP131⁴⁷. It is possible that SON also regulates the splicing of CEP120, together with the CEP120 intronic deletion, resulting in the observed reduction in gene expression and potentially contributing to the complexity of the individual’s phenotype.

Discussion

In this study, we benchmarked the diagnostic utility of RNA-seq in a well-studied pediatric rare disease cohort (SickKids Genome Clinic) that was established for assessing the clinical utility of genome sequencing. This unique cohort allowed us to perform a follow-up assessment of blood RNA-seq focussing on two clinical testing scenarios. In the first scenario, RNA-seq was used to provide functional data for variants that had been identified by GS and were considered diagnostic for the participant’s phenotype. Using RNA expression as a functional readout of genetic changes is a rationale and practical way to help confirm diagnoses and suggest pathogenic mechanisms. Assigning a molecular mechanism to a genetic disease diagnosis is a requisite for bespoke genetic therapy development⁴⁸. In the second scenario, we aimed to identify candidate genes using a transcriptome-first analysis. Our work showcases a variety of cases where RNA-seq effectively facilitates the diagnosis and will contribute to the establishment of a clinical guideline.

In the first scenario, when comparing the RNA-seq data to in silico prediction algorithms (SpliceAI for splicing impact, Supplementary Data 2), the results are concordant for the majority (29/34 cases) of the known pathogenic or likely pathogenic SNVs and indels analyzed. The five cases that appeared to be discordant benefited from a manual inspection of the RNA-seq data. First, we predicted that the homozygous c.2600_2601delTA p.(Leu867*) [NM_017733.3] variant in the PIGG (Case 64) gene would result in decreased expression. We only observed moderately decreased transcript levels (z-score −2.34, fold change 0.88, p value 0.028). While this decreased expression is suggestive of NMD, the gene expression reduction did not pass our cutoff filters. This illustrates the impact that gene expression cut-offs and filtering algorithms will have on potentially biologically complex, yet still meaningful observations. A combined approach of lenient cutoffs for well-studied genes and more stringent cutoffs for novel disease genes could help alleviate this issue.

Next, one missense variant, (Case 1029: PIK3R2 c.1117 G > A (p.Gly373Arg) [NM_005027.4]), 7 bp from the splice acceptor sites was computationally predicted to cause an acceptor loss 2 bp from the variant while RNA-seq showed no impact. Upon closer examination, the predicted loss does not occur at a splice site used in the population and is thus likely a false positive prediction. Similarly, SpliceAI failed to predict the splicing impact of the PANK2 (Case 1016) variant. However, another tool, Pangolin⁴⁹, predicted the alternative splice site usage (a donor loss (score = 0.28) and splice donor gain (score = 0.19)) observed with the RNA-seq data. In addition, the frameshift variant in the SMARCB1 (Case 1008) gene was not predicted to be splice-altering but we showed a splicing impact. Finally, in Case 46, SpliceAI only reported the most proximal splicing change (the skipping of exon 5) and did not predict the more complicated event (skipping of exon 4). While our analyses show that in silico approaches can fail to predict the specific transcriptional outcome, getting a consensus result from multiple tools will improve the performance of computational prediction. RNA-seq data provides individual-specific, empirical information on specific splicing mechanisms (exon skip, intron retention), which can be useful even when canonical splice site variants are encountered. This observation echoes the conclusions from other published work⁵⁰ and highlights the need for obtaining more pediatric RNA-seq data to improve in silico RNA splicing prediction algorithms.

Our study also highlights the value of using RNA-seq in cases with large CNVs identified by CMA. Consistent with existing studies, we showed that gene expression changes cannot be accurately predicted from gene copy number (identified by array or GS) alone^{27,28,30,32,33}. The examples that we present show that RNA-seq provides information on dosage compensation at the transcriptional level for individual genes within CNVs and complex chromosomal rearrangements. For example, decreased dosage of a combination of genes in individuals with 22q11 deletion syndrome is attributed to the complex and variable 22q11 deletion syndrome phenotypes^51,52. By measuring the expression change of every gene in an unbiased manner, RNA-seq could provide further insights into the pathogenicity of CNVs and pinpoint sets of genes contributing to inter-individual differences in phenotype. These data will be relevant for the more complex phenotypes and add a nuanced interpretation to genome sequencing findings. However, additional information on gene expression and epigenomic features in disease-relevant and developmental stage-specific samples will be needed to understand the pathogenicity of sets of genes affected by CNVs^27,33,53.

Using an RNA-centric approach on the second cohort of individuals who did not receive a genetic diagnosis by GS or CMA, we identified aberrant splicing or expression events related to the primary clinical indication in ~10% (7/73) of the cases, including a diagnostic variant in LZTR1 (case 87) before a DNA variant had been found, underscoring the ability of RNA-seq to improve genome diagnosis. However, it remained challenging to identify a causal DNA variant leading to these clinically relevant RNA phenotypes, especially for gene expression outliers. The yield seen in this RNA-centric approach may be the result of DNA variants that are poorly annotated, for example, regulatory, intronic or intragenic areas of the genome, or structural changes not identified by short-read sequencing.

Both the genomic and transcriptomic analyses focused on genes with known disease or HPO associations, which will limit our ability to identify molecular changes in novel genes. During this study our clinical lab utilized the hg19 as the reference genome, and we acknowledge the importance of reanalysis on newer genome assemblies. For our study anticipate a small impact of a genome assembly update, as none of the genes we identified in this manuscript were among those shown to be impacted by differences in genome build (hg19 vs hg38⁵⁴). Additionally, our splicing pipeline examines de novo identified junctions independent of gene annotations. However, it is still possible that an updated genome build, and updated disease gene annotations could reveal additional diagnoses. As a first step to rule out changes, we processed all RNA-seq samples using hg38 (Ensembl gene annotation v104). We re-examined all outlier gene expression events using OUTRIDER and reproduced Fig. 1C. Despite slight differences in z scores and p values, the overall findings remained the same (Supplementary Fig. 6).

Our study highlights some of the challenges facing the routine use RNA-seq in a clinical genetic setting identified in previous studies^{4,5,6,7,8,9,10,11,12,13,14,15}. Firstly, using blood as a proxy tissue for non-blood related clinical indications may fail to detect gene expression levels and cell-type specific splicing. Even in the cases where robust gene expression was detected in blood, different epigenetic processes in the disease-relevant cell type could change the outlier gene expression and splicing results, which in turn could change the clinical interpretation. More sensitive targeted technologies such as targeted PCR-based approaches (e.g., ref. ⁷), long-read RNA-seq approaches^21,55, as well as trans-differentiation of accessible cell types from blood or skin into more disease-relevant tissue types (e.g., ref. ⁵⁶) will help to overcome some of these challenges. While we have used our own cohort as a pediatric control population to minimize technical and biological differences, subtle variations may have escaped the identification of outliers under a fixed cutoff. Since the number of controls has a strong impact on gene expression outlier analysis¹¹, it is critical to collect age-, sex-, and tissue-matched control cohort that is processed using a uniform experimental and computational pipeline. Building and sharing databases of high-quality pediatric RNA expression data is a realistic solution to achieve this.

In conclusion, our data show the utility of peripheral blood RNA-seq in assessing the genetic underpinnings of a diverse range of disorders. By utilizing a clinical-grade, automated version of a standard RNA-seq library preparation methodology on whole blood samples, we show that RNA-seq is sufficiently sensitive and reproducible to provide functional evidence for variants identified by genome sequencing. In this context, our results support using RNA-seq diagnostics in parallel with genomic sequencing. Such benefits could include increasing confidence in, and reducing the time spent reporting GS findings. An immediate application for RNA-seq is to provide functional evidence for variants currently classified under ACMG guidelines as variants of uncertain significance (VUS), which may lead to re-classification to likely benign/benign or likely pathogenic/pathogenic. Furthermore, providing critical clarification of variant impact, a critical step for assessing and designing individualized genetic medicines⁴⁸. However, from a health economics perspective, it is unclear if a dual DNA/RNA testing regime is financially justified and how it compares to reserving RNA-seq for individuals for whom GS did not result in a conclusive finding. Nevertheless, expediting the production of a comprehensive and diverse reference set of pediatric blood gene expression profiles will enhance RNA-based genetic diagnoses and enable new algorithms for precision child health.

Methods

Cohort description

The cohort in this study was derived from the SickKids Genome Clinic, which consisted of two studies that collected samples over a 2-year period (April 2013 to June 2015). The participants presented with a spectrum of complex clinical indication,s including epilepsy, global developmental delay, and multiple congenital anomalies, which represent the most common indications for genetic testing. The SickKids Genomic Clinic participants first underwent genetic testing that met the standard of care provided at the time and were subsequently examined using genome sequencing. The first study cohort (which compared the diagnostic rate of genome sequencing to traditional chromosomal microarray) involved 100 children with suspected genetic disorders¹⁶. The second cohort (which compared genome sequencing to targeted gene panels) involved 103 children¹⁷. Variants were classified using ACMG guidelines¹⁹. In silico predictions for variants impact were generated using Ensembl VEP and SpliceAI, which was accessed through SpliceAI lookup (https://spliceailookup.broadinstitute.org/). SpliceAI prediction results are included in Supplementary Data 2.

Research ethics

This study was approved by the Research Ethics Board at The Hospital for Sick Children, and informed consent was obtained for all participants (REB number 1000037726). Case IDs are de-identified and are not known to anyone outside the research group. We have complied with all relevant ethical regulations, including the Declaration of Helsinki.

Individual’s phenotypic data used in this study, to prioritize splicing and differential expression events, were previously captured in PhenoTips (http://www.phenotips.com)⁵⁷. Phenotypic information is represented using the Human Phenotype Ontology (HPO)⁵⁸. Participant data also included data regarding molecular genetic testing, including single gene or gene panels, extracted from the electronic medical records. Chemistry tests (blood and urine), enzymatic studies, muscle biopsies, and medical imaging, when available, were also included as previously described by both Stavropoulos et al.¹⁶ and Lionel et al.^16,17.

RNA isolation, library preparation, and sequencing

RNA from the 134 probands was extracted from blood collected in PAXGene tubes using an automated QIAsymphony PAXGene blood RNA kit (Qiagen). RNA was quantified using the Qubit Fluorometer (Thermo Fisher Scientific) High Sensitivity Assay, and sample purity was checked using the Nanodrop (Thermo Fisher Scientific) OD 260/280 ratio. Following the manufacturer’s recommended protocol, 250 ng of total RNA spiked with the Spike-in RNA variants of the isoform (SIRV) Set 3 (Lexogen), such that spike-ins were ~1% of the mRNA, were used as input material for library preparation. SIRV set3 includes 92 ERCCs and 69 SIRV isoforms corresponding to seven SIRV genes, allowing the assessment of an RNA-seq platform’s ability to detect a range of RNA concentrations as well as isoform complexity. Libraries were prepared using an automated NEBNext Ultra II Directional Library Prep Kit for Illumina with polyA isolation at Genome Diagnostics (Department of Laboratory Medicine, The Hospital for Sick Children). Libraries were assessed using the Bioanalyzer DNA High Sensitivity chips (Agilent Technologies, Santa Clara, CA) and quantified by quantitative polymerase chain reaction using the Kapa Library Quantification Illumina/ABI Prism Kit protocol (KAPA Biosystems, Roche, Basel, Switzerland). Libraries were pooled in equimolar quantities and paired-end sequenced on an Illumina NovaSeq6000 system, following Illumina’s recommended protocol, to generate paired-end reads of 150 bases in length at The Center for Applied Genomics, The Hospital for Sick Children, Toronto, Canada. A median of 116 million paired-end reads per library was obtained.

RNA-seq data analysis and annotation

A customized RNA-seq processing pipeline was used for read alignment, quality control (QC), identification of expression and splicing aberrations, and variant calling. Raw sequencing reads were aligned using STAR⁵⁹ (v2.6.1c) to the combined genome of hg19 (1000 genomes reference genome, hs37d5) and the spike-in sequences (SIRVome, SIRV set3). Gene annotation was obtained from Ensembl GRCh37 assembly archive (v87) and was combined with SIRVome transcript annotation obtained from the manufacturer (https://www.lexogen.com/sirvs/download/). Additional gene-disease association annotations were obtained from various sources: OMIM gene accession IDs and disease descriptions (Ensembl v87); HPO gene-disease association (https://hpo.jax.org/data/annotations); numbers of ClinVar variants for each gene (https://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/gene_specific_summary.txt); and Orphanet (http://www.orphadata.org/) in November 2019. Fastqc⁶⁰ (v0.11.5), RNA-seQC⁶¹ (v2.3.5), Picard (https://broadinstitute.github.io/picard/) (v2.18.0), Markduplicates and CollectRnaSeqMetrics, supplemented with customized script,s were used to collect various QC metrics. Gene and transcript expression level quantification was performed using RSEM⁶² (v1.2.22).

Gene expression outlier analysis

Gene expression outliers were identified by comparing each sample to the rest of the samples in the cohort using R package OUTRIDER⁶³ (v 1.8.0). First, to minimize the effect of variable hemoglobin content, we removed nine hemoglobin genes (HBB, HBD, HBG1, HBG2, HBZ, HBM, HBA2, HBA1, HBQ1). Next, gene read counts estimated by RSEM were filtered for lowly expressed genes (only genes with a 95th percentile RPKM >1 were used). OUTRIDER results for samples 230373, 230577, and 239099 were not analyzed as they were identified as outliers after PCA analysis using OUTRIDER normalized counts (Supplementary Fig. 1). Subsequent examination showed these samples have suboptimal library quality. For each sample, genes with either an adjusted p value <0.05 or an absolute z-score ≥ 3 were reported as expression outliers and further examined manually.

Splicing outlier analysis

To identify aberrant splicing events, we adopted the approach described by Fresard et al.¹¹. Briefly, we used junction quantification by STAR (*SJ.out file). Only junctions with more than 5 uniquely mapped reads were considered in the analysis. Next, we calculated a junction coverage score, defined as the number of reads mapping to a junction of interest divided by the total number of reads mapping to other junctions that share a splicing donor or acceptor site with the junction of interest. We then calculated the z-score of the junction coverage score for each junction within the study cohort. Junctions with an absolute z-score ≥ 2 were further examined.

In addition, we also applied the aberrant splicing module from the Detection of RNA Outlier Pipeline (DROP)⁶⁴ with default settings as a complementary approach to identify aberrant splicing events. Specifically, the DROP pipeline uses the R package FRASER⁶⁵, which assesses the statistical significance of aberrant splicing events in addition to reporting z-scores, providing additional criteria for prioritization. Splicing events were visualized using ggsashimi⁶⁶. Gene models were visualized using the R package ggtranscript⁶⁷ based on Ensembl hg19 (v87) gene annotation. Introns were shortened using the function shorten_gaps().

Allele-specific expression

Allele-specific expression (ASE) events were detected using a method adapted from the “MAE” module of the DROP pipeline. Briefly, for each sample, heterozygous SNPs identified from GS were first filtered for quality as previously described^16,17 and further filtered to retain variants that are only identified in one sample in our internal batch. “ASEReadCount” from GATK⁶⁸ (v4.0.1.2) was used to count the number of RNA-seq reads mapping to reference or alternative alleles of filtered SNPs. DESeq2⁶⁹ (v1.44) was then used to test for significant bias towards reference or alternative alleles. ASE events with an alternative allele read ratio >0.8 and adjusted p values < 0.05 were further examined.

Interpretation workflow

Splicing junctions were prioritized first using technical filters (e.g., z-scores, number of junction reads, and junction coverage ratio) followed by the biological significance of the genes containing the candidate junctions. Specifically, genes with HPO terms corresponding to each participant’s disease presentation were prioritized, followed by genes with any disease associations, and genes with no known disease associations. All splicing junctions that were considered clinically significant were visually inspected using the Integrated Genomics Viewer (IGV)⁷⁰. Expression outliers and ASE genes were prioritized using z-scores and the adjusted p-value for each sample. Similarly, genes with relevant HPO terms or disease associations were prioritized.

Data availability

The data supporting the current study was consented for research but not for sharing in public databases. Based on the results of our study this dataset will serve as pediatric blood expression control samples for rare disease research. Please contact the corresponding authors if you wish access to this data.

Code availability

The RNA-seq processing pipeline and scripts for analyzing data and generating plots have been deposited on Zenodo (10.5281/zenodo.14889940).

References

Wojcik, M. H. et al. Genome sequencing for diagnosing rare diseases. N. Engl. J. Med. 390, 1985–1997 (2024).
Article CAS PubMed PubMed Central Google Scholar
Wigby, K. M. et al. Evidence review and considerations for use of first line genome sequencing to diagnose rare genetic disorders. NPJ Genom. Med. 9, 15 (2024).
Article PubMed PubMed Central Google Scholar
Sullivan, J. A., Schoch, K., Spillmann, R. C. & Shashi, V. Exome/genome sequencing in undiagnosed syndromes. Annu. Rev. Med. 74, 489–502 (2023).
Article CAS PubMed PubMed Central Google Scholar
Wai, H. A. et al. Blood RNA analysis can increase clinical diagnostic rate and resolve variants of uncertain significance. Genet. Med. 22, 1005–1014 (2020).
Article CAS PubMed PubMed Central Google Scholar
Murdock, D. R. et al. Transcriptome-directed analysis for Mendelian disease diagnosis overcomes limitations of conventional genomic testing. J. Clin. Investig. 131, e141500 (2021).
Lee, H. et al. Diagnostic utility of transcriptome sequencing for rare Mendelian diseases. Genet. Med. 22, 490–499 (2020).
Article CAS PubMed Google Scholar
Bournazos, A. M. et al. Standardized practices for RNA diagnostics using clinically accessible specimens reclassifies 75% of putative splicing variants. Genet. Med. 24, 130–145 (2022).
Article CAS PubMed Google Scholar
Gonorazky, H. et al. RNAseq analysis for the diagnosis of muscular dystrophy. Ann. Clin. Transl. Neurol. 3, 55–60 (2016).
Article CAS PubMed Google Scholar
Gonorazky, H. D. et al. Expanding the boundaries of RNA sequencing as a diagnostic tool for rare mendelian disease. Am. J. Hum. Genet. 104, 466–483 (2019).
Article CAS PubMed PubMed Central Google Scholar
Deshwar, A. R. et al. Trio RNA sequencing in a cohort of medically complex children. Am. J. Hum. Genet. 110, 895–900 (2023).
Article CAS PubMed PubMed Central Google Scholar
Frésard, L. et al. Identification of rare-disease genes using blood transcriptome sequencing and large control cohorts. Nat. Med. 25, 911–919 (2019).
Article PubMed PubMed Central Google Scholar
Cummings, B. B. et al. Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Sci. Transl. Med. 9, eaal5209 (2017).
Kremer, L. S. et al. Genetic diagnosis of Mendelian disorders via RNA sequencing. Nat. Commun. 8, 15824 (2017).
Article CAS PubMed PubMed Central Google Scholar
Yépez, V. A. et al. Clinical implementation of RNA sequencing for Mendelian disease diagnostics. Genome Med. 14, 38 (2022).
Article PubMed PubMed Central Google Scholar
Jaramillo Oquendo, C. et al. Identification of diagnostic candidates in Mendelian disorders using an RNA sequencing-centric approach. Genome Med. 16, 110 (2024).
Article CAS PubMed PubMed Central Google Scholar
Stavropoulos, D. J. et al. Whole genome sequencing expands diagnostic utility and improves clinical management in pediatric medicine. npj Genom. Med. 1, 15012 (2016).
Lionel, A. C. et al. Improved diagnostic yield compared with targeted gene sequencing panels suggests a role for whole-genome sequencing as a first-tier genetic test. Genet. Med. 20, 435–443 (2018).
Article CAS PubMed Google Scholar
Costain, G. et al. Periodic reanalysis of whole-genome sequencing data enhances the diagnostic advantage over standard clinical genetic testing. Eur. J. Hum. Genet. 26, 740–744 (2018).
Article CAS PubMed PubMed Central Google Scholar
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).
Article PubMed PubMed Central Google Scholar
Torene, R. I. et al. Systematic analysis of variants escaping nonsense-mediated decay uncovers candidate Mendelian diseases. Am. J. Hum. Genet. 111, 70–81 (2024).
Article CAS PubMed Google Scholar
Del Gobbo, G. F. & Boycott, K. M. The additional diagnostic yield of long-read sequencing in undiagnosed rare diseases. Genome Res. 35, 559–571 (2025).
Kodera, H. et al. De novo mutations in SLC35A2 encoding a UDP-galactose transporter cause early-onset epileptic encephalopathy. Hum. Mutat. 34, 1708–1714 (2013).
Article CAS PubMed Google Scholar
Marshall, C. R. et al. Contribution of copy number variants to schizophrenia from a genome-wide study of 41,321 subjects. Nat. Genet. 49, 27–35 (2017).
Article CAS PubMed Google Scholar
Bonini, K. E. et al. Identification of copy number variants with genome sequencing: Clinical experiences from the NYCKidSeq program. Clin. Genet. 104, 210–225 (2023).
Article CAS PubMed PubMed Central Google Scholar
Li, Y. R. et al. Rare copy number variants in over 100,000 European ancestry subjects reveal multiple disease associations. Nat. Commun. 11, 255 (2020).
Article CAS PubMed PubMed Central Google Scholar
Zarrei, M. et al. A large data resource of genomic copy number variation across neurodevelopmental disorders. NPJ Genom. Med. 4, 26 (2019).
Article PubMed PubMed Central Google Scholar
Stranger, B. E. et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315, 848–853 (2007).
Article CAS PubMed PubMed Central Google Scholar
Schlattl, A., Anders, S., Waszak, S. M., Huber, W. & Korbel, J. O. Relating CNVs to transcriptome data at fine resolution: assessment of the effect of variant size, type, and overlap with functional regions. Genome Res. 21, 2004–2013 (2011).
Article CAS PubMed PubMed Central Google Scholar
Handsaker, R. E. et al. Large multiallelic copy number variations in humans. Nat. Genet. 47, 296–303 (2015).
Article CAS PubMed PubMed Central Google Scholar
Gamazon, E. R. & Stranger, B. E. The impact of human copy number variation on gene expression. Brief. Funct. Genom. 14, 352–357 (2015).
Article CAS Google Scholar
Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).
Article CAS PubMed PubMed Central Google Scholar
Corbett, M. A. et al. Pathogenic copy number variants that affect gene expression contribute to genomic burden in cerebral palsy. NPJ Genom. Med. 3, 33 (2018).
Article CAS PubMed PubMed Central Google Scholar
Shao, X. et al. Copy number variation is highly correlated with differential gene expression: a pan-cancer study. BMC Med. Genet. 20, 175 (2019).
Article PubMed PubMed Central Google Scholar
Han, L. et al. Functional annotation of rare structural variation in the human brain. Nat. Commun. 11, 2990 (2020).
Article CAS PubMed PubMed Central Google Scholar
Álvarez-Nava, F. & Lanes, R. Epigenetics in Turner syndrome. Clin. Epigenet. 10, 45 (2018).
Article Google Scholar
Santos-Rebouças, C. B. et al. Skewed X-Chromosome inactivation and compensatory upregulation of escape genes precludes major clinical symptoms in a female with a large Xq deletion. Front. Genet. 11, 101 (2020).
Article PubMed PubMed Central Google Scholar
Hengel, H. et al. Loss-of-function mutations in UDP-Glucose 6-Dehydrogenase cause recessive developmental epileptic encephalopathy. Nat. Commun. 11, 595 (2020).
Article CAS PubMed PubMed Central Google Scholar
Lev, S. et al. Protein tyrosine kinase PYK2 involved in Ca(2 + )-induced regulation of ion channel and MAP kinase functions. Nature 376, 737–745 (1995).
Article CAS PubMed Google Scholar
Dikic, I., Tokiwa, G., Lev, S., Courtneidge, S. A. & Schlessinger, J. A role for Pyk2 and Src in linking G-protein-coupled receptors with MAP kinase activation. Nature 383, 547–550 (1996).
Article CAS PubMed Google Scholar
Lambert, J. C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat. Genet. 45, 1452–1458 (2013).
Article CAS PubMed PubMed Central Google Scholar
Raj, T. et al. Integrative transcriptome analyses of the aging brain implicate altered splicing in Alzheimer’s disease susceptibility. Nat. Genet. 50, 1584–1592 (2018).
Article CAS PubMed PubMed Central Google Scholar
Jansen, I. E. et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat. Genet. 51, 404–413 (2019).
Article CAS PubMed PubMed Central Google Scholar
Chan, G. et al. CD33 modulates TREM2: convergence of Alzheimer loci. Nat. Neurosci. 18, 1556–1558 (2015).
Article CAS PubMed PubMed Central Google Scholar
GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
Article Google Scholar
Roosing, S. et al. Mutations in CEP120 cause Joubert syndrome as well as complex ciliopathy phenotypes. J. Med. Genet. 53, 608–615 (2016).
Article CAS PubMed Google Scholar
Marshall, A. E. et al. RNA sequencing reveals deep intronic CEP120 variant: A report of the diagnostic odyssey for two siblings with Joubert syndrome type 31. Am. J. Med. Genet. A 194, e63485 (2024).
Article CAS PubMed Google Scholar
Stemm-Wolf, A. J., O’Toole, E. T., Sheridan, R. M., Morgan, J. T. & Pearson, C. G. The SON RNA splicing factor is required for intracellular trafficking structures that promote centriole assembly and ciliogenesis. Mol. Biol. Cell 32, ar4 (2021).
Article CAS PubMed PubMed Central Google Scholar
Cheerie, D. et al. Consensus guidelines for eligibility assessment of pathogenic variants to antisense oligonucleotide treatments. medRxiv https://doi.org/10.1101/2024.09.27.24314122 (2024).
Zeng, T. & Li, Y. I. Predicting RNA splicing from DNA sequence using Pangolin. Genome Biol. 23, 103 (2022).
Article CAS PubMed PubMed Central Google Scholar
Oh, R. Y. et al. A systematic assessment of the impact of rare canonical splice site variants on splicing using functional and in silico methods. HGG Adv. 5, 100299 (2024).
CAS PubMed PubMed Central Google Scholar
Morrow, B. E., McDonald-McGinn, D. M., Emanuel, B. S., Vermeesch, J. R. & Scambler, P. J. Molecular genetics of 22q11.2 deletion syndrome. Am. J. Med. Genet. A 176, 2070–2081 (2018).
Article CAS PubMed PubMed Central Google Scholar
Motahari, Z., Moody, S. A., Maynard, T. M. & LaMantia, A.-S. In the line-up: deleted genes associated with DiGeorge/22q11.2 deletion syndrome: are they all suspects?. J. Neurodev. Disord. 11, 7 (2019).
Article PubMed PubMed Central Google Scholar
Komlósi, K., Gyenesei, A. & Bene, J. Editorial: copy number variation in rare disorders. Front. Genet. 13, 898059 (2022).
Article PubMed PubMed Central Google Scholar
Ungar, R. A. et al. Impact of genome build on RNA-seq interpretation and diagnostics. Am. J. Hum. Genet. 111, 1282–1300 (2024).
Article CAS PubMed PubMed Central Google Scholar
Farrow, E. et al. Case of CLPB deficiency solved by HiFi long read genome sequencing and RNAseq. Am. J. Med. Genet. A 191, 2908–2912 (2023).
Article PubMed Google Scholar
Nicolas-Martinez, E. C. et al. RNA variant assessment using transactivation and transdifferentiation. Am. J. Hum. Genet. 111, 1673–1699 (2024).
Article CAS PubMed PubMed Central Google Scholar
Girdea, M. et al. PhenoTips: patient phenotyping software for clinical and research use. Hum. Mutat. 34, 1057–1065 (2013).
Article PubMed Google Scholar
Gargano, M. A. et al. The Human Phenotype Ontology in 2024: phenotypes around the world. Nucleic Acids Res. 52, D1333–D1346 (2024).
Article CAS PubMed Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS PubMed Google Scholar
Babraham Bioinformatics - FastQC: A Quality Control tool for High Throughput Sequence Data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
Graubert, A., Aguet, F., Ravi, A., Ardlie, K. G. & Getz, G. RNA-SeQC 2: efficient RNA-seq quality control and quantification for large cohorts. Bioinformatics 37, 3048–3050 (2021).
Article CAS PubMed PubMed Central Google Scholar
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinforma. 12, 323 (2011).
Article CAS Google Scholar
Brechtmann, F. et al. OUTRIDER: A statistical method for detecting aberrantly expressed genes in RNA sequencing data. Am. J. Hum. Genet. 103, 907–917 (2018).
Article CAS PubMed PubMed Central Google Scholar
Yépez, V. A. et al. Detection of aberrant events in RNA sequencing data. https://doi.org/10.21203/rs.2.19080/v1 (2020).
Mertes, C. et al. Detection of aberrant splicing events in RNA-seq data using FRASER. Nat. Commun. 12, 529 (2021).
Article CAS PubMed PubMed Central Google Scholar
Garrido-Martín, D., Palumbo, E., Guigó, R. & Breschi, A. ggsashimi: Sashimi plot revised for browser- and annotation-independent splicing visualization. PLoS Comput. Biol. 14, e1006360 (2018).
Article PubMed PubMed Central Google Scholar
Gustavsson, E. K., Zhang, D., Reynolds, R. H., Garcia-Ruiz, S. & Ryten, M. ggtranscript: an R package for the visualization and interpretation of transcript isoforms using ggplot2. Bioinformatics 38, 3844–3846 (2022).
Article CAS PubMed PubMed Central Google Scholar
Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinforma. 11, 11.10.1–11.10.33 (2013).
Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article PubMed PubMed Central Google Scholar
Thorvaldsdóttir, H., Robinson, J. T. & Mesirov, J. P. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief. Bioinforma. 14, 178–192 (2013).
Article Google Scholar
Walker, L. C. et al. Application of the ACMG/AMP framework to capture evidence relevant to predicted and observed impact on splicing: recommendations from the CLingen SVI Splicing Subgroup. medRxiv https://doi.org/10.1101/2023.02.24.23286431 (2023).

Download references

Acknowledgements

We are grateful to the individuals and their families who participated in this study. Special thanks to Sergio Pereira and Karen Ho at The Center for Applied Genomics, Lisa Knapp at Agilent, and Howard Cukier at New England BioLabs for their support in the development and optimization of the automated RNA-seq platform, and all past contributors to the SickKids Center for Genetic Medicine and Genome Clinic project. Funding was provided by Genome Canada (OGI-158 to M.B., A.S., J.J.D., M.D.W.). M.Brudno and D.J.S. have equity in PhenoTips. S.W.S. is supported by a Northbridge Chair in Pediatric Research, and J.J.D. is supported by the Mogford Campbell Family Chair of Pediatric Neurosciences.

Author information

These authors contributed equally: Huayun Hou, Kyoko E. Yuki.

Authors and Affiliations

Program in Genetics and Genome Biology, SickKids Research Institute, Toronto, ON, Canada
Huayun Hou, Kyoko E. Yuki, Gregory Costain, Anna Szuto, Ronald D. Cohn, Stephen W. Scherer, Michael Brudno, Adam Shlien, James J. Dowling & Michael D. Wilson
Division of Genome Diagnostics, The Hospital for Sick Children, Toronto, ON, Canada
Kyoko E. Yuki, Sierra Barnes, Michael Braga, Meagan Gloven-Brown, Dimitri J. Stavropoulos, Sarah Bowdin, Christian R. Marshall, Adam Shlien & Lianna Kyriakopoulou
Division of Clinical and Metabolic Genetics, The Hospital for Sick Children, Toronto, ON, Canada
Gregory Costain, Anna Szuto & Roberto Mendoza-Londono
Department of Paediatrics, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
Gregory Costain, Ronald D. Cohn & Roberto Mendoza-Londono
Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
Gregory Costain, Anna Szuto, Ronald D. Cohn, Stephen W. Scherer, James J. Dowling & Michael D. Wilson
Centre for Computational Medicine, The Hospital for Sick Children, Toronto, ON, Canada
Arun K. Ramani & Alper Celik
Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
Dimitri J. Stavropoulos, Christian R. Marshall, Adam Shlien & Lianna Kyriakopoulou
Department of Clinical Genetics, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
Sarah Bowdin
Institute of Medical Science, University of Toronto, Toronto, ON, Canada
Ronald D. Cohn
The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, Canada
Stephen W. Scherer
University Health Network, Toronto, ON, Canada
Michael Brudno
Department of Computer Science, University of Toronto, Toronto, ON, Canada
Michael Brudno
Vector Institute for Artificial Intelligence, Toronto, ON, Canada
Michael Brudno
Center for Human Genomics and Precision Medicine, University of Wisconsin, Madison, WI, USA
M. Stephen Meyn
Department of Pediatrics, University of Wisconsin, Madison, WI, USA
M. Stephen Meyn
Division of Neurology, The Hospital for Sick Children, Toronto, ON, Canada
James J. Dowling

Authors

Huayun Hou
View author publications
Search author on:PubMed Google Scholar
Kyoko E. Yuki
View author publications
Search author on:PubMed Google Scholar
Gregory Costain
View author publications
Search author on:PubMed Google Scholar
Anna Szuto
View author publications
Search author on:PubMed Google Scholar
Sierra Barnes
View author publications
Search author on:PubMed Google Scholar
Arun K. Ramani
View author publications
Search author on:PubMed Google Scholar
Alper Celik
View author publications
Search author on:PubMed Google Scholar
Michael Braga
View author publications
Search author on:PubMed Google Scholar
Meagan Gloven-Brown
View author publications
Search author on:PubMed Google Scholar
Dimitri J. Stavropoulos
View author publications
Search author on:PubMed Google Scholar
Sarah Bowdin
View author publications
Search author on:PubMed Google Scholar
Ronald D. Cohn
View author publications
Search author on:PubMed Google Scholar
Roberto Mendoza-Londono
View author publications
Search author on:PubMed Google Scholar
Stephen W. Scherer
View author publications
Search author on:PubMed Google Scholar
Michael Brudno
View author publications
Search author on:PubMed Google Scholar
Christian R. Marshall
View author publications
Search author on:PubMed Google Scholar
M. Stephen Meyn
View author publications
Search author on:PubMed Google Scholar
Adam Shlien
View author publications
Search author on:PubMed Google Scholar
James J. Dowling
View author publications
Search author on:PubMed Google Scholar
Michael D. Wilson
View author publications
Search author on:PubMed Google Scholar
Lianna Kyriakopoulou
View author publications
Search author on:PubMed Google Scholar

Contributions

A.S., J.J.D., L.K., M.D.W., M. Brudno, and M.S.M. obtained funding, conceived the study design, and supervised the work. H.H., A.K.R., A.C., M. Brudno, A.S., and M.D.W. were responsible for data analysis pipelines. L.K., K.E.Y., H.H., G.C., S.Barnes, A.S., D.J.S., M.S.M., and C.R.M. were involved in data interpretation and clinical assessments. H.H., A.K.R., and A.C. were responsible for RNA-seq quality control analyses. M.Braga and M.G.B. generated the RNA-seq libraries as well as testing and optimizing the automated RNA-seq library preparation method. S.W.S. coordinated and KEY led the implementation of the automated RNA-seq library preparation method at the Center for Applied Genomics. R.D.C., S. Bowdin, R.M.L., S.W.S., C.R.M., M.S.M., D.J.S. represent the SickKids Genome Clinic, who obtained, organized, and analysed the genome sequencing data. H.H., K.E.Y., L.K., and M.D.W. wrote the manuscript. All authors read and reviewed the manuscript.

Corresponding authors

Correspondence to Michael D. Wilson or Lianna Kyriakopoulou.

Ethics declarations

Competing interests

MBrudno and DJS have equity in PhenoTips. SWS, RC, CRM and AS are affiliated with this journal as Editors or Editorial Board Members. These co-authors are not part of a peer review process or decision making of the manuscript. The other authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Materials

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hou, H., Yuki, K.E., Costain, G. et al. Assessing the diagnostic impact of blood transcriptome profiling in a pediatric cohort previously assessed by genome sequencing. npj Genom. Med. 10, 51 (2025). https://doi.org/10.1038/s41525-025-00505-4

Download citation

Received: 06 December 2024
Accepted: 21 May 2025
Published: 01 July 2025
DOI: https://doi.org/10.1038/s41525-025-00505-4