Abstract
Reference genomes serve as a baseline criterion for comparison of personal genomes to deduce clinical variants. The widely used reference genome, GRCh38, contains stretches of gaps and unresolved bases particularly in complex regions which could obscure variant discovery. In contrast, the gapless telomere-to-telomere CHM13 (T2T-CHM13) reference genome can be used to assess difficult regions of the genome. Optical genome mapping (OGM), an imaging technique for structural variation identification has improved resolution compared to traditional cytogenetic methods. Our study showcases the utility of the T2T-CHM13 reference genome for enhanced structural variant (SV) detection in complex regions. We illustrate this through two clinical cases, where improved alignment with T2T-CHM13 led to significantly higher confidence scores for critical SVs. We demonstrate improved clinical diagnostic outcomes with the updated T2T-CHM13 reference and advocate its adoption.
Similar content being viewed by others
Introduction
Evaluation of whole genomes is rapidly emerging as a standard diagnostic test in rare diseases. While small genetic variants such as SNPs and InDels can be identified by molecular tests and whole exome panels (WES), these technologies are limited in detection of large variants [1]. Large structural variants can have profound consequences in research of Mendelian and complex diseases but are challenging to resolve [2, 3]. Whole genome sequencing using short or long read sequencing and imaging techniques like optical genome mapping (OGM), either separately or in combination, are being increasingly used for genome-wide assessment of structural variants. Though there have been rapid improvements in these technologies over the past years, absence of a proper reference genome can hinder their utility to the fullest potential.
The GRCh38 reference genome, while widely used for variation identification, has many unresolved sequences and gaps, contributing to about 150 megabases of genome-wide ambiguity [4]. This encompasses regions in and around centromeres, telomeres, acrocentric p-arms, collapsed, and missing sequences. Consequently, GRCh38 can lead to numerous spurious variant calls, potentially hindering certain variant identification with accuracy [5]. In contrast, the T2T-CHM13 reference is haplotype-resolved, gapless, near-perfect, and offers substantial improvements in variant detection, particularly within these problematic regions [6].
OGM is a high-resolution imaging technique capable of detecting structural variants (SVs) >500 base pairs in length. This technology dubbed “Next-generation cytogenetics” is gaining traction in clinical laboratories for clinical SV identification in genetic diseases and cancers [7, 8]. However, adoption of T2T-CHM13 reference genomes for variant evaluation in OGM is still nascent in the clinical setup. To address this gap, our study aims to evaluate the utility of the T2T-CHM13 reference for OGM based SV detection in the context of genetic diseases.
Clinical presentation and methodology
Case1
A male child was diagnosed with severe Hemophilia due to factor VIII deficiency. A false negative PCR based assay for intron 22 inversion led to whole exome sequencing (WES) in the patient. WES could not identify any known pathogenic/ likely pathogenic variants in the F8 gene. To further investigate the cause of Hemophilia A in this patient, OGM was performed to identify any pathogenic structural variations.
Case2
A 4-year-old male child, born to non-consanguineous parents, exhibited global developmental delay, facial dysmorphism and features of autism spectrum disorder. Karyotyping indicated an apparently balanced translocation between the long arm of chromosome 7 at cytoband 7q11.2 and chromosome 21 at cytoband 21q11.2. The parents had a history of three spontaneous first-trimester abortions. Trio OGM was done to identify the break point of the identified translocation and determine if there are any gene disruption or deletion or duplication events at that locus which can result in described phenotype and to look for any other pathogenic SV.
Ultra-high molecular weight genomic DNA was isolated from the patient’s blood using Bionano Prep SP Frozen Human Blood DNA Isolation Kit (Bionano, USA) followed by direct enzymatic labeling with fluorophores and imaging using Bionano Saphyr. The optical maps were then assembled and mapped to the two reference genomes GRCh38 and CHM13 T2T using the de novo assembly pipeline in Bionano Solve software version 3.8.0. The resultant structural variants were then assessed using Bionano Access version 1.8. A control SV dataset is provided by Bionano for both GRCh38 and CHM13 T2T reference genomes. This dataset encompasses structural variants from 394 individuals of varied ethnicities, and has been used for filtering out common variants. Comparisons of QC metrics and SV concordance between both the assemblies are provided in Tables S1 and S2, respectively.
Results
Case1
OGM of the proband using the GRCh38 reference genome identified 6181 structural variants via de novo genome assembly pipeline. Among these, 45 SVs were unique to the proband compared to the control database and encompassed insertions and deletions. Initial variant and visualization analysis did not reveal any structural variation in the F8 gene located on the X chromosome. However, manual inspection of consensus maps across the F8 gene region revealed the presence of a 0.56 Mb inversion, i.e., the intron 22 inversion (Int22Inv) (ogm[GRCh38] inv(X)(q28)(154888832_155456141)) which splits the F8 gene into two parts (Fig. 1a). The F8 inversion had a confidence score of 0.47, and hence was filtered out as it was below the pipeline’s confidence threshold of 0.7. Given the presence of the F8 gene in the subtelomeric region, which is challenging to resolve in GRCh38 [9], the T2T-CHM13 reference assembly was leveraged for a more accurate assessment of the inversion. This resulted in a significantly increased confidence score (0.82) for the Int22Inv variant. This enhanced score is attributable to the improved alignment of consensus maps to the left of the inversion breakpoint, as demonstrated by the RawConfidenceLeft score reported for T2T-CHM13 which had a twofold increase when compared to that of GRCh38 (Table 1). As the confidence score is calculated based on these raw confidence estimates, this suggests that mapping to the T2T-CHM13 reference genome leads to more reliable SV calls particularly in complex regions. The inversion was further confirmed using long distance PCR assay in-house (Fig. S1, Supplementary Methods).
a Optical maps depicting the F8 Intron 22 inversion when mapped to GRCh38 (top) or T2T-CHM13 (bottom) references. Blue box highlights the SV call reported by Bionano Solve with the T2T-CHM13 reference. b Optical maps showcasing the pathogenic chr7-chr21 translocation. The variant call is highlighted with a blue box.
Case2
Consensus optical maps derived from the proband did not reveal the presence of a translocation aligned to the GRCh38 reference genome. We reviewed low confidence translocations below the recommended threshold of 0.02 which revealed presence of a translocation between chromosome 7 and 21 i.e.; t(7;21)(q11.22;q11.2) (ogm[GRCh38] t(7;21)(q11.22;q11.2)(70555391;13117484)) with a confidence score of 0, disrupting the AUTS2 gene. Given the association of AUTS2 in neurological disorders including autism spectrum disorder, intellectual disability and developmental delay, this finding was clinically significant [10, 11]. As the translocation breakpoint was in the pericentromeric region, consensus maps were realigned to the T2T-CHM13 reference genome. The translocation had a substantially increased confidence score of 0.76 due to improved alignment on either side of the breakpoint. This was evidenced by the RawConfidenceRight score of 6628.61 in T2T-CHM13 compared to 15.94 for the GRCh38 map (Table 1). Additionally, the RawConfidenceLeft score also showed a threefold increase with T2T-CHM13. Trio-optical genome mapping (Trio-OGM) analysis confirmed a de novo origin of the translocation in the proband. No other pathogenic SVs were identified. A closer look at the breakpoint ___location in chromosome 21 revealed an overlap with the centric transition region present in the pericentromeric region of the q arm in this acrocentric chromosome [12]. This region contained long tracts of segmental duplications which could cause ambiguous alignment (Fig. 1b). The same was confirmed by nanopore long read whole genome sequencing which showed the disruption of AUTS2 gene at chr7:70548861 (Fig. S2, Supplementary Methods). We hypothesize that multi mapping of molecules across highly similar segmental duplications in distinct genomic locations may lead to decreased confidence scores.
Discussion
OGM has a significantly higher resolution than techniques such as karyotyping, fluorescence in situ hybridization and chromosomal microarray and can detect a wider range of SVs in a single assay. Its ability to identify cryptic inversions and translocations which cannot be detected by technologies mentioned above has a significant bearing on the recurrence risk of genetic anomalies in affected families. This study employed illustrative clinical cases to evaluate the robustness of the T2T-CHM13 reference genome in identifying clinically relevant variants within challenging genomic loci. We observed increased performance, particularly in repeat-rich regions like subtelomeric and pericentromeric loci containing segmental duplications, compared to existing reference genomes such as GRCh38. To our knowledge, this is the first optical genome mapping study demonstrating the enhanced performance of T2T-CHM13 for variant identification in these complex regions.
The widely used GRCh38 reference genome has more than 100 million unresolved and missing nucleotides across pericentromeric and subtelomeric regions, segmental duplications and ribosomal DNA (rDNA) arrays. In contrast, the T2T-CHM13 genome is a complete, gapless assembly and has an additional 200 million nucleotides which adds on to 8% of previously undiscovered sequence and corrects for many structural errors in complex regions. Multiple studies have reported limitations in detecting structural variants using the GRCh38 reference genome. These limitations stem from reference biases such as deletion bias due to assembly gaps, and inaccuracies in mapping complex regions, despite the availability of advanced SV detection technologies. Compared to GRCh38, the T2T-CHM13 reference genome has been shown to improve variant calling accuracy in medically relevant genes [6]. Furthermore, our study showcases that resolved repeat arrays and long stretches of segmental duplications within the T2T-CHM13 genome significantly improves alignment accuracy. This improvement in alignment, in turn, has the potential to enhance variant calling confidence scores. On the other hand, the lack of adequate control data and readily available annotation tracks in databases like DECIPHER, DGV (Database of Genomic variants), etc., which are important for clinical interpretation of structural variants are a hindrance to immediate adoption of T2T-CHM13 reference genome in clinical laboratories. Despite these limitations, we recommend the adoption of the T2T-CHM13 reference genome for resolving variants, particularly within challenging genomic regions.
Data availability
Data is available upon request.
References
Alkan C, Coe BP, Eichler EE. Genome structural variation discovery and genotyping. Nat Rev Genet. 2011;12:363–76. https://doi.org/10.1038/nrg2958
Carvalho CM, Lupski JR. Mechanisms underlying structural variant formation in genomic disorders. Nat Rev Genet. 2016;17:224–38. https://doi.org/10.1038/nrg.2015.25
Mahmoud M, Gobet N, Cruz-Dávalos DI, Mounier N, Dessimoz C, Sedlazeck FJ. Structural variant calling: the long and the short of it. Genome Biol. 2019;20:1–4. https://doi.org/10.1186/s13059-019-1828-7
NurkS K, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, et al. The complete sequence of a human genome. Science. 2022;376:44–53. https://doi.org/10.1126/science.abj69
Wagner J, Olson ND, Harris L, McDaniel J, Cheng H, Fungtammasan A, et al. Curated variation benchmarks for challenging medically relevant autosomal genes. Nat Biotechnol. 2022;40:672–80. https://doi.org/10.1038/s41587-021-01158-1
Aganezov S, Yan SM, Soto DC, Kirsche M, Zarate S, Avdeyev P, et al. A complete reference genome improves analysis of human genetic variation. Science. 2022;376:eabl3533. https://doi.org/10.1126/science.abl3533
Iqbal MA, Broeckel U, Levy B, Skinner S, Sahajpal NS, Rodriguez V, et al. Multisite assessment of optical genome mapping for analysis of structural variants in constitutional postnatal cases. J Mol Diagn. 2023;25:175–88. https://doi.org/10.1016/j.jmoldx.2022.12.005
Sahajpal NS, Mondal AK, Tvrdik T, Hauenstein J, Shi H, Deeb KK, et al. Clinical validation and diagnostic utility of optical genome mapping for enhanced cytogenomic analysis of hematological neoplasms. J Mol Diagn. 2022;24:1279–91. https://doi.org/10.1016/j.jmoldx.2022.09.009
Linardopoulou EV, Williams EM, Fan Y, Friedman C, Young JM, Trask BJ. Human subtelomeres are hot spots of interchromosomal recombination and segmental duplication. Nature. 2005;437:94–100. https://doi.org/10.1038/nature04029
Sultana R, Yu CE, Yu J, Munson J, Chen D, Hua W, et al. Identification of a novel gene on chromosome 7q11. 2 interrupted by a translocation breakpoint in a pair of autistic twins. Genomics. 2002;80:129–34. https://doi.org/10.1006/geno.2002.6810
Huang XL, Zou YS, Maher TA, Newton S, Milunsky JM. A de novo balanced translocation breakpoint truncating the autism susceptibility candidate 2 (AUTS2) gene in a patient with autism. Am J Med Genet Part A. 2010;152:2112–4. https://doi.org/10.1002/ajmg.a.33497
Altemose N, Logsdon GA, Bzikadze AV, Sidhwani P, Langley SA, Caldas GV, et al. Complete genomic and epigenetic maps of human centromeres. Science. 2022;376:eabl4178. https://doi.org/10.1126/science.abl4178
Acknowledgements
The authors thank the patients and their families for participation in the study. We acknowledge Dr Ashwin Dalal, Centre for DNA fingerprinting and Diagnostics, Hyderabad, India for his guidance on Factor VIII, intron 22 inversion assay.
Funding
The study has been supported by CCMB core funds.
Author information
Authors and Affiliations
Contributions
Conceptualization: KBT, DTS; Formal analysis: SB, SMS; Investigation: SB, KMK, JKG, SMS; Resources: SJP, SRP; Supervision: KBT, DTS; Writing - Original Draft: SB; Writing—review and editing: KBT, DTS, SJP, SRP.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval
The study received the approval of the Institute Human Ethics Committee, CCMB (IEC-109/2023).
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Banu, S., MK, K., George, J.K. et al. Enhanced resolution of optical genome mapping utilizing telomere-to-telomere reference in genetic disorders. Eur J Hum Genet (2024). https://doi.org/10.1038/s41431-024-01763-z
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41431-024-01763-z