Abstract
Retroelements have a critical role in shaping eukaryotic genomes. For instance, site-specific non-long terminal repeat retrotransposons have spread widely through preferential integration into repetitive genomic sequences, such as microsatellite regions and ribosomal DNA genes1,2,3,4,5,6. Despite the widespread occurrence of these systems, their targeting constraints remain unclear. Here we use a computational pipeline to discover multiple new site-specific retrotransposon families, profile members both biochemically and in mammalian cells, find previously undescribed insertion preferences and chart potential evolutionary paths for retrotransposon retargeting. We identify R2Tg, an R2 retrotransposon from the zebra finch, Taeniopygia guttata, as an orthologue that can be retargeted by payload engineering for target cleavage, reverse transcription and scarless insertion of heterologous payloads at new genomic sites. We enhance this activity by fusing R2Tg to CRISPR–Cas9 nickases for efficient insertion at new genomic sites. Through further screening of R2 orthologues, we select an orthologue, R2Tocc, with natural reprogrammability and minimal insertion at its natural 28S site, to engineer SpCas9H840A–R2Tocc, a system we name site-specific target-primed insertion through targeted CRISPR homing of retroelements (STITCHR). STITCHR enables the scarless, efficient installation of edits, ranging from a single base to 12.7 kilobases, gene replacement and use of in vitro transcribed or synthetic RNA templates. Inspired by the prevalence of nLTR retrotransposons across eukaryotic genomes, we anticipate that STITCHR will serve as a platform for scarless programmable integration in dividing and non-dividing cells, with both research and therapeutic applications.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
27,99 € / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
199,00 € per year
only 3,90 € per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
High-throughput sequencing data have been deposited in the NCBI Sequencing Read Archive database under accession PRJNA1223444. Expression plasmids are available from Addgene under the UBMTA; support information and computational tools are available at https://www.abugootlab.org/. All other data are available from the corresponding authors upon reasonable request.
References
Goodier, J. L. & Kazazian, H. H. Jr Retrotransposons revisited: the restraint and rehabilitation of parasites. Cell 135, 23–35 (2008).
Kojima, K. K., Seto, Y. & Fujiwara, H. The wide distribution and change of target specificity of R2 non-LTR retrotransposons in animals. PLoS ONE 11, e0163496 (2016).
Eickbush, D. G., Burke, W. D. & Eickbush, T. H. Evolution of the R2 retrotransposon ribozyme and its self-cleavage site. PLoS ONE 8, e66441 (2013).
Kojima, K. K. & Fujiwara, H. Long-term inheritance of the 28S rDNA-specific retrotransposon R2. Mol. Biol. Evol. 22, 2157–2165 (2005).
Fujiwara, H. et al. Introns and their flanking sequences of Bombyx mori rDNA. Nucleic Acids Res. 12, 6861–6869 (1984).
Roiha, H., Miller, J. R., Woods, L. C. & Glover, D. M. Arrangements and rearrangements of sequences flanking the two types of rDNA insertion in D. melanogaster. Nature 290, 749–754 (1981).
Kojima, K. K. & Fujiwara, H. Evolution of target specificity in R1 clade non-LTR retrotransposons. Mol. Biol. Evol. 20, 351–361 (2003).
Burke, W. D., Malik, H. S., Lathe, W. C. III & Eickbush, T. H. Are retrotransposons long-term hitchhikers? Nature 392, 141–142 (1998).
Malik, H. S., Burke, W. D. & Eickbush, T. H. The age and evolution of non-LTR retrotransposable elements. Mol. Biol. Evol. 16, 793–805 (1999).
Eickbush, T. H. in Mobile DNA II (eds Craig, N. L. et al.) 813–835 (ASM, 2002).
Fujiwara, H. in Mobile DNA III (eds Chandler, M. et al.) 1147–1163 (ASM, 2015).
Eickbush, T. H. & Eickbush, D. G. Integration, regulation, and long-term stability of R2 retrotransposons. Microbiol. Spectr. https://doi.org/10.1128/microbiolspec.mdna3-0011-2014 (2015).
Christensen, S. M. & Eickbush, T. H. R2 target-primed reverse transcription: ordered cleavage and polymerization steps by protein subunits asymmetrically bound to the target DNA. Mol. Cell. Biol. 25, 6617–6628 (2005).
Han, J. S. Non-long terminal repeat (non-LTR) retrotransposons: mechanisms, recent developments, and unanswered questions. Mob. DNA 1, 15 (2010).
Zhang, X. et al. Harnessing eukaryotic retroelement proteins for transgene insertion into human safe-harbor loci. Nat. Biotechnol. 43, 42–51 (2025).
Kuroki-Kami, A. et al. Targeted gene knockin in zebrafish using the 28S rDNA-specific non-LTR-retrotransposon R2Ol. Mob. DNA 10, 23 (2019).
Su, Y., Nichuguti, N., Kuroki-Kami, A. & Fujiwara, H. Sequence-specific retrotransposition of 28S rDNA-specific LINE R2Ol in human cells. RNA 25, 1432–1438 (2019).
Chen, Y. et al. All-RNA-mediated targeted gene integration in mammalian cells with rationally engineered R2 retrotransposons. Cell 187, 4674–4689 (2024).
Wilkinson, M. E., Frangieh, C. J., Macrae, R. K. & Zhang, F. Structure of the R2 non-LTR retrotransposon initiating target-primed reverse transcription. Science 380, 301–308 (2023).
Luchetti, A. & Mantovani, B. Non-LTR R2 element evolutionary patterns: phylogenetic incongruences, rapid radiation and the maintenance of multiple lineages. PLoS ONE 8, e57076 (2013).
Kojima, K. K. Structural and sequence diversity of eukaryotic transposable elements. Genes Genet. Syst. 94, 233–252 (2020).
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 11 (2015).
Yang, J., Malik, H. S. & Eickbush, T. H. Identification of the endonuclease ___domain encoded by R2 and other site-specific, non-long terminal repeat retrotransposable elements. Proc. Natl Acad. Sci. USA 96, 7847–7852 (1999).
Bibillo, A. & Eickbush, T. H. End-to-end template jumping by the reverse transcriptase encoded by the R2 retrotransposon. J. Biol. Chem. 279, 14945–14953 (2004).
Ruminski, D. J., Webb, C.-H. T., Riccitelli, N. J. & Lupták, A. Processing and translation initiation of non-long terminal repeat retrotransposons by hepatitis delta virus (HDV)-like self-cleaving ribozymes. J. Biol. Chem. 286, 41286–41295 (2011).
Kong, X. et al. Precise genome editing without exogenous donor DNA via retron editing system in human cells. Protein Cell 12, 899–902 (2021).
Zhao, B., Chen, S.-A. A., Lee, J. & Fraser, H. B. Bacterial retrons enable precise gene editing in human cells. CRISPR J. 5, 31–39 (2022).
Borel, F., Lacroix, F. B. & Margolis, R. L. Prolonged arrest of mammalian cells at the G1/S boundary results in permanent S phase stasis. J. Cell Sci. 115, 2829–2838 (2002).
Anzalone, A. V. et al. Programmable deletion, replacement, integration and inversion of large DNA sequences with twin prime editing. Nat. Biotechnol. 40, 731–740 (2022).
Zheng, C. et al. Template-jumping prime editing enables large insertion and exon rewriting in vivo. Nat. Commun. 14, 3369 (2023).
Wang, J. et al. Efficient targeted insertion of large DNA fragments without DNA donors. Nat. Methods 19, 331–340 (2022).
Yarnall, M. T. N. et al. Drag-and-drop genome insertion of large sequences without double-strand DNA cleavage using CRISPR-directed integrases. Nat. Biotechnol. 41, 500–512 (2023).
de Rocquigny, H. et al. The zinc fingers of HIV nucleocapsid protein NCp7 direct interactions with the viral regulatory protein Vpr. J. Biol. Chem. 272, 30753–30759 (1997).
Kojima, K. K. & Fujiwara, H. An extraordinary retrotransposon family encoding dual endonucleases. Genome Res. 15, 1106–1117 (2005).
Edgar, R. C. Muscle5: High-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny. Nat. Commun. 13, 6968 (2022).
Eddy, S. R. Accelerated profile HMM searches. PLoS Comput. Biol. 7, e1002195 (2011).
Woodcroft, B. J., Boyd, J. A. & Tyson, G. W. OrfM: a fast open reading frame predictor for metagenomic data. Bioinformatics 32, 2702–2703 (2016).
Mistry, J. et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 49, D412–D419 (2021).
Lu, S. et al. CDD/SPARCLE: the conserved ___domain database in 2020. Nucleic Acids Res. 48, D265–D268 (2020).
Sigrist, C. J. A. et al. New and continuing developments at PROSITE. Nucleic Acids Res. 41, D344–D347 (2013).
Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res. 49, D192–D200 (2021).
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935 (2013).
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Steenwyk, J. L., Buida, T. J. III, Li, Y., Shen, X.-X. & Rokas, A. ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference. PLoS Biol. 18, e3001007 (2020).
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2 – approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010).
Yu, G. Using ggtree to visualize data on tree-like structures. Curr. Protoc. Bioinformatics 69, e96 (2020).
Valdar, W. S. J. Scoring residue conservation. Proteins 48, 227–241 (2002).
Killick, R. & Eckley, I. A. changepoint: an R package for changepoint analysis. J. Stat. Softw. 58, 1–19 (2014).
Hu, J. et al. Detecting DNA double-stranded breaks in mammalian genomes by linear amplification–mediated high-throughput genome-wide translocation sequencing. Nat. Protoc. 11, 853–871 (2016).
Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10–12 (2011).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat. Biotechnol. 37, 224–226 (2019).
Acknowledgements
We thank P. Reginato, D. Weston and E. Boyden for support with MiSeq instrumentation; K. Holden for Synthego sgRNAs; S. Levine and the MIT BioMicro Center for Pacific Biosciences sequencing library preparation and sequencing; PhoenixBio for providing primary human hepatocytes (PXB cells); X. D. Chen for retrotransposon analysis; N. Willis and S. Khoramian Tusi for Southern blot advice and protocols; R. Desimone and J. Crittenden for support and discussions; and members of the Abudayyeh–Gootenberg lab for support and advice. C.W.F. is supported by a grant from the Simons Foundation International to the Simons Center for the Social Brain at MIT. L.V. is supported by a Swiss National Science Foundation postdoc mobility fellowship. H.N is supported by JSPS KAKENHI grant 21H05281, the Takeda Medical Research Foundation and the Inamori Research Institute for Science. M.H is supported by JSPS KAKENHI grant 23K14133, the Takeda Medical Research Foundation and JST, ACT-X grant JPMJAX232F. J.S.G. and O.O.A. are supported by NIH grants 1R21-AI149694, R01-EB031957, R01-AG074932 and R56-HG011857; the McGovern Institute Neurotechnology (MINT) program; the K. Lisa Yang and Hock E. Tan Center for Molecular Therapeutics in Neuroscience; the G. Harold & Leila Y. Mathers Charitable Foundation; the NHGRI Technology Development Coordinating Center Opportunity Fund; the MIT John W. Jarve (1978) Seed Fund for Science Innovation; Impetus Grants; a Cystic Fibrosis Foundation pioneer grant; Google Ventures; FastGrants; the Harvey Family Foundation; Winston Fu; and the McGovern Institute.
Author information
Authors and Affiliations
Contributions
O.O.A. and J.S.G. conceived the study and participated in the design, execution and analysis of experiments. L.V. and C.W.F. designed and performed the experiments and analysed the data. J.L. developed computational pipelines for retrotransposon discovery. M.H. purified the retrotransposon proteins. M.H., C.W.F., O.O.A. and J.S.G. did the biochemistry experiments. K.J. performed computational analysis of sequencing experiments. D.T., A.L., M.T.N.Y., R.N.K., C.S.-U., A.K., H.R. and S.M.Y. assisted with experiments. C.A.V. and N.R. provided synthetic RNA templates. H.N. participated in the analysis of biochemical experiments. L.V., C.W.F., H.N., O.O.A. and J.S.G. wrote the manuscript with help from all authors. C.W.F., L.V., J.L., M.H. share co-first authorship. D.T., M.T.N.Y. and A.L. share co-second authorship. The order of authors C.W.F. and L.V. was decided by a coin toss.
Corresponding authors
Ethics declarations
Competing interests
MIT has filed for a patent application for this work (WO2024220409A1). J.S.G. and O.O.A. are co-founders of Terrain Biosciences, Doppler Bio and Transit Therapeutics. All other authors declare no competing interests.
Peer review
Peer review information
Nature thanks Todd Macfarlan and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Computational discovery and characterization of novel site-specific nLTR retrotransposon systems.
a) Schematic of computational pipeline used to discover and classify site-specific nLTR retrotransposon systems. b) Size distribution of the ORFs from the first methionine for each of the 5 families of RLE containing nLTR retrotransposons. c) Distribution of distances from candidate retrotransposons to detected Rfam annotation or tandem repeat targets for each of the 5 families of RLE containing nLTR retrotransposons. d) Distribution of the predicted 5′ and 3′ UTR sizes for all nLTR RLE-containing retrotransposons. UTR sizes are predicted based on the distance from the ORF and nearest predicted target site. Box plots are shown with the median, 25th percentile, 75th percentile, and whiskers that are 1.5x the interquartile range. All outliers are shown as individual points. n(5’UTR) = 10,033; n(3’UTR) = 7642. e) Distribution of the lengths of observed non-coding conservation regions flanking the 5′ and 3′ ends of the retrotransposon ORF. Box plots are shown with the median, 25th percentile, 75th percentile, and whiskers that are 1.5x the interquartile range. All outliers are shown as individual points. n(5’UTR) = 3307; n(3’ UTR) = 6472. f) Schematic of typical nLTR retrotransposon insertion sites with target sites consistent on both sides of the retrotransposon. g) Phylogenetic tree of all 5 families of RLE-containing nLTR systems showing majority of detected Rfam targets in the vicinity of the nLTR ORF.
Extended Data Fig. 2 Additional analysis of select nLTR retrotransposon systems.
a) DNA sequence alignments of nLTR families with divergent target preferences in the non-coding areas surrounding the nLTR ORFs. Identified Rfam annotations in the surrounding locus are highlighted. b) Multiple sequence alignment of different nLTR retrotransposons using MUSCLE, with Pfam ___domain schematic above as determined by HHpred. c) Analysis of sequence identity similarity of chosen nLTR retrotransposon family members using the MUSCLE protein alignment from Extended Data Fig. 2b.
Extended Data Fig. 3 Analysis of the integration activity of nLTR retrotransposon orthologs and biochemical analysis of R2Tg TPRT activity.
a) Analysis of the 5′ end of the nLTR1Mbr locus with the microsatellite repeat region and alignment to the human 28S rDNA region highlighted. b) Schematic of payload homology and target sites used to evaluate nLTR1Mbr insertion. c) Gluc payload insertion by nLTR1Mbr into a panel of luciferase reporters, as quantified by luciferase production, with R2Tg targeting the R2 28S sequence as control. Reporters with either similarity to the R2 28S region, or with similarity to the 28S homology region in the nLTR1Mbr locus are used for evaluation of alternative insertion sites. d) Phylogenetic tree of nLTR retrotransposons zoomed in on the R2Tocc system and surrounding orthologs. Tree branches corresponding to avian genomes are highlighted in blue and orthologs used in this study are labeled. e) Heatmap of 28S luciferase reporter assay, testing integration by R2Bm, R2Tg, R2Mes and R2TgRTmut (x axis) using RNA payloads containing UTRs from different retrotransposon ortholog systems (y axis). f) Validation of 28S NGS assessment of editing efficiencies. Synthetic eblocks containing editing and unedited DNA sequences were mixed at defined ratios (x-axis) and measured by NGS (y-axis). Agreement between the known editing percentage (x-axis) and measured editing percentage was calculated by linear regression and is shown as an inset. Schematic above shows the relationship of the three NGS primers to the inserted sequence where one forward primer is in the genomic sequence upstream and there is one reverse primer in the insert and one reverse primer in the downstream genomic sequence. g) Timecourse of biochemical TPRT by R2Tg into 28S DNA with or without RNA payloads with different incubation times, as indicated. h) NGS insertion quantification of TPRT shown in Extended Data Fig. 3g. i) Electrophoretic Mobility Shift Assay gel showing the shift of the 28S DNA target due to binding of the R2Tg protein alone or R2Tg-RNA complex. Schematics to the right of the gel show the identity of the different DNA complex products on the gel. The bottom strand is 5′ labeled. j) Biochemical insertion of Gluc sequence into the 28S target with a payload containing only homology arms to the 28S locus and no UTRs with +/–payload RNA, +/– 28S DNA, +/– R2 protein, +/– Mg2+ and +/– dNTPs, as indicated. Above, NGS quantitation of insertion efficiency and schematic of the used RNA payloads. Schematics to the right of the gels indicate the specific TPRT and cleavage products. k) Biochemical TPRT by R2Tg into 28S DNA using RNA payloads with 100 bp, 60 bp, 30 bp and 0 bp 28S homology and no RNA payload control. Insertion frequency is quantified by NGS. l) Biochemical TPRT by R2Tg into 28S DNA using RNA payloads with or without 5′ cap and/or 3′ poly-A tail modifications as well as no RNA payload control. m) NGS insertion quantification of TPRT shown in Extended Data Fig. 3l. n) Primer extension assay by WT R2Tg, RLE inactivated R2TgD1275A, and no protein, where 28S RNA payload and complementary primer were hybridized and extended by reverse transcription activity of the R2Tg protein. Error bars represent mean +/− (c, e) s.e.m. or (j) s.d. n = 3 (c, e, f, j) or n = 1 (h, k, m) where n represents biological replicates.
Extended Data Fig. 4 Mammalian editing with R2Tg and analysis of insertion junctions and associated indels.
a) Gluc payload insertion by R2Tg reverse transcriptase ___domain deletions, RLE inactivation mutants (D1275A), and reverse transcriptase mutations (R2TgF876A/A877L/D878A/D879A/L880A/V881A/L882A, RTmut), at the 28S locus luciferase reporter target, as quantified by luciferase activity. Luciferase activity was assayed in HEK293FT cells. b) Gluc payload insertion by R2Tg RT ___domain mutations, including R2TgF876A/A877L/D878A/D879A/L880A/V881A/L882A (RTmut), R2TgD878R/D879R, and R2TgD878H/D879H, and the RLE inactivation mutant (D1275A) at the 28S locus luciferase reporter, as quantified by luciferase. Luciferase activity was assayed in HEK293FT cells. c) Uncropped version of the gels shown in Fig. 2g; Above, RNA payload insertion into a 28S plasmid reporter by wild type R2Tg, RLE inactivated, RT inactivated, and complemented RT and RLE inactivated proteins +/– RNA payload, as indicated. RNA templates used were in vitro transcribed with a 5′ cap and a poly A tail. Expected band size = 294 bp. NT, non-targeting RNA templates that have homology to the NOLC1 target instead of the 28S locus. Below, R2Tg insertion into human 28S endogenous locus with payloads containing 100, 50, 30 or 0 homology to the 28S target site. RNA templates used were in vitro transcribed with a 5′ cap and a poly A tail. Expected band size = 374 bp. d) Luciferase assay of Gluc insertion of an IVT RNA payload with variable 3′ tail length into a 28S reporter target by WT R2Tg and RLE-inactivated R2TgD1275A. Luciferase activity was assayed in Huh-7 cells. e) Sanger sequencing of 5′ and 3′ insertion junctions at the 28S target for additional selected payload designs after R2Tg integration. Payload numbers correspond to those in Fig. 2h. f) Example indels at the 5′ junction for R2Tg insertion at the 28S target for selected payloads. Non-templated Cs from reverse transcription in the bottom strand (G in the top strand) are highlighted with red boxes. g) Example indels at the WT 28S locus target for selected payloads. Non-templated Cs from reverse transcription in the bottom strand (G in the top strand) are highlighted with red boxes. Error bars represent mean +/− (a,b,d) s.e.m. n = 3 where n represents biological replicates.
Extended Data Fig. 5 Analysis of integration outcomes at the 28S target and NOLC1 locus with payload variations.
a) Gaussia luciferase exon 2 (Gluc) payload insertion by wild type and ___domain inactivated mutants of R2Tg into a 28S plasmid reporter, with editing outcomes profiled by NGS at the upstream (left) junction. Mutants tested are WT R2Tg and R2TgD1275A (RLE mutant) and outcomes are classified as perfect insertions, insertions with indels, or WT locus indels. b) Schematic of additional payload variant with internal homology arms against the 28S target. c) Gaussia luciferase exon 2 (Gluc) payload insertion by wild type R2Tg into a 28S plasmid reporter with payload variants shown in part B, with editing outcomes profiled by NGS at the upstream (left) junction. Outcomes are classified as perfect insertions, insertions with indels, or WT locus indels. d) Size analysis by gel electrophoresis of 5′ and 3′ insertion junctions at the 28S target reporter for payload designs from part (b) and (c) after R2Tg integration. Payload numbers correspond to those in B. e) Gluc exon 2 payload insertion by WT R2Tg, R2TgD1275A, or the RT ___domain deletion R2TgΔ(875-885) into a 28S plasmid reporter with payloads containing 28S or AAVS1 targeting homology arms, profiled by NGS. Statistics were calculated using unpaired t-test. f) Biochemical retrotransposition of different RNA payloads into the AAVS1 DNA target with the R2Tg protein, dNTPs, and MgCl2, as indicated. Either no payload was used or the following two payloads were used: 1) payload with a 5′ UTR targeting AAVS1 and containing a Gluc insert, or 2) a payload with 5′ and 3′ UTRs targeting NOLC1 and containing an EGFP insert. Labels on the gel indicate the specific TPRT product, DNA target band, and R2Tg produced nicked fragments. g) Validation of AAVS1 NGS method. Synthetic eblocks containing editing and unedited DNA sequences were mixed at defined ratios (x-axis) and measured by NGS (y-axis). Agreement between the known editing percentage (x-axis) and measured editing percentage was calculated by linear regression and is shown inset. h) Validation of the NOLC1 3-primer NGS assay using mixes of genomic DNA from unedited or heterozygously inserted cells at NOLC1, as measured by ddPCR. Shown is the known pre-mixed ratio of edited and unedited gDNA (x-axis) vs the measured editing rate by NGS (y-axis). Inset, coefficient of determination between values on x- and y-axes. i) Schematic of payload engineering for R2Tg reprogramming to the NOLC1 locus. j) EGFP payload insertion at human endogenous NOLC1 locus by natural reprogrammed wild-type R2Tg as well as R2TgD1275A and R2TgRTmut. Insertion is quantified by ddPCR. Statistics calculated with unpaired t-test. k) Payload insertion by SpCas9H840A-R2TgΔ1-183 or SpCas9H840A-R2TgΔ1-183,D1275A into the endogenous NOLC1 locus, mediated by dual guides or non-targeting guides and quantified by ddPCR. Inset shows payload design and locus schematic with homology arms colored and top guide in red and bottom guide in blue. Statistics calculated with unpaired t-test. l) Secondary structure analysis of the 5′ UTR of R2Tg, including the full length, 15 nt truncated variant, and the 15 nt truncated variant with the 50 nt 28S homology sequence upstream. m) Validation of the 3-primer NGS assay for analysis of AAVS1 integration via the left insertion junction. Standards consist of edited and WT amplicons that are mixed in the listed ratios (x-axis) and the measured editing is determined by the 3-primer NGS assay (y-axis). n) Gluc integration at the endogenous AAVS1 locus via the SpCas9H840A-R2TgΔ1-183 fusion using payloads with the full length or 15-nt truncated 5′ UTR, an upstream 28S 50 nt sequence, and internal AAVS1 homology arms. Integration is quantified by next-generation sequencing (left) and ddPCR (right). Error bars represent mean +/− s.e.m. n = 3 where n represents three biological replicates.
Extended Data Fig. 6 Analysis of R2Tg UTRs and AAVS1 integration.
a) Biochemical retrotransposition of an RNA payload into the NOLC1 DNA target with and without withdrawal of R2Tg protein, RNA, dNTPs, SpCas9/guides, or MgCl2, as indicated. Above, NGS quantification of insertion efficiency and a schematic of the RNA payload used. Gel is stained with SYBR gold for visualization of nucleic acid. b) Biochemical retrotransposition of an RNA payload into the NOLC1 DNA target with and without withdrawal of R2Tg protein, RNA, dNTPs, SpCas9/guides, or MgCl2, as indicated. The DNA top strand is Cy5 labeled (red) and bottom strand is FAM labeled (green), allowing for visualization by fluorescence. Labels on the gel indicate the specific TPRT product, DNA target band, and R2Tg produced nicked fragments. c) Reprogrammed biochemical retrotransposition by R2Tg into the NOLC1 DNA target, using a homologous IVT NOLC1 payload (N) with +/– 5′ cap and 3′ tail modifications compared to EMX1 (E)- or 28-homologous (28S) payloads (i.e. non-homologous to NOLC1). Integration is quantified by NGS. d) Reprogrammed biochemical retrotransposition of an IVT RNA payload containing the optimized 5′ and 3′ UTR and homology regions into the AAVS1 DNA target by R2Tg +/– DNA target, +/– RNA, +/– Cas9-assisted nicking, and +/– R2Tg, as indicated. Black arrow on the gel indicates the specific TPRT product. The blue arrow denotes the cleaved DNA band generated by R2Tg protein alone reprogrammed by its payload RNA. e) NGS quantification of insertion data shown in Extended Data Fig. 6d. f) Integration efficiencies, quantified by NGS, of reprogrammed biochemical TPRT of an RNA payload by R2Tg into varying amounts of NOLC1 DNA target compared to no RNA controls. g) Integration efficiencies, quantified by NGS, of reprogrammed biochemical TPRT by R2Tg using NOLC1 RNA payloads incorporating either different single-base mismatches or insertions into the NOLC1 DNA, as indicated. Either in vitro transcribed mRNA or synthetic RNA templates were used as the payloads. h) Biochemical retrotransposition of an RNA payload into the AAVS1 DNA target with and without withdrawal of R2Tg protein, RNA, dNTPs, or MgCl2, as indicated. Above, NGS quantification of insertion efficiency and a schematic of the RNA payload used. Labels on the gel indicate the specific TPRT product, DNA target band, and R2Tg produced nicked fragments. i) Schematic of DNA cleavage end detection using ligation and NGS. Ligation adaptor primers (shown in black) are used in combination with anchored primers on either the left (red) or right end (blue) are used to read out the variable R2Tg cleavage sites. j) Cleavage end detection by next-generation sequencing of the R2Tg generated nicks on the AAVS1 target from Extended Data Fig. 6h in the condition without dNTPs. The color of the reads for 5′ or 3′ ends match the anchored primers shown in the schematic in Extended Data Fig. 6i. Below the plot is a schematic of the AAVS1 target (black) and the homology arms of the payload template (beige and gray). k) Cleavage end detection by next-generation sequencing of the R2Tg generated nicks on the AAVS1 target from Extended Data Fig. 6h in the condition without the RNA template. The color of the reads for 5′ or 3′ ends match the anchored primers shown in the schematic in Extended Data Fig. 6i. Below the plot is a schematic of the AAVS1 target (black) and the homology arms of the payload template (beige and gray). l) Biochemical retrotransposition of an RNA payload into the NOLC DNA target with and without withdrawal of R2Tg protein, RNA, dNTPs, or MgCl2, as indicated. Labels on the gel indicate the specific TPRT product, DNA target band, and R2Tg produced nicked fragments. m) Cleavage end detection by next-generation sequencing of the R2Tg generated nicks on the NOLC1 target from Extended Data Fig. 6l in the condition without dNTPs but with RNA template. The color of the reads for 5′ or 3′ ends match the anchored primers shown in the schematic inset. Below each plot is a schematic of the NOLC1 target (black) and the homology arms of the payload template (beige and gray). n) Cleavage end detection by next-generation sequencing of the R2Tg generated nicks on the NOLC1 target from Extended Data Fig. 6l in the condition without RNA template. The color of the reads for 5′ or 3′ ends match the anchored primers shown in the schematic inset. Below each plot is a schematic of the NOLC1 target (black) and the homology arms of the payload template (beige and gray). Error bars represent mean +/− s.e.m. n = 3 where n represents three biological replicates.
Extended Data Fig. 7 Additional evaluation of SpCas9H840A-R2TgΔ1-183 and SpCas9H840A-R2ToccΔ1-169 STITCHR systems at the endogenous AAVS1 and NOLC1 loci and reporter targets.
a) Schematic of SpCas9H840A fused to N- and C-terminal truncations of R2Tg at different amino acid positions. Not all tested constructs are shown. b) Gluc payload insertion by different SpCas9H840A-R2TgΔ1-183 fusions, according to the schematic in (a), into the endogenous AAVS1 locus quantified by NGS. N-term and C-term denote either N-terminal or C-terminal fusions of the full length R2Tg protein. Denoted residue positions indicate the starting amino acid position of N-terminal R2Tg truncations that are fused to the C-terminal of SpCas9H840A. c) Gluc integration at the endogenous AAVS1 target by SpCas9H840A-R2TgΔ1-183, SpCas9H840A-R2TgΔ1-183,F876A/A877L/D878A/D879A/L880A/V881A/L882A (RTmut), and SpCas9H840A-R2TgΔ1-183,Δ(875-885), and SpCas9H840A alone. Editing rates were quantified by NGS (left) and ddPCR (right). d) TPRT activity in HEK293FT cells with SpCas9H840A alone or fused to R2Tg, R2TgΔ1-183,F876A/A877L/D878A/D879A/L880A/V881A/L882A (RTmut), or R2TgΔ1-183,Δ875-885 into the NOLC1 genomic target with dual guides. EGFP payload contains the full 5′ and 3′ UTRs for R2Tg. e) Gluc payload insertion into a 28S plasmid reporter in HEK293FT cells by selected nLTR retrotransposons fused to SpCas9H840A, with either targeting or non-targeting guides, quantified by Gluc production normalized to a control Cluc. Data is shown as ratio of targeting signal to non-targeting signal. f) Gluc payload insertion into the endogenous AAVS1 locus in HEK293FT cells by selected nLTR retrotransposons fused to SpCas9H840A, with either targeting or non-targeting guides, profiled by next generation sequencing. Editing outcomes are quantified as perfect insertions, insertions with indels, and indels at the unmodified WT target site. g) Gluc payload insertion into the endogenous AAVS1 locus in HEK293FT cells by selected nLTR retrotransposons fused to so SpCas9H840A and an AAVS1-targeting or non-targeting sgRNA control, quantified by ddPCR h) Validation of the AAVS1 3-primer NGS assay using mixes of genomic DNA from unedited or heterozygously inserted cells at AAVS1, as measured by ddPCR. Shown is the known pre-mixed ratio of edited and unedited gDNA (x-axis) vs the measured editing rate by NGS (y-axis). Inset, coefficient of determination between values on x- and y-axes. i) R2Tocc retrotransposition of a synthetic RNA payload into top- and bottom-strand labeled 28S DNA. The top strand is FAM labeled (red); the bottom strand is Cy5 labeled (green). Schematics on the side of the gel indicate the expected size of each band, including the TPRT product and cleaved target fragments. j) Indels or substitutions found in the sequencing reads of AAVS1 in vitro TPRT shown in Fig. 5b, analyzed by CRISPResso. Above, a reference sequence consisting of correct insertion of the Gluc payload into AAVS1 DNA. Below, a schematic of the different insertion outcomes found by sequencing, the raw number of reads and % of total reads which these correspond to. Error bars represent mean +/− (b,g) s.d. or (d, e, f, h) s.d. n = 3 where n represents 3 biological replicates.
Extended Data Fig. 8 STITCHR is functional at multiple endogenous loci.
a) Expression of wild type and mutant R2Tg orthologs (x-axis), quantified by luciferase signal. b) Schematic of STITCHR insertion using intron-containing templates in the following subpanels. An EGFP STITCHR payload containing an interrupting intron is expressed by a CAG promoter. After RNA splicing and TPRT, it is inserted into the genome as an uninterrupted EGFP ORF. Shown are the landing sites of the NGS primers used in the subsequent panels. Shown are the GFP cargo (green bar, approximately 500 bp), interrupting intron (USF1, 245 bp, tetrahymena self-splicing intron, 399 bp), homology sequences (yellow bar, 50 bp each), poly-A tail, genomic sequence (grey bar), external F and R NGS primers (black) and internal reverse primer (blue). c) NGS evaluation of insertion at AAVS1 (left) and EMX1 (right) loci after delivering a plasmid template containing GFP with an interrupting self-splicing tetrahymena intron. Shown is the % insertion of GFP lacking the interrupting intron (i.e. spliced insertion) by SpCas9H840A-R2ToccΔ1-169 or SpCas9H840A. Insertions are quantified as perfect insertions or insertions with indels. d) ddPCR evaluation of AAVS1 insertion after delivering a plasmid template containing an interrupting USF1 intron, which interrupts in two locations in the payload. The ddPCR assay used detects spliced insertion only. Shown is the % spliced insertion by SpCas9H840A-R2ToccΔ1-169, SpCas9H840A-R2TgΔ1-183,RTmut, SpCas9H840A-R2TgΔ1-183,RLEmut, SpCas9H840A-R2TgΔ1-183,RTmut and SpCas9H840A control. e) Gluc reconstitution by correction of a 20 bp deletion by delivering plasmid or synthetic RNA payloads, quantified by Gluc expression normalized to control Cluc. The synthetic RNA template is an extension of the Cas9 sgRNA. f-g) Gluc reconstitution by R2Tg mutants with synthetic RNA payloads extended off the guide RNA as quantified by NGS (f) and Gluc (g) expression normalized to control Cluc. h) STITCHR 20 bp payload insertion on a luciferase reporter plasmid from a synthetic RNA lacking the 5′ UTR and containing a Cas9 guide scaffold (fused) or a synthetic RNA delivered in trans containing the 5′ UTR and the correction sequence (trans). Editing is with or without a Cas9 nicking guide that allows for initiation of TPRT for the trans template. Integration is quantified by NGS and is represented as perfect insertions or insertions with indels. WT indels are also shown which are defined as indels at the unintegrated Gluc locus. i) STITCHR 22 bp payload insertion on an EGFP reporter plasmid from a synthetic RNA lacking the 5′ UTR and containing a Cas9 guide scaffold (fused) or a synthetic RNA delivered in trans containing the 5′ UTR and the correction sequence (trans). Editing is with or without a Cas9 nicking guide that allows for initiation of TPRT for the trans template. Integration is quantified by NGS and is represented as perfect insertions or insertions with indels. WT indels are also shown which are defined as indels at the unintegrated plasmid reporter. j) STITCHR 20 bp payload insertion on a luciferase reporter plasmid from a synthetic RNA lacking the 5′ UTR and containing a Cas9 guide scaffold (fused), a synthetic RNA delivered in trans containing the 5′ UTR and the correction sequence (trans with UTR), and a synthetic RNA delivered in trans containing the correction sequence without a UTR (trans without UTR). SpCas9H840A-R2TgΔ1-183 and SpCas9H840A-R2TgΔ1-183,RTmut are compared to each other and editing is performed +/− a Cas9 nicking guide that allows for initiation of TPRT for the trans template. Integration is quantified by NGS. k) STITCHR 38 bp payload insertion at the endogenous LMNB1 locus from a synthetic RNA lacking the 5′ UTR and containing a Cas9 guide scaffold. Integration is quantified by NGS. l) STITCHR 700 bp EGFP payload insertion in Huh-7 cells at the endogenous NOLC1 locus from an in vitro transcribed mRNA, insertion is quantified by ddPCR. m) Indels or substitutions found in the sequencing reads of NOLC1 insertion experiment shown in Extended Data Fig. 8i, analyzed by CRISPResso. Above, a reference sequence consisting of correct insertion of the Gluc payload into AAVS1 DNA. Below, a schematic of the different insertion outcomes found by sequencing, the raw number of reads and % of total reads which these correspond to. n) Insertion of a GFP payload delivered as an IVT mRNA without UTRs into the human endogenous NOLC1 locus by SpCas9H840A-R2ToccΔ1-169 in Huh-7 cells. Insertion is quantified by NGS and is represented as perfect insertions or insertions with indels. WT indels are also shown which is defined as indels at the unintegrated NOLC1 locus. o) Insertion of a GFP payload delivered as an IVT mRNA with UTRs and other variable modifications, as indicated, into the human endogenous NOLC1 locus by SpCas9H840A-R2ToccΔ1-169 in HEK293FT cells. Insertion is quantified by NGS. p) EGFP payload insertion by STITCHR with SpCas9H840A-R2ToccΔ1-169 into the endogenous NOLC1 locus, with combinations of single and dual guides, compared to a non-targeting guide control and quantified by NGS. q) EGFP payload insertion by STITCHR with SpCas9H840A-R2ToccΔ1-169 into the endogenous LMNB1 locus, with combinations of single and dual guides, compared to a non-targeting guide control and SpCas9H840A alone. Editing was quantified by digital droplet PCR (ddPCR). r) EGFP payload insertion by STITCHR with SpCas9H840A-R2ToccΔ1-169 into the endogenous NOLC1 locus, with combinations of single and dual guides, compared to a non-targeting guide control and profiled by ddPCR. s) EGFP payload insertion by STITCHR with SpCas9H840A-R2ToccΔ1-169 into the endogenous EMX1 locus, with combinations of single and dual guides, compared to a non-targeting guide control and SpCas9H840A alone. Editing quantified by digital droplet PCR (ddPCR). t) EGFP payload insertion by STITCHR with SpCas9H840A-R2ToccΔ1-169 into the endogenous AAVS1 locus, with combinations of single and dual guides, compared to SpCas9H840A alone, SpCas9H840A-R2ToccΔ1-169,RTmut, and SpCas9. Insertion is quantified by ddPCR. u) Comparison of ddPCR and NGS quantification of EGFP payload insertion by STITCHR with SpCas9H840A-R2ToccΔ1-169 into the endogenous AAVS1 locus, with combinations of single and dual guides. Error bars represent mean +/− (a, f, g, h, i, l, n, o, p, q, s, t, u) s.e.m or (c, d, e, j, k) s.d. n = 3 where n represents 3 biological replicates.
Extended Data Fig. 9 Additional evaluation of STITCHR mutants, payload designs in different cell types, and further characterization of STITCHR inserts.
a) Gluc payload insertion by SpCas9H840A-R2ToccΔ1-169 (WT), SpCas9H840A-R2ToccΔ1-169,F811A,A812L,D813A,D814A,L815A,V816A,L817A (RTmut), SpCas9H840A-R2ToccΔ1-169,Δ(811-814), SpCas9H840A-R2ToccΔ1-169,Δ(810-820), and SpCas9H840A at AAVS1. Editing quantified by NGS. b) EGFP insertion by SpCas9H840A-R2ToccΔ1-169 (WT), SpCas9H840A-R2ToccΔ1-169,F811A,A812L,D813A,D814A,L815A,V816A,L817A (RTmut), SpCas9H840A-R2ToccΔ1-169,Δ(876-879), SpCas9H840A-R2ToccΔ1-169,Δ(875-885), and SpCas9H840A at NOLC1. Editing is quantified by ddPCR. c) GFP insertion by SpCas9H840A-R2ToccΔ1-169 (WT), SpCas9H840A-R2ToccΔ1-169,RLEmut, and SpCas9H840A at the endogenous NOLC1 target site. Editing quantified by ddPCR. d) EGFP payload insertion by SpCas9H840A-R2ToccΔ1-169 into the endogenous NOLC1 locus, using payloads with 50 nt homology arms targeting NOLC1 or AAVS1 targets, or without homology. Payloads are evaluated with single, dual, or non-targeting guides and are compared to SpCas9H840A. Editing quantified by ddPCR. N = NOLC1 target. A = AAVS1 target. e) EGFP insertion by STITCHR with SpCas9H840A-R2ToccΔ1-169 into the endogenous NOLC1 locus, with payloads with varying homology arm lengths. Payloads are evaluated with dual or non-targeting guides and are compared to SpCas9H840A. Editing quantified by ddPCR. f) GFP insertion by STITCHR with SpCas9H840A-R2ToccΔ1-169 into the endogenous NOLC1 locus in HepG2 cells, compared to SpCas9H840A. Editing quantified by ddPCR. g) STITCHR EGFP insertion at endogenous EMX1, NOLC1 and two AAVS1 loci in Huh-7 cells by SpCas9H840A-R2ToccΔ1-169 compared to SpCas9H840A-R2ToccΔ1-169,RTmut. Insertion quantified by ddPCR. h) STITCHR EGFP insertion at endogenous EMX1 and NOLC1 loci in HepG2 cells by SpCas9H840A-R2ToccΔ1-169 compared to SpCas9H840A-R2ToccΔ1-169,RTmut. Insertion quantified by ddPCR. i) EGFP insertion at endogenous NOLC1 by STITCHR, delivered by different adenovirus amounts to HEK293FT cells. Shown is a comparison of insertion efficiency when delivering STITCHR machinery with one vector and guides and template with the other, compared to delivery of guides and template only as a control. Editing quantified by NGS. j) EGFP insertion by SpCas9H840A-R2ToccΔ1-169 at NOLC1 in quiescent primary human hepatocyte cells compared to SpCas9H840A control. 1.4e11 viral copies was used in the dual vector condition; half of that for the single vector payload only condition. Editing quantified by NGS. k) EGFP payload insertion by STITCHR at the NOLC1 endogenous locus in HEK293FT cells, comparing editing efficiencies with and without PAM elimination. Editing quantified by ddPCR. l) STITCHR EGFP insertion at the endogenous NOLC1 locus by SpCas9H840A-R2ToccΔ1-169 and SpCas9D10A-R2ToccΔ1-169. Editing quantified by ddPCR. m) PacBio sequencing of a 700 bp EGFP insertion at the endogenous NOLC1 locus by SpCas9H840A-R2ToccΔ1-169. Reads are aligned to the expected reference sequences of scarless NOLC1 insertion. Gray bases indicate a match to the reference sequence; red or black indicate mismatched. n) PacBio sequencing of a 280 bp Gluc payload insertion at the endogenous AAVS1 locus by SpCas9H840A-R2ToccΔ1-169. Reads are aligned to the corresponding expected reference sequences of scarless AAVS1 inseriton. Gray bases indicate a match to the reference sequence; red or black indicate mismatched. o) Additional analysis of the PacBio long read sequencing for complete, incomplete, and concatemeric insertions at the respective sites. p) Schematic of the cross-junction ddPCR assay. Primers amplify across the central junction of a hypothetical concatemeric GFP insertion. Above, a single GFP insert in which the primers are facing in opposite directions and will not amplify. Below, a hypothetical concatemeric insertion in which the primers are facing each other across the concatemer junction, producing amplification. q) Cross-junction ddPCR readout of concatemers generated by STITCHR with SpCas9H840A-R2ToccΔ1-169 and a 700 bp GFP payload into the endogenous NOLC1 locus, benchmarked against synthetic standards r) Cross-junction ddPCR of genomic DNA standards generated from mixtures of genomic DNA from a heterozygous clone containing a 2x concatemeric GFP insertion at NOLC1 with gDNA from WT cells. gDNA was mixed at ratios corresponding to editing efficiencies ranging from 0.01% to 1% editing. Shown is the percentage of gDNA mixing (editing percentage) versus editing detected by cross-junctional ddPCR (measured editing). s) Schematic of ddPCR assay used to assess copy number of insertions. Above, 4 possible insertion outcomes of GFP insertion at AAVS1 by STITCHR: a single insert, tail-to-head and two tail-to-tail concatemeric insertions. Other outcomes are possible that are not depicted (e.g. head-to-head, partial concatemers, >2x concatemers) but would still be detected by the ddPCR design. Shown are primers (black) and the probe (pink box) used in the assay, plus the site of the restriction enzyme Xho1 which separates any concatemers (below), detected as increasing positive droplet concentration. t) CNV ddPCR assay depicted in s), of 10 HEK293FT clones containing a monoallelic, scarless STITCHR insertion of GFP at AAVS1 (indicated with a dotted line), a HEK293FT clone (22n115) containing a tail-to-head 2x GFP insertion at NOLC1 and a negative control (22n22) containing no insertion. Each sample was assayed +/− Xho1 digestion. * = p < 0.05, statistics calculated with unpaired t-test. u) Design of two Southern blots detecting STITCHR inserts at AAVS1 and their expected outcomes. Shown are two designs: an internal probe (left) which hybridizes to the GFP insertion and an external probe (right) which hybridizes outside the insert. For both, shown are 3 possible editing outcomes and their expected sizes: a 2x monoallelic insertion, a 1x monoallelic insertion and no insertion. Other outcomes are possible that are not depicted but will alter the expected band sizes (e.g. 3x insertion, insertion with unexpected insertions/deletions). Shown are the restriction enzyme cut sites, the site where the probe (pink box) hybridizes and, right, the expected banding pattern for each depicted editing outcome. v) Southern blots of 10 HEK293FT clones containing a scarless GFP insert by STITCHR at AAVS1 and a negative clone (WT), utilizing an internal (above) or external (below) probe. Expected band sizes are indicated with a red (inserted) or black (uninserted) asterisk. Error bars represent mean +/− s.d. (a, c, f, g, j, k, r, t) or s.e.m. (b, e, h, i, l, q, r) n = 3 for panels (a-i, k-l, q, r, t), n = 2 (j), n = 1 (o) where n represents biological replicates.
Extended Data Fig. 10 Additional characterization of STICHR insertions, editing outcomes for payload homology truncations, therapeutic gene insertion, and multiplexing.
a-b) Circos plots depicting genome-wide insertion sites of payloads by SpCas9H840A-R2ToccΔ1-169 using sgRNAs and payload homologies to a) AAVS1 (chr19) and b) NOLC1 (chr10). Counts are defined as the number of mapped reads occurring within a 5 kb window. c) Schematic of STITCHR using SpCas9H840A-R2ToccΔ1-169 to insert EGFP as a scarless in-frame fusion at the N-terminus of the human NOLC1 gene. The EGFP template is transcribed in a reverse complement manner to minimize background expression in the absence of insertion with 50 nt homology arms. d) STITCHR-mediated EGFP tagging of NOLC1, visualized by confocal microscopy, and compared to immunofluorescence staining of NOLC1. White scale bar denotes 10 µm. e) Therapeutically relevant payload insertion by STITCHR with SpCas9H840A-R2ToccΔ1-169 into the endogenous AAVS1 locus, with sizes and identities of payload panel members shown and 100 nt homology arms. Integration is quantified by ddPCR and compared to SpCas9H840A. For the NGS, shown are the number of total left junction inserts, left junction inserts containing indels and the WT locus containing indels. f) Evaluation of different sized edits using STITCHR at the NOLC1 locus using either SpCas9H840A-R2ToccΔ1-169 or SpCas9H840A. Inset shows payload design and locus schematic with homology arms colored and top guide in red and bottom guide in blue. g) PacBio HiFi long-read sequencing of a 12.7 kb insert at the endogenous NOLC1 locus by SpCas9H840A-R2ToccΔ1-169 using primers that land externally on the 5’ side and within the insert on the 3’ side. Reads are aligned to the corresponding expected reference sequences of scarless insertion at the NOLC1 locus. Black bases indicate a match to the reference sequence; red indicate mismatched. h) Additional analysis of the PacBio long read sequencing for the 12.7 kb insert at NOLC1 from Extended Data Fig. 10g showing complete, incomplete, and concatemeric insertions at the respective sites. i) Short read sequencing of the right junction of the same sample containing a 12.7 kb insert at NOLC1 used in Extended Data Fig. 10g, showing complete insertion of the right junction. j) Installation of small edits and insertions using STITCHR at the NOLC1 locus, using a U6 promoter for payload expression. k) SpCas9-mediated HDR editing of the EMX1 gene in cells treated with varying concentrations of aphidicolin. Genome editing is quantified by NGS. l) EGFP payload insertion efficiencies at endogenous NOLC1 locus by homology-directed repair (HDR), using SpCas9, at different concentrations of the cell cycle inhibitor aphidicolin or DMSO control. m) EGFP payload insertion (50 nt homology arms) by STITCHR with SpCas9H840A-R2ToccΔ1-169 into the endogenous AAVS1 locus in cells treated with cell cycling inhibitor Mirin or double thymidine. Integration is quantified by NGS and compared to SpCas9H840A. n) SpCas9-mediated HDR editing of the EMX1 gene in cells treated with cell cycling inhibitor Mirin or double thymidine. Genome editing is quantified by NGS. o) Schematic of STITCHR-replace methodology involving replacement of a region of the genome while inserting the STITCHR payload. Top guide is shown in red and the bottom guide in blue. p) Evaluation of STITCHR-replace at the NOLC1 locus using a single guide and homology arms spaced 50–150 bp apart on the genome. R2ToccRTmut corresponds to the RT inactivation mutant: F811A/A812L/D813A/D814A/L815A/V816A/L817A. q) Example sequencing reads of the EGFP insertion site at NOLC1 for STITCHR replace, showing the desired 50–150 bp deletions. r) ddPCR quantification of multiplexed gene integration by STITCHR with SpCas9H840A-R2ToccΔ1-169 at NOLC1 and AAVS1 sites. EGFP payload insertion at NOLC1 is quantified by ddPCR, and Gluc insertion at AAVS1 is quantified by NGS. Targeting conditions are compared to non-targeting guide controls. Error bars represent mean +/− s.e.m (d, e, k, l, m, n) or s.d. (f, j, p, r). n = 3 where n represents 3 biological replicates.
Supplementary information
Supplementary Information
This file contains Supplementary Discussion, Supplementary References, Supplementary Tables 1–13 and legends to Supplementary Figs. 1–6.
Supplementary Data 1
Computationally mined ORF list. A full list of the mined R2 ORFs from Fig. 1, showing NCBI accession numbers, the species, the ORF protein sequence, Rfam annotations and distances to preferred insertion sites.
Supplementary Figures
Supplementary Figs. 1–6.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fell, C.W., Villiger, L., Lim, J. et al. Reprogramming site-specific retrotransposon activity to new DNA sites. Nature (2025). https://doi.org/10.1038/s41586-025-08877-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41586-025-08877-4
This article is cited by
-
Powerful CRISPR system inserts whole gene into human DNA
Nature (2025)
-
Precise genome editing process and its applications in plants driven by AI
Functional & Integrative Genomics (2025)