Introduction

Base editors (BEs) are developed to overcome the challenges associated with Cas9-mediated point mutation induction, which relies on homology-directed repair (HDR) using donor DNA and results in double-strand breaks (DSBs) that can lead to unintended insertions or deletions (indels) at the target site1,2. Three principal types of programmable deaminases are engineered for precise genome editing: CBEs, which mediate C-to-T conversions; Adenine Base Editors (ABEs), which facilitate A-to-G conversions; and Thymine Base Editors (TBEs), which induce T-to-V (where V represents A, C, or G) conversions3,4,5,6,7,8. These BEs operate in a gRNA-dependent manner, allowing for targeted and specific nucleotide modifications. Alternative genome editing tools, such as Transcription Activator-like Effector nucleases (TALEs) and Zinc Finger Proteins (ZFPs), also enable precise genome editing and are utilized in mitochondrial genome editing9,10,11,12,13. In a recent study, the Gao group proposes engineering deaminases, including Sdd7, known as the most active deaminase, using AI-guided, structure-based protein clustering14.

Despite the high activity of Sdd7, we observed that Sdd7 can induce unintended C-to-T mutations at upstream of the protospacer, particularly within the TC context. To address this limitation, we engineer Sdd7 variants to reduce bystander editing while maintaining on-target activity. Our engineered Sdd7 variants demonstrate a minimized editing window with decreased bystander mutations at upstream of the protospacer and significantly reduced both gRNA-dependent and gRNA-independent off-target effects. These findings expand the base editing toolkit by providing high-precision editors with minimized bystander and off-target effects, supporting broader applications in genome editing that demand high specificity and precision.

Results

Comparison of Sdd7 and BE4max

First, we assessed the base editing efficiency generated by Sdd7 compared to the widely used BE4max at 30 endogenous target sites in HEK293T cells. Plasmids encoding either Sdd7 or BE4max were co-transfected with gRNA constructs into the HEK293T cells and measured base editing efficiencies using targeted deep sequencing (Fig. 1a). Both CBEs exhibited a wide range of mutation frequencies across the target sites. The average base editing frequency for Sdd7 was 60.12% ± 2.35%, with an indel frequency of 1.31% ± 0.22%, which were slightly higher than those observed for BE4max (base editing: 56.69% ± 3.26%; indels: 0.94% ± 0.15%) (Fig. 1b and Supplementary Fig. 1). Although the base editing frequencies of BE4max and Sdd7 were correlated (R = 0.74) (Supplementary Fig. 2a), Sdd7 showed significantly greater editing efficiency at specific sites such as RELA, POMC, NOS2-2, and FANCF, while BE4max demonstrated superior editing efficiency at other sites, including PTGER1 (Fig. 1a).

Fig. 1: Comparative analysis of base editing efficiencies of BE4max and Sdd7 in HEK293T cells.
figure 1

a Bar graphs showing the base editing frequencies of BE4max and Sdd7 at 30 genomic sites in HEK293T cells. Data are presented as mean values, with error bars indicating the standard error of the mean (SEM) from independent biological triplicates. b Representative Box-and-whisker plots depicting the distribution of base editing frequencies and indel frequencies across the 30 target sites for both BE4max and Sdd7. c Comparison of the base editing windows of BE4max and Sdd7 across the 30 target sites, illustrating the editing efficiencies at each nucleotide position within the protospacer sequence (positions 1 to 20, 5′–3′ direction). d Base editing context analysis of BE4max and Sdd7 across the 30 endogenous target sites in HEK293T cells. e Representative box-and-whisker plots display the distribution of base editing frequencies across 10 target sites for BE4max and Sdd7 in K562 and SKOV3 cells. f Base editing frequencies of gRNA-independent off-target deamination activities of BE4max and Sdd7 using the dSaCas9-mediated orthogonal R-loop assay in HEK293T cells. Data represent mean values with SEM from independent biological triplicates. g Analysis of bystander cytosine editing frequencies in the upstream regions of the target sites (positions −1 to −43) for BE4max and Sdd7 across the 30 endogenous sites. h Schematic representation illustrating the proposed mechanism by which the Sdd7 base editor induces cytosine deamination both at the target site and in upstream regions. Schematic created in BioRender. Kim, D. (2025) https://BioRender.com/k20k736. For (be), Box plots represented that the median (central line), interquartile range (box, 25th–75th percentiles), and full data range (whiskers, minima to maxima), with individual data points shown as dots. Statistical analysis measured by * p ≤ 0.05, ** p ≤ 0.01, and *** p ≤ 0.001 (Student’s two-tailed t-test).

Next, we investigated the base editing windows and product purity of Sdd7 in comparison to BE4max. Sdd7 displayed a broader base editing window compared to BE4max, with higher editing efficiency at positions 2 to 11 within the protospacer sequence (positions numbered 1 to 23 in the 5′–3′ direction) (Fig. 1c). Notably, Sdd7 showed significantly greater efficiency at positions from 2 to 7, 12, and 14 compared to BE4max (Fig. 1c). The product purity, defined as the proportion of desired edits (C-to-T conversion) relative to total edits (C-to-G/A conversion), was found to be similar between Sdd7 and BE4max (Supplementary Fig. 2b). Furthermore, consistent with previous reports, BE4max demonstrated lower editing efficiency in GC context3,4,15,16 (Fig. 1d). In contrast, Sdd7 exhibited no apparent sequence context preference, maintaining robust editing efficiencies regardless of GC context (Fig. 1d). Subsequently, we evaluated the base editing frequencies of BE4max and Sdd7 in additional cell lines, specifically K562 and SKOV3, across 10 different target sites. In both cell lines, Sdd7 consistently demonstrated significantly higher average base editing frequencies compared to BE4max (Fig. 1e and Supplementary Fig. 2c), further underscoring the superior performance of Sdd7 in these cellular contexts.

We further compared the gRNA-independent deamination activities of BE4max and Sdd7 using an orthogonal R-loop assay with catalytically inactive Staphylococcus aureus Cas9 (dSaCas9) and saCas9 gRNA17. HEK293T cells were transfected with plasmid DNA encoding either Sdd7 or BE4max, along with gRNA, dSaCas9, and saCas9 gRNA to induce artificially single-stranded DNA regions (Fig. 1f). Targeted deep sequencing was employed to measure gRNA-independent deamination within dSaCas9-induced R-loops at five endogenous genomic loci. Compared to BE4max, Sdd7 induced increased gRNA-independent off-target deamination at loci 1 to 4 and 6, whereas its activity at locus 5 was reduced (Fig. 1f). These results indicate that Sdd7 has a higher propensity for gRNA-independent DNA deamination than BE4max.

To investigate whether Sdd7 and BE4max induce bystander base conversions outside the intended target sequence, we analyzed base editing events occurring both upstream and downstream of the target sites at 30 endogenous loci in HEK293T cells (Fig. 1g and Supplementary Figs. 2d and 3). Both Sdd7 and BE4max exhibited bystander editing activity, particularly in regions upstream of the target sites within a TC sequence context (Fig. 1g). Notably, we did not observe significant bystander editing in the downstream of the protospacer regions (Supplementary Fig. 2d). These results suggest that Sdd7 induced higher bystander editing efficiencies compared to BE4max in the upstream regions of the target sites (Fig. 1g, h).

Rational engineering of Sdd7

We observed that Sdd7 exhibited elevated cytosine bystander editing upstream of the target sites (Fig. 1g) and increased gRNA-independent off-target DNA editing. To mitigate these undesired bystander editing effects, we aimed to enhance the specificity of Sdd7 through protein engineering (Fig. 2a). We first performed a comparative amino acid sequence analysis of Sdd7 with ten other deaminases, including seven Sdd variants and three Ddd variants reported by the Gao group (Fig. 2b)14. This analysis aimed to identify conserved residues that could be modified to reduce bystander editing without compromising on-target efficiency. This analysis identified conserved sequences at positions 132 and 144 across the deaminases (Fig. 2b). V132 and V144 are located in a hydrophobic region that is critical for maintaining the three-dimensional structure of Sdd7 (Fig. 2b). A comparison of the hydrophobic region amino acid sequences among Sdd isotypes reveals that residue 132 is conserved as either Valine or Leucine, and residue 144 as either Valine or Methionine (Fig. 2b). We hypothesized that this amino acid conservation is essential for the structural integrity and function of the Sdd protein. Based on this, we posited that introducing mutations at these conserved residues without drastically altering the overall structure could modulate enzyme specificity and activity. In this study, we substituted the Valine at positions 132 and 144 with the conserved amino acids Leucine (V132L) and Methionine (V144M), respectively, to induce changes in enzyme specificity and activity while preserving the protein’s basic structure and function. Next, we co-transfected HEK293T cells with plasmid DNA encoding either Sdd7 or its variants, along with gRNA constructs, and assessed base editing efficiencies at both the target site and upstream regions using targeted deep sequencing. The V144M variant exhibited a complete loss of deaminase activity, while the V132L variant showed significantly reduced bystander activity upstream of the target sites, without compromising on-target editing efficiency (Fig. 2c–e and Supplementary Fig. 4a–c).

Fig. 2: Engineering of Sdd7 base editor to reduce bystander editing in upstream regions.
figure 2

a Schematic overview illustrating the engineering strategy of the Sdd7 base editor designed to inhibit bystander editing in the upstream region of the target site. Schematic created in BioRender. Kim, D. (2025) https://BioRender.com/k20k736. b Sequence alignment and structural representation of the Sdd7 protein demonstrating the homologous engineering of Sdd7 by substituting key residues with amino acids conserved in related deaminases. c Bar graphs showing the on-target editing frequencies of Sdd7, Sdd7 V132L, and Sdd7 V144M at the HEK3 and HEK4 sites. d Heatmaps depicting the editing frequencies of Sdd7, Sdd7 V132L, and Sdd7 V144M across protospacer positions and upstream regions at two genomic sites. e Bar graph representing the ratio of bystander base editing efficiencies in the upstream region to the on-target base editing efficiencies for Sdd7, Sdd7 V132L, and Sdd7 V144M. f Structural representation of the Sdd7 protein highlighting the computational analysis used to identify electrostatic surface regions predicted to interact with non-specific double-stranded DNA. Key arginine and lysine residues are indicated. g Bar graphs showing the on-target editing frequencies for six Sdd7 variants. h Heatmaps illustrating the editing frequencies for Sdd7 and the six Sdd7 variants across protospacer positions and upstream regions. i Bar graph representing the ratio of bystander base editing efficiencies in the upstream region to the on-target base editing efficiencies for the six Sdd7 variants. j Bar graphs depicting the on-target editing frequencies for seven Sdd7 combination variants. k Heatmaps showing the editing frequencies for the seven Sdd7 combination variants across the spacer region and upstream of the target sites. l Bar graph representing the ratio of bystander base editing efficiencies in the upstream region to the on-target base editing efficiencies for the seven Sdd7 combination variants. For (c, e, g, i, j, and l), data are presented as mean ± SEM from independent biological triplicates. For (e, i, and l), relative frequency (%) was calculated by (bystander base editing efficiencies/on-target base editing efficiencies) × 100. For (d, h, and k), color intensity indicates the level of editing at each position.

We next hypothesized that the upstream bystander editing mediated by Sdd7 could result from non-specific double-stranded DNA interactions, we performed computational analysis to identify electrostatic surface regions predicted to interact with non-specific double-stranded DNA (Fig. 2f). This analysis revealed five arginine residues and one lysine residue forming positively charged surfaces thought to be critical for DNA binding. We substituted these residues with alanine, generating the Sdd7 variants R119A, R122A, K150A, R153A, R155A, and R157A. These variants were co-transfected into HEK293T cells along with gRNA, and base editing efficiencies were assessed at both the target site and upstream regions using targeted deep sequencing. Notably, the Sdd7 variants R119A and R153A demonstrated significantly reduced bystander editing in upstream regions without compromising on-target editing efficiency (Fig. 2g–i and Supplementary Fig. 4d–f).

Based on previous experiments, we identified three Sdd7 variants (V132L, R119A, and R153A) that significantly reduced bystander editing in upstream regions. To explore potential synergistic effects, we generated Sdd7 variants containing all possible double and triple combinations of these substitutions. These variants were co-transfected into HEK293T cells along with gRNA, and base editing efficiencies were evaluated at both the target site and upstream regions using targeted deep sequencing. The results demonstrated that one double-substituted variant (V132L and R153A) and the triple-substitution variant (V132L, R119A, and R153A) exhibited minimal bystander editing in upstream regions, nearing background levels (Fig. 2j–l and Supplementary Fig. 4g–i). Based on these findings, we selected the double-substituted variant (hereafter referred to as Sdd7e1) and the triple-substitution variant (hereafter referred to as Sdd7e2) for further analysis.

Engineered Sdd7 reduces bystander cytosine base editing

To evaluate the base editing efficiencies of the engineered Sdd7 variants (Sdd7e1 and Sdd7e2) compared to the original Sdd7, we transfected HEK293T cells with plasmid DNA encoding either BE4max, Sdd7, Sdd7e1, or Sdd7e2, along with gRNA constructs targeting 30 endogenous sites and base editing frequencies were measured using targeted deep sequencing. (Fig. 3a and Supplementary Fig. 5). The average base editing frequencies of Sdd7e1 (59.75 ± 0.53%) and Sdd7e2 (60.48 ± 0.63%) were comparable to that of Sdd7 (62.09 ± 0.61%), indicating that the engineered variants maintain similar on-target editing efficiencies (Fig. 3b). However, the indel frequencies at the target sites were slightly higher for Sdd7e1 (3.07 ± 0.67%) compared to Sdd7 (1.93 ± 0.29%) and Sdd7e2 (2.08 ± 0.41%) (Fig. 3b and Supplementary Fig. 6a).

Fig. 3: Comparative analysis of base editing efficiencies and bystander editing of BE4max, Sdd7, Sdd7e1, and Sdd7e2.
figure 3

a Bar graphs displaying the base editing frequencies of BE4max, Sdd7, Sdd7e1, and Sdd7e2 at 30 genomic sites. Data are presented as mean values, with error bars indicating SEM from independent biological triplicates. b Representative Box-and-whisker plots illustrating the distribution of base editing frequencies (left) and indel frequencies (right) for BE4max, Sdd7, Sdd7e1, and Sdd7e2 across the 30 endogenous target sites. Statistical analysis measured by * p ≤ 0.05, and ** p ≤ 0.01 (Student’s two-tailed t-test). c Representative Box-and-whisker plots showing the base editing efficiencies at each protospacer position (positions 1–20, numbered from the 5′ to 3′ end) for BE4max, Sdd7, Sdd7e1, and Sdd7e2, aggregated from the 30 endogenous sites. d Bar graphs depicting the fractions of modified bases (number of mutated cytosines) within the protospacer regions at the RNF2 and HEK3 sites for BE4max, Sdd7, Sdd7e1, and Sdd7e2. Data are presented as mean values, with error bars indicating SEM from independent biological triplicates. e Representative graph depicting the distribution of base editing frequencies for BE4max, Sdd7, Sdd7e1, and Sdd7e2 across K562, SKOV3, iPSC, and HSC. f Bar graphs illustrating the base editing frequencies for BE4max, Sdd7, Sdd7e1, Sdd7e2, YE1-BE4max, YE2-BE4max, eA3Amax, and TadCBEd at 5 endogenous sites. g Analysis of bystander cytosine editing frequencies in the upstream regions of the target sites (positions −1 to −43) for BE4max, Sdd7, Sdd7e1, and Sdd7e2 across the 30 endogenous sites. Statistical analysis measured by * p ≤ 0.05, ** p ≤ 0.01, and *** p ≤ 0.001 (Student’s two-tailed t-test). For (b, c, e, and g), box plots represented that the median (central line), interquartile range (box, 25th–75th percentiles), and full data range (whiskers, minima to maxima), with individual data points shown as dots. For (e and f), data are presented as mean ± SEM from three independent biological replicates.

We confirmed that Sdd7 efficiently edited cytosines at positions 2 to 11, demonstrating a significantly broader editing window compared to BE4max (Fig. 1c). To assess whether the engineered variants alter this window, we measured editing efficiencies at each position of the protospacer sequence (Fig. 3c). While Sdd7e1 and Sdd7e2 exhibited comparable editing efficiencies at positions 5 and 6, their editing efficiencies at positions 2–4 and 7–12 were significantly reduced compared to the original Sdd7 (Fig. 3c and Supplementary Figs. 5 and 6b). We normalized the editing efficiency at the highest-activity position for each target site to 100% and calculated the relative activities at other positions. At position 2, the relative editing frequencies of Sdd7 were 2.1-fold and 4.0-fold higher than those of Sdd7e1 and Sdd7e2, respectively, indicating a reduction in bystander editing in the engineered variants at protospacer regions (Fig. 3c).

Given that CBEs can potentially mutate any cytosine within the editing window, achieving single base changes is crucial for the precise correction of pathogenic mutations. We therefore assessed the fraction of single-nucleotide mutations within the editing window and found that the engineered Sdd7 variants, particularly Sdd7e2, produced a higher proportion of single-point mutations at the target sites (Fig. 3d and Supplementary Fig. 6c). For instance, the fraction of single mutations induced by Sdd7e2 (36.90 ± 0.79%) was significantly higher than that induced by Sdd7 (3.73 ± 0.16%) at RNF2 site (Fig. 3d). Subsequently, we evaluated the base editing frequencies of BE4max, Sdd7, Sdd7e1, and Sdd7e2 in additional cell lines, including K562, SKOV3, induced pluripotent stem cell (iPSC), and CD34⁺ Hematopoietic stem cell (HSC). In K562 and SKOV3 cells, Sdd7e1 exhibited lower activity compared to Sdd7 in K562 and comparable activity in SKOV3, whereas Sdd7e2 showed similar activity compared to Sdd7 in both cell lines (Fig. 3e and Supplementary Fig. 6d). Additionally, although Sdd7e2 displayed reduced activity in HSCs, both Sdd7e1 and Sdd7e2 exhibited comparable activity in iPSCs (Fig. 3e). These results underscore that the performance of Sdd7 and its engineered variants varies according to the target site and cellular context.

We next compared the editing frequency and window of Sdd7, Sdd7e1, and Sdd7e2 relative to established narrow-window CBEs, including YE1-BE4max, YE2-BE4max, eA3Amax, and the newer generation TadCBEd7,15,17,18,19. Although target site-dependent differences were observed, YE1-BE4max and YE2-BE4max consistently exhibited a narrower editing window compared to the other CBEs (Fig. 3f and Supplementary Fig. 7).

We also examined bystander editing upstream of the protospacer region. The engineered Sdd7 variants (Sdd7e1 and Sdd7e2) showed significantly reduced bystander editing in the upstream TC context from 2.83 ± 0.41% for Sdd7 to 0.86 ± 0.18% for Sdd7e1 and 0.73 ± 0.18% for Sdd7e2 (Fig. 3g and Supplementary Fig. 8), which is lower than that observed for BE4max (1.29 ± 0.43%) (Fig. 3g). These results suggest that the engineered Sdd7 variants offer improved specificity by minimizing undesired bystander mutations both within the protospacer and upstream regions.

Engineered Sdd7 variants reduce gRNA-dependent and gRNA-independent DNA off-target effects and transcriptome-wide RNA off-target effects

Next, we evaluated the off-target activities of the engineered Sdd7 variants, Sdd7e1 and Sdd7e2, in comparison to the original Sdd7. For gRNA-dependent off-target assessment, potential off-target sites with single or double mismatches were identified using Cas-OFFinder20, and base editing frequencies were measured by targeted deep sequencing (Fig. 4a). The engineered variants demonstrated reduced gRNA-dependent off-target base editing at most of these sites, achieving improved specificity ratios of up to 7.8 for Sdd7e1 and 9.6 for Sdd7e2 (Fig. 4a). To assess genome-wide off-target effects, we performed Digenome-seq21,22,23, which identified 12 off-target sites for FANCF and 64 off-target sites for HEK4 (Supplementary Data 1). After than, we validated all of potential off-target sites for FANCF and 14 out of 64 off-target sites using targeted deep sequencing. Subsequent validation of all FANCF off-target sites and 14 of the HEK4 off-target sites via targeted deep sequencing confirmed that the engineered variants maintained lower gRNA-dependent off-target editing, with specificity ratios reaching up to 16.8 for Sdd7e1 and 23.9 for Sdd7e2 (Fig. 4b).

Fig. 4: Evaluation of off-target editing activities of engineered Sdd7.
figure 4

a Editing frequencies at potential off-target sites for BE4max, Sdd7, Sdd7e1, and Sdd7e2 were measured by targeted deep sequencing in HEK293T cells. PAM sequences are indicated in blue, and mismatched bases are shown in red lowercase letters. Specificity ratios were calculated by dividing the specificity of each Sdd7 variant (on-target frequency/off-target frequency) by that of the original Sdd7 (on-target frequency/off-target frequency). Data are presented as mean values, with error bars representing SEM from three independent biological replicates. b Editing frequencies at off-target sites captured by Digenome-seq for BE4max, Sdd7, Sdd7e1, and Sdd7e2 were quantified by targeted deep sequencing in HEK293T cells. Data are presented as mean ± SEM from three independent biological replicates. c OTI values for BE4max, Sdd7, Sdd7e1, and Sdd7e2 at the FANCF and HEK4 sites. The OTI is calculated as the ratio of the sum of base editing frequencies at all off-target sites to the on-target base editing frequency. Data are presented as mean values, with error bars representing SEM from three independent biological replicates. d Measurement of gRNA-independent off-target deamination activities of BE4max, Sdd7, Sdd7e1, and Sdd7e2 using the dSaCas9-mediated orthogonal R-loop assay in HEK293T cells. Data are presented as mean values, with error bars representing SEM from three independent biological replicates. e Representative heatmap depicting gRNA-independent off-target deamination frequencies measured by the dSaCas9-mediated orthogonal R-loop assay in HEK293T cells. f Cas9-independent RNA off-target deamination for BE4max, Sdd7, Sdd7e1, and Sdd7e2 in HEK293T cells was evaluated by transcriptome sequencing, which determined both the number of C-to-U edited nucleotides and the overall frequency of RNA C-to-U editing.

To quantitatively compare off-target effects, we calculated the Off-Target Effect Index (OTI), defined as the ratio of the sum of base editing frequencies at off-target sites to the on-target base editing frequency24. This calculation was performed for BE4max, Sdd7, Sdd7e1, and Sdd7e2 when targeting the FANCF and HEK4 loci (Fig. 4c). The OTI values for Sdd7e1 and Sdd7e2 were lower than those for Sdd7 at both target sites, indicating a reduction in gRNA-dependent off-target activity (Fig. 4c). Specifically, the OTI of Sdd7e1 and Sdd7e2 was lower than that of BE4max when targeting FANCF, and only slightly higher than BE4max when targeting HEK4 (Fig. 4c). These findings suggest that Sdd7e1 and Sdd7e2 possess enhanced specificity, effectively minimizing unintended gRNA-dependent off-target editing.

To assess gRNA-independent off-target effects, we employed the orthogonal R-loop assay. Using this method, we found that both Sdd7e1 and Sdd7e2 exhibited reduced gRNA-independent off-target deamination compared to the original Sdd7 (Fig. 4d). Notably, Sdd7e1 demonstrated particularly low levels of gRNA-independent off-target activity (Fig. 4d, e and Supplementary Fig. 9). These observations indicate that the engineered Sdd7 variants effectively minimize both gRNA-dependent and gRNA-independent off-target editing while maintaining their on-target editing efficiency.

CBEs have been reported to induce RNA off-target effects25,26. To evaluate transcriptome-wide RNA off-target editing, we performed comprehensive RNA sequencing (RNA-seq) analysis. Plasmid DNAs encoding BE4max, Sdd7, Sdd7e1, and Sdd7e2 were transfected with a gRNA targeting the HEK2 locus, and RNA was isolated 48 h post-transfection. The number of RNA off-target sites for Sdd7 was comparable to, or slightly lower than, that for BE4max. Notably, the engineered variants Sdd7e1 and Sdd7e2 exhibited a markedly reduced number of RNA off-target sites compared to both BE4max and Sdd7, approaching levels observed in the untreated control group (Fig. 4f). These findings demonstrate that the engineered Sdd7 variants substantially mitigate transcriptome-wide RNA off-target editing, achieving near-baseline levels.

eVLP-mediated delivery enhances specificity of engineered Sdd7

To further enhance the specificity of Sdd7 variants and reduce bystander editing, we utilized eVLPs to deliver Sdd7e1 and Sdd7e2 as RNPs. These eVLPs encapsulated the BE4max, Sdd7, Sdd7e1, or Sdd7e2 proteins along with their corresponding sgRNAs27. Notably, the base editing efficiencies achieved by eVLP-mediated delivery increased with the treated volume of eVLP particles (Fig. 5a and Supplementary Fig. 10). Infection of HEK293T cells with these eVLPs resulted in base editing efficiencies that were comparable to, or slightly lower than, those observed with plasmid DNA delivery, depending on the target sites (Fig. 5b and Supplementary Fig. 11a). To assess the precision of the base editing, we evaluated the fraction of single-nucleotide mutations within the editing window (Fig. 5c and Supplementary Fig. 11b). We found that eVLP-mediated delivery induced a higher proportion of single-point mutations at the target sites compared to plasmid DNA delivery (Fig. 5c and Supplementary Fig. 11b). For instance, the fraction of single mutations induced by Sdd7e2 delivered via eVLPs was 94.30 ± 0.51%, significantly higher than that induced by Sdd7 plasmid DNA (5.47 ± 0.18%) and Sdd7e2 plasmid DNA (52.93 ± 0.36%), at the HEK4 site (Fig. 5c).

Fig. 5: eVLP-mediated delivery enhances base editing specificity of engineered Sdd7.
figure 5

a Base editing frequency of Sdd7e1 and Sdd7e2 delivered via eVLPs at the HEK4 and SSTR5 genomic sites in HEK293T cells. b Comparative analysis of base editing efficiencies of Sdd7e1 and Sdd7e2 delivered by plasmid DNA transfection versus eVLP-mediated delivery in HEK293T cells. c Bar graphs illustrating the fractions of modified bases (number of mutated cytosines) within the protospacer regions at the HEK4 and SSTR5 sites for Sdd7e1 and Sdd7e2 delivered via plasmid DNA and eVLPs in HEK293T cells. d Heatmaps showing the editing frequencies of Sdd7e1 and Sdd7e2 delivered via plasmid DNA and eVLPs across protospacer positions and upstream regions at the HEK4 and SSTR5 sites. Color intensity corresponds to the level of editing efficiency at each nucleotide position. e Bar graphs representing the ratio of bystander base editing efficiencies in the upstream region to the on-target base editing efficiencies for Sdd7e1 and Sdd7e2 delivered via plasmid DNA and eVLPs. Lower ratios indicate higher editing specificity and reduced unintended edits. Error bars represent SEM from independent biological triplicates. Relative frequency (%) was calculated by (bystander base editing efficiencies/on-target base editing efficiencies) × 100. f, g Editing frequencies at potential off-target sites for Sdd7e1 (f) and Sdd7e2 (g) delivered via plasmid DNA and eVLPs, measured by targeted deep sequencing in HEK293T cells. PAM sequences are indicated in blue, and mismatched bases are shown in red lowercase letters. Specificity ratios were calculated by dividing the specificity of eVLP-mediated delivery (on-target frequency/off-target frequency) by that of plasmid DNA-mediated delivery. h OTI values for Sdd7e1 and Sdd7e2 delivered via plasmid DNA and eVLPs at the HEK4 site. The OTI is calculated as the ratio of the sum of base editing frequencies at all off-target sites to the on-target base editing frequency. For (a–c, and e–h), data are presented as mean ± SEM from three independent biological replicates.

We compared bystander editing frequencies in the upstream regions of target sites following eVLP-mediated delivery and plasmid DNA delivery. Remarkably, the bystander editing frequencies in the upstream regions were dramatically reduced to levels approaching the sequencing detection limit with eVLP delivery (Fig. 5d, e and Supplementary Fig. 11c). Additionally, the base editing window at the target sites was narrower with eVLP-mediated delivery compared to plasmid DNA delivery (Fig. 5d and Supplementary Fig. 11c).

We next compared the gRNA-dependent off-target effects of plasmid DNA delivery and eVLP-mediated RNP delivery. The eVLP-mediated RNP delivery exhibited significantly higher specificity, with specificity ratios up to 121.0 for Sdd7e1 and up to 242.7 for Sdd7e2, compared to plasmid DNA delivery (Fig. 5f, g). The OTI values for eVLP-mediated Sdd7e1 and Sdd7e2 RNP delivery were dramatically decreased compared to plasmid DNA delivery (Fig. 5h).

These findings indicate that eVLP-mediated delivery of Sdd7e1 and Sdd7e2 RNPs enhances the specificity of base editing by increasing the fraction of precise single-point mutations while significantly reducing unintended bystander edits and off-target effects. Therefore, eVLP-mediated delivery represents a promising strategy for achieving efficient and potentially safer gene editing applications, improving the overall precision of genome editing.

Discussion

The precise and efficient correction of single-nucleotide pathogenic mutations is crucial for realizing the full potential of gene therapy. In this study, we conducted a comprehensive analysis of Sdd7 in comparison with the widely used CBEs, BE4max, aiming to enhance specificity and reduce off-target effects. Our results demonstrate that although Sdd7 exhibits slightly higher or comparable on-target editing efficiency in HEK293T cell (Fig. 1a, b) and significantly higher activity in K562 and SKOV3 cells (Fig. 1e). However, it also introduces elevated levels of bystander mutations, particularly in the upstream regions of target sites (Fig. 1g), and gRNA-dependent and gRNA-independent off-target deamination (Figs. 1f and 4a, b). Through rational protein engineering, we developed two Sdd7 variants (Sdd7e1 and Sdd7e2) that significantly mitigate these undesired effects while maintaining robust on-target activity.

Our initial comparisons revealed that Sdd7 has a higher average base editing efficiency (60.12 ± 0.35%) compared to BE4max (56.69 ± 0.26%), effectively editing cytosines at positions 2 to 14 within the protospacer sequence. However, Sdd7 induces bystander mutations upstream of the target sites, particularly within a TC context. Such off-target modifications pose significant risks for therapeutic applications due to the potential for unintended genomic alterations. These findings highlight that CBEs can induce bystander editing in upstream regions of target sites, emphasizing the necessity of monitoring not only the protospacer region but also adjacent regions for unintended base editing. Recent findings from the Kohli group have demonstrated that activation-induced deaminase (AID) base editors (AID-BE) can induce bystander editing upstream of the spacer sequence, resulting in both C-to-T and G-to-A conversions when dCas9 is employed28. In contrast, the use of Cas9 D10A nickase with AID-BE has been shown to suppress the occurrence of G-to-A conversions in this region. Since we used the Cas9 D10A nickase in conjunction with BE4max and Sdd7, our results predominantly revealed C-to-T conversions upstream of the spacer sequence, with minimal evidence of G-to-A editing, consistent with previous reports28.

To address this challenge, we employed rational engineering to refine Sdd7’s specificity. Liu and colleagues previously demonstrated that modifying hydrophobic residues within the active site of APOBEC1-derived editors (YE1-BE3 and YE2-BE3) effectively narrowed their editing window by modulating active site hydrophobicity18. Inspired by this approach, we analyzed conserved amino acid sequences among ten deaminase variants reported by the Gao group14, we identified key residues at positions 132 and 144 within a structurally essential hydrophobic region. We hypothesized that substituting valine at position 132 with leucine (V132L) could similarly change the editing window. Additionally, previous studies showed that introducing mutations at amino acids involved in non-specific DNA interactions could enhance the fidelity of Cas9 nucleases (e.g., eSpCas9 and SpCas9-HF)29,30. Following this rationale, we introduced mutations at arginine residues (R119A and R153A) predicted to mediate non-specific interactions between Sdd7 and target DNA. Consequently, the double mutant Sdd7e1 (V132L and R153A) and the triple mutant Sdd7e2 (V132L, R119A, and R153A) demonstrated substantially decreased off-target effects, and a notably narrowed editing window compared to the original Sdd7.

Importantly, both Sdd7e1 and Sdd7e2 maintained on-target editing efficiencies comparable to the original Sdd7, with average base editing frequencies of 59.75 ± 0.53% and 60.48 ± 0.63%, respectively (Fig. 3a). The engineered variants displayed a narrowed editing window, primarily affecting positions 5 and 6 (Fig. 3c), which is advantageous for applications requiring precise nucleotide modifications. The reduction in editing at positions 2–4 and 7–12 significantly decreased the likelihood of unintended mutations. Moreover, Sdd7e2 induced a higher proportion of single-point mutations, highlighting its improved specificity for therapeutic targeting. However, we observed unexpectedly increased indel frequencies of Sdd7e1 and Sdd7e2 compared to original Sdd7 (Fig. 3b). Although the precise mechanism underlying these elevated indel rates remains unclear, one plausible explanation is that these indels reflect a known characteristic of CBEs, which typically exhibit deletions spanning the region between the PAM-proximal HNH ___domain nick site and the PAM-distal cytosine deamination peak position15. Consistent with this mechanism, our analysis indicated that Sdd7, Sdd7e1, and Sdd7e2 similarly exhibit deletions concentrated within this interval (Supplementary Fig. 6a). It is important to note that our use of PCR amplicon-based NGS for indel frequency measurement may not adequately capture large deletions. Given the frequent upstream deamination events observed with Sdd7, a significant proportion of large deletions may be generated yet remain undetected, potentially leading to an underestimation of the true indel frequency.

Our assessment of off-target activities revealed that the engineered variants substantially reduce both gRNA-dependent and gRNA-independent off-target effects. Using targeted deep sequencing, we observed improved specificity ratios of up to 16.8 for Sdd7e1 and 23.9 for Sdd7e2 (Fig. 4a, b). The orthogonal R-loop assay further confirmed that Sdd7 variants, especially Sdd7e1, exhibited low frequencies of gRNA-independent DNA and RNA off-target mutations compared to both Sdd7 and BE4max (Fig. 4d–f). These enhancements are critical for the safe application of CBEs in clinical settings, where off-target effects can lead to unintended genomic alterations.

Delivery of Cas9 nucleases and CBEs as RNPs is known to reduce off-target effects and cytotoxicity compared to plasmid DNA delivery methods23,31. To enhance the specificity of the Sdd7 variants and reduce bystander editing, we utilized eVLPs to deliver Sdd7e1 and Sdd7e2 as RNPs27. The eVLP-mediated delivery resulted in base editing efficiencies comparable to plasmid DNA delivery but significantly reduced bystander editing frequencies in the upstream regions of target sites, reaching levels near the sequencing detection limit (Fig. 5d, e). The editing window was also narrowed with eVLP delivery, and a higher proportion of single-point mutations was observed (Fig. 5c, d). For example, at the HEK4 site, the fraction of single mutations induced by Sdd7e2 delivered via eVLPs was 94.30 ± 0.51%, significantly higher than that induced by Sdd7 plasmid DNA (5.47 ± 0.18%) and Sdd7e2 plasmid DNA (52.93 ± 0.36%) (Fig. 5c). Additionally, eVLP-mediated delivery exhibited significantly higher specificity ratios, up to 121.0 for Sdd7e1 and up to 242.7 for Sdd7e2, compared to plasmid DNA delivery (Fig. 5f, g), indicating a substantial reduction in gRNA-dependent off-target effects.

In conclusion, the engineered Sdd7 variants (Sdd7e1 and Sdd7e2) represent significant advancements in the development of high-fidelity CBEs. By balancing efficiency and specificity, and through eVLP-mediated delivery, these variants hold substantial promise for therapeutic genome editing and other applications requiring precise genetic modifications. Future studies should aim to validate these findings in clinically relevant models and expand the base editor toolkit with tailored properties for specific applications.

Methods

Ethics statement

The use of CD34⁺ HSC in this study was approved by the Seoul National University Hospital Institutional Review Board in accordance with all relevant ethical regulations. Cord blood samples were obtained from the Seoul National University Hospital AllCord Bank for research purposes.

Plasmid construction

The plasmids pCMV_BE4max (Addgene plasmid #112093), dSaCas9 (Addgene plasmid #138162), and pCMV-MMLVgag-3 × NES-ABE8e (Addgene plasmid #181751), YE1-BE4max (Addgene plasmid #138155), YE2-BE4max (Addgene plasmid #138156), p2T-CMV-eA3Amax-BlastR (Addgene plasmid #152997), and SpCas9 TadCBEd (Addgene plasmid #193835) were a gift from David Liu. To construct the Sdd7-based base editors, Sdd7 double strand DNA fragment was inserted into the pCMV_BE4max vector in place of the APOBEC ___domain using Gibson Assembly (New England Biolabs). Plasmids encoding pCMV-MMLVgag-3xNES-Sdd7-2 × UGI were similarly constructed via Gibson Assembly. Sdd7 variants were generated using a site-directed mutagenesis kit (New England Biolabs). All plasmids used for transfection experiments were purified using the NucleoBond Xtra Midiprep Kit (Macherey-Nagel).

Cell culture and transfection

HEK293T cells (ATCC, CRL-11268) and Gesicle Producer 293T cells (Takara Bio, 632617) were cultured in Dulbecco’s Modified Eagle Medium (DMEM; Welgene) supplemented with 10% fetal bovine serum (FBS; Welgene) and 1% penicillin-streptomycin (Welgene) at 37 °C in a humidified atmosphere containing 5% CO₂. K562 (ATCC, CCL–243) and SKOV3 (ATCC, HTB–77) were maintained in RPMI and McCoy’s 5 A medium, respectively, with 10% FBS and 1% penicillin–streptomycin at 37 °C in a humidified atmosphere containing 5% CO₂. iPSC (WiCell, WC035i-SOD1-D90D) were maintained in mTeSR1 Plus medium (Stem Cell Technologies) on Matrigel Growth Factor Reduced Basement Membrane Matrix (Corning) in a 5% CO2, 37 °C incubator. The use of iPSC lines in this study was approved by SKKU Institutional Review Board.

For CD34⁺ HSC Isolation, cord blood was transferred into a 500 mL bottle. The blood was diluted 1:1 with PBS and carefully layered over Ficoll-Paque (15 mL per 50 mL tube) using 30 mL of diluted blood to maintain a 1:1:1 ratio. After centrifugation at 300 × g for 20 min at room temperature, the mononuclear cell layer was collected using a sterile transfer pipette and pooled into new 50 mL tubes. Cells were washed with PBS and centrifuged at 300 × g for 10 min at room temperature. Red blood cells were lysed with 6 mL of ACK lysis buffer at 37 °C for 2–3 min with gentle pipetting, followed by further PBS washes and centrifugation. The cell pellet was resuspended in PBS and combined, and cell counts were determined using diluted aliquots. For magnetic separation, cells were resuspended in MACS buffer (2 mM EDTA, 0.5% BSA in PBS, 0.22 μm filtered) and incubated with FcR blocking reagent and CD34 microbeads (Miltenyi Biotec, 130-046-702) on ice for 30 min. Following a wash with 10 mL MACS buffer and centrifugation at 300 × g for 5 min, the cell suspension was applied to an LS column (Miltenyi Biotec, 130-042-401) mounted on a MACS separator. The column was washed three times with 1 mL MACS buffer, and the CD34⁺ cells were eluted with 5 mL MACS buffer.

For transfection experiments, HEK293T cells were seeded in 24-well plates at a density of 1 × 10⁵ cells per well. Cells were co-transfected with 1500 ng of base editor plasmid and 500 ng of guide RNA-encoding plasmid using Lipofectamine 2000 (Thermo Fisher Scientific) according to the manufacturer’s protocol. For K562 and SKOV3 cells, 2 × 105 cells were electroporated with 250 ng of gRNA plasmids and 750 ng of CBE using the SF Cell Line Nucleofector X Kit (Lonza) via the 4D−Nucleofector system. For iPSCs, 2 × 105 cells were electroporated with 2 µg of gRNA plasmids and 6 µg of CBE using the P3 Cell Line Nucleofector X Kit (Lonza) via the 4D−Nucleofector system. For, HSCs, 2 × 105 cells were electroporated with 2 µg of gRNA plasmids and 6 µg of CBE using the P3 Cell Line Nucleofector X Kit (Lonza) via the 4D−Nucleofector system.

For orthogonal R-loop assays measuring gRNA-independent off-target editing, HEK293T cells were co-transfected with 600 ng of base editor plasmid, 400 ng of gRNA plasmid, 600 ng of dSaCas9 plasmid, and 400 ng of SaCas9 gRNA plasmid using 3 μL of Lipofectamine 2000. 72 h post-transfection, genomic DNA was extracted using the DNeasy Blood & Tissue Kit (Qiagen) following the manufacturer’s instructions.

Digenome-seq

Genomic DNA was extracted using the DNeasy Tissue Kit (Qiagen) according to the manufacturer’s instructions. In vitro deamination was initiated by pre-incubating 100 nM Sdd7 protein with 300 nM gRNA at room temperature for 10 min. The resulting complex was then added to 10 μg of genomic DNA in a reaction buffer (100 mM NaCl, 50 mM Tris–HCl, 10 mM MgCl₂, and 100 μg/mL BSA) to a final volume of 1000 μL, and the reaction was incubated at 37 °C for 8 h. Following deamination, the genomic DNA was purified using the DNeasy Tissue Kit (Qiagen), and RNase A (50 μg/mL) was added to remove residual gRNA. Next, 2 μg of the purified genomic DNA was incubated with 6 units of USER enzyme in a 100 μL reaction at 37 °C for 3 h, after which a second purification was performed using the DNeasy Blood & Tissue Kit (Qiagen). The digested genomic DNA was then subjected to whole-genome sequencing at 30–40× coverage using an Illumina NovaSeq 6000 sequencer (Macrogen). Sequence reads were aligned using the Isaac aligner, and DNA cleavage sites were identified using the Digenome toolkit (https://github.com/chizksh/digenome-toolkit2).

Transcriptome sequencing

Total RNA was extracted 48 h post-transfection using the RNeasy Mini Kit (Qiagen) according to the manufacturer’s instructions. RNA libraries were prepared with the TruSeq Stranded Total RNA Library Prep Gold kit (Illumina) and their quality was assessed using the Agilent 2200 TapeStation with the D1000 ScreenTape system. Sequencing was performed at Macrogen on a NovaSeq 6000 sequencer (Illumina) using paired-end reads (2 × 100 bp).

For RNA sequencing data analysis, we employed a previously validated RNA variant calling pipeline designed for off-target RNA base editing25. The NGS data were aligned to the hg38 (release v.105) human reference genome using STAR aligner (v.2.7.10a). Subsequent processing for RNA variant calling was performed using MarkDuplicates, BaseRecalibrator, ApplyBQSR, and HaplotypeCaller from the GATK package (v.4.2.4.1). RNA variant loci were filtered by comparing them to control samples; in experimental set replicate #1, untreated replicate #2 served as the control, and vice versa for replicate set #2. Variant loci with a minimum variant count of 2 and a read depth of at least 10 were retained, while those present in the control sample or with indeterminate calls due to low sequencing depth in the control were excluded. C-to-T editing was quantified as variant loci showing C-to-T changes on the positive strand or G-to-A changes on the negative strand, relative to total RNA editing. Similarly, A-to-G editing was quantified as loci exhibiting A-to-G editing on the positive strand or T-to-C editing on the negative strand.

Electrostatic potential calculation

Electrostatic potential surfaces were computed using ChimeraX version 1.932. Atomic partial charges and coordinates were employed to calculate the potential based on Coulomb’s law, and the resulting values are reported in kcal/(mol·e) at 298 K.

Production and purification of Sdd7-eVLPs

eVLPs encapsulating Sdd7 variants (Sdd7e1 and Sdd7e2) and gRNA complex were produced in Gesicle Producer 293 T cells. Gesicle Producer 293 T cells were seeded in 150-mm dishes at a density of 1 × 10⁷ cells per dish. After 24 h, cells were transfected with a plasmid mixture containing 400 ng of VSV-G, 3375 ng of MMLVgag–pro–pol, 1125 ng of either pCMV-MMLVgag–3 × NES–Sdd7e1-2 × UGI or pCMV-MMLVgag–3 × NES–Sdd7e2-2 × UGI, and 4400 ng of gRNA plasmid, using polyethyleneimine (PEI; Sigma-Aldrich). 72 h post-transfection, the culture supernatant was collected and centrifuged at 500 × g for 5 min to remove cellular debris. The supernatant was then filtered through a 0.45-μm polyvinylidene difluoride (PVDF) filter (Millipore).

For concentration, the filtered supernatant was incubated with 5× PEG-it Virus Precipitation Solution (System Biosciences) at 4 °C overnight. The eVLPs were pelleted by centrifugation at 1500 × g for 1 h at 4 °C and resuspended in 1× HIV-Safe Manager buffer (Lugen Sci). The eVLPs were stored at −80 °C and thawed on ice immediately before use.

Targeted deep sequencing

Genomic DNA containing the target sites was amplified using Q5 Hot Start High-Fidelity DNA Polymerase (New England Biolabs) with primers listed in Supplementary Data 2 and 3. The PCR products, incorporating Illumina TruSeq HT dual index adapter sequences, were purified and subjected to 150 bp paired-end sequencing on the Illumina iSeq 100 platform, achieving a sequencing depth of over 5000 reads. Base editing efficiencies were calculated as the percentage of reads harboring the desired mutation within the editing window (positions 1 to 20) relative to the total number of reads, using the MAUND bioinformatics tool available at https://github.com/ibs-cge/maund. Heatmaps representing editing frequencies at each position were calculated using MAUND and subsequently generated using GraphPad Prism.

Statistics and reproducibility

All results are expressed as mean ± SEM unless indicated otherwise. Statistical analyses were performed using GraphPad Prism version 9.1.1. p-values were calculated using Student’s two-tailed t-test, with p < 0.05 considered statistically significant. Sample size was not determined by a statistical method, and no data were excluded from analysis. Experiments were not randomized, and investigators were not blinded to allocation during experiments and outcome assessment.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.