Fig. 2: BE-DICT: A machine learning model for predicting base editing outcomes.

a Design of an attention-based deep learning algorithm to predict base editing probabilities. Given a target sequence, the model returns a confidence score to predict the chance of target base conversions. The model has three main blocks: (1) An embedding block that embeds both the nucleotide and its corresponding position from one-hot encoded representation to a dense fixed-length vector representation. (2) An encoder block that contains a self-attention layer (with multi-head support), layer normalization31 and residual connections, and a feed-forward network. (3) An output block that contains a position attention layer, and a classifier layer. b–e The average AUC achieved across five runs (interpolated) for models trained on data from high-throughput base editing experiments. f–i Line plot of per-position accuracy of the trained models across five individual runs for base editors in comparison to the accuracy of majority class baseline predictor. Standard deviation is depicted as a band along with the line plot.