Fig. 2: Comparison of chemical space between novel macrocycles generated by Macformer trained with five-fold data augmentation and MacLS_extra, respectively, on ChEMBL test and ZINC datasets.

a Distribution of average Tanimoto coefficient between generated novel and ground-truth target macrocycles. ChEMBL, Macformer, ×5, n = 23772; ChEMBL, MacLS_extra, n = 23765; ZINC, Macformer, ×5, n = 5514; ZINC, MacLS_extra, n = 5551. b UMAP plot of the 1024-bit Morgan fingerprints of the linkers in the ChEMBL training dataset (n = 9243 linkers) and the novel linkers generated by Macformer on ChEMBL test (n = 9039 linkers) and ZINC (n = 2082 linkers) datasets, respectively. c Retrospective macrocyclization of a Checkpoint Kinase 1 (CHK1) inhibitor64 by Macformer. The Tc values between the generated novel and target compounds were labeled. Source data are provided as a Source Data file.