A deep learning approach for rational ligand generation with toxicity control via reactive building blocks

Li, Pengyong; Zhang, Kaihao; Liu, Tianxiao; Lu, Ruiqiang; Chen, Yangyang; Yao, Xiaojun; Gao, Lin; Zeng, Xiangxiang

doi:10.1038/s43588-024-00718-0

Article
Published: 08 November 2024

A deep learning approach for rational ligand generation with toxicity control via reactive building blocks

Pengyong Li¹,
Kaihao Zhang ORCID: orcid.org/0009-0001-8405-3905²,
Tianxiao Liu¹,
Ruiqiang Lu³,
Yangyang Chen⁴,
Xiaojun Yao³,
Lin Gao¹ &
…
Xiangxiang Zeng ORCID: orcid.org/0000-0003-1081-7658⁵

Nature Computational Science volume 4, pages 851–864 (2024)Cite this article

2558 Accesses
5 Citations
13 Altmetric
Metrics details

Subjects

Abstract

Deep generative models are gaining attention in the field of de novo drug design. However, the rational design of ligand molecules for novel targets remains challenging, particularly in controlling the properties of the generated molecules. Here, inspired by the DNA-encoded compound library technique, we introduce DeepBlock, a deep learning approach for block-based ligand generation tailored to target protein sequences while enabling precise property control. DeepBlock neatly divides the generation process into two steps: building blocks generation and molecule reconstruction, accomplished by a neural network and a rule-based reconstruction algorithm we proposed, respectively. Furthermore, DeepBlock synergizes the optimization algorithm and deep learning to regulate the properties of the generated molecules. Experiments show that DeepBlock outperforms existing methods in generating ligands with affinity, synthetic accessibility and drug likeness. Moreover, when integrated with simulated annealing or Bayesian optimization using toxicity as the optimization objective, DeepBlock successfully generates ligands with low toxicity while preserving affinity with the target.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of the DeepBlock framework.**

**Fig. 2: Comparison of the proposed method and baselines.**

**Fig. 3: Binding contribution of a residue.**

**Fig. 4: Case study of ligands generation for KIAA1363.**

**Fig. 5: Comparison of affinity before and after optimization.**

**Fig. 6: The optimization processes and results.**

Generative and reinforcement learning approaches for the automated de novo design of bioactive compounds

Article Open access 18 October 2022

Screening of multi deep learning-based de novo molecular generation models and their application for specific target molecular generation

Article Open access 05 February 2025

Prospective de novo drug design with deep interactome learning

Article Open access 22 April 2024

Data availability

Source data for Figs. 2–6 are provided with this paper. All of the datasets used in this study are publicly available. The raw data of the CrossDocked 2020 dataset were obtained from https://github.com/gnina/models/tree/master/data/CrossDocked2020. The dataset for pretraining BGNet were obtained from ChEMBL dataset (https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_31/). The processed datasets used to train the model are available via figshare⁵⁶ at https://figshare.com/articles/dataset/crossdocked_pocket10_with_protein_tar_gz/25878871. The small-mouse intraperitoneal LD₅₀ sub-dataset was obtained from TOXRIC (https://toxric.bioinforai.tech/home).

Code availability

The source code and weights of trained models are available on GitHub at https://github.com/BioChemAI/DeepBlock and deposited on Zenodo at https://doi.org/10.5281/zenodo.13852436 (ref. ⁵⁷).

References

Shoichet, B. K. Virtual screening of chemical libraries. Nature 432, 862–865 (2004).
Article Google Scholar
Meyers, J., Fabian, B. & Brown, N. De novo molecular design and generative models. Drug Discov. Today 26, 2707–2715 (2021).
Article Google Scholar
Wang, M. et al. Deep learning approaches for de novo drug design: an overview. Curr. Opin. Struc. Biol. 72, 135–144 (2022).
Article Google Scholar
Segler, M. H., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4, 120–131 (2018).
Article Google Scholar
Moret, M., Friedrich, L., Grisoni, F., Merk, D. & Schneider, G. Generative molecular design in low data regimes. Nat. Mach. Intell. 2, 171–180 (2020).
Article Google Scholar
Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).
Article Google Scholar
Jin, W., Barzilay, R. & Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. In Proc. 35th International Conference on Machine Learning, Proc. Machine Learning Research Vol. 80 (eds Dy, J. & Krause, A.) 2323–2332 (PMLR, 2018).
Li, Y., Zhang, L. & Liu, Z. Multi-objective de novo drug design with conditional graph generative model. J. Cheminform. 10, 33 (2018).
Article Google Scholar
Putin, E. et al. Reinforced adversarial neural computer for de novo molecular design. J. Chem. Inf. Model. 58, 1194–1204 (2018).
Article Google Scholar
Zang, C. & Wang, F. Moflow: an invertible flow model for generating molecular graphs. In Proc. 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 617–626 (ACM, 2020).
Kuznetsov, M. & Polykovskiy, D. MolGrow: a graph normalizing flow for hierarchical molecular generation. Proc. AAAI Conf. Artif. Intell. 35, 8226–8234 (2021).
Google Scholar
Hoogeboom, E., Satorras, V. G., Vignac, C. & Welling, M. Equivariant diffusion for molecule generation in 3D. In Proc. 39th International Conference on Machine Learning, Proc. Machine Learning Research Vol. 162 (eds Chaudhuri, K. et al.) 8867–8887 (PMLR, 2022).
Schneuing, A. et al. Structure-based drug design with equivariant diffusion models. Preprint at https://arxiv.org/abs/2210.13695 (2022).
Li, J. et al. Mining for potent inhibitors through artificial intelligence and physics: a unified methodology for ligand based and structure based drug design. J. Chem. Inf. Model. https://doi.org/10.1021/acs.jcim.4c00634 (2024).
Ragoza, M., Masuda, T. & Koes, D. R. Generating 3D molecules conditional on receptor binding sites with deep generative models. Chem. Sci. 13, 2701–2713 (2022).
Article Google Scholar
Luo, S., Guan, J., Ma, J. & Peng, J. A 3D generative model for structure-based drug design. Adv. Neural Inf. Process. Syst. 34, 6229–6239 (2021).
Google Scholar
Gao, W. & Coley, C. W. The synthesizability of molecules proposed by generative models. J. Chem. Inf. Model. 60, 5714–5723 (2020).
Article Google Scholar
Brenner, S. & Lerner, R. A. Encoded combinatorial chemistry. Proc. Natl Acad. Sci. USA 89, 5381–5383 (1992).
Article Google Scholar
Liu, R., Li, X. & Lam, K. S. Combinatorial chemistry in drug discovery. Curr. Opin. Chem. Biol. 38, 117–126 (2017).
Article Google Scholar
Bertsimas, D. & Tsitsiklis, J. Simulated annealing. Stat. Sci. 8, 10–15 (1993).
Article Google Scholar
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Article MathSciNet Google Scholar
Degen, J., Wegscheid-Gerlach, C., Zaliani, A. & Rarey, M. On the art of compiling and using ‘drug-like’ chemical fragment spaces. ChemMedChem 3, 1503–1507 (2008).
Article Google Scholar
Mendez, D. et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 47, D930–D940 (2019).
Article Google Scholar
Wang, R., Fang, X., Lu, Y. & Wang, S. The PDBbind database: collection of binding affinities for protein–ligand complexes with known three-dimensional structures. J. Med. Chem. 47, 2977–2980 (2004).
Article Google Scholar
Jessani, N., Liu, Y., Humphrey, M. & Cravatt, B. F. Enzyme activity profiles of the secreted and membrane proteome that depict cancer cell invasiveness. Proc. Natl Acad. Sci. USA 99, 10335–10340 (2002).
Article Google Scholar
Chiang, K. P., Niessen, S., Saghatelian, A. & Cravatt, B. F. An enzyme that regulates ether lipid signaling pathways in cancer annotated by multidimensional profiling. Chem. Biol. 13, 1041–1050 (2006).
Article Google Scholar
Chang, J. W., Nomura, D. K. & Cravatt, B. F. A potent and selective inhibitor of KIAA1363/AADACL1 that impairs prostate cancer pathogenesis. Chem. Biol. 18, 476–484 (2011).
Article Google Scholar
Francoeur, P. G. et al. Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. J. Chem. Inf. Model. 60, 4200–4215 (2020).
Article Google Scholar
Steinegger, M. & Söding, J. mmseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
Article Google Scholar
Jänne, P. et al. KRYSTAL-1: activity and safety of adagrasib (MRTX849) in advanced/metastatic non-small cell lung cancer (NSCLC) harboring KRAS^G12C mutation. Eur. J. Cancer 138, S1–S2 (2020).
Article Google Scholar
Landrum, G. RDKit: open-source cheminformatics. RDKit http://www.rdkit.org (2006).
Zhao, T., Zhao, R. & Eskenazi, M. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. In Proc. 55th Annual Meeting of the Association for Computational Linguistics Vol. 1 (eds Barzilay, R. & Kan, M.) 654–664 (ACL, 2017).
Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. Preprint at https://arxiv.org/abs/1312.6114 (2014).
Chung, J., Gulcehre, C., Cho, K. & Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. Preprint at https://arxiv.org/abs/1412.3555 (2014).
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. North American Chapter of the Association for Computational Linguistics Vol. 1 (eds Burstein, J. et al.) 4171–4186 (ACL, 2019).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2015).
Bowman, S. R. et al. Generating sentences from a continuous space. In Proc. 20th SIGNLL Conference on Computational Natural Language Learning (eds Riezler, S. & Goldberg, Y.) 10–21 (ACL, 2016).
Riniker, S. & Landrum, G. A. Better informed distance geometry: using what we know to improve conformation generation. J. Chem. Inf. Model. 55, 2562–2574 (2015).
Article Google Scholar
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. & Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1092 (1953).
Article Google Scholar
Kirkpatrick, S., Gelatt, C. D. & Vecchi, M. P. Optimization by simulated annealing. Science 220, 671–680 (1983).
Article MathSciNet Google Scholar
Bajusz, D., Rácz, A. & Héberger, K. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminform. 7, 20 (2015).
Article Google Scholar
Jain, S. et al. Large-scale modeling of multispecies acute toxicity end points using consensus of multitask deep learning methods. J. Chem. Inf. Model. 61, 653–663 (2021).
Article Google Scholar
Liwanag, P. M., Hudson, V. W. & Hazard, G. F. Jr. ChemIDplus: a web-based chemical search system. NLM https://www.nlm.nih.gov/pubs/techbull/ma00/ma00_chemid.html (2000).
Wu, L. et al. TOXRIC: a comprehensive database of toxicological data and benchmarks. Nucleic Acids Res. 51, D1432–D1445 (2023).
Article Google Scholar
Durant, J. L., Leland, B. A., Henry, D. R. & Nourse, J. G. Reoptimization of mdl keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42, 1273–1280 (2002).
Article Google Scholar
Le, T. T., Fu, W. & Moore, J. H. Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinformatics 36, 250–256 (2020).
Article Google Scholar
Cao, Y., Goodin, D. & Mcree, D. Probing the strength and character of an Asp-His-x hydrogen bond by introducing buried charges. PDB https://doi.org/10.2210/pdb1a2g/pdb (1998).
Alhossary, A., Handoko, S. D., Mu, Y. & Kwoh, C.-K. Fast, accurate, and reliable molecular docking with QuickVina 2. Bioinformatics 31, 2214–2216 (2015).
Article Google Scholar
Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455–461 (2010).
Article Google Scholar
Eberhardt, J., Santos-Martins, D., Tillack, A. F. & Forli, S. AutoDock Vina 1.2. 0: new docking methods, expanded force field, and Python bindings. J. Chem. Inf. Model. 61, 3891–3898 (2021).
Article Google Scholar
Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90–98 (2012).
Article Google Scholar
Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1, 8 (2009).
Article Google Scholar
Wildman, S. A. & Crippen, G. M. Prediction of physicochemical parameters by atomic contributions. J. Chem. Inf. Comput. Sci. 39, 868–873 (1999).
Article Google Scholar
Chen, B., Li, C., Dai, H. & Song, L. Retro*: learning retrosynthetic planning with neural guided A* search. In Proc. 37th International Conference on Machine Learning Vol. 119 (eds Daumé, H. III & Singh, A.) 1608–1616 (PMLR, 2020).
Bemis, G. W. & Murcko, M. A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 39, 2887–2893 (1996).
Article Google Scholar
Zhang, K. & Li, P. crossdocked_pocket10_with_protein.tar.gz. figshare https://figshare.com/articles/dataset/crossdocked_pocket10_with_protein_tar_gz/25878871 (2024).
Li, P. & Zhang, K. Biochemai/deepblock. Zenodo https://doi.org/10.5281/zenodo.13852436 (2024).

Download references

Acknowledgements

This work was supported by the National Science and Technology Major Project (2023ZD0120902 to X.Z.), the National Natural Science Foundation of China (62202353 to P.L.; U22A2037 to X.Z.; 62425204 to X.Z.; 62122025 to X.Z.; 62450002 to X.Z.; and 62432011 to X.Z.).

Author information

Authors and Affiliations

School of Computer Science and Technology, Xidian University, Xi’an, China
Pengyong Li, Tianxiao Liu & Lin Gao
School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
Kaihao Zhang
Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao, China
Ruiqiang Lu & Xiaojun Yao
University of Tsukuba, Tsukuba, Japan
Yangyang Chen
College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
Xiangxiang Zeng

Authors

Pengyong Li
View author publications
Search author on:PubMed Google Scholar
Kaihao Zhang
View author publications
Search author on:PubMed Google Scholar
Tianxiao Liu
View author publications
Search author on:PubMed Google Scholar
Ruiqiang Lu
View author publications
Search author on:PubMed Google Scholar
Yangyang Chen
View author publications
Search author on:PubMed Google Scholar
Xiaojun Yao
View author publications
Search author on:PubMed Google Scholar
Lin Gao
View author publications
Search author on:PubMed Google Scholar
Xiangxiang Zeng
View author publications
Search author on:PubMed Google Scholar

Contributions

P.L. and X.Z. conceived the research project. P.L. and K.Z. designed and implemented the framework. P.L., X.Y., L.G. and X.Z. designed the experiments. P.L., K.Z., T.L., Y.C., X.Y., L.G. and X.Z. conducted the experiments and results analyses. R.L. conducted the molecular dynamics simulation. All the authors discussed the experimental results and commented on the paper.

Corresponding author

Correspondence to Xiangxiang Zeng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Computational Science thanks Kil To Chong and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Kaitlin McCardle, in collaboration with the Nature Computational Science team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–20, Tables 1 and 2, and Algorithms 1–3.

Reporting Summary

Source data

Source Data Fig. 2

Source data for Fig. 2.

Source Data Fig. 3

Source data for Fig. 3.

Source Data Fig. 4

Source data for Fig. 4.

Source Data Fig. 5

Source data for Fig. 5.

Source Data Fig. 6

Source data for Fig. 6.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, P., Zhang, K., Liu, T. et al. A deep learning approach for rational ligand generation with toxicity control via reactive building blocks. Nat Comput Sci 4, 851–864 (2024). https://doi.org/10.1038/s43588-024-00718-0

Download citation

Received: 09 May 2024
Accepted: 07 October 2024
Published: 08 November 2024
Issue Date: November 2024
DOI: https://doi.org/10.1038/s43588-024-00718-0

This article is cited by

AI in drug development: advances in response, combination therapy, repositioning, and molecular design
- Ren Qi
- Shujia Liu
- Bin Liu
Science China Information Sciences (2025)
Harnessing deep learning to build optimized ligands
- Orestis A. Ntintas
- Theodoros Daglis
- Vassilis G. Gorgoulis
Nature Computational Science (2024)