Identification of nanomolar adenosine A2A receptor ligands using reinforcement learning and structure-based drug design

Thomas, Morgan; Matricon, Pierre G.; Gillespie, Robert J.; Napiórkowska, Maja; Neale, Hannah; Mason, Jonathan S.; Brown, Jason; Harwood, Kaan; Fieldhouse, Charlotte; Swain, Nigel A.; Geng, Tian; O’Boyle, Noel M.; Deflorian, Francesca; Bender, Andreas; de Graaf, Chris

doi:10.1038/s41467-025-60629-0

Download PDF

Article
Open access
Published: 01 July 2025

Identification of nanomolar adenosine A_2A receptor ligands using reinforcement learning and structure-based drug design

Nature Communications volume 16, Article number: 5485 (2025) Cite this article

Subjects

Abstract

Generative chemical language models (CLMs) have demonstrated success in learning language-based molecular representations for de novo drug design. Here, we integrate structure-based drug design (SBDD) principles with CLMs to go from protein structure to novel small-molecule ligands, without a priori knowledge of ligand chemistry. Using Augmented Hill-Climb, we successfully optimise multiple objectives within a practical timeframe, including protein-ligand complementarity. Resulting de novo molecules contain known or promising adenosine A_2A receptor ligand chemistry that is not available in commercial vendor libraries, accessing commercially novel areas of chemical space. Experimental validation demonstrates a binding hit rate of 88%, with 50% having confirmed functional activity, including three nanomolar ligands and two novel chemotypes. The two strongest binders are co-crystallised with the A_2A receptor, revealing their binding mechanisms that can be used to inform future iterations of structure-based de novo design, closing the AI SBDD loop.

Introduction

The discovery and design of small molecules that meet specific target endpoints in drug discovery is a formidable challenge. Traditionally, structure-based hit discovery involves computationally screening commercially available or proprietary libraries via docking into the binding pocket, with expected hit rates between 0.02% and 34.8%¹. However, the rapid expansion of virtual libraries² renders brute force search practically intractable without the use of artificial intelligence (AI) to narrow the search space^3,4. Generative AI, on the other hand, can learn larger chemical spaces⁵- up to 1000-fold greater⁶- which are also faster to traverse due to their implicit nature, promising to significantly enhance drug discovery efficiency⁷.

Chemical language models (CLMs)^8,9,10, are a particular class of neural networks trained on datasets using a chemical language representation such as SMILES¹¹. These models can be combined with optimisation techniques such as reinforcement learning (RL)^12,13,14 to guide de novo molecule generation towards specific endpoints by a reward signal computed by molecule evaluation functions or scoring functions. Furthermore, these models have shown consistent state-of-the-art performance on several benchmarks^{14,15,16,17,18} and are the most used models for generative molecular design¹⁹. These models are also one of the most extensively experimentally validated^{20,21,22,23,24,25}, as also recently reviewed²⁶. However, none of the experimental validations have used CLMs for protein structure-based drug design (SBDD).

The predominant benefit of structure-based over ligand-based drug design is the ability to explore novel chemical spaces that are complementary to protein structure, unrestricted by known ligand space. Retrospective analyses have shown additional benefits, including increased diversity of known ligand chemistry rediscovered²⁷ compared to the use of ligand-based approaches, which can exacerbate generative model failure modes²⁸. However, the practical implementation of SBDD often comes at the expense of increased computational run time^27,29. Therefore, the use of SBDD as scoring functions in RL is limited by the learning efficiency of the algorithm (i.e., how many samples are required), a topic which is becoming of increasing importance^{16,30,31,32,33}.

In this work, we use a CLM⁸ in combination with a sample-efficient RL algorithm, Augmented Hill-Climb (AHC)³⁰, to design putative adenosine A_2A receptor ligands. The overall workflow is demonstrated in Fig. 1. The increased efficiency and hence reduced run-time enable extensive exploration of the effects of protein co-crystal structure and scoring function protocol on de novo chemistry. We further synthesize nine proposed molecules and test bioactivity with respect to both receptor binding and functional activity. The two most potent nanomolar compounds are co-crystallised with the A_2A receptor to improve our understanding of receptor binding.

**Fig. 1: Overview of the workflow presented here.**

Results

Design of putative A_2A receptor ligands

To generate de novo molecules with optimal properties, we used a CLM trained using RL. First, a recurrent neural network was trained at next token prediction using maximum likelihood estimation on 189,238 SMILES string extracted from the ChEMBL database³⁴, constituting our CLM. Second, this CLM underwent further training using RL. In RL, the CLM equates to a policy that decides which action (next token) to take given the current state (previously observed tokens), such as to learn how to maximise a reward. We used the AHC³⁰ algorithm for its improved learning efficiency compared to baseline algorithms^12,35,36,37 which fine-tunes the CLM to maximise a reward bound between [0, 1]. Note that a copy of the pre-trained CLM is kept during RL and is used as a prior policy to regularise learning and maintain the chemical principles initially learned.

AHC was then used to train the CLM to generate molecules optimal against each of the seven A_2A receptor crystal structures over the course of sampling 12,800 de novo molecules per structure. The reward maximised was formulated to reflect molecular desirability by combining the predicted protein complementarity according to the GlideSP docking score and four secondary objectives to encourage more favourable drug-like properties. These secondary objectives included synthesisability³⁸, predicted logP, hydrogen bond donor count, and the maximum number of consecutive rotatable bonds present, thus presenting a more realistic and challenging multi-objective optimisation problem.

Figure 2 demonstrates that AHC successfully generated molecules with favourable docking scores for the respective A_2A receptor structure while maintaining secondary objectives within desirable ranges, highlighting the powerful multi-objective optimisation ability of AHC within a restricted budget of just 12,800 molecules. To further understand how the optimisation of the structure-based objective influenced de novo chemistry, we compared the generated molecules to known A_2A receptor ligands (sourced from GPCRBench³⁹ and updated with newer data from ChEMBL29³⁴ and Reaxys), classified by chemotype. Note that A_2A receptor ligand chemotypes were defined manually by in-house project teams utilising X-ray and docked co-structures as per GPCRBench³⁹. Figure 3 shows that each crystal structure led to rediscovery of approximately 25 unique chemotypes known to be present in A_2A ligands, except for 6ZDR, which led to the rediscovery of approximately 18. This corresponded to a range of 2000 to 4000 molecules per experiment with known A_2A chemotypes out of the 12,800 (~15–30%), indicating alignment between the structure-based objective and known A_2A ligands. Interestingly, only one to two known active molecules were rediscovered per experiment meaning that almost all de novo molecules are novel, providing a significant enrichment around known A_2A ligand chemistry. The number and type of chemotypes rediscovered varied depending on the structure used; for example, only the experiment with the 5OLV structure rediscovers the 2c6 chemotype. All chemotypes rediscovered are detailed in Supplementary Fig. 2.

**Fig. 2: Optimisation of the structure-based objective repeated over seven different experiments each with a different A_2A receptor structure.**

**Fig. 3: Chemistry assessment of de novo molecules with respect to known A_2A ligands and their chemotypes.**

Another important consideration is novelty with respect to commercial vendor libraries. Figure 3e shows that approximately 10,000 molecules are novel with respect to a combination of MolPort, ChemSpace and Aldrich (a total of ~11.7 M purchasable compounds at the time of this study). Of these, about 2000 to 4000 contained known A_2A chemotypes, highlighting promising compounds that traditional virtual screening methods would not discovery, thus delving into novel chemical space.

Overall, structure-based drug design objectives can be efficiently optimized with a CLM and AHC, resulting in the rediscovery of commercially novel but relevant ligand chemistry, dependent on the choice of structure used.

Despite clear enrichment of known A_2A receptor ligand chemistry, the majority of molecules do not possess known chemotypes. However, chemotypes that have multiple corresponding de novo molecules with high scores could represent promising novel chemotypes compared to known A_2A receptor ligands. To identify promising unknown chemotypes, molecules with a GlideSP score less than −10 were pooled across all experiments and unknown chemotypes extracted. Note that of the 10,363 de novo molecules with a GlideSP score less than −10, only 3101 possessed unknown chemotypes, indicating that at this threshold, known A_2A chemotypes are much more prevalent. The unknown chemotypes were clustered using the Tanimoto similarity (using ECFP4 fingerprints) of the Bemis-Murcko scaffolds with a similarity cut-off of 0.2 and the maximum common substructure of each cluster was identified resulting in 65 chemotypes derived from a cluster containing at least 3 example molecules (see Supplementary Fig. 3). Many of these chemotypes still display functional groups important for interacting with N253^6.55(Ballesteros-Weinstein generic residue numbers are shown in superscript), and therefore are of interest to validate as potentially novel ligand series.

As previously established, the protein structure used influences the resulting de novo chemistry. We also tested further modifications, such as the effect of fine-tuning the CLM on an A_2A receptor ligand dataset prior to applying AHC. While this initially increased the number of unique chemotypes rediscovered, the number decreased over time due to the diversity filter penalising repeated areas of chemical space. By the end of the AHC-based optimisation the non-fine-tuned CLM generated a higher proportion of molecules with known A_2A chemotypes (see Supplementary Fig. 4), indicating that the effect of fine-tuning the CLM is short-lived as RL updates the model parameters.

We also investigated the effect of different docking constraints, such as enforcing a docked pose to interact with specific sub-pockets based on pharmacophoric features. The default constraints used for all seven experiments required either a hydrogen bond donor or acceptor interaction with N253^6.55, and the occupation by a lipophilic moiety of either of the sub-pockets II and III (see Fig. 4)^40,41. These constraints were made stricter on a subset of three crystal structures (4EIY, 5OLH and 6GT3) selected to balance rediscovery of A_2A chemotypes and novelty. Figure 4 shows that stricter docking constraints lead to a greater rediscovery of unique chemotypes. However, it is not predictable whether stricter constraints result in more de novo molecules with known A_2A chemotypes. Enforcing occupation of both lipophilic sub-pockets decreases the number of total molecules with A_2A chemotypes, while requiring both a hydrogen bond donor and acceptor interaction with N253^6.55 increases this number. Experimenting with different docking constraints resulted in the rediscovery of four additional chemotypes compared to the default constraints. Moreover, we observed an increase in the quality of predicted poses of the de novo molecules, although this is more challenging to measure quantitatively. The resultant de novo molecules from these additional experiments were pooled together with the previous seven experiments for further analysis, filtering, and selection.

**Fig. 4: The effect of additional docking constraints on a subset of three crystal structures 4EIY, 5OLH and 6GT3.**

Selection of de novo ligands for experimental validation

To select candidates for synthesis and testing, the de novo proposals were filtered down. First, the top 1000 unique molecules from each experiment, based on the multi-objective desirability score, were visually inspected. From this, 427 molecules of interest were inspected and selected based on the predicted binding pose. Selection criteria included: (i) H-bond interaction with N253^6.55; (ii) interaction with at least two of the following three lipophilic hotspots II, III, IV, displacing energetically unfavourable unhappy water molecules located in these binding site regions (Fig. 4). Analysis of these 427 compounds revealed that 6 were already known ligands (i.e. 98.6% were novel with respect to known ligands) and 8 were available for purchase in vendor libraries (i.e. 98.1% were novel with respect to commercial libraries). The majority (75.6%) contained known A_2A chemotypes, providing confidence that the chemistry was novel yet relevant. To estimate synthetic feasibility, AiZynthFinder⁴², which has a reported accuracy of up to 80%⁴³, was used to predict synthetic routes. 71% of the 427 molecules were predicted to be synthesisable via a proposed synthetic route.

To complement this, WaterFLAP⁴⁴ was used to calculate the pseudo-apo water network for each A_2A structure binding site as well as perturbation of the pseudo-apo water network upon ligand binding for the 427 molecules and predicted binding poses. This helped to de-prioritize molecules predicted to trap water molecules unfavourably in the lipophilic pockets without a sufficient supporting water network or hydrogen bond donors/acceptors. Based on these analyses, a panel of 41 predicted synthesisable de novo molecules was proposed for synthesis, of which 9 were synthesised for experimental validation based on synthetic feasibility and diversity. Synthetic chemistry routes of this triaged set of compounds were defined based on organic chemistry knowledge and established synthesis protocols. Chemical diversity-based selection was guided by ECFP4 fingerprint clustering. The final 9 molecules are shown in Fig. 5 with their corresponding predicted binding pose in Fig. 6. Four of these molecules contain known A_2A chemotypes, while the remainder contain potentially novel bioactive chemotypes. None of the compounds were available for purchase from a commercial library, and none had a Tanimoto similarity (using ECFP4 fingerprints) greater than 0.6 to any compound in the training dataset.

**Fig. 5: Molecules that underwent synthesis and validation.**

**Fig. 6: Predicted binding pose of molecules that underwent synthesis and validation.**

Experimental validation and characterisation of de novo ligands

To confirm the ability of the synthesized compounds to bind to the A_2A receptor, the compounds were tested for their ability to displace [³H]-ZM241385. ZM241385 was previously demonstrated to be an inverse agonist at the A_2A receptor⁴¹. All compounds competed with [³H]-ZM241385, with some compounds demonstrating a pKi >7.3 (Table 1, Supplementary Fig. 5).

Table 1 Summary table showing compound name, binding affinity, and functional activity

Full size table

To characterise the agonist pharmacology of the synthesised compounds, the ability of the compounds to cause accumulation of cAMP in response to the ligands was measured. Compound 5 increased cAMP accumulation in the A_2A CHO-K1 (pEC₅₀ 7.3); however, maximum cAMP accumulation levels were lower than that of NECA (E_max 65%). This suggests that Compound 5 is an A_2A receptor partial agonist.

Overexpression of A_2A in CHO-K1 resulted in elevated cAMP accumulation in the absence of a ligand, suggesting the A_2A receptor has high constitutive activity. Compounds 7, and 9 caused inhibition of the constitutive activity (Table 1, Fig. 7). These compounds are consequently considered inverse agonists of the A_2A receptor.

**Fig. 7: Pharmacological characterisation of proposed A_2A ligands.**

To characterise the antagonist pharmacology of the synthesised compounds at both the A_2A and A_2B receptor, the ability to inhibit NECA-induced cAMP accumulation was measured. Pharmacological characterisation of the compounds at the A_2B receptor, showed no inhibition of NECA-induced cAMP accumulation at the concentrations tested. The previously mentioned Compounds 7 and 9 caused inhibition of cAMP at the A_2A receptor, in the presence of NECA, further suggesting inverse agonism. Compound 4 (isomers 1 and 2) showed inhibition of the EC₈₀ of NECA at the A_2A receptor, but did not inhibit the constitutive activity, suggesting that these compounds could be neutral antagonists.

Compounds 1, 2, 3, 6, and 8 did not cause any modulation of cAMP accumulation or inhibition of NECA-induced cAMP at the concentrations tested, at either the A_2A or A_2B receptor.

The two strongest binders, Compounds 7 and 9 were successfully co-crystallised with the A_2A receptor to reveal their binding modes, as shown in Fig. 8. GRID was used for physicochemical analysis of the binding sites of the two crystallographic structures, highlighting the same lipophilic sub-pockets I to IV (shown in Fig. 4e). Compound 9 binds in a very similar orientation to that predicted by molecular docking to A_2A receptor structure 4EIY binding site, with the amino pyrrolopyrimidine core anchored in sub-pocket I by the hydrogen bond interactions to N253^6.55 and E169^ECL2, and the furan ring located in the lipophilic hotspot II. The pyridine moiety of Compound 9 is located as predicted in the lipophilic sub-pocket III. In contrast, the bioactive orientation of Compound 7 is partially different from the predicted docking pose to A_2A receptor structure 6GT3. The aminotriazine core and furan ring of Compound 7 are located as expected and as predicted by docking in sub-pockets I and II of the binding site, with the pyridine moiety between sub-pocket III and IV (Supplementary Fig. 6). However, the imidazole moiety is oriented upwards in sub-pocket IV, and not as predicted downwards in sub-pocket III, leaving sub-pocket III filled with water molecules. In both structures, sub-pocket III contains a small water network of four crystallographic waters not displaced by the ligands, and engaged by hydrogen bond interactions with each other, the receptor residues (A59^2.57, I80^3.28, V84^3.32, and H278^7.43), and the ligands heteroatoms facing sub-pocket III. We note that the second-best scored pose of 7 to 6GT3 A_2A receptor binding site showed the imidazole ring pointing upwards in sub-pocket IV as in the co-crystallised structure.

**Fig. 8: Crystal structure visualisations of Compound 7 (9H2X, orange sticks) and Compound 9 (9H37, purple sticks).**

Discussion

The workflow presented here has led to the discovery of promising, potent antagonists for the A_2A receptor using CLMs, RL, and SBDD. Compared to virtual screening of commercial vendor libraries, this approach offers key benefits with regard to chemical novelty and target chemistry enrichment. Approximately 16-30% of de novo molecules per experiment contain commercially novel molecules containing known target chemotypes, which are therefore of immediate interest from a practical drug discovery perspective. This enrichment in high-scoring, desirable compounds led to a hit rate of 88% (8/9) based on binding. This hit rate is higher than previously reported virtual screening efforts in the hunt for A_2A ligands, which were between 3% and 64%^{45,46,47,48,49,50}. This is a considerable improvement, especially given that identifying novel A_2A ligands is increasingly difficult due to exhaustive exploration of A_2A ligand chemical space in the scientific ___domain. This is highlighted by Lenselink et al., who found only 2 active molecules out of 71 selected when enforcing high degrees of novelty⁴⁹.

We note that although 3/4 molecules synthesised functionally inactivated the A_2A receptor, one activated the receptor as a partial agonist despite an attempt to bias ligand chemistry using inactive state A_2A receptor complexes. However, active and inactive state binding site conformations are generally quite similar, and ligands typically have some affinity for both conformations⁴⁵ therefore, it is not surprising to identify ligands with alternative functional effects than expected (especially considering imperfect binding mode prediction with docking). Moreover, a previous study by Rodríguez et al. using activated state A_2A receptors in the hunt for agonists only identified antagonists⁴⁸, emphasising the challenge of functional activity prediction. Despite this, we cautiously expect that using active state A_2A structures in this workflow would bias ligands more towards agonists because the generative model probes novel chemical space, whereas a chemical bias towards antagonists within the commercial library was identified as part responsible by Rodríguez et al. We also identify scope to integrate protein-ligand interaction fingerprints within the scoring function to predict functional activity⁵¹ in future iterations.

This work involves experimental validation for protein structure-based de novo design with CLMs using RL to optimise a structure-based reward signal, constituting structure-implicit goal-directed generative design⁵². Different structure-based generative models have been experimentally validated²⁶. Most extensively, the Chemistry42 commercial platform⁵³; however, it pools the outcome of many generative models, making it difficult to delineate the performance of individual algorithms. In comparison, we demonstrate clear performance of a single algorithm and workflow which is made open-source and free to use. Other recent examples of prospectively validated structure-based de novo design using alternative generative model architectures have demonstrated hit rates of 22%⁵⁴, 92%⁵⁵, 0.83%⁵⁶ and 100%⁵⁷. However, the lack of novelty of de novo designs is often criticised⁵⁸, hence here we show the nearest known A_2A ligand, training set compound, and vendor library compound in the supporting information. Moreover, we test and discover a diverse range of different chemotypes compared to some reported hit rates, which only explore one chemotype⁵⁵, provide full dose response curves, a selectivity analysis versus A_2B receptor, and co-crystallise the two most potent binders to inform future iterations of structure-based design. Moreover, this workflow is practical to implement with each experiment taking 5.94 hours on average using a single consumer-grade GPU (here an NVIDIA RTX 2080Ti) and parallelised over approximately 30 CPUs; therefore, possible to complete overnight.

Despite the use of generative AI, this workflow still uses human expertise to filter and select molecules for testing, resulting in level 1 automated chemical design⁵⁹. Challenges remain in fully automating this workflow and removing human influence while maintaining its success, challenges that are not unique to our approach. We note that visual inspection and novelty analysis for the selection of compounds for testing do not differ greatly between virtual screening and de novo approaches; however, the level of enrichment of A_2A chemistry beyond vendor libraries we have shown means that more success is to be found within the top-ranked compounds. One key improvement would be more accurate scoring functions that go beyond providing enrichment to providing reliable, precise predictions for binding affinity and other important molecular properties at reasonable computational expense. As RL is a decision-making algorithm to optimize an arbitrary reward, the resulting decisions (i.e., molecular structures) are only as good as the arbitrary reward is descriptive of ground truth desirability.

Methods

Datasets

A dataset of SMILES was used to pre-train the generative model used in this work. This consisted of molecules extracted from ChEMBL28³⁴, which underwent further filtering and refinement. First, only molecules with a pChEMBL value greater than 6 from an assay with a confidence value of 8 or higher were considered, as a proxy for more developed chemistry resulting in more medicinally interesting chemistry. These molecules were further standardized, neutralized, and filtered to ensure molecules had a predicted logP less than or equal to 4.5, rotatable bond count less than or equal to 7, molecular weight in the range 150 to 650 Da and only contain atoms belonging to the following set A ∈ {C, S, O, N, H, F, Cl, Br}. Any molecules violating MOSES structural alerts were removed¹⁵. Finally, molecules were clustered based on scaffold similarity using ECFP4 fingerprints at a threshold of 0.8 Tanimoto similarity and only the centroids were carried forward into the resulting dataset of 189,238 unique molecules. This under-sampling approach was used to even out the chemical space. During training, 10-fold restricted randomisation of the SMILES representation was conducted to augment the training dataset⁶⁰.

Seven A_2A receptor crystal structures were selected to represent different possible protein conformations and associated complementary chemistry, six of which are StaRs⁶¹ published in previous work. These included 3REY co-crystallised with XAC⁴¹, 4IEY co-crystallised with ZM241385⁶², 5OLO co-crystallised with Tozadenant⁶³, 5OLH co-crystalised with Vipadenant⁶³, 5OLV co-crystallised with LUAA47070⁶³, 6GT3 co-crystallised with HTL1071/AZD4635⁶⁴, and 6ZDR co-crystallised with Chromone⁶⁵. For each structure, missing side chain atoms were added, and ionisable residues were set to their most probable protonation state at pH 7.4, and histidine residues in the binding site based on visual inspection. H250^6.52 was protonated in the Nε position, H278^7.43 was protonated in both Nδ and Nε positions and based on interactions with E169^ECL2 resulting from the conformation of ECL3, H264^ECL3 was either doubly protonated (4EIY⁶², 5OLH⁶⁶, 5OLV⁶⁶, 6GT3⁶⁴, 6ZDR⁶⁵) or protonated in the Nδ position (3REY⁴¹, 5OLO⁶⁶). Compounds to dock were prepared using Moka^67,68 followed by Corina⁶⁹. Tautomeric and protomeric states with an abundance of at least 20% at pH 7.4 were enumerated with TauThor⁶⁸ and Blabber⁶⁷, respectively. Docking was performed using Glide SP⁷⁰. Hydrogen bond constraints were considered to orient the hydrogen bond donor/acceptor-containing scaffolds into sub-pocket I (Fig. 4e), according to hydrogen bonds formed between co-crystallised ligands and N253^6.55 in each structure. An additional occupancy constraint was used with the requirement of a hydrophobic substituent to occupy sub-pocket II (Fig. 4e). All constraints were required to be satisfied after docking, to finally save a top-scored pose for visual inspection.

Known A_2A chemotypes were taken as previously curated in GPCR-Bench³⁹, and updated to add more recent chemotypes derived from Reaxys data with a pKi threshold above or equal to 6.6 on human A_2A (ligands containing a ribose were discarded, due to likely agonistic functionality.

Generative model

In this work, a gated recurrent unit (GRU-based) neural network was used as a SMILES-based chemical language model (CLM). These well-detailed models^8,12,13 learn to predict the next token (${x}_{i+1}$) in a sequence of tokens ($X$) by training on a corpus of example sequences while using teacher forcing to ensure that the predicted token at each index ${y}_{i}$ is calculated with the ground truth previous token ${x}_{i}$ as input. The model is then trained to maximize the likelihood assigned to the correct token at each index, conditional upon all previous tokens observed, which is then summed over the whole sequence as formulated in Eq. 1. In this work, SMILES¹¹ was used as the language-based chemical grammar and the model consisted of an embedding layer of size 256 followed by 3 layers of GRUs with a hidden dimension size of 512. The model was trained on the pre-training dataset described earlier for 5 epochs using a batch size of 128 and the Adam optimizer with a learning rate of 0.001.

$${{\mathscr{L}}}\left({{\rm{\theta }}}\right)=-{\sum }_{i=o}^{N}log P\left({x}_{i} | {x}_{i-1},\ldots,{x}_{0}\right)$$

(1)

Given a trained GRU-based neural network, new SMILES strings can be sampled by inputting a start token (“GO”) and iteratively sampling from the predicted probability distribution over the next token and repeating this process by using the sampled token as the next input until a stop token is sampled (“EOS”).

Reinforcement learning

The task of predicting the next token in a sequence can be framed as a partially observable Markov Decision Process, lending itself to the use of RL as an approach to learn which decisions to make in the process to maximize an arbitrary reward. Where each token ${x}_{i}$ is considered a state out of all possible states $s\in S$ at a given timestep in a trajectory $t\in \tau$, the transition between two states is determined by an action out of all possible actions $a\in A$. Combining this framework with our generative model, the pre-trained network that learns the probability $P({x}_{i}|{x}_{i-1},\ldots,{x}_{0})$ can equally be considered an initialized policy where the policy function parameterized by the network weights denotes the probability of taking an action given the state at a given timestep ${\pi }_{\theta }({a}_{t}|{s}_{t})$. Within this framework, many different RL algorithms exist to modify the policy, such as to optimize an arbitrary reward $R$. REINFORCE⁷¹ is a policy-based algorithm that only relies on one reward value at the end of the episode (a.k.a return): a particularly favourable property considering a partially complete SMILES string may be invalid or not have a relevant or associated calculable reward. REINVENT¹² is an extension of this algorithm that utilises two networks and therefore two policy functions: a prior that is fixed based on the pre-trained neural network ${\pi }_{{prior}}$ and an agent network that is updated to maximize the reward ${\pi }_{{agent}}$. The loss function used in REINVENT is shown in Eq. 2, it can be considered a reward shaping of REINFORCE that couples the two policies, such that the agent policy does not drift too far from the initially learnt prior policy. The extent to which the prior policy is considered in relation to the reward for a given molecule is controlled by $\sigma$. This loss function is then used to update the agent such that the probability distribution of actions given a state is learned to maximize the final reward. Augmented Hill-Climb³⁰ (AHC) modifies REINVENT such that only the top-k ranked molecules in a batch are used to update the network weights using Eq. 2 which drastically improves the learning efficiency of the algorithm. Thus, in this work we used AHC to conduct RL with $\sigma=60$, $k=0.5$, ${batch\; size}=64$ which was conducted for 200 steps, resulting in 12,800 molecules sampled by the agent network.

$$\begin{array}{c}{\pi }_{{aug}}={\pi }_{{prior}}-\sigma \left(R\right)\\ {{\mathscr{L}}}\left(\theta \right)={\left[{\pi }_{{aug}}-{\pi }_{{agent}}\right]}^{2}\end{array}$$

(2)

Scoring functions

Reinforcement learning requires a reward value $R$ for a given molecule, in this work between 0 and 1, reflecting the desirability of a particular compound. This was calculated as the arithmetic mean of 5 individual scores ${S}_{x}$ between 0-1 describing the desirability of a given molecule with respect to each parameter.

$${f}_{{maxmin}}\left(x;{x}_{\max },\,{x}_{\min }\right)=\,\frac{x-{x}_{\min }}{{x}_{\max }-{x}_{\min }}$$

(3)

1.
Docking score: Proposed de novo molecules were first prepared to enumerate protonation and tautomerisation states more than 20% abundant at pH 7.4 using MoKa⁶⁷, followed by enumeration of unspecified stereocentres up to a total of 16 by CORINA⁶⁹_. These variants were then docked using Glide-SP⁷⁰. The lowest (i.e., best) docking score achieved by any molecule variant was returned as the final docking score. To realise the final score ${S}_{{dock}}$, the docking score ${ds}$ was normalized based on the maximum and minimum observed docking scores achieved at a given point in training (in reverse, such that a low docking score is given a high score).
$${S}_{{dock}}={f}_{{maxmin}}({ds};\min ({ds}),\max ({ds}))$$
(4)
2.
Synthesisability score: Proposed de novo molecules were predicted synthesisable according to RAScore³⁸, which returns a value between 0 and 1 representing the probability a molecule is predicted to be synthesisable by AiZynthFinder. In this work, the provided XGB model trained on ChEMBL was utilised. As this value is already in the desired range, no transformation was necessary resulting directly in ${S}_{{synth}}$.
3.
logP: The logP of de novo molecules was predicted using Crippen logP available in RDKit^72,73. The desirable range specified was between 1 and 3, with a ‘hard limit’ specified as between 0 and 4. The logP was thus transformed between 0 and 1 according to Eq. 5.
$${S}_{\log P}=\left\{\begin{array}{cc}1,\hfill& {if}1\le \log P\le 3\,\hfill\\ 0,\hfill& {if\; log}P\le 0{or\; log}P\ge 4\\ {f}_{{maxmin}}\left(\log P;1,0\right),& {if}0 < \log P < 1\hfill\\ {f}_{{maxmin}}\left(\log P;3,4\right),& {if}4 > {x}_{i} > 3\,\hfill\end{array}\right.$$
(5)
4.
Rotatable bonds: To restrict the flexibility of the molecule to a sensible range, a consideration of rotatable bonds was included as a scoring component. More specifically, the maximum consecutive number of rotatable bonds present in a de novo molecule. The desirable range specified was 3 or below, which would result in a score ${S}_{{crot}}$ of 1; 4 or more consecutive rotatable bonds resulted in a score of 0.
5.
Hydrogen bond donors: The number of hydrogen bond donors was specified as the last scoring component ${S}_{{hbd}}$. The desirable range specified was 3 or below, which would result in a score ${S}_{{hbd}}$ of 1; 4 or more hydrogen bond donors resulted in a score of 0.

All scoring functions and transformations were implemented as available in MolScore⁷⁴.

Radioligand binding assay

Cell membranes expressing the human A_2A receptor were incubated with [³H]-ZM24385 in assay buffer (50 mM HEPES, pH 7.4) in a total assay volume of 200 µl with a final DMSO concentration of 1%. After 120 minutes incubation at room temperature, the reaction was terminated by rapid filtration through GF/C 96-well glass fibre plates with 5 × 0.25 ml washes with ddH₂O using a Tomtec cell harvester. Bound radioactivity was determined through liquid scintillation using Lablogic SafeScint and detected on a MicroBeta liquid scintillation counter. Non-specific binding was determined as that remaining in the presence of a 1 μM saturating concentration of the antagonist CGS15943. Competition binding was performed as above by incubating membranes with 0.5 nM concentration of [³H]-ZM24385 and a range of concentrations of the test compound (10-point concentration curves at half-log intervals). IC₅₀ values were derived from fitting to a four-parameter logistic equation in PRISM (GraphPad Software, San Diego, CA, USA). Apparent Ki values were derived using the equation of Cheng and Prusoff⁶³.

Functional assay

Cell Line Generation

CHO-K1 cells were BacMam transfected, for 24 hr, with a vector containing human A_2A (5% v/v) or A_2B (5% v/v). CHO A_2A and CHO A_2B cells were maintained in Dulbecco’s modified Eagle’s medium/Ham’s F-12 medium (Sigma) supplemented with 10% FBS. Once transfected, cells were harvested and frozen at −150 °C until required for assay.

cAMP Assays

CHO A_2A and CHO A_2B cells were thawed, and seeded at 500 and 1000 cells per well, respectively, in white 384-well plates in assay buffer (Hank’s balanced salt solution; Lonza, Basel, Switzerland) supplemented with 100 µM rolipram and 1 U/mL adenosine deaminase, pH 7.4) in the absence or presence of antagonists, for 1 hour at 37 °C. EC₈₀ (A_2A 6 nM/A_2B 19 nM) concentration of NECA, were added and plates further incubated for 30 minutes. cAMP was detected using cAMP Gs Dynamic kit (Cisbio, Codolet, France) according to the kit instructions. Plates were read on a PHERAstar FS microplate reader (BMG LabTech, Offenburg, Germany) using standard homogeneous time-resolved fluorescence settings. Homogeneous time-resolved fluorescence ratios were determined by dividing emissions at 665 nm by emissions at 620 nm and multiplying by 10,000.

Data analysis

The percentage agonist response was normalized to 10 µM NECA (100%) and DMSO (0%). The percentage antagonist response was normalised to A_2A and A_2B literature antagonist, AB928 or GS6201.

All data was analysed in GraphPad Prism v10 using a non-linear regression four-parameter fit.

Protein expression and purification

The expression and purification of the A_2A-StaR2-b_RIL 562 construct was carried out following the protocol described previously^41,63. The receptor was expressed using the Bac-to-Bac Expression System (Invitrogen) in Trichoplusa ni Tni PRO cells using ESF 921 medium (Expression Systems) supplemented with 5% (v/v) foetal bovine serum (Sigma-Aldrich) and 1% (v/v) Penicillin/Streptomycin (PAA Laboratories). Cells were infected at a density of 2.6 × 106 cells/ml with virus at an approximate multiplicity of infection of 1 and grown for 48 hours at 27 °C with constant shaking.

All protein purification steps were carried out at 4 °C unless otherwise stated. Cell pellets were resuspended in 40 mM TRIS buffer at pH 7.6, 1 mM EDTA supplemented with Complete EDTA-free protease inhibitor cocktail tablets (Roche) and disrupted at ~15,000 psi using a microfluidizer (Processor M-110L Pneumatic, Microfluidics). Membranes were pelleted by ultra-centrifugation at 200,000 g for 50 minutes, and then subjected to a high salt wash in a buffer containing 40 mM Tris pH 7.6, 1 M NaCl and Complete EDTA-free protease inhibitor cocktail tablets. Washed membranes were resuspended in 50 mL 40 mM Tris pH 7.6 supplemented with Complete EDTA-free protease inhibitor cocktail tablets and 3 mM theophylline (Sigma Aldrich) and incubated for 2 hours at room temperature. Membranes were solubilised with 1.5% n-Decyl-β-D-maltopyranoside (DM, Anatrace), for 2 hours at 4 °C. Solubilised material was centrifuged at 145 000 g for 60 min and supernatant applied to a 5 ml Ni-NTA Superflow cartridge (Qiagen) pre-equilibrated in 40 mM Tris pH 7.4, 200 mM NaCl, 0.15% DM, 1 mM theophylline. The column was washed with 25 column volumes of buffer 40 mM Tris pH 7.4, 200 mM NaCl, 0.15% DM, 70 mM imidazole, 1 mM theophylline and then the protein was eluted with 40 mM Tris pH 7.4, 200 mM NaCl, 0.15% DM, 280 mM imidazole, 1 mM theophylline. Fractions containing A_2A-StaR2-b_RIL 562 were pooled and concentrated using an Amicon Ultra Ultracell 50 K ultrafiltration membrane and applied to a SuperdexTM200 Increase size exclusion column (Cytiva) pre-equilibrated with 40 mM Tris pH 7.4, 200 mM NaCl, 0.15% DM, 1 mM theophylline. Eluted fractions containing the protein were analyzed by SDS PAGE, pooled and concentrated to ~35 mg/ml using an Amicon Ultra Ultracell 50 K ultrafiltration membrane and subjected to an ultra-centrifugation at 436 000 g prior to crystallisation.

Crystallisation, in meso soaking and crystal harvesting

The A_2A-StaR2-b_RIL 562-theophylline co-crystallisation and in meso soaking were performed following a well-established protocol⁶³. The concentrated protein was mixed using the twin syringe method⁷⁵ with monoolein (Nu-Chek) supplemented with 10% (w/w) cholesterol (Sigma Aldrich) and 10 μM theophylline. The final protein-to-lipid ratio was 40:60 (w/w). Forty-nanolitre LCP boli were dispensed onto 96-well Laminex Glass Bases (Molecular Dimensions Ltd.) using a Mosquito LCP crystallisation robot (TTP Labtech) and overlaid with 800 nL precipitant solution. Glass bases were sealed using Laminex Film covers (Molecular Dimensions Ltd). Plate-shaped crystals grew at 20 °C over 2 weeks in 0.1 M tri-sodium citrate pH 5.3–5.4, 0.05 M sodium thiocyanate, 29–32% PEG400, 2% (v/v) 2,5-hexanediol and 0.5 mM theophylline.

For soaking experiments, incisions were made into the Laminex cover over base wells containing crystals. 10 µL of mother liquor containing 1 mM of either Compound 7 or 9 were added to the well, this was then re-sealed using Crystal Clear Sealing Tape (Hampton Research). Crystals were incubated for 1 hour with a final ligand concentration of 925 µM. Single crystals were mounted in LithoLoops (Molecular Dimensions Ltd) and flash-frozen in liquid nitrogen without the addition of further cryoprotectant.

Diffraction data collection and processing

X-ray diffraction data were collected at beamline I24 (Diamond Light Source) at a wavelength of 0.61992 Å on an Eiger 16 M detector. Crystals were exposed using 80% beam transmission for 0.05 seconds per 0.2° oscillation per frame using an attenuated beam to reduce radiation damage.

Diffraction data from individual crystals were integrated using XDS⁷⁶. Data merging and scaling were carried out using AIMLESS (CCP4 suite)⁷⁷ and anisotropic correction using STARANISO⁷⁸. The final datasets for both Compound 7 and Compound 9, were merged from 6 sweeps. Data collection statistics are reported in Table 2.

Table 2 Data collection and refinement statistics for X-ray crystal structures

Full size table

Structure solution and refinement

The structures of the different A_2A-StaR2-b_RIL 562-ligand complexes were solved by molecular replacement (MR) with Phaser⁷⁹ (CCP4 suite)⁷⁷ using the A_2A-StaR2-b_RIL 562-theophylline complex structure as the search model (PDB code: 5MZJ). Iterative cycles of manual model building, and refinement were performed using COOT⁸⁰ and Buster⁸¹, respectively. 2 TLS groups corresponding to the receptor + ligand and to the bRIL 562 were defined during refinement. The final refinement statistics are presented in Table 2. Structure figures were generated using PyMOL⁸².

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The crystal structure data of Compound 7 and Compound 9 have been deposited in the PDB under the accession codes 9H2X and 9H37, respectively. The prior training dataset and pre-trained neural network weights are available on GitHub [https://github.com/MorganCThomas/SMILES-RNN] or Zenodo [https://doi.org/10.5281/zenodo.11356193]. Source data are provided with this paper as a Source Data file. Source data are provided with this paper.

Code availability

All of the code used to generate the results in this work is available open-source under MIT license. The SMILES-RNN⁸³ repository [https://github.com/MorganCThomas/SMILES-RNN] [https://doi.org/10.5281/zenodo.11356193] contains code to train the generative model and conduct reinforcement learning. The MolScore repository⁸⁴ [https://github.com/MorganCThomas/MolScore] [https://doi.org/10.5281/zenodo.14998609] was used to define the objective functions that scored the de novo molecules.

References

Doman, T. N. et al. Molecular docking and high-throughput screening for novel inhibitors of protein tyrosine phosphatase-1B. J. Med Chem. 45, 2213–2221 (2002).
Article CAS PubMed Google Scholar
Warr, W. A., Nicklaus, M. C., Nicolaou, C. A. & Rarey, M. Exploration of ultralarge compound collections for drug discovery. J. Chem. Inf. Model 62, 2021–2034 (2022).
Article CAS PubMed Google Scholar
Sivula, T. et al. Machine learning-boosted docking enables the efficient structure-based virtual screening of giga-scale enumerated chemical libraries. J. Chem. Inf. Model 63, 5773–5783 (2023).
Article CAS PubMed PubMed Central Google Scholar
Gentile, F. et al. Artificial intelligence–enabled virtual screening of ultra-large chemical libraries with deep docking. Nat. Protoc. 17, 672–697 (2022).
Article CAS PubMed Google Scholar
Stanley, M. & Segler, M. Fake it until you make it? Generative de novo design and virtual screening of synthesizable molecules. Curr. Opin. Struct. Biol. 82, 102658 (2023).
Article CAS PubMed Google Scholar
Arús-Pous, J. et al. Exploring the GDB-13 chemical space using deep generative models. J. Cheminform. 11, 20 (2019).
Article PubMed PubMed Central Google Scholar
Zhavoronkov, A. et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37, 1038–1040 (2019).
Article CAS PubMed Google Scholar
Segler, M. H. S., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4, 120–131 (2018).
Article CAS PubMed Google Scholar
Moret, M., Friedrich, L., Grisoni, F., Merk, D. & Schneider, G. Generative molecular design in low data regimes. Nat. Mach. Intell. 2, 171–180 (2020).
Article Google Scholar
Bagal, V., Aggarwal, R., Vinod, P. K. & Priyakumar, U. D. MolGPT: Molecular generation using a transformer-decoder model. J. Chem. Inf. Model 62, 2064–2076 (2021).
Article PubMed Google Scholar
Weininger, D. SMILES, a chemical language and information system: 1: Introduction to methodology and encoding rules. J. Chem. Inf. Comput Sci. 28, 31–36 (1988).
Article CAS Google Scholar
Olivecrona, M., Blaschke, T., Engkvist, O. & Chen, H. Molecular de-novo design through deep reinforcement learning. J. Cheminform. 9, 48 (2017).
Article PubMed PubMed Central Google Scholar
Popova, M., Isayev, O. & Tropsha, A. Deep reinforcement learning for de novo drug design. Sci. Adv. 4, eaap7885 (2018).
Article CAS PubMed PubMed Central Google Scholar
Brown, N., Fiscato, M., Segler, M. H. S. & Vaucher, A. C. GuacaMol: Benchmarking models for de novo molecular design. J. Chem. Inf. Model 59, 1096–1108 (2019).
Article CAS PubMed Google Scholar
Polykovskiy, D. et al. Molecular Sets (MOSES): A benchmarking platform for molecular generation models. Front Pharm. 11, 1931 (2020).
Article Google Scholar
Gao, W., Fu, T., Sun, J. & Coley, C. W. Sample Efficiency Matters: A Benchmark for Practical Molecular Optimization. ArXiv https://doi.org/10.48550/arxiv.2206.12411 (2022).
Huang, K. et al. Therapeutics data commons: machine learning datasets and tasks for drug discovery and development. in Advances in Neural Information Processing Systems https://doi.org/10.48550/arXiv.2102.09548 (2021).
Ciepliński, T., Danel, T., Podlewska, S. & Jastrzȩbski, S. Generative models should at least be able to design molecules that dock well: a new benchmark. J. Chem. Inf. Model 63, 3238–3247 (2023).
Article PubMed PubMed Central Google Scholar
Martinelli, D. D. Generative machine learning for de novo drug discovery: A systematic review. Comput Biol. Med 145, 105403 (2021).
Article PubMed Google Scholar
Grisoni, F. et al. Combining generative artificial intelligence and on-chip synthesis for de novo drug design. Sci. Adv. 7, eabg3338 (2021).
Article CAS PubMed PubMed Central Google Scholar
Li, X., Xu, Y., Yao, H. & Lin, K. Chemical space exploration based on recurrent neural networks: applications in discovering kinase inhibitors. J. Cheminform. 12, 42 (2020).
Article CAS PubMed PubMed Central Google Scholar
Merk, D., Grisoni, F., Friedrich, L. & Schneider, G. Tuning artificial intelligence on the de novo design of natural-product-inspired retinoid X receptor modulators. Commun. Chem. 1, 68 (2018).
Article Google Scholar
Yang, Y. et al. Discovery of highly potent, selective, and orally efficacious p300/CBP Histone Acetyltransferases inhibitors. J. Med Chem. 63, 1337–1360 (2020).
Article CAS PubMed Google Scholar
Hua, Y. et al. Effective reaction-based De Novo strategy for kinase targets: a case study on MERTK inhibitors. J. Chem. Inf. Model 62, 1654–1668 (2022).
Article CAS PubMed Google Scholar
Moret, M. et al. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design. Nat. Commun. 14, 114 (2023).
Article CAS PubMed PubMed Central Google Scholar
Du, Y. et al. Machine learning-aided generative molecular design. Nat. Mach. Intell. 6, 589–604 (2024).
Article Google Scholar
Thomas, M., Smith, R. T., O’Boyle, N. M., de Graaf, C. & Bender, A. Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study. J. Cheminform. 13, 39 (2021).
Article CAS PubMed PubMed Central Google Scholar
Renz, P., Van Rompaey, D., Wegner, J. K., Hochreiter, S. & Klambauer, G. On failure modes in molecule generation and optimization. Drug Discov. Today Technol. 32–33, 55–63 (2020).
Google Scholar
Guo, J. et al. DockStream: a docking wrapper to enhance de novo molecular design. J. Cheminform. 13, 89 (2021).
Article PubMed PubMed Central Google Scholar
Thomas, M., O’Boyle, N. M., Bender, A. & de Graaf, C. Augmented Hill-Climb increases reinforcement learning efficiency for language-based de novo molecule generation. J. Cheminform. 14, 68 (2022).
Article PubMed PubMed Central Google Scholar
Dodds, M. et al. Sample efficient reinforcement learning with active learning for molecular design. Chem. Sci. 15, 4146–4160 (2024).
Article CAS PubMed PubMed Central Google Scholar
Bjerrum, E. J., Margreitter, C., Blaschke, T. & de Castro, R. L.-R. Faster and more diverse de novo molecular optimization with double-loop reinforcement learning using augmented SMILES. ArXiv https://doi.org/10.48550/arxiv.2210.12458 (2022).
Guo, J. & Schwaller, P. Augmented memory: sample-efficient generative molecular design with reinforcement learning. JACS Au. 4, 2160–2172 (2024).
Article CAS PubMed PubMed Central Google Scholar
Mendez, D. et al. ChEMBL: Towards direct deposition of bioassay data. Nucleic Acids Res 47, D930–D940 (2019).
Article CAS PubMed Google Scholar
Thomas, M., O’Boyle, N. M., Bender, A. & de Graaf, C. Re-evaluating sample efficiency in de novo molecule generation. ArXiv https://doi.org/10.48550/arxiv.2212.01385 (2022).
Renz, P., Luukkonen, S. & Klambauer, G. Diverse hits in De Novo molecule design: diversity-based comparison of goal-directed generators. J. Chem. Inf. Model 64, 5756–5761 (2024).
Article CAS PubMed PubMed Central Google Scholar
Bou, A. et al. ACEGEN: Reinforcement learning of generative chemical agents for drug discovery. J. Chem. Inf. Model 64, 5900–5911 (2024).
Article CAS PubMed PubMed Central Google Scholar
Thakkar, A., Chadimová, V., Bjerrum, E. J., Engkvist, O. & Reymond, J. L. Retrosynthetic accessibility score (RAscore) – rapid machine learned synthesizability classification from AI driven retrosynthetic planning. Chem. Sci. 12, 3339–3349 (2021).
Article CAS PubMed PubMed Central Google Scholar
Weiss, D. R., Bortolato, A., Tehan, B. & Mason, J. S. GPCR-Bench: A benchmarking set and Practitioners’ Guide for G protein-coupled receptor docking. J. Chem. Inf. Model 56, 642–651 (2016).
Article CAS PubMed Google Scholar
Chen, D., Ranganathan, A., Ijzerman, A. P., Siegal, G. & Carlsson, J. Complementarity between in silico and biophysical screening approaches in fragment-based lead discovery against the A2A adenosine receptor. J. Chem. Inf. Model 53, 2701–2714 (2013).
Article CAS PubMed Google Scholar
Doré, A. S. et al. Structure of the Adenosine A2A receptor in complex with ZM241385 and the Xanthines XAC and Caffeine. Structure 19, 1283–1293 (2011).
Article PubMed PubMed Central Google Scholar
Genheden, S. et al. AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning. J. Cheminform. 12, 70 (2020).
Article PubMed PubMed Central Google Scholar
Thakkar, A., Kogej, T., Reymond, J.-L., Engkvist, O. & Bjerrum, E. J. Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical ___domain. Chem. Sci. 11, 154–168 (2020).
Article CAS PubMed Google Scholar
Baroni, M., Cruciani, G., Sciabola, S., Perruccio, F. & Mason, J. S. A common reference framework for analyzing/comparing proteins and ligands. Fingerprints for Ligands and Proteins (FLAP): Theory and application. J. Chem. Inf. Model 47, 279–294 (2007).
Article CAS PubMed Google Scholar
Ballante, F., Kooistra, A. J., Kampen, S., de Graaf, C. & Carlsson, J. Structure-based virtual screening for ligands of G Protein–coupled receptors: what can molecular docking do for you? Pharm. Rev. 73, 1698–1736 (2021).
Article Google Scholar
Langmead, C. J. et al. Identification of novel adenosine A 2A receptor antagonists by virtual screening. J. Med. Chem. 55, 1904–1909 (2012).
Article CAS PubMed PubMed Central Google Scholar
Katritch, V. et al. Structure-based discovery of novel chemotypes for adenosine A2A receptor antagonists. J. Med. Chem. 53, 1799–1809 (2010).
Article CAS PubMed PubMed Central Google Scholar
Rodríguez, D., Gao, Z. G., Moss, S. M., Jacobson, K. A. & Carlsson, J. Molecular docking screening using agonist-bound GPCR structures: Probing the A2A adenosine receptor. J. Chem. Inf. Model. 55, 550–563 (2015).
Article PubMed PubMed Central Google Scholar
Lenselink, E. B. et al. In search of novel ligands using a structure-based approach: a case study on the adenosine A2A receptor. J. Comput. Aided Mol. Des. 30, 863–874 (2016).
Article CAS PubMed PubMed Central Google Scholar
Carlsson, J. et al. Structure-based discovery of A2A adenosine receptor ligands. J. Med. Chem. 53, 3748–3755 (2010).
Article CAS PubMed PubMed Central Google Scholar
Kooistra, A. J., Leurs, R., De Esch, I. J. P. & De Graaf, C. Structure-based prediction of g-protein-coupled receptor ligand function: A β-adrenoceptor case study. J. Chem. Inf. Model 55, 1045–1061 (2015).
Article CAS PubMed Google Scholar
Thomas, M., Bender, A. & de Graaf, C. Integrating structure-based approaches in generative molecular design. Curr. Opin. Struct. Biol. 79, 102559 (2023).
Article CAS PubMed Google Scholar
Ivanenkov, Y. A. et al. Chemistry42: An AI-driven platform for molecular design and optimization. J. Chem. Inf. Model 63, 695–701 (2023).
Article CAS PubMed PubMed Central Google Scholar
Izmailyan, R. et al. Discovery of new antiviral agents through artificial intelligence: In vitro and in vivo results. Antivir. Res. 222, 105818 (2024).
Article CAS PubMed Google Scholar
Korablyov, M. et al. Generative Active Learning for the Search of Small-molecule Protein Binders. ArXiv https://doi.org/10.48550/arXiv.2405.01616 (2024).
Oliveira, S. de., Pedawi, A., Kenyon, V. & Bedem, H. van den. NGT: Generative AI with Synthesizability Guarantees Identifies Potent Inhibitors for a G-protein Associated Melanocortin Receptor in a Tera-scale vHTS Screen. chemRxiv https://doi.org/10.26434/CHEMRXIV-2024-FZ37H (2024).
Jiang, X. et al. Synthetically feasible De Novo molecular design of leads based on a reinforcement learning model: AI-assisted discovery of an Anti-IBD lead targeting CXCR4. J. Med. Chem. 09, 22 (2024).
Google Scholar
Walters, P. W. & Murcko, M. Assessing the impact of generative AI on medicinal chemistry. Nat. Biotechnol. 38, 143–145 (2020).
Article CAS PubMed Google Scholar
Goldman, B., Kearnes, S., Kramer, T., Riley, P. & Walters, W. P. Defining levels of automated chemical design. J. Med Chem. 65, 7073–7087 (2022).
Article CAS PubMed PubMed Central Google Scholar
Arús-Pous, J. et al. Randomized SMILES strings improve the quality of molecular generative models. J. Cheminform. 11, 71 (2019).
Article PubMed PubMed Central Google Scholar
Robertson, N. et al. The properties of thermostabilised G protein-coupled receptors (StaRs) and their use in drug discovery. Neuropharmacology 60, 36–44 (2011).
Article CAS PubMed Google Scholar
Liu, W. et al. Structural basis for allosteric regulation of GPCRS by sodium ions. Science (1979) 337, 232–236 (2012).
CAS Google Scholar
Rucktooa, P. et al. Towards high throughput GPCR crystallography: In Meso soaking of Adenosine A2A Receptor crystals. Sci. Rep. 8, 41 (2018).
Article PubMed PubMed Central Google Scholar
Borodovsky, A. et al. Small molecule AZD4635 inhibitor of A _2A R signaling rescues immune cell function including CD103 ⁺ dendritic cells enhancing anti-tumor immunity. J. Immunother. Cancer 8, e000417 (2020).
Article PubMed PubMed Central Google Scholar
Jespers, W. et al. X-Ray Crystallography and free energy calculations reveal the binding mechanism of A2A Adenosine receptor antagonists. Angew. Chem. Int. Ed. 59, 16536–16543 (2020).
Article CAS Google Scholar
Rucktooa, P. et al. Towards high throughput GPCR crystallography: In Meso soaking of Adenosine A2A Receptor crystals. Sci. Rep. 8, 1–7 (2018).
Article CAS Google Scholar
Milletti, F., Storchi, L., Sforna, G. & Cruciani, G. New and original pKa prediction method using grid molecular interaction fields. J. Chem. Inf. Model 47, 2172–2181 (2007).
Article CAS PubMed Google Scholar
Milletti, F., Storchi, L., Sfoma, G., Cross, S. & Cruciani, G. Tautomer enumeration and stability prediction for virtual screening on large chemical databases. J. Chem. Inf. Model 49, 68–75 (2009).
Article CAS PubMed Google Scholar
3D Structure Generator CORINA Classic, MN-AM, Molecular Networks Altamira, Nuremberg, Germany, www.mn-am.com.
Friesner, R. A. et al. Glide: A new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med Chem. 47, 1739–1749 (2004).
Article CAS PubMed Google Scholar
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction. (MIT Press, 2018).
Wildman, S. A. & Crippen, G. M. Prediction of physicochemical parameters by atomic contributions. J. Chem. Inf. Comput Sci. 39, 868–873 (1999).
Article CAS Google Scholar
RDKit. Open-source cheminformatics. http://www.rdkit.org.
Thomas, M., O’Boyle, N. M., Bender, A. & De Graaf, C. MolScore: a scoring, evaluation and benchmarking framework for generative models in de novo drug design. J. Cheminform. 16, 64 (2024).
Article CAS PubMed PubMed Central Google Scholar
Caffrey, M. & Cherezov, V. Crystallizing membrane proteins using lipidic mesophases. Nat. Protoc. 4, 706–731 (2009).
Article CAS PubMed PubMed Central Google Scholar
Kabsch, W. Integration, scaling, space-group assignment and post-refinement. Acta Crystallogr. D. Biol. Crystallogr. 66, 133–144 (2010).
Article CAS PubMed PubMed Central Google Scholar
Winn, M. D. et al. Overview of the CCP4 suite and current developments. Acta Crystallogr. D. Biol. Crystallogr. 67, 235–242 (2011).
Article CAS PubMed PubMed Central Google Scholar
Vonrhein, C. et al. Advances in automated data analysis and processing within autoPROC, combined with improved characterisation, mitigation and visualisation of the anisotropy of diffraction limits using STARANISO. Acta Crystallogr. A Found Adv. A74, a360–a360 (2018).
McCoy, A. J. et al. Phaser crystallographic software. J. Appl. Crystallogr. 40, 658–674 (2007).
Article CAS PubMed PubMed Central Google Scholar
Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. D. Biol. Crystallogr. 66, 486–501 (2010).
Article CAS PubMed PubMed Central Google Scholar
Bricogne, G. et al. BUSTER version X.Y.Z. Cambridge, United Kingdom: Global Phasing Ltd. (2017).
Schrödinger, L. & DeLano, W. PyMOL. Retrieved from http://www.pymol.org/pymol (2020).
Thomas, M., Lo, S. & O’Boyle, N. Augmented Hill-Climb increases reinforcement learning efficiency for language-based de novo molecule generation. https://github.com/MorganCThomas/SMILES-RNN https://doi.org/10.5281/zenodo.11356193 (2022).
Thomas, M., Araripe, D. & O’Boyle, N. MolScore: a scoring, evaluation and benchmarking framework for generative models in de novo drug design. https://github.com/MorganCThomas/MolScore https://doi.org/10.5281/zenodo.14998609 (2024).

Download references

Acknowledgements

The authors acknowledge open-source tools used in this work. MT acknowledges Nxera Pharma UK for their sponsorship of his PhD studies. The permission to publish this work was granted by Nxera Pharma UK.

Author information

Chris de Graaf
Present address: Structure Therapeutics, 601 Gateway Blvd, San Francisco, CA, USA

Authors and Affiliations

Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
Morgan Thomas & Andreas Bender
Nxera Pharma, Steinmetz Building, Granta Park, Great Abington, Cambridge, UK
Pierre G. Matricon, Robert J. Gillespie, Maja Napiórkowska, Hannah Neale, Jonathan S. Mason, Jason Brown, Kaan Harwood, Charlotte Fieldhouse, Nigel A. Swain, Tian Geng, Noel M. O’Boyle, Francesca Deflorian & Chris de Graaf
College of Medicine and Health Sciences, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
Andreas Bender
STAR-UBB Institute, Babeş-Bolyai University, Cluj-Napoca, Romania
Andreas Bender

Authors

Morgan Thomas
View author publications
Search author on:PubMed Google Scholar
Pierre G. Matricon
View author publications
Search author on:PubMed Google Scholar
Robert J. Gillespie
View author publications
Search author on:PubMed Google Scholar
Maja Napiórkowska
View author publications
Search author on:PubMed Google Scholar
Hannah Neale
View author publications
Search author on:PubMed Google Scholar
Jonathan S. Mason
View author publications
Search author on:PubMed Google Scholar
Jason Brown
View author publications
Search author on:PubMed Google Scholar
Kaan Harwood
View author publications
Search author on:PubMed Google Scholar
Charlotte Fieldhouse
View author publications
Search author on:PubMed Google Scholar
Nigel A. Swain
View author publications
Search author on:PubMed Google Scholar
Tian Geng
View author publications
Search author on:PubMed Google Scholar
Noel M. O’Boyle
View author publications
Search author on:PubMed Google Scholar
Francesca Deflorian
View author publications
Search author on:PubMed Google Scholar
Andreas Bender
View author publications
Search author on:PubMed Google Scholar
Chris de Graaf
View author publications
Search author on:PubMed Google Scholar

Contributions

M.T. conducted the in silico generative molecular design and analysis with supervision from P.G.M., F.D. and Cd.G. in the preparation and use of crystal structures for docking. M.T., PGM, F.D., R.J.G., J.S.M. and CdG triaged the de novo molecules into those selected for synthesis. R.J.G., K.H. and C.F. were responsible for the synthesis and characterisation of the selected compounds. H.N. and J.B. conducted the binding affinity and pharmacological assay of the synthesis compounds. M.N. and T.G. conducted the crystallographic structure experiments and deposited the structures to the protein data bank (PDB). The project was supervised and managed by F.D., N.A.S., NOB, A.B. and Cd.G. M.T. prepared the manuscript with contributions from P.G.M., H.N., R.J.G., F.D, A.B. and Cd.G. All authors read, edited, and approved the manuscript.

Corresponding authors

Correspondence to Francesca Deflorian, Andreas Bender or Chris de Graaf.

Ethics declarations

Competing interests

PGM, RJG, MN, HN, JSM, JB, KH, CF, NAS, TG, NOB, FD, and CdG are currently, or were, employees of Nxera Pharma UK while the work presented in this manuscript was conducted. Nxera Pharma UK is a drug discovery and development company working in the field of G-protein-coupled receptor structure-based drug design. All remaining authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Martin Lochner and the other anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

;Supplementary information

Supplementary Information

Reporting Summary

Transparent Peer Review file

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Thomas, M., Matricon, P.G., Gillespie, R.J. et al. Identification of nanomolar adenosine A_2A receptor ligands using reinforcement learning and structure-based drug design. Nat Commun 16, 5485 (2025). https://doi.org/10.1038/s41467-025-60629-0

Download citation

Received: 22 October 2024
Accepted: 25 May 2025
Published: 01 July 2025
DOI: https://doi.org/10.1038/s41467-025-60629-0