Abstract
Oxidative stress, characterized by the damaging accumulation of free radicals, is associated with various diseases, including cardiovascular, neurodegenerative, and metabolic disorders. The transcription factor Nrf2 is pivotal in cellular defense against oxidative stress by regulating genes that detoxify free radicals, thus maintaining redox homeostasis and preventing cellular aging. Keap1 plays a regulatory role through its interaction with Nrf2, ensuring Nrf2 degradation under homeostatic conditions and facilitating its stabilization and nuclear translocation during oxidative stress. In the initial stage of our study, we conducted in vitro assays on HaCaT cells, a human keratinocyte cell line, to measure the expression levels of Nrf2 to reveal the activity of promising medicinal plants, which were then selected for further evaluation. Subsequently, this study leverages in silico techniques, integrating machine learning with molecular docking and dynamics, to screen natural compounds that potentially activate Nrf2. Data from the ChEMBL database were categorized into active and inactive compounds and used for training different machine-learning models to predict potential Nrf2 activators. The best-performing model was used to select compounds for further evaluation via molecular docking and dynamics, assessing their interactions with Keap1/Nrf2. The LC-MS/MS-based chemical profiles also validated the presence of these chemical compounds. This approach underscores the synergy between in vitro bioassays and in silico approaches in identifying Nrf2 activators, offering a cost-effective strategy for drug development.
Similar content being viewed by others
Introduction
Oxidative stress refers to a condition in which prolonged exposure of body tissues to harmful agents leads to the formation of free radicals that damage structural components in cells, thereby changing their physiological functions1. Chronic oxidative stress is associated with many human diseases, including cardiovascular diseases, neurodegenerative diseases, diabetes, cancer, and cellular aging2. Therefore, cells form a cytoprotective mechanism that removes oxidative free radicals for redox homeostasis, maintaining cell physiology1.
Nuclear factor erythroid-2 p45-related factor 2 (Nrf2) is an important transcription factor that regulates the activity of genes, protecting cells against oxidative stress, maintaining homeostasis, and inhibiting cellular aging3. Possible target genes of Nrf2 include glutathione S-transferase, NAD(P)H quinone oxidoreductase 1, heme oxygenase-1, and glutamate-cysteine ligase4. When oxidative stress occurs, Nrf2 activates the transcription of genes that regulate enzymes in free radical detoxification, maintaining the balance of oxidants and reducing oxidative agents in the body5. The Nrf2 signaling pathway also indispensably controls the human body, such as inflammation, metabolism, immune response, and the development of carcinogenic processes6. Nrf2 activity is directly regulated by the Kelch-like ECH-associated protein (Keap1) through the C-terminal Kelch ___domain, which interacts with the Asp-Leu-Gly and Glu-Thr-Gly-Glu circuit in the Neh2 ___domain of Nrf27. Nrf2, when bound to Keap1, will be ubiquitinated by E3 ligase formed from Cullin3 (Cul3) and RBX1 proteins, then Nrf2 is degraded by the 26 S proteasome. Under normal conditions, where cells are homeostatic, Nrf2 is continuously captured, inhibited, and degraded by Keap1 with an average half-life of 13 min8. Under oxidative stress conditions, inducers modify the cysteine thiol group of Keap1, disrupting the binding between the Keap1/Nrf2/Cul3 complexes. As a result, Nrf2 is not ubiquitinated and is freely released intracellularly with a half-life of up to 100 min8. Nrf2 freely translocates into the nucleus, forms a transcriptional complex with sMaf, and activates the expression of antioxidant genes through the antioxidant response element9.
The target gene of Nrf2 is involved in regulating the glutathione and thioredoxin antioxidant systems, NADPH regeneration, reactive oxygen species and xenobiotic detoxification, and heme metabolism10. The roles of Nrf2 in preventing and treating inflammatory, neurodegenerative, cardiovascular, diabetic, and joint diseases have been documented. Many studies show that reducing oxidative stress through Nrf2 activation can reduce infarction and heart failure as well as prevent the progression of diabetes11. Research by Liu and colleagues reveals that Nrf2 deficiency has a higher risk of severe diabetic nephropathy and the progression of diabetes than normal mice12. Natural compounds that activate Nrf2 expression, such as curcumin, sulforaphan, catechins, and stilbenes, are of research interest in anti-aging and treating diseases related to oxidative stress13. Sulforaphane-rich broccoli sprout extract were reported to prevent diabetes-induced cardiac dysfunction in rats through the regulation of Nrf2 expression14. Therefore, screening compounds in medicinal plants is a potential research trend to discover natural compounds that activate Nrf2 expression.
However, one of the limitations in screening active compounds from medicinal plants is the large number of compounds with low concentrations and complex chemical structures; therefore, isolating pure active substances requires much time and cost. In recent years, in silico methods have emerged as a cost-effective solution in drug development. These methods involve finding the desired active compound from a data library and integrating machine learning techniques. By leveraging information from available databases, machine learning algorithms can process large amounts of data, screening compounds based on their chemical structure and ability to work against specific targets15. This approach significantly reduces the time and resources required to test experimental bioactivity in the drug discovery process, thereby alleviating the financial burden associated with drug development.
Integrating bioassay with machine learning represents an effective approach for screening active compounds16. Experimental research requires reliable testing methods, ensuring objectivity and practical relevance in results. Conversely, machine learning models trained on datasets from both experimental and virtual screenings can identify complex patterns, thus improving the prediction of potential compounds. This accompanying approach also establishes a quantitative relationship, comparing the activity of a large number of compounds with the target molecule, and reduces bias in research outcomes17. Utilizing prediction models alongside experimental bioassays has led to the discovery of many Nrf2 activators. The research of Li et al. has indicated phenylethanoid glycosides are highly effective in inhibiting the Keap1/Nrf2 protein–protein interaction, as demonstrated through molecular docking analysis18. This finding was also validated by enhancing Nrf2 expression in both the cytoplasm and nucleus in vitro. Another study by Shimizu et al. performed ligand-based virtual screening and time-resolved fluorescence resonance energy transfer assays has identified specific hit compounds inhibitory compounds for Keap1 and Nrf219.
Based on the previous scientific research that documented the antioxidant, anti-inflammatory, and Nrf2 activation properties of various medicinal plants and their distribution across Vietnam, this study aims to screen the bioactivity of five medicinal plants, including leaves of Helicteres hirsuta (LHH), roots of Piper sarmentosa (RPS), rhizomes of Zingiber zerumbet (RZZ), stems from Oroxylum indicum (SOI), and stems from Phyllanthus amarus (SPA) in enhancing Nrf2 in the HaCaT cells. Oxidative stress damage, such as skin aging and keratinization, often results from direct exposure to UV radiation and pollutants20. Our research, which focus on studying Nrf2 expression in HaCaT keratinocytes, offers promising insights into potential therapeutic targets for skin-related oxidative stress.The choice of HaCaT cells, known for their stable physiological activities ideal and strong proliferation for in vitro experiments, were chosen to perform in this study. Simultaneously, the study evaluates potential natural compounds that inhibit the Keap1/Nrf2 using multiple machine learning models, including Support Vector Machines (SVM), Random Forest (RF), and Multi-layer Perceptron (MLP) Classifier. The screened compounds were further evaluated using molecular docking and dynamics to assess their interactions with the target Keap1/Nrf2 complex and their electrical and structural characteristics, respectively.
Result
In vitro screening
Figure 1 summarizes the relative activity of Nrf2 in HaCaT cells exposed to the methanolic crude extract and sub-fractions. This study used 1% DMSO as a negative control and tBHQ as a positive control, with 10 µM tBHQ increasing Nrf2 activity by over 450% in HaCaT cells. Among 30 crude extracts and sub-fractions, six of the sub-fractions exhibited the ability to significantly enhance Nrf2 activity beyond that of tBHQ at 10 µM. P. sarmentosum roots, Z. zerumbet zhiromes and O. indicum stems showed a strong ability to activate Nrf2 on HaCaT cells. The water sub-fractions of P. sarmentosum roots and O. indicum stems exhibited an outstanding ability to enhance over 1000% of Nrf2 activity, two-fold higher than the positive control (P-value < 0.0001). Other sub-fractions from O. indicum stems, excluding chloroform fraction, mildly to highly enhanced Nrf2 activity in HaCaT cells, in a range of 200–800%. The ethyl acetate and n-butanol sub-fractions from Z. zerumbet zhiromes activated up to 600% of Nrf2 activity (P-value < 0.01), whereas the methanolic crude extract and the following sub-fraction of n-Hexane and chloroform strongly decreased Nrf2 expression on the cells. In contrast, the methanolic crude extract of P. amarus activated Nrf2 by more than 900% (P-value < 0.0001), while other sub-fractions did not show superior Nrf2 activation.
In vitro screening results on the expression of Nrf2 activity of methanol crude and sub-fractions from five medicinal plants on HaCaT cells, including crude extract and sub-fractions from (A) Helicteres hirsuta leaves (LHH); (B) Piper sarmentosum roots (RPS); (C) Zingiber zerumbet rhizomes (RZZ), (D) Oroxylum indicum stems (SOI) and (E) Phyllanthus amarus stems (SPA). Negative control: DMSO; positive control: tBHQ.
Machine learning model performance
The optimal hyper-parameter values for each model, using different molecular descriptors, are presented in Table S1, while their corresponding performances are detailed in Table S2. Among these models, the SVM model, which utilizes the Morgan3 fingerprint, exhibited the highest performance. The metrics for this model were as follows: Accuracy − 0.87, Sensitivity − 0.88, Specificity − 0.86, area under curve (AUC) − 0.93, and F1_score − 0.89. The Tanimoto index between 613 compounds used in the machine learning model building and 99 compounds in the testing data set are less than 0.26 (Figure S6)19. This result confirms that the data set is diverse and suitable enough to be used for ML prediction modeling since such diversity enhances the model’s ability to generalize effectively to novel compounds. Consequently, this model was employed to classify a list of natural compounds (refer to Table S3). The model successfully predicted 16 active compounds, as shown in Table 1.
Molecular docking
The molecular docking results of 16 active compounds are summarized in Table 1; Fig. 2 and Table S4. The ligands binded with Keap1/Nrf2 that exhibited the highest docking scores were geraniin (− 12.2 kcal/mol), corilagin (−11.7 kcal/mol), and β-sitosterol-3-O-β-D-glucopyranoside (− 10.8 kcal/mol). All compounds interacted with the amino acids of Keap1/Nrf2 through non-covalent bonds, including Van der Waals, hydrogen bonds, hydrogen-carbon bonds, and π-alkyl bonds. The hydroxyl and carbonyl groups of the ligands tended to form hydrogen bonds with Arg326, Ala366, Val512, Val514, Thr560, Val561, Val606, and Val608, contributing to the high binding energy between the ligand and Keap1. These binding sites are consistent with other studies that docked flavonoid compounds in Citrus21. These results emphasized the potential of the compounds predicted by the machine learning model to inhibit Keap1.
Molecular dynamics simulation
Three selected compounds with the best docking scores when binding to Keap1/Nrf2 including geraniin, corilagin, and β-sitosterol-3-O-β-D-glucopyranoside are evaluated the dynamic activity of Keap1 complexes with the three compounds through MD simulation analysis in 50 ns. Comparative study of MD simulation analysis results of compound complexes with Keap1 and Keap1/Nrf2 complexes (2FLU) as shown Fig. 3A. The RMSD value of Keap1/Nrf2 was in the range of 0.7–1.5 Å in the first 30 ns and stabilized at 1.0–1.6 Å in the next 20 ns. The RMSD value of the Keap1/Nrf2-geraniin complex is higher than that of Keap1/Nrf2 and ranges from 1.5 to 2.25 Å. Meanwhile, the RMSD of Keap1/Nrf2-corilagin and Keap1/Nrf2-β-sitosterol-3-O-β-D-glucopyranoside complexes are range of 0.7–1.5 Å and lower than that of Keap1/Nrf2.
Figure 3B illustrates the Root Mean Square Fluctuation (RMSF) values for Keap1/Nrf2 and its complexes with three selected ligands. Generally, the RMSF value of the Keap1/Nrf2-geraniin complex is higher than that of Keap1/Nrf2 and the other complexes, particularly for residue numbers between 570 and 600. The RMSF values for all complexes fluctuate between 1 and 2 Å, except for specific fluctuations at atom numbers ranging from 382 to 385.
The radius of gyration (Rg) values of Keap1/Nrf2 and Keap1/Nrf2-ligand complexes as shown in Fig. 3C reveal only minor changes during the simulation period, indicating that their structures were minimally affected. Consistently, the Rg of the Keap1/Nrf2-geraniin complex was highest with the value of 1.86 ± 0.11 nm, followed by that of Keap1/Nrf2 at 1.81 ± 0.14 nm. Furthermore, Rg values of the Keap1/Nrf2-corilagin and Keap1/Nrf2-β-sitosterol-3-O-β-D-glucopyranoside complex are similar with the value of 1.78 ± 0.09 nm.
The number of hydrogen bonds formed between three compounds and the Keap1/Nrf2 complex’s binding site fluctuated between 1 and 12 during 50 ns of simulation, as shown in Fig. 3D. Notably, corilagin established a higher number of hydrogen bonds (3–12 bonds) with Keap1/Nrf2 compared to the bonds formed by geraniin (1–11) and β-sitosterol-3-O-β-D-glucopyranoside with Keap1 (1–7).
To get the insight into the degree of the flexibilities of the residues, all four system trajectories were analyzed by the principal component analysis, which determines the total motions of the protein backbone atoms indicated by eigenvectors of the covariance matrix (Figure S7). The first three principal components (PCs) in Keap1/Nrf2, Keap1/Nrf2-β-sitosterol-3-O-β-D-glucopyranoside, Keap1/Nrf2-geraniin and Keap1/Nrf2-corilagin explained 50.1%, 37.6%, 39.5%, and 43% of variances of the total variance observed in the simulation trajectories, respectively. It was also observed that the total variance in the Keap1/Nrf2 complex for the first 20 PCs was highest with 70.7%. For Keap1/Nrf2-β-sitosterol-3-O-β-D-glucopyranoside, Keap1/Nrf2-geraniin and Keap1/Nrf2-corilagin complexes, the total variances were 64.6%, 67.5%, and 66.9%, respectively. These results indicate that the binding of three selected compounds hindered the motion of the Keap1 backbone atoms, which proves the likelihood of the successful binding of ligands to the Keap1/Nrf2 complex.
Free energy landscape (FEL) of Keap1 and ligand complexes projected on PC1 and PC2. (A) and (B) 3D and 2D FEL graphs of Keap1/Nrf2 and ligand complexes. The Energy distribution is demonstrated by the coloring gradient: Blue indicates minimum energy (stable state) while red defines maximum energy (unstable state).
We also identified the low-energy minima during the MD simulation from a Free Energy Landscape (FEL) analysis, which was conducted using PC1 and PC2. The 2D and 3D graphs of the FEL are depicted in Fig. 4. In these plots, the dark blue spots represent the energy minima and energetically favored protein conformations, while the light blue spots indicate less favorable conformations. The observation of a shallow and narrow energy basin during the simulation suggests a low stability of the protein-ligand complex. For Keap1/Nrf2 and Keap1/Nrf2-β-sitosterol-3-O-β-D-glucopyranoside, a deep and shallower minima was observed, whereas for Keap1/Nrf2-geraniin and Keap1/Nrf2-corilagin, two deep and broader valley energy minima were evident. In these four systems, the initial Gibbs free energies ranged from 10.7 kJ/mol to 12.8 kJ/mol. Especially, Keap1/Nrf2-Corilagin exhibited the lowest initial Gibbs free energy at 10.7 kJ/mol, which indicates its superior stability compared to other three systems.
In silico ADME and toxicity
The in silico ADME (absorption, distribution, metabolism, and excretion) properties are demonstrated in Table 2. The analysis of Lipinski’s Rule of Five indicated that β-sitosterol-3-O-β-D-glucopyranoside had only one violation by molecular weight < 500 g/mol, while other criteria including H-bond donor and acceptor counts, and log P values, confirming ideal drug-like behavior. Meanwhile, geraniin and corilagin were found to violate Lipinski’s Rule of Five, as evidenced by their molecular weights exceeding 500 g/mol and their H-bond donor and acceptor counts more than five bonds. Topological polar surface area (TPSA) influenced drug transport properties including blood-brain barrier permeability, permeability, and intestinal absorption. Compounds with TPSA exceeding 140 Å2 were predicted to exhibit limited oral bioavailability due to passive absorption barriers. The TPSA value of geraniin and corilagin was 450.25 and 310.66 Å2 suggests that these compounds had passive absorption limitations, potentially leading to poor oral bioavailability. The Log Kp value, representing the skin permeation coefficient, offers critical insights into skin penetration of compounds. All three compounds had negative Log Kp values indicating limited skin permeability. The ADME prediction also predicted none of the compounds interact with the cytochrome P450 enzyme system. These results indicate that cytochrome P450 enzymes might not participate in the metabolism of these compounds and mediate many adverse reactions. The ADME analysis revealed that none of the compounds are predicted to interact with the cytochrome P450 enzyme system. This suggests that these compounds are unlikely to be metabolized via cytochrome P450 pathways, thereby reducing the risk of adverse reactions.
The oral toxicity predictions obtained from ProTox 3.0 indicated that β-sitosterol-3-O-β-D-glucopyranoside is non-toxic, with a predicted median lethal dose (LD50) of 8000 mg/kg and an accuracy level of 69.26%. Corilagin was classified as potentially harmful if ingested, with an LD50 of 2260 mg/kg and a confidence level of 68.07%. Geraniin demonstrated an LD50 of 300 mg/kg and an accuracy level of 67.38%. These results highlight the varying toxicity levels among the compounds, suggesting distinct safety considerations for further experiment.
LC-QTOF-MS/MS: based chemical profiles
Based on the results from in silico approach, the predicted active compounds mainly belong to O. indicum, Z. zerumbet, and P. amarus. Thus, we established the chemical profiles of methanolic crude extract from three medicinal plants using LC-QTOF-MS/MS analysis. The summaries of the retention time and precursor fragment of compounds found in the crude methanol extract of three medicinal plants: O. indicum stems, Z. zerumbet rhizomes, and P. amarus stems were shown in Table 3.
Discussion
In this study, the in vitro activity of five methanolic crude extracts and 25 sub-fractions from five Vietnamese medicinal plants, including H. hirsuta leaves, P. sarmentosum roots, Z. zerumbet rhizomes, O. indicum stems, and P. amarus stems towards activating Nrf2 in HaCaT cells all tested at a concentration of 100 µg/mL. The ethyl acetate and n-butanol sub-fractions from Z. zerumbet rhizomes; water sub-fraction from P. sarmentosum roots; crude extract, ethyl acetate and water sub-fractions from O. indicum stems; and crude extract from P. amarus stems exhibited significant ability to activate Nrf2 expression in HaCaT cells. The water sub-fractions from P. sarmentosum roots and O. indicum stems showed the most prominent among other tested samples, increasing activity tenfold compared to normal conditions at a concentration of 100 µg/mL. The ethyl acetate extracs from Z. zerumbet was reported to limit the toxicity of paracetamol thought increasing the concentration of glutathione and antioxidant superoxide dismutase, reported by Hamid et al.22. The antioxidants were also regulated by Nrf2, suggesting that the antioxidant activity of Z. zerumbet is closely associated with Nrf2 activity23. The main components from O. indicum were found as flavonoids, such as baicalin, baicalein, and chrysin24. These compounds have been investigated for their capacity to stimulate the nuclear translocation of Nrf2 regulatory signaling and to enhance the expression of antioxidant genes25,26. The variation in Nrf2 regulatory activity between the crude methanol extract and sub-fractions of P. amarus stem may be attributed to differences in their chemical composition and content. However, research on the specific chemical constituents in P. amarus stem that focus on activating Nrf2 expression remains limited.
The machine learning model with the capacity to predict potent natural compounds to inhibit Keap1/Nrf2 from the support vector machine algorithm and the Morgan3 chemical structure encoding method was successfully established in this research. The performance evaluation indexes of the model include an accuracy of 0.87, sensitivity of 0.88, specificity of 0.86, AUC of 0.93, and F1 score of 0.89. A total of 16 natural compounds were predicted to exhibit Keap1/Nrf2 inhibitory activity, suggesting their potential to indirectly enhance Nrf2 activity, indicating as Nrf2 activators. Out of the 16 compounds predicted by the model, 14 are classified within the flavonoid, phenolic, and polyphenol categories. Molecular docking results of these 16 natural compounds with the Keap1/Nrf2 complex (PDB ID: 2FLU) indicated that geraniin (-12.2 kcal/mol), corilagin (-11.7 kcal/mol), β-sitosterol-3-O-β-D-glucopyranoside (-10.8 kcal/mol) exhibited the highest binding affinities (Table 1). All interactions of the 16 compounds with amino acids of Keap1 were non-covalent bonds including Van der Waals, hydrogen bonds, hydrogen-carbon bonds, and π-alkyl bonds (Table 1). The hydroxyl groups of the ligands acted as hydrogen donors to amide groups on Ala366, Val512, Val514, Thr560, Val561, Val606, and Val608, while the ligand’s carbonyl group accepted hydrogen from the side chain of Arg326. These interactions between the ligand’s hydroxyl and carbonyl hydrogen groups and specific amino acids in Keap1 contribute significantly to stabilizing the ligand-Keap1 binding. Additionally, compounds exhibiting low binding energies (below − 10 kcal/mol) commonly bind at the Val418 site, consistent with findings from previous studies on flavonoid docking from Citrus plants21.
The three compounds with top docking score with Keap1 were selected, including geraniin, corilagin, and β-sitosterol-3-O-β-D-glucopyranoside to investigate their dynamic interactions with Keap1 complexes through MD simulations in 50 ns. Comparative analysis was conducted between the MD simulations of these compound complexes with Keap1 and the Keap1/Nrf2 complex (2FLU). The RMSD values of the Keap1/Nrf2 complex ranged from 0.7 to 1.5 Å over the initial 30 ns, stabilizing at 1.0 to 1.6 Å for the subsequent 20 ns. In contrast, the RMSD values for the Keap1/Nrf2-geraniin complex were higher, fluctuating between 1.5 and 2.25 Å. Meanwhile, the RMSD of the Keap1/Nrf2-corilagin and Keap1/Nrf2-β-sitosterol-3-O-β-D-glucopyranoside complexes was stable in the range of 0.7–1.5 Å and lower than that of Keap1/Nrf2. These findings suggest that both the Keap1/Nrf2-corilagin and Keap1/Nrf2-β-sitosterol-3-O-β-D-glucopyranoside complexes exhibit more consistent binding affinities to Keap1 than the Keap1/Nrf2 complex, while not compromising the structural integrity. As combining results from the RMSD, RMSF, and Rg analyses, along with the number of hydrogen bonds consistently demonstrated stable binding between the corilagin complex and Keap1 throughout the MD simulation. These findings underscore corilagin from P. amarus stem as a promising compound for inhibiting Keap1/Nrf2; thereby potentially activating Nrf2 expression indirectly.
Regarding ADME/Tox analysis, Lipinski’s Rule of Five represents one of the first qualitative frameworks in the design of compounds with oral bioavailability and selecting drug candidates for further development based on physicochemical properties, including molecular weight, hydrogen bond acceptors and donors, and lipophilicity (log P)27. The results indicated that while β-sitosterol-3-O-β-D-glucopyranoside shows the most potential drug-likeness for oral bioavailability, corilagin and geranin significantly violate Lipinski’s Rule of Five. Despite this, the ability of these compounds to activate Nrf2 remains noteworthy since natural compounds are frequently regarded as exceptions to Lipinski’s rules because of their unique structural features28. They often demonstrate high biological activity despite exceeding the rule’s parameters, such as molecular weight or hydrogen bond donor/acceptor counts. This characteristic is attributed to the capacity for low lipophilicity, selective intramolecular hydrogen bonding, and resemblance to endogenous molecules, enabling active transport across biological membranes29. These distinct characteristics also allow complex structures of natural molecules to maintain drug-like properties and bioavailability, even when traditional guidelines might suggest otherwise.
Liquid chromatography-tandem mass spectrometry revealed the presence of flavonoids and phenolic compounds in sub-fractions that exhibited enhanced Nrf2 expression activity in HaCaT cells. Notably, the LC-MS/MS spectrum of P. amarus stem extract showed the highest intensity peak with the precursor at 633.0721 m/z and the fragments at 300.9959; 275.0183 m/z, respectively, at a retention time of 9.81 min (Table 2), which corresponds to the mass spectrum of corilagin in the database30. Corilagin, identified as a compound with significant potential to inhibit the Keap1/Nrf2 complex, exhibited promising molecular docking results with a low binding energy (-11.7 kcal/mol). The experimental results further confirmed its ability to activate Nrf2, particularly notable as corilagin exhibited the highest concentration in the crude methanol extract of P. amarus stems and was the sole sample to enhance Nrf2 expression (Fig. 1).
Based on the outcomes, the further objective was to isolate and purify β-sitosterol-3-O-β-D-glucopyranoside and corilagin to assess their capacity for enhancing Nrf2 expression in vitro.
Conclusion
This study confirms the effectiveness of in vitro and in silico hybrid models in studying the Nrf2 activation of natural compounds from medicinal plant sources. The LC-MS/MS based chemical profiles further validate this, aligning closely with the outcomes of machine learning model, molecular docking and dynamics simulation, as well as the Nrf2 expression level of the HaCaT cells. The study identifies corilagin as a compound with significant potential for further in vitro and in vivo experiments, with the aim of inducing the Nrf2 activation through the inhibition of Keap1/Nrf2.
Material and method
Chemicals and instrumentals
Chemicals Dulbecco’s modified eagle medium (DMEM), fetal bovine serum, penicillin-streptomycin, and alamarblue were purchased from Thermo Fisher Scientific, USA. Methanol, n-hexane, chloroform, ethyl acetate, n-butanol, toluene, ammonia solution, and formic acid were purchased from Xilong Scientific Co., Ltd., China. Luciferase assay substrate, luciferase assay buffer, and cell culture lysis 5x were purchased from Promega, USA. Dimethyl sulfoxide (DMSO) was acquired from Nacalai Tesque Inc., Japan.
Instrumentals Biological safety cabinet (Sanxiong Technology, Taiwan), RE300 Vacuum Evaporator (Stuart, UK), Analytical balance (Ohaus, USA), Synergy HT Multi-Mode Reader (Bio-Tek Instruments Inc, USA); NuAire CO₂ incubator (NuAire, USA), WFH-203B thin plate chromatography lamp (China), Exion LCTM - X500R QTOF (Sciex, USA).
Sample collection, extraction, and preparation
The list of medicinal plants consisted of fresh leaves from Helicteres hirsuta, roots of Piper sarmentosa, rhizomes of Zingiber zerumbet, stems from Oroxylum indicum, and stems from Phyllanthus amarus in December 2022 in Nui Cam Moutain, An Giang province, Vietnam. All medicinal plants were kindly suggested by the Vietnamese traditional physician Nguyen Thien Chung, Head of Oriental Traditional Medicine Association of Tinh Bien District, An Giang Province, Vietnam, drawing from traditional remedies and the local folk practices. All medicinal plants grew wild and were collected by Nguyen Thien Chung and Le Nguyen Thien Han. The plants were authenticated by Associate Professor Nguyen Hoang Khue Tu, International University, Vietnam National University Ho Chi Minh City and Le Nguyen Thien Han. The morphological characteristics of the medicinal plants are detailed in the supplementary information (Figure S1 to Figure S5).
The plant materials were washed, cut into small pieces, and dried in a heater at 50 °C. After drying, all samples were ground into powder and sieved through a 1 mm sieve. The powder with humidity less than 10% was stored in a colored glass bottle at the cool condition. 25 g of each medicinal powder was macerated with 200 mL of methanol for three days, shaken thoroughly for each 12 h. The extraction was filtered by using Newstar 102 filter paper and evaporated under vacuum conditions at 50 °C, − 90 kPa to obtain the methanolic crude extract. The crude extract was dispersed in distilled water and sequentially partitioned with n-hexane, chloroform, ethyl acetate, and n-butanol. The remaining solution after n-butanol extraction is referred to as the remaining water sub-fraction. The solvent was removed using a vacuum evaporator at 50 °C, − 80 kPa to obtain the sub-fraction extracts, which were then stored at 4 °C for subsequent experiments.
Cell culture
HaCaT cells with luciferase reporter gene were kindly provided by Professor Chia-Hung Yen, Graduate Institute of Natural Products, College of Pharmacy, Kaohsiung Medical University31. The cells were cultured in DMEM with 10% heat-inactivated Fetal Bovine Serum, 100 U/mL penicillin, and (100 µg/mL) streptomycin. The cells were maintained in an incubator with the temperature of 37 °C with 5% CO2.
In vitro Nrf2 activity screening assay
The regulative Nrf2 activity was determined based on reporter gene assay as described by Wu et al. with minor revision32. Initially, HaCaT cells were seeded into 96-well plates (10⁴ cell/well) and followed by treatment with extracts diluted in DMSO (final concentration was 100 µg/mL) for 18 h in the incubator at 37 °C and 5% CO2. After 18 h of exposure, the old media with extracts were removed, then the cells were tested for cell viability by adding 100 µL of fresh media with 10% of alamarblue reagent (1 mg/mL) and continuously incubated for 4 h. The fluorescence of resazurin in reduced alamarblue was measured with excitation wavelength at 530–560 nm and emission wavelength at 590 nm using Synergy HT Multi-Mode Reader (BioTek, Winooski, VT, USA). After that, the cells were harvested following the manufacturer’s guidelines (Promega Corporation, Madison, WI, USA) for luciferase activity. DMSO at the concentration of 1% was used at the negative control, which represents 100% of cell viability and 100% of relative Nrf2 activity. The wells were treated with tBHQ (10 µM) as the positive controls, and 1% DMSO as the negative controls. The relative Nrf2 activity was calculated as follows:
Where: F: fluorescence of the reaction of luciferase protein with luciferin. V: number of survival cells, determined through the fluorescence of resazurin in alamarblue.
The cell viability and relative Nrf2 activity data were assessed and exported using Gen5 software (Version 2.04, BioTek®, USA). The data were analyzed with Microsoft Office Excel 2016, and all graphs were created using GraphPad Prism (Version 9.5.0.730, Dotmatics, USA). All experiments were performed three times, and results are expressed as mean ± standard deviation.
Data collection and preparation for machine learning
The compounds were experimentally tested for Keap1/Nrf2 inhibition, and their activity values were found and obtained from the ChEMBL public database using CHEMBL3038498 and CHEMBL2069156 IDs. Then, the obtained data was cleaned by the removal of null data by deleting the rows not containing the half-maximal inhibitory concentration values (IC₅₀) and dropping duplicated data with one entry retained. The IC₅₀ of the drug was used to measure the compound potency. Molecular fingerprints were computed from the compound SMILES through different techniques with 166 bit Molecular Access System (MACCS), 1024 bit Morgan fingerprints with both 2 and 3 radii (referred to as Morgan2 and Morgan3), and path-based 1024 bit RDK5 fingerprints (RDK5)33. Molecular fingerprints were derived from the compound’s SMILES using 166-bit Molecular Access System (MACCS), 1024-bit Morgan fingerprints with radii of both 2 and 3 (Morgan2 and Morgan3, respectively), and path-based 1024-bit RDK5 fingerprints (RDK5). Compounds with IC50 ≤ 10 µM were defined as “active” samples, and those with higher values were defined as “inactive” samples. Finally, 613 input vectors were used to build a screening model with 363 compounds classified into the “active” group and 250 compounds classified into the “inactive” group. This dataset was used for optimizing various machine learning (ML) models as mentioned in the following section. The specific compounds information can be seen in Supplementary Material 2.
The testing data set of natural compounds from medicinal plants in this study was constructed based on chemical composition information from a collection of published literature. The chemical structures of the compounds in SMILES format were used to screen and predict the ability to inhibit Keap1/Nrf2 using various ML models. The list of collected compounds is shown in Table S.3.
ML model optimization and testing
To build the expected model of the compound’s Keap1 inhibitory activity, different types of models were used including SVM, RF and MLP. These three models were built using Python language employed on the Google Colaboratory platform. The model hyperparameters tuning was conducted using a grid search technique with 5-fold cross-validation through the scikit-learn package in Python. The data retrieved from the ChEMBL public database was used for this tuning process, with a training to test data ratio of 4:1. The grid search then saved the best parameters based on the accuracy value from the cross-validation used inside the function. Next, the model with the best accuracy was applied to the dataset built from natural compounds in medicinal plants to predict the potential “active” compounds. The performance results, including accuracy, precision, recall, F1 score, and AUC, were calculated.
Molecular docking
After identifying compounds with potential to inhibit the Keap1/Nrf2 complex using the optimized ML model, molecular docking was conducted to analyze the chemical interactions of the Keap1/Nrf2 receptor binding site and the filtered ligands. The 3D structure of the Keap1/Nrf2 complex was obtained from the RCSB Protein Data Bank database (PDB ID: 2FLU) SMILES formats of natural compounds were converted to 3D structures. All protein and compound structures were prepared by removing water molecules, adding missing hydrogen atoms and assigning charges using the Dock Prep tool of UCSF Chimera program version 1.17.3. The docking box dimension is 41 Å × 41 Å × 41 Å with its center at x = − 26, y = 0 and z = − 20. Next, the molecular docking process was performed using AutoDock Vina 1.1.2 to simulate the ligand–receptor binding poses and calculate the docking score or the binding energy of the ligand–receptor complex34. Finally, Biovia Discovery Studio 22.1 was used to visualize binding modes and explore the interaction types between the receptor and ligands.
Molecular dynamics (MD) simulations
Three ligands with best docking scores docked in complex with Keap1/Nrf2 and the original Keap1/Nrf2 were further subjected to MD simulations using Gromacs 2024.135. Topology files for Keap1/Nrf2 and their complexes with ligands were prepared using CHARMS-36 force field and TIP3P GROMACS recommended water model35. CGENFF web server tool was used to prepare the “.str” file for the ligand topology after which the topology files of the complexes were appropriately updated using appropriate python codes to manually include ligands topology36. The protein-ligand complex was placed in a solvated dodecahedron box with the minimum distance of 1 nm between the solute and the box wall. All systems were ionized, neutralized and then solvated using the simple point charge-216 explicit water model (spc216.gro). Energy minimization was run for the maximum of 500 ps (picoseconds) using steepest descent algorithm until the systems established a stable conformation and gradually heated from 0 to 300 K. Following the energy minimization, the system was equilibrated under position restraints in two stages with a time step of 2 fs to obtain a stable thermodynamic. The system was first equilibrated in an NVT ensemble for 1 ns using the modified Berendsen algorithm at a temperature of 300 K. This was followed by a 1 ns equilibration in an NPT ensemble using the Parrinello-Rahman algorithm at the pressure of 1 bar. To account for electrostatic forces, the Ewald Particle Mesh method was adopted. An 1 nm cutoff was applied to treat short-range electrostatics and van der Waals interactions. Hydrogen bonds were constrained in both equilibration and production runs using the LINCS algorithm. Finally, the production run was conducted for 50 ns for each system with a snapshot captured every 10 ps to record the conformation trajectory.
In silico ADME and toxicity analysis
Compounds with high-ranking binding energy scores were conducted the in silico ADME prediction using the SwissADME web tool (accessible at www.swissadme.ch)37. The SMILES format of compounds was used to evaluate the ADME properties, including molecular weight, number of hydrogen bond acceptor/donor, topological polar surface area (TPSA), lipophilicity (log P), water solubility (log S), blood brain-barrier, skin permeability (log Kp), pharmacokinetics and drug-lead likeness. The in silico toxicity of compounds was assessed by ProTox 3.0 web tool (accessible at https://tox.charite.de/protox3/).38 The toxicity of compounds was evaluated by predicted toxicity class and median lethal dose (LD50).
LC-QTOF-MS/MS conditions
LC/QTOF-MS/MS was employed to establish chemical profiles of Z. zerumbet, O. indicum, and P. amarus crude extract. The analysis system was Exion LC™ - X500R QTOF (Sciex, USA). The liquid chromatography was carried out using column Hypersil GOLD Dim. 150 × 2.1 mm, 3 μm (Thermo Scientific, USA) maintained at 25 °C. The mobile phase consisted of 0.1% formic acid in water (A), and 0.1% formic acid in acetonitrile (B) with following gradient: 1 min: 2% (B), 20 min: 98% (B) and 25 min: 98% (B) and 0.4 mL/min flow rate. The injection volume was 2 µL.
The mass spectrometry was performed in negative polarity mode. The source parameters were as follows: the capillary voltage is − 4500 V. The collision gas pressure was 7 psi. TOFMS with TOF start and stop mass was 100 and 2000 (Da), respectively, while for TOFMS/MS is 50 and 2000 (Da). The declustering potential was − 80 V and the declustering potential spread was 0 V. In TOFMS, the collision energy was − 20 V, while for TOFMS/MS was − 35 V. The collision energy spread was 10 V. The mass spectrometer data of compounds were searched for comparison on PubChem, Mass Bank of Europe, and literature. The identified compounds were fragmented on ChemDraw (version 22.0.0) and the mass spectrum of the calculated parent ion on the software was compared with the mass spectrum of the observation parent ion.
Data availability
Data is provided within the manuscript or supplementary information files.
Change history
18 March 2025
A Correction to this paper has been published: https://doi.org/10.1038/s41598-025-92487-7
References
Ngo, V. & Duennwald, M. L. Nrf2 and oxidative stress: a general overview of mechanisms and implications in human disease. Antioxidants 11, 2345. (2022). https://doi.org/10.3390/antiox11122345
Anik, M. I. et al. Role of reactive oxygen species in aging and age-related diseases: a review. ACS Appl. Biomater. 5, 4028–4054 (2022). https://doi.org/10.1021/acsabm.2c00411
Zinovkin, R. A., Kondratenko, N. D. & Zinovkina, L. A. Does Nrf2 play a role of a master regulator of mammalian aging? Biochemistry (Moscow) 87, 1465–1476 (2022) https://doi.org/10.1134/S0006297922120045.
Glorieux, C., Enríquez, C., González, C., Aguirre-Martínez, G., & Buc Calderon, P. The Multifaceted Roles of NRF2 in Cancer: Friend or Foe? Antioxidants 13, 70 (2024). https://doi.org/10.3390/antiox13010070
He, F., Ru, X. & Wen, T. NRF2, a transcription factor for stress response and Beyond. Int. J. Mol. Sci. 21, 4777 (2020). https://doi.org/10.3390/ijms21134777
Tian, C., Gao, L. & Zucker, I. H. Regulation of Nrf2 signaling pathway in heart failure: Role of extracellular vesicles and non-coding RNAs. Free Radic. Biol. Med. 167, 218–231 (2021). https://doi.org/10.1016/j.freeradbiomed.2021.03.013
Canning, P., Sorrell, F. J. & Bullock, A. N. Structural basis of Keap1 interactions with Nrf2. Free Radic. Biol. Med. 88, 101–107 (2015). https://doi.org/10.1016/j.freeradbiomed.2015.05.034
Saha, S., Buttari, B., Panieri, E., Profumo, E. & Saso, L. An overview of Nrf2 Signaling Pathway and its role in inflammation. Molecules 25, 5474 (2020). https://doi.org/10.3390/molecules25225474
McMahon, M., Thomas, N., Itoh, K., Yamamoto, M. & Hayes, J. D. Dimerization of Substrate Adaptors Can Facilitate Cullin-mediated Ubiquitylation of Proteins by a Tethering Mechanism. J. Biol. Chem. 281, 24756–24768 (2006). https://doi.org/10.1074/jbc.M601119200
Gorrini, C., Harris, I. S. & Mak, T. W. Modulation of oxidative stress as an anticancer strategy. Nat. Rev. Drug Discov. 12, 931–947 (2013). https://doi.org/10.1038/nrd4002
Wang, J. et al. NRF2-related epigenetic modifications in cardiac and vascular complications of diabetes mellitus. Front. Endocrinol. (Lausanne). 12 (598005) (2021). https://doi.org/10.3389/fendo.2021.598005
Liu, Y. et al. Nrf2 deficiency deteriorates diabetic kidney disease in Akita model mice. Redox Biol. 58, 102525 (2022). https://doi.org/10.1016/j.redox.2022.102525
Matzinger, M., Fischhuber, K. & Heiss, E. H. Activation of Nrf2 signaling by natural products-can it alleviate diabetes? Biotechnol. Adv. 36, 1738–1767 (2017). https://doi.org/10.1016/j.biotechadv.2017.12.015
Xu, Z. et al. Broccoli sprout extract prevents diabetic cardiomyopathy via Nrf2 activation in db/db T2DM mice. Sci. Rep. 6, 30252 (2016). https://doi.org/10.1038/srep30252
Parvatikar, P. P. et al. Artificial intelligence: machine learning approach for screening large database and drug discovery. Antivir. Res. 220, 105740 (2023). https://doi.org/10.1016/j.antiviral.2023.105740
Dai, W., Li, L. & Guo, D. Integrating bioassay data for improved prediction of drug-target interaction. Biophys. Chem. 266, 106455 (2020). https://doi.org/10.1016/j.bpc.2020.106455
Moshkov, N. et al. Predicting compound activity from phenotypic profiles and chemical structures. Nat. Commun. 14, 1967 (2023). https://doi.org/10.1038/s41467-023-37570-1
Li, M. et al. Discovery of Keap1-Nrf2 small-molecule inhibitors from phytochemicals based on molecular docking. Food Chem. Toxicol. 133, 110758 (2019). https://doi.org/10.1016/j.fct.2019.110758
Shimizu, Y. et al. Identification of novel inhibitors of Keap1/Nrf2 by a promising method combining protein–protein interaction-oriented library and machine learning. Sci. Rep. 11, 7420 (2021). https://doi.org/10.1038/s41598-021-86616-1
Gęgotek, A. & Skrzydlewska, E. The role of transcription factor Nrf2 in skin cells metabolism. Arch. Dermatol. Res. 307, 385–396 (2015). https://doi.org/10.1007/s00403-015-1554-2
Guan, T., Bian, C. & Ma, Z. In vitro and in silico perspectives on the activation of antioxidant responsive element by citrus-derived flavonoids. Front. Nutr. 10, 1257172 (2023). https://doi.org/10.3389/fnut.2023.1257172
Abdul Hamid, Z. et al. Nephroprotective effects of Zingiber zerumbet Smith ethyl acetate extract against Paracetamol-induced nephrotoxicity and oxidative stress in rats. J. Zhejiang Univ. Sci. B. 13, 176–185 (2012). https://doi.org/10.1631/jzus.B1100133
Hamid, A. et al. Zingiber zerumbet L. (Smith) extract alleviates the ethanol-induced brain damage via its antioxidant activity. BMC Complement Altern. Med. 18, 101 (2018). https://doi.org/10.1186/s12906-018-2161-5
Jagetia, G. C. A review on the medicinal and pharmacological properties of traditional ethnomedicinal plant Sonapatha, Oroxylum Indicum. Sinusitis 5, 71–89 (2021). https://doi.org/10.3390/sinusitis5010009
Shi, L. et al. Baicalein and baicalin alleviate acetaminophen-induced liver injury by activating Nrf2 antioxidative pathway: the involvement of ERK1/2 and PKC. Biochem. Pharmacol. 150, 9–23 (2018). https://doi.org/10.1016/j.bcp.2018.01.026
Huang, C. S. et al. Protection by chrysin, apigenin, and luteolin against oxidative stress is mediated by the Nrf2-dependent up-regulation of heme oxygenase 1 and glutamate cysteine ligase in rat primary hepatocytes. Arch. Toxicol. 87, 167–178 (2013). https://doi.org/10.1007/s00204-012-0913-4
Roskoski, R. Properties of FDA-approved small molecule protein kinase inhibitors. Pharmacol. Res. 144, 19–50 (2019). https://doi.org/10.1016/j.phrs.2019.03.006
Newman, D. J. & Cragg, G. M. Natural products as sources of new drugs over the nearly four decades from 01/1981 to 09/2019. J. Nat. Prod. 83, 770–803 (2020). https://doi.org/10.1021/acs.jnatprod.9b01285
Ganesan, A. The impact of natural products upon modern drug discovery. Curr. Opin. Chem. Biol. 12, 306–317 (2008). https://doi.org/10.1016/j.cbpa.2008.03.016
National Center for Biotechnology Information. PubChem Compound Summary for CID 73568, Corilagin, (2024). https://pubchem.ncbi.nlm.nih.gov/compound/Corilagin
Chen, Y. S. et al. Identification of Beilschmiedia Tsangii root extract as a liver cancer cell-normal keratinocyte dual-selective NRF2 regulator. Antioxidants 10, 544 (2021). https://doi.org/10.3390/antiox10040544
Wu, H. C. et al. Chemical constituents with GNMT-promoter-enhancing and NRF2-Reduction activities from Taiwan agarwood excoecaria formosana. Molecules 25, 1746 (2020). https://doi.org/10.3390/molecules25071746
O’Boyle, N. M. & Sayle, R. A. Comparing structural fingerprints using a literature-based similarity benchmark. J. Cheminf. 8, 36 (2016). https://doi.org/10.1186/s13321-016-0148-0
Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455–461 (2010). https://doi.org/10.1002/jcc.21334
Best, R. B. et al. Optimization of the additive CHARMM all-atom protein force field targeting improved sampling of the backbone ϕ, ψ and side-chain χ1 and χ2 dihedral angles. J. Chem. Theory Comput. 8, 3257–3273 (2012). https://doi.org/10.1021/ct300400x
Soteras Gutiérrez, I. et al. Parametrization of halogen bonds in the CHARMM general force field: Improved treatment of ligand–protein interactions. Bioorg. Med. Chem.24, 4812–4825 (2016). https://doi.org/10.1016/j.bmc.2016.06.034
Daina, A., Michielin, O. & Zoete, V. SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules. Sci. Rep. 7, 42717 (2017). https://doi.org/10.1038/srep42717
Banerjee, P., Kemmler, E., Dunkel, M. & Preissner, R. ProTox 3.0: a webserver for the prediction of toxicity of chemicals. Nucleic Acids Res. 52, W513–W520 (2024). https://doi.org/10.1093/nar/gkae303
Acknowledgements
This research is funded by Vietnam National University HoChiMinh City (VNU-HCM) under grant number B2023-44-01.
Author information
Authors and Affiliations
Contributions
N.T.H.L, C.-H.Y., and T.Y.N.N conducted the in vitro experiments. N.T.H.L and B.Q.H.N performed the in silico screening. H.K.T.N identified the plant species. M.H.N performed LC-MS/MS and data analysis. M.H.N designed all experiments, supervised, and managed project. M.H.N, N.T.H.L, and B.Q.H. prepared figured 1-3. M.H.N, N.T.H.L and B.Q.H.N wrote the main text. All authors reviewed and agreed the final version of manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this Article was revised: The original version of this Article contained an error in Affiliation 1, which was incorrectly given as ' Institute for Research and Executive Education, The University of Danang-VN-UK, 41 Le Duan Street, Hai Chau 1 ward, Hai Chau District, Danang city, 50000, Vietnam'. Full information regarding the corrections made can be found in the correction for this Article.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Nguyen, B.Q.H., Le, N.T.H., Nguyen, T.Y.N. et al. Hybridisation of in silico and in vitro bioassays for studying the activation of Nrf2 by natural compounds. Sci Rep 14, 31222 (2024). https://doi.org/10.1038/s41598-024-82559-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-82559-5