Abstract
Deciphering the relationships between genes and complex traits can enhance our understanding of phenotypic variations and disease mechanisms. However, determining the specific roles of individual genes and quantifying their direct and indirect causal effects on complex traits remains a significant challenge. Here we present a framework (called Bayesian network genome-wide association studies (BN-GWAS)) to decipher the total and direct causal effects of individual genes. BN-GWAS leverages imputed expression profiles from GWAS and raw expression data from a reference dataset to construct a directed gene–gene–phenotype causal network. It allows gene expression and disease traits to be evaluated in different samples, significantly improving the flexibility and applicability of the approach. It can be extended to decipher the joint causal network of two or more traits, and exhibits high specificity and precision (positive predictive value), making it particularly useful for selecting genes for follow-up studies. We verified the feasibility and validity of BN-GWAS by extensive simulations and applications to 52 traits across 14 tissues in the UK Biobank, revealing insights into their genetic architectures, including the relative contributions of direct, indirect and mediating causal genes. The identified (direct) causal genes were significantly enriched for genes highlighted in the Open Targets database. Overall, BN-GWAS provides a flexible and powerful framework for elucidating the genetic basis of complex traits through a systems-level, causal inference approach.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
27,99 € / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
118,99 € per year
only 9,92 € per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout




Similar content being viewed by others
Data availability
UKBB data are available to any researcher who formally applies for the data. However, the data are not publicly available due to privacy concerns. GTEx RNA-seq data are publicly available via the GTEx portal at https://www.gtexportal.org/home/datasets. We used GTEx V7 RNA-seq data for our analysis. The Open Targets database is freely available via the Open Targets Platform at https://www.opentargets.org. Additional examples of the inferred causal graphs are available via GitHub at https://github.com/LiangyingYin/BayesianNetwork/tree/main/Causalgraphs.
Code availability
The source codes and R package to reproduce our experiments for this work are available via GitHub at https://github.com/LiangyingYin/BN-GWAS-Simulation and https://github.com/LiangyingYin/BN-GWAS under the GPL-3 license and via Zenodo at https://doi.org/10.5281/zenodo.10065706 (ref. 67) and https://doi.org/10.5281/zenodo.10068075 (ref. 68).
References
Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).
Ripke, S. et al. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
Kiezun, A. et al. Exome sequencing and the genetic basis of complex traits. Nat. Genet. 44, 623–630 (2012).
Zhu, X., Duren, Z. & Wong, W. H. Modeling regulatory network topology improves genome-wide analyses of complex human traits. Nat. Commun. 12, 2851 (2021).
Burgess, S., Daniel, R. M., Butterworth, A. S., Thompson, S. G. & EPIC-InterAct Consortium Network Mendelian randomization: using genetic variants as instrumental variables to investigate mediation in causal pathways. Int. J. Epidemiol. 44, 484–495 (2015).
Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).
Wray, N. R., Wijmenga, C., Sullivan, P. F., Yang, J. & Visscher, P. M. Common disease is more complex than implied by the core gene omnigenic model. Cell 173, 1573–1580 (2018).
Boyle, E. A., Li, Y. I. & Pritchard, J. K. The omnigenic model: response from the authors. J. Psychiatry Brain Sci. 2, S8 (2017).
Liu, X., Li, Y. I. & Pritchard, J. K. Trans effects on gene expression can drive omnigenic inheritance. Cell 177, 1022–1034.e6 (2019).
Liao, Z., Wang, Y., Qi, X. & Xiao, X. JAZF1, a relevant metabolic regulator in type 2 diabetes. Diabetes Metab. Res. Rev. 35, e3148 (2019).
Zhang, H. Lysosomal acid lipase and lipid metabolism: new mechanisms, new questions, and new therapies. Curr. Opin. Lipidol. 29, 218–223 (2018).
Evans, T. D. et al. Functional characterization of LIPA (lysosomal acid lipase) variants associated with coronary artery disease. Arterioscler. Thromb. Vasc. Biol. 39, 2480–2491 (2019).
Yi, X., Ming, B., Wang, C., Chen, H. & Ma, C. Variants in COX-2, PTGIS, and TBXAS1 are associated with carotid artery or intracranial arterial stenosis and neurologic deterioration in ischemic stroke patients. J. Stroke Cerebrovasc. Dis. 26, 1128–1135 (2017).
Davì, G. & Patrono, C. Platelet activation and atherothrombosis. N. Engl. J. Med. 357, 2482–2494 (2007).
Zou, H., Chen, H., Zhou, Z., Wan, Y. & Liu, Z. ATXN3 promotes breast cancer metastasis by deubiquitinating KLF4. Cancer Lett. 467, 19–28 (2019).
Sattar, N., McInnes, I. B. & McMurray, J. J. Obesity is a risk factor for severe COVID-19 infection: multiple potential mechanisms. Circulation 142, 4–6 (2020).
White, J. et al. Association of lipid fractions with risks for coronary artery disease and diabetes. JAMA Cardiol. 1, 692–699 (2016).
Riaz, H. et al. Association between obesity and cardiovascular outcomes: a systematic review and meta-analysis of Mendelian randomization studies. JAMA Netw. Open 1, e183788 (2018).
Yeung, S. L. A., Luo, S. & Schooling, C. M. The impact of glycated hemoglobin (HbA1c) on cardiovascular disease risk: a Mendelian randomization study using UK Biobank. Diabetes Care 41, 1991–1997 (2018).
Pai, J. K. et al. Hemoglobin A1c is associated with increased risk of incident coronary heart disease among apparently healthy, nondiabetic men and women. J. Am. Heart Assoc. 2, e000077 (2013).
Weverling-Rijnsburger, A. W., Jonkers, I. J., Van Exel, E., Gussekloo, J. & Westendorp, R. G. High-density vs low-density lipoprotein cholesterol as the risk factor for coronary artery disease and stroke in old age. Arch. Intern. Med. 163, 1549–1554 (2003).
Nikpay, M. & McPherson, R. Convergence of biomarkers and risk factor trait loci of coronary artery disease at 3p21.31 and HLA region. npj Genom. Med. 6, 12 (2021).
Tontonoz, P. & Mangelsdorf, D. J. Liver X receptor signaling pathways in cardiovascular disease. Mol. Endocrinol. 17, 985–993 (2003).
Lee, S. D. & Tontonoz, P. Liver X receptors at the intersection of lipid metabolism and atherogenesis. Atherosclerosis 242, 29–36 (2015).
Calkin, A. C. & Tontonoz, P. Liver X receptor signaling pathways and atherosclerosis. Arterioscler. Thromb. Vasc. Biol. 30, 1513–1518 (2010).
Cannon, M. V., van Gilst, W. H. & de Boer, R. A. Emerging role of liver X receptors in cardiac pathophysiology and heart failure. Basic Res. Cardiol. 111, 3 (2016).
Tian, J. et al. Dasatinib sensitises triple negative breast cancer cells to chemotherapy by targeting breast cancer stem cells. Br. J. Cancer 119, 1495–1507 (2018).
Xu, J., Shi, P., Li, H. & Zhou, J. Broad spectrum antiviral agent niclosamide and its therapeutic potential. ACS Infect. Dis. 6, 909–915 (2020).
Braga, L. et al. Drugs that inhibit TMEM16 proteins block SARS-CoV-2 spike-induced syncytia. Nature 594, 88–93 (2021).
Kunzelmann, K. Getting hands on a drug for Covid-19: inhaled and intranasal niclosamide. Lancet Reg. Health Eur. 4, 100094 (2021).
US National Library of Medicine. ClinicalTrials.gov, https://clinicaltrials.gov/ct2/show/NCT04399356
Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
Lin, Z., Xue, H. & Pan, W. Combining Mendelian randomization and network deconvolution for inference of causal networks with GWAS summary data. PLoS Genet. 19, e1010762 (2023).
Sanderson, E., Davey Smith, G., Windmeijer, F. & Bowden, J. An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings. Int. J. Epidemiol. 48, 713–727 (2019).
Altay, G. & Emmert-Streib, F. Inferring the conservative causal core of gene regulatory networks. BMC Syst. Biol. 4, 132 (2010).
Karlebach, G. & Shamir, R. Modelling and analysis of gene regulatory networks. Nat. Rev. Mol. Cell Biol. 9, 770–780 (2008).
Maathuis, M. H., Colombo, D., Kalisch, M. & Bühlmann, P. Predicting causal effects in large-scale systems from observational data. Nat. Methods 7, 247–248 (2010).
Võsa, U. et al. Unraveling the polygenic architecture of complex traits using blood eQTL metaanalysis. Preprint at BioRxiv https://doi.org/10.1101/447367 (2018).
Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091 (2015).
Bühlmann, P., Kalisch, M. & Maathuis, M. H. Variable selection in high-dimensional linear models: partially faithful distributions and the PC-simple algorithm. Biometrika 97, 261–278 (2010).
Spirtes, P. & Glymour, C. An algorithm for fast recovery of sparse causal graphs. Soc. Sci. Comput. Rev. 9, 62–72 (1991).
Kalisch, M. & Bühlmann, P. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 8, 613–636 (2007).
Strobl, E. V. A constraint-based algorithm for causal discovery with cycles, latent variables and selection bias. Int. J. Data Sci. Anal. 8, 33–56 (2019).
Park, K., Waldorp, L. J. & Ryan, O. Discovering cyclic causal models in psychological research. advances.in/psychology 2, e72425 (2024).
Friedman, J., Hastie, T. & Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441 (2008).
Witten, D. M., Friedman, J. H. & Simon, N. New insights and faster computations for the graphical lasso. J. Comput. Graphical Stat. 20, 892–900 (2011).
Meinshausen, N. & Bühlmann, P. High-dimensional graphs and variable selection with the lasso. Ann. Stat. 34, 1436–1462 (2006).
Pearl, J. Causality (Cambridge Univ. Press, 2009).
Meek, C. Causal inference and causal explanation with background knowledge. In Proc. Eleventh Conference on Uncertainty in Artificial Intelligence 403–410 (AUAI Press, 1995).
Peters, J., Mooij, J. M., Janzing, D. & Schölkopf, B. Causal discovery with continuous additive noise models. J. Mach. Learn. Res. 15, 2009–2053 (2014).
Kalisch, M., Mächler, M., Colombo, D., Maathuis, M. H. & Bühlmann, P. Causal inference using graphical models with the R package pcalg. J. Stat. Softw. 47, 1–26 (2012).
Maathuis, M. H., Kalisch, M. & Bühlmann, P. Estimating high-dimensional intervention effects from observational data. Ann. Stat. 37, 3133–3164 (2009).
Perković, E., Kalisch, M. & Maathuis, M. H. Interpreting and using CPDAGs with background knowledge. In Proc. 2017 Conference on Uncertainty in Artificial Intelligence 120 (AUAI Press, 2017).
Nandy, P., Maathuis, M. H. & Richardson, T. S. Estimating the effect of joint interventions from observational data in sparse high-dimensional settings. Ann. Stat. 45, 647–674 (2017).
Lu, J. et al. Causal network inference from gene transcriptional time-series response to glucocorticoids. PLoS Comput. Biol. 17, e1008223 (2021).
Sanchez-Castillo, M., Blanco, D., Tienda-Luna, I. M., Carrion, M. C. & Huang, Y. A Bayesian framework for the inference of gene regulatory networks from time and pseudo-time series data. Bioinformatics 34, 964–970 (2018).
Hastie, T. & Qian, J. Glmnet vignette. Retrieved June 9, 1–30 (2014).
Meinshausen, N. & Bühlmann, P. Stability selection. J. R. Stat. Soc. B 72, 417–473 (2010).
Carvalho-Silva, D. et al. Open Targets Platform: new developments and updates two years on. Nucleic Acids Res. 47, D1056–D1065 (2019).
Koscielny, G. et al. Open Targets: a platform for therapeutic target identification and validation. Nucleic Acids Res. 45, D985–D994 (2017).
Wang, T. et al. OncoVar: an integrated database and analysis platform for oncogenic driver variants in cancers. Nucleic Acids Res. 49, D1289–D1301 (2021).
Kamburov, A., Stelzl, U., Lehrach, H. & Herwig, R. The ConsensusPathDB interaction database: 2013 update. Nucleic Acids Res. 41, D793–D800 (2013).
Chen, E. Y. et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinf. 14, 128 (2013).
Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44, W90–W97 (2016).
Licata, L. et al. SIGNOR 2.0, the SIGnaling Network Open Resource 2.0: 2019 update. Nucleic Acids Res. 48, D504–D510 (2020).
Yin, L. et al. Estimation of causal effects of genes on complex traits using a Bayesian network-based framework based on GWAS data. Zenodo https://doi.org/10.5281/zenodo.10065706 (2023).
Yin, L. LiangyingYin/BN-GWAS-Simulation: v1.0.0 (v1.0.0). Zenodo https://doi.org/10.5281/zenodo.10068075 (2023).
Acknowledgements
This work was supported partially by a Theme-based Research Grant under grant no. T44-410/21-N from the Research Grants Council (H.-C.S), a National Natural Science Foundation China grant under grant no. 81971706 (H.-C.S), a National Natural Science Foundation China (NSFC) Young Scientist grant under grant no. 31900495 (H.-C.S), the Lo Kwee Seong Biomedical Research Fund from The Chinese University of Hong Kong and the KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research of Common Diseases, Kunming Institute of Zoology and The Chinese University of Hong Kong, China (H.-C.S). We also thank S. Tsui and C. Cao for useful discussions.
Author information
Authors and Affiliations
Contributions
L.Y. and Y.F. designed and implemented the investigations, contributed to the methodology development, analysed the data and wrote the paper. A.L. and J.Q. performed the data analyses. Y.S. implemented the R package BN-GWAS and provided relevant suggestions. P.-C.S. provided suggestions on the methodology and analyses, and interpretation of the results. H.-C.S. conceived and supervised the study, contributed to methodology development and interpretation of results, and wrote the paper. All authors reviewed, edited and approved the final paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks the anonymous reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Supplementary information
Supplementary Information
Supplementary Figs. 1–13, Tables 1, 4–6, 14–16, 22 and 23, captions for Supplementary Tables 2, 3, 7–13, 17–21 and 24 and text.
Supplementary Tables
Supplementary Tables 2, 3, 7–13, 17–21, 23 and 24.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yin, L., Feng, Y., Shi, Y. et al. Estimation of causal effects of genes on complex traits using a Bayesian-network-based framework applied to GWAS data. Nat Mach Intell 6, 1231–1244 (2024). https://doi.org/10.1038/s42256-024-00906-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-024-00906-7