Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Estimation of causal effects of genes on complex traits using a Bayesian-network-based framework applied to GWAS data

A preprint version of the article is available at medRxiv.

Abstract

Deciphering the relationships between genes and complex traits can enhance our understanding of phenotypic variations and disease mechanisms. However, determining the specific roles of individual genes and quantifying their direct and indirect causal effects on complex traits remains a significant challenge. Here we present a framework (called Bayesian network genome-wide association studies (BN-GWAS)) to decipher the total and direct causal effects of individual genes. BN-GWAS leverages imputed expression profiles from GWAS and raw expression data from a reference dataset to construct a directed gene–gene–phenotype causal network. It allows gene expression and disease traits to be evaluated in different samples, significantly improving the flexibility and applicability of the approach. It can be extended to decipher the joint causal network of two or more traits, and exhibits high specificity and precision (positive predictive value), making it particularly useful for selecting genes for follow-up studies. We verified the feasibility and validity of BN-GWAS by extensive simulations and applications to 52 traits across 14 tissues in the UK Biobank, revealing insights into their genetic architectures, including the relative contributions of direct, indirect and mediating causal genes. The identified (direct) causal genes were significantly enriched for genes highlighted in the Open Targets database. Overall, BN-GWAS provides a flexible and powerful framework for elucidating the genetic basis of complex traits through a systems-level, causal inference approach.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Workflow of the proposed causal inference framework based on GWAS data.
Fig. 2: Simulation results for total and direct causal effect estimation under different settings.
Fig. 3: Number and proportion of identified causally relevant genes for various traits in different tissues under our exploratory analysis.
Fig. 4: Examples of estimated causal graphs for studied traits.

Similar content being viewed by others

Data availability

UKBB data are available to any researcher who formally applies for the data. However, the data are not publicly available due to privacy concerns. GTEx RNA-seq data are publicly available via the GTEx portal at https://www.gtexportal.org/home/datasets. We used GTEx V7 RNA-seq data for our analysis. The Open Targets database is freely available via the Open Targets Platform at https://www.opentargets.org. Additional examples of the inferred causal graphs are available via GitHub at https://github.com/LiangyingYin/BayesianNetwork/tree/main/Causalgraphs.

Code availability

The source codes and R package to reproduce our experiments for this work are available via GitHub at https://github.com/LiangyingYin/BN-GWAS-Simulation and https://github.com/LiangyingYin/BN-GWAS under the GPL-3 license and via Zenodo at https://doi.org/10.5281/zenodo.10065706 (ref. 67) and https://doi.org/10.5281/zenodo.10068075 (ref. 68).

References

  1. Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).

    Article  Google Scholar 

  2. Ripke, S. et al. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).

    Google Scholar 

  3. Kiezun, A. et al. Exome sequencing and the genetic basis of complex traits. Nat. Genet. 44, 623–630 (2012).

    Article  Google Scholar 

  4. Zhu, X., Duren, Z. & Wong, W. H. Modeling regulatory network topology improves genome-wide analyses of complex human traits. Nat. Commun. 12, 2851 (2021).

    Article  Google Scholar 

  5. Burgess, S., Daniel, R. M., Butterworth, A. S., Thompson, S. G. & EPIC-InterAct Consortium Network Mendelian randomization: using genetic variants as instrumental variables to investigate mediation in causal pathways. Int. J. Epidemiol. 44, 484–495 (2015).

    Article  Google Scholar 

  6. Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).

    Article  Google Scholar 

  7. Wray, N. R., Wijmenga, C., Sullivan, P. F., Yang, J. & Visscher, P. M. Common disease is more complex than implied by the core gene omnigenic model. Cell 173, 1573–1580 (2018).

    Article  Google Scholar 

  8. Boyle, E. A., Li, Y. I. & Pritchard, J. K. The omnigenic model: response from the authors. J. Psychiatry Brain Sci. 2, S8 (2017).

  9. Liu, X., Li, Y. I. & Pritchard, J. K. Trans effects on gene expression can drive omnigenic inheritance. Cell 177, 1022–1034.e6 (2019).

    Article  Google Scholar 

  10. Liao, Z., Wang, Y., Qi, X. & Xiao, X. JAZF1, a relevant metabolic regulator in type 2 diabetes. Diabetes Metab. Res. Rev. 35, e3148 (2019).

    Article  Google Scholar 

  11. Zhang, H. Lysosomal acid lipase and lipid metabolism: new mechanisms, new questions, and new therapies. Curr. Opin. Lipidol. 29, 218–223 (2018).

    Article  Google Scholar 

  12. Evans, T. D. et al. Functional characterization of LIPA (lysosomal acid lipase) variants associated with coronary artery disease. Arterioscler. Thromb. Vasc. Biol. 39, 2480–2491 (2019).

    Article  Google Scholar 

  13. Yi, X., Ming, B., Wang, C., Chen, H. & Ma, C. Variants in COX-2, PTGIS, and TBXAS1 are associated with carotid artery or intracranial arterial stenosis and neurologic deterioration in ischemic stroke patients. J. Stroke Cerebrovasc. Dis. 26, 1128–1135 (2017).

    Article  Google Scholar 

  14. Davì, G. & Patrono, C. Platelet activation and atherothrombosis. N. Engl. J. Med. 357, 2482–2494 (2007).

    Article  Google Scholar 

  15. Zou, H., Chen, H., Zhou, Z., Wan, Y. & Liu, Z. ATXN3 promotes breast cancer metastasis by deubiquitinating KLF4. Cancer Lett. 467, 19–28 (2019).

    Article  Google Scholar 

  16. Sattar, N., McInnes, I. B. & McMurray, J. J. Obesity is a risk factor for severe COVID-19 infection: multiple potential mechanisms. Circulation 142, 4–6 (2020).

    Article  Google Scholar 

  17. White, J. et al. Association of lipid fractions with risks for coronary artery disease and diabetes. JAMA Cardiol. 1, 692–699 (2016).

    Article  Google Scholar 

  18. Riaz, H. et al. Association between obesity and cardiovascular outcomes: a systematic review and meta-analysis of Mendelian randomization studies. JAMA Netw. Open 1, e183788 (2018).

    Article  Google Scholar 

  19. Yeung, S. L. A., Luo, S. & Schooling, C. M. The impact of glycated hemoglobin (HbA1c) on cardiovascular disease risk: a Mendelian randomization study using UK Biobank. Diabetes Care 41, 1991–1997 (2018).

    Article  Google Scholar 

  20. Pai, J. K. et al. Hemoglobin A1c is associated with increased risk of incident coronary heart disease among apparently healthy, nondiabetic men and women. J. Am. Heart Assoc. 2, e000077 (2013).

    Article  Google Scholar 

  21. Weverling-Rijnsburger, A. W., Jonkers, I. J., Van Exel, E., Gussekloo, J. & Westendorp, R. G. High-density vs low-density lipoprotein cholesterol as the risk factor for coronary artery disease and stroke in old age. Arch. Intern. Med. 163, 1549–1554 (2003).

    Article  Google Scholar 

  22. Nikpay, M. & McPherson, R. Convergence of biomarkers and risk factor trait loci of coronary artery disease at 3p21.31 and HLA region. npj Genom. Med. 6, 12 (2021).

    Article  Google Scholar 

  23. Tontonoz, P. & Mangelsdorf, D. J. Liver X receptor signaling pathways in cardiovascular disease. Mol. Endocrinol. 17, 985–993 (2003).

    Article  Google Scholar 

  24. Lee, S. D. & Tontonoz, P. Liver X receptors at the intersection of lipid metabolism and atherogenesis. Atherosclerosis 242, 29–36 (2015).

    Article  Google Scholar 

  25. Calkin, A. C. & Tontonoz, P. Liver X receptor signaling pathways and atherosclerosis. Arterioscler. Thromb. Vasc. Biol. 30, 1513–1518 (2010).

    Article  Google Scholar 

  26. Cannon, M. V., van Gilst, W. H. & de Boer, R. A. Emerging role of liver X receptors in cardiac pathophysiology and heart failure. Basic Res. Cardiol. 111, 3 (2016).

    Article  Google Scholar 

  27. Tian, J. et al. Dasatinib sensitises triple negative breast cancer cells to chemotherapy by targeting breast cancer stem cells. Br. J. Cancer 119, 1495–1507 (2018).

    Article  Google Scholar 

  28. Xu, J., Shi, P., Li, H. & Zhou, J. Broad spectrum antiviral agent niclosamide and its therapeutic potential. ACS Infect. Dis. 6, 909–915 (2020).

    Article  Google Scholar 

  29. Braga, L. et al. Drugs that inhibit TMEM16 proteins block SARS-CoV-2 spike-induced syncytia. Nature 594, 88–93 (2021).

    Article  Google Scholar 

  30. Kunzelmann, K. Getting hands on a drug for Covid-19: inhaled and intranasal niclosamide. Lancet Reg. Health Eur. 4, 100094 (2021).

    Article  Google Scholar 

  31. US National Library of Medicine. ClinicalTrials.gov, https://clinicaltrials.gov/ct2/show/NCT04399356

  32. Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).

    Article  Google Scholar 

  33. Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).

    Article  Google Scholar 

  34. Lin, Z., Xue, H. & Pan, W. Combining Mendelian randomization and network deconvolution for inference of causal networks with GWAS summary data. PLoS Genet. 19, e1010762 (2023).

    Article  Google Scholar 

  35. Sanderson, E., Davey Smith, G., Windmeijer, F. & Bowden, J. An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings. Int. J. Epidemiol. 48, 713–727 (2019).

    Article  Google Scholar 

  36. Altay, G. & Emmert-Streib, F. Inferring the conservative causal core of gene regulatory networks. BMC Syst. Biol. 4, 132 (2010).

    Article  Google Scholar 

  37. Karlebach, G. & Shamir, R. Modelling and analysis of gene regulatory networks. Nat. Rev. Mol. Cell Biol. 9, 770–780 (2008).

    Article  Google Scholar 

  38. Maathuis, M. H., Colombo, D., Kalisch, M. & Bühlmann, P. Predicting causal effects in large-scale systems from observational data. Nat. Methods 7, 247–248 (2010).

    Article  Google Scholar 

  39. Võsa, U. et al. Unraveling the polygenic architecture of complex traits using blood eQTL metaanalysis. Preprint at BioRxiv https://doi.org/10.1101/447367 (2018).

  40. Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091 (2015).

    Article  Google Scholar 

  41. Bühlmann, P., Kalisch, M. & Maathuis, M. H. Variable selection in high-dimensional linear models: partially faithful distributions and the PC-simple algorithm. Biometrika 97, 261–278 (2010).

    Article  MathSciNet  Google Scholar 

  42. Spirtes, P. & Glymour, C. An algorithm for fast recovery of sparse causal graphs. Soc. Sci. Comput. Rev. 9, 62–72 (1991).

    Article  Google Scholar 

  43. Kalisch, M. & Bühlmann, P. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 8, 613–636 (2007).

    Google Scholar 

  44. Strobl, E. V. A constraint-based algorithm for causal discovery with cycles, latent variables and selection bias. Int. J. Data Sci. Anal. 8, 33–56 (2019).

    Article  Google Scholar 

  45. Park, K., Waldorp, L. J. & Ryan, O. Discovering cyclic causal models in psychological research. advances.in/psychology 2, e72425 (2024).

    Article  Google Scholar 

  46. Friedman, J., Hastie, T. & Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441 (2008).

    Article  Google Scholar 

  47. Witten, D. M., Friedman, J. H. & Simon, N. New insights and faster computations for the graphical lasso. J. Comput. Graphical Stat. 20, 892–900 (2011).

    Article  MathSciNet  Google Scholar 

  48. Meinshausen, N. & Bühlmann, P. High-dimensional graphs and variable selection with the lasso. Ann. Stat. 34, 1436–1462 (2006).

    Article  MathSciNet  Google Scholar 

  49. Pearl, J. Causality (Cambridge Univ. Press, 2009).

  50. Meek, C. Causal inference and causal explanation with background knowledge. In Proc. Eleventh Conference on Uncertainty in Artificial Intelligence 403–410 (AUAI Press, 1995).

  51. Peters, J., Mooij, J. M., Janzing, D. & Schölkopf, B. Causal discovery with continuous additive noise models. J. Mach. Learn. Res. 15, 2009–2053 (2014).

    MathSciNet  Google Scholar 

  52. Kalisch, M., Mächler, M., Colombo, D., Maathuis, M. H. & Bühlmann, P. Causal inference using graphical models with the R package pcalg. J. Stat. Softw. 47, 1–26 (2012).

    Article  Google Scholar 

  53. Maathuis, M. H., Kalisch, M. & Bühlmann, P. Estimating high-dimensional intervention effects from observational data. Ann. Stat. 37, 3133–3164 (2009).

    Article  MathSciNet  Google Scholar 

  54. Perković, E., Kalisch, M. & Maathuis, M. H. Interpreting and using CPDAGs with background knowledge. In Proc. 2017 Conference on Uncertainty in Artificial Intelligence 120 (AUAI Press, 2017).

  55. Nandy, P., Maathuis, M. H. & Richardson, T. S. Estimating the effect of joint interventions from observational data in sparse high-dimensional settings. Ann. Stat. 45, 647–674 (2017).

    Article  MathSciNet  Google Scholar 

  56. Lu, J. et al. Causal network inference from gene transcriptional time-series response to glucocorticoids. PLoS Comput. Biol. 17, e1008223 (2021).

    Article  Google Scholar 

  57. Sanchez-Castillo, M., Blanco, D., Tienda-Luna, I. M., Carrion, M. C. & Huang, Y. A Bayesian framework for the inference of gene regulatory networks from time and pseudo-time series data. Bioinformatics 34, 964–970 (2018).

    Article  Google Scholar 

  58. Hastie, T. & Qian, J. Glmnet vignette. Retrieved June 9, 1–30 (2014).

    Google Scholar 

  59. Meinshausen, N. & Bühlmann, P. Stability selection. J. R. Stat. Soc. B 72, 417–473 (2010).

    Article  MathSciNet  Google Scholar 

  60. Carvalho-Silva, D. et al. Open Targets Platform: new developments and updates two years on. Nucleic Acids Res. 47, D1056–D1065 (2019).

    Article  Google Scholar 

  61. Koscielny, G. et al. Open Targets: a platform for therapeutic target identification and validation. Nucleic Acids Res. 45, D985–D994 (2017).

    Article  Google Scholar 

  62. Wang, T. et al. OncoVar: an integrated database and analysis platform for oncogenic driver variants in cancers. Nucleic Acids Res. 49, D1289–D1301 (2021).

    Article  Google Scholar 

  63. Kamburov, A., Stelzl, U., Lehrach, H. & Herwig, R. The ConsensusPathDB interaction database: 2013 update. Nucleic Acids Res. 41, D793–D800 (2013).

    Article  Google Scholar 

  64. Chen, E. Y. et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinf. 14, 128 (2013).

    Article  Google Scholar 

  65. Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44, W90–W97 (2016).

    Article  Google Scholar 

  66. Licata, L. et al. SIGNOR 2.0, the SIGnaling Network Open Resource 2.0: 2019 update. Nucleic Acids Res. 48, D504–D510 (2020).

    Google Scholar 

  67. Yin, L. et al. Estimation of causal effects of genes on complex traits using a Bayesian network-based framework based on GWAS data. Zenodo https://doi.org/10.5281/zenodo.10065706 (2023).

  68. Yin, L. LiangyingYin/BN-GWAS-Simulation: v1.0.0 (v1.0.0). Zenodo https://doi.org/10.5281/zenodo.10068075 (2023).

Download references

Acknowledgements

This work was supported partially by a Theme-based Research Grant under grant no. T44-410/21-N from the Research Grants Council (H.-C.S), a National Natural Science Foundation China grant under grant no. 81971706 (H.-C.S), a National Natural Science Foundation China (NSFC) Young Scientist grant under grant no. 31900495 (H.-C.S), the Lo Kwee Seong Biomedical Research Fund from The Chinese University of Hong Kong and the KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research of Common Diseases, Kunming Institute of Zoology and The Chinese University of Hong Kong, China (H.-C.S). We also thank S. Tsui and C. Cao for useful discussions.

Author information

Authors and Affiliations

Authors

Contributions

L.Y. and Y.F. designed and implemented the investigations, contributed to the methodology development, analysed the data and wrote the paper. A.L. and J.Q. performed the data analyses. Y.S. implemented the R package BN-GWAS and provided relevant suggestions. P.-C.S. provided suggestions on the methodology and analyses, and interpretation of the results. H.-C.S. conceived and supervised the study, contributed to methodology development and interpretation of results, and wrote the paper. All authors reviewed, edited and approved the final paper.

Corresponding author

Correspondence to Hon-Cheong So.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks the anonymous reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Table 1 Performance evaluation on causal gene detection with the presence of genetic pleiotropy
Extended Data Table 2 The comparison between PC-simple and univariate test on direct causal gene detection with loops, bidirectional relationships and hidden variables
Extended Data Table 3 Results of peripheral(indirect causal) gene detection in different scenarios
Extended Data Table 4 Examples of genes with estimated causal effects on different traits
Extended Data Table 5 Results of split-half replication analysis
Extended Data Table 6 Results of stability selection for 4 traits in the whole blood tissue
Extended Data Table 7 Open targets enrichment analysis result based on overall association score for our proposed method
Extended Data Table 8 Top ranked novel genes identified from our proposed method compared with the univariate test under same p-value threshold
Extended Data Table 9 Highlights of literature support for identified causal genes for CAD, diabetes and COVID-19
Extended Data Table 10 Causal effects estimation for the top 10 ‘genes’ with largest absolute IDA in multiple traits analysis (BMI- > CAD)

Supplementary information

Supplementary Information

Supplementary Figs. 1–13, Tables 1, 4–6, 14–16, 22 and 23, captions for Supplementary Tables 2, 3, 7–13, 17–21 and 24 and text.

Reporting Summary

Supplementary Tables

Supplementary Tables 2, 3, 7–13, 17–21, 23 and 24.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yin, L., Feng, Y., Shi, Y. et al. Estimation of causal effects of genes on complex traits using a Bayesian-network-based framework applied to GWAS data. Nat Mach Intell 6, 1231–1244 (2024). https://doi.org/10.1038/s42256-024-00906-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-024-00906-7

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing