Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Fine-mapping across diverse ancestries drives the discovery of putative causal variants underlying human complex traits and diseases

Abstract

Genome-wide association studies (GWAS) of human complex traits or diseases often implicate genetic loci that span hundreds or thousands of genetic variants, many of which have similar statistical significance. While statistical fine-mapping in individuals of European ancestry has made important discoveries, cross-population fine-mapping has the potential to improve power and resolution by capitalizing on the genomic diversity across ancestries. Here we present SuSiEx, an accurate and computationally efficient method for cross-population fine-mapping. SuSiEx integrates data from an arbitrary number of ancestries, explicitly models population-specific allele frequencies and linkage disequilibrium patterns, accounts for multiple causal variants in a genomic region and can be applied to GWAS summary statistics. We comprehensively assessed the performance of SuSiEx using simulations. We further showed that SuSiEx improves the fine-mapping of a range of quantitative traits available in both the UK Biobank and Taiwan Biobank, and improves the fine-mapping of schizophrenia-associated loci by integrating GWAS across East Asian and European ancestries.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of fine-mapping methods.
Fig. 2: The performance of SuSiEx in simulations.
Fig. 3: Comparison of SuSiEx, PAINTOR and MsCAVIAR in the simulations.
Fig. 4: Cross-population fine-mapping analysis in biobanks.
Fig. 5: SuSiEx identifies variants missed in single-population fine-mapping.
Fig. 6: Fine-mapping of schizophrenia risk loci across EUR and EAS populations.

Similar content being viewed by others

Data availability

Publicly available data are available from the following sites: 1000 Genomes Project Phase 3 reference panels: https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3.html; genetic map for each subpopulation: https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130507_omni_recombination_rates; Pan-UKB summary statistics: https://pan.ukbb.broadinstitute.org/downloads. Individual-level genotypes for the UKB samples were obtained under application no. 32568; the TWB data used in this study contain protected health information and are thus under controlled access. Request for data access can be made to the TWB (https://www.twbiobank.org.tw/). The PGC schizophrenia GWAS can be found at https://pgc.unc.edu/for-researchers/download-results. The in-sample LD of individuals with EUR and EAS ancestry for the PGC schizophrenia analysis were obtained from the Schizophrenia Working Group of the PGC. The Ensembl variant effect predictor can be found at https://ftp.ensembl.org/pub/release-95/.

Code availability

The code used in this study is available from the following websites: SuSiEx (v.1.1.2): https://github.com/getian107/SuSiEx (https://doi.org/10.5281/zenodo.11211744)41; PAINTOR (v.3.0): https://github.com/gkichaev/PAINTOR_V3.0; MsCAVIAR (v.0.1): https://github.com/nlapier2/MsCAVIAR; HAPGEN2 (v.2.2.0): https://mathgen.stats.ox.ac.uk/genetics_software/hapgen/hapgen2.html; FlashPCA2 (v.2.0): https://github.com/gabraham/flashpca; KING (v.2.3.2): https://www.kingrelatedness.com/index.shtml; PLINK 1.90 (v.1.9): https://www.cog-genomics.org/plink; LDmergeFM: https://github.com/Pintaius/LDmergeFM; and METASOFT (v.2.0.1): https://zarlab.cs.ucla.edu/software/.

References

  1. Huang, H. et al. Fine-mapping inflammatory bowel disease loci to single-variant resolution. Nature 547, 173–178 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Maller, J. B. et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 44, 1294–1301 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Cortes, A. et al. Identification of multiple risk variants for ankylosing spondylitis through high-density genotyping of immune-related loci. Nat. Genet. 45, 730–738 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Trubetskoy, V. et al. Mapping genomic loci implicates genes and synaptic biology in schizophrenia. Nature 604, 502–508 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Mahajan, A. et al. Multi-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation. Nat. Genet. 54, 560–572 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Kanai, M. et al. Meta-analysis fine-mapping is often miscalibrated at single-variant resolution. Cell Genom. 2, 100210 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Kanai, M. et al. Insights from complex trait fine-mapping across diverse populations. Preprint at medRxiv https://doi.org/10.1101/2021.09.03.21262975 (2021).

  8. LaPierre, N. et al. Identifying causal variants by fine mapping across multiple studies. PLoS Genet. 17, e1009733 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Kichaev, G. & Pasaniuc, B. Leveraging functional-annotation data in trans-ethnic fine-mapping studies. Am. J. Hum. Genet. 97, 260–271 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Wyss, A. B. et al. Multiethnic meta-analysis identifies ancestry-specific and cross-ancestry loci for pulmonary function. Nat. Commun. 9, 2976 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Gharahkhani, P. et al. Genome-wide meta-analysis identifies 127 open-angle glaucoma loci with consistent effect across ancestries. Nat. Commun. 12, 1258 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Robertson, C. C. et al. Fine-mapping, trans-ancestral and genomic analyses identify causal variants, cells, genes and drug targets for type 1 diabetes. Nat. Genet. 53, 962–971 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Series B Stat. Methodol. 82, 1273–1300 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Feng, Y.-C. A. et al. Taiwan Biobank: a rich biomedical research database of the Taiwanese population. Cell Genom. 2, 100197 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Mitchell, T. J. & Beauchamp, J. J. Bayesian variable selection in linear regression. J. Am. Stat. Assoc. 83, 1023–1032 (1988).

    Article  Google Scholar 

  17. George, E. I. & McCulloch, R. E. Approaches for Bayesian variable selection. Stat. Sin. 7, 339–373 (1997).

    Google Scholar 

  18. Su, Z., Marchini, J. & Donnelly, P. HAPGEN2: simulation of multiple disease SNPs. Bioinformatics 27, 2304–2305 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  PubMed  Google Scholar 

  20. Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, e1004722 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Ulirsch, J. C. et al. Interrogation of human hematopoiesis at single-cell and single-variant resolution. Nat. Genet. 51, 683–693 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Weissbrod, O. et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat. Genet. 52, 1355–1363 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Ulirsch, J. C. Identification and Interpretation of Causal Genetic Variants Underlying Human Phenotypes. Doctoral thesis, Harvard Univ. Graduate School of Arts and Sciences (2022).

  24. Chen, C.-Y. et al. Analysis across Taiwan Biobank, Biobank Japan and UK Biobank identifies hundreds of novel loci for 36 quantitative traits. Cell Genom. 3, 100436 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Li, H. Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics 30, 2843–2851 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Benner, C. et al. Prospects of fine-mapping trait-associated genomic regions by using summary statistics from genome-wide association studies. Am. J. Hum. Genet. 101, 539–551 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Zhou, W. et al. Global Biobank Meta-analysis Initiative: powering genetic discovery across human disease. Cell Genom. 2, 100192 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Wang, Q. S. et al. Leveraging supervised learning for functionally informed fine-mapping of cis-eQTLs identifies an additional 20,913 putative causal eQTLs. Nat. Commun. 12, 3394 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Hou, Y. et al. Schizophrenia-associated rs4702 G allele-specific downregulation of FURIN expression by miR-338-3p reduces BDNF production. Schizophr. Res. 199, 176–180 (2018).

    Article  PubMed  Google Scholar 

  34. Schrode, N. et al. Synergistic effects of common schizophrenia risk variants. Nat. Genet. 51, 1475–1485 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Abraham, G., Qiu, Y. & Inouye, M. FlashPCA2: principal component analysis of Biobank-scale genotype datasets. Bioinformatics 33, 2776–2778 (2017).

    Article  CAS  PubMed  Google Scholar 

  38. Han, B. & Eskin, E. Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. Am. J. Hum. Genet. 88, 586–598 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Ge, T. & Yuan, K. getian107/SuSiEx: SuSiEx-v1.1.2 (v1.1.2). Zenodo https://doi.org/10.5281/zenodo.11211744 (2024).

Download references

Acknowledgements

We thank M. Kanai for helpful discussions. The UKB EUR and AFR GWAS summary statistics were obtained from the Pan-UKB project. H.H. acknowledges support from National Institute of Diabetes and Digestive and Kidney Diseases (nos K01DK114379 and R01DK129364), National Institute of Mental Health (NIMH) (nos U01MH109539 and R01MH130675), Brain and Behavior Research Foundation Young Investigator Grant (no. 28450), the Zhengxu and Ying He Foundation, and the Stanley Center for Psychiatric Research. T.G. is supported by a National Human Genome Research Institute grant no. R01HG012354. A.F.P. was supported by the Academy of Medical Sciences ‘Springboard’ award (no. SBF005\1083). M.C.O. acknowledges support from Medical Research Council grants to Cardiff University: center (no. MR/L010305/1), program (no. MR/P005748/1) and project (nos MR/L011794/1, MC_PC_17212). Y.-F.L. is supported by the National Health Research Institutes (nos NP-110, 111, 112-PP-09) and the National Science and Technology Council (Ministry of Science and Technology nos 109-2314-B-400-017 and 110-2314-B-400-028-MY3) of Taiwan. Y.-C.A.F. acknowledges support from the National Science and Technology Council (no. 112-2314-B-002-200-MY3) and the Ministry of Education (the Yushan Young Fellow Program, no. MOE-111-YSFAG-0003-001-P1; the Population Health Research Center from Featured Areas Research Center Program within the framework of the Higher Education Sprout Project, no. NTU-113L9004). B.M.N. acknowledges support from the NIMH (nos R37MH107649 and R01MH101244). The Schizophrenia Workgroup of Psychiatric Genomics Consortium acknowledges support from the NIMH (no. R01MH124873).

Author information

Authors and Affiliations

Authors

Consortia

Contributions

T.G. and H.H. designed and supervised the project. T.G. developed the statistical methods. K.Y. and T.G. programmed the code for SuSiEx. K.Y., R.J.L. and T.G. conducted the simulation studies. K.Y., R.J.L. and M.Y. performed the biobank fine-mapping analysis. T.-T.C., S.-C.L. and Y.-F.L. performed the analysis in the TWB. Y.-C.A.F., Y.-F.L. and C.-Y.C. supervised the work in the TWB. K.Y. and A.F.P. performed the analysis of the cohorts with schizophrenia. M.J.D. and B.M.N. provided critical suggestions for the study design. M.Y., Y.C., M.L., R.L. and Y.X. took part in the testing of the code. M.C.O. and Z.G. made substantial contributions to the generation and management of the schizophrenia data. W.S. and C.S. provided support to the computational infrastructure. K.Y., T.G. and H.H. wrote the manuscript. All authors reviewed and approved the final version of the manuscript.

Corresponding authors

Correspondence to Tian Ge or Hailiang Huang.

Ethics declarations

Competing interests

W.S. and C.S. are employees of Digital Health China Technologies. M.J.D. is a founder of Maze Therapeutics. B.M.N. is a member of the scientific advisory board at Deep Genomics and Neumora. C.-Y.C. is an employee of Biogen. R.J.L. is an employee of Ionis Pharmaceuticals. M.C.O. was supported by a collaborative research grant from Takeda Pharmaceuticals for a project unrelated to work presented in this article. Takeda played no part in the conception, design, implementation or interpretation of this study. H.H. received consultancy fees from Ono Pharmaceutical and an honorarium from Xian Janssen Pharmaceutical. The other authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Wenan Chen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Schematic illustration of the meta-analysis-based fine-mapping method, single-population combining method, and SuSiEx.

All panels were created following the LocusZoom style. Variant positions are shown on the x-axis. The gold diamond for each locus represents the lead (most associated) variant. The association strengths for other variants are colored by descending degrees of linkage disequilibrium (LD) with the lead variant (ordered red, orange, green, and blue dots). The purple bars represent the posterior inclusion probabilities (PIPs) inferred by fine-mapping methods. The light gray boxes represent the credible sets estimated by fine-mapping. a, Example of a strong causal signal shared across populations. b, Example of a weak causal signal shared across populations. c, Example of a population-specific causal signal.

Extended Data Fig. 2 Comparison of SuSiEx, PAINTOR and MsCAVIAR in simulations.

a, The job completion summary for the three Bayesian fine-mapping methods using different parameters and input datasets. Red represents jobs taking longer than 24 hours. Yellow represents jobs returning unreasonable results, defined as the sum of PIPs across variants in the genomic locus >5 or <0.1 (1 is expected). Green represents jobs that were completed within 24 hours and returned reasonable results. The lower panel represents different sample size combinations of the discovery GWAS. P-values are from linear regression with no multiple testing correction applied. b, Number of identified true causal SNPs with PIP > 0.5 (x-axis) versus the coverage of the credible sets (y-axis) for different input datasets and fine-mapping methods. Color represents the combination of discovery populations, the size of the symbols represents the total discovery sample size, and the shape of the symbols represents different methods and parameters. Only simulation runs that were completed within 24 hours and returned reasonable results were included.

Extended Data Fig. 3 Examples of the improvement of SuSiEx over single-population fine-mapping in the biobank analysis.

Each of the three sub-figures consists of eight panels, which are aligned vertically, with the x-axis showing the genomic position. The top six panels visualize GWAS association statistics and single-population fine-mapping results within the European (Pan-UKBB European), African (Pan-UKBB African) and East Asian (Taiwan biobank) populations. For association statistics, the left y-axis shows the −log10(P-value) of each SNP. The color represents the descending degrees of LD with the lead SNP (from red, orange to blue). The right y-axis shows the recombination rate in centimorgan per Megabase. The solid line indicates the population-specific recombination maps obtained from the 1000 Genomes Project. Different colors are used to distinguish different credible sets in the fine-mapping results. The second to bottom panel visualizes the results from SuSiEx. ‘Null’ indicates that single-population fine-mapping did not obtain any reliable credible set. The bottom panel shows gene annotations, if any. P-values are from linear regression with no multiple testing correction applied. a, Association with albumin on chr8:9,170,000-9,190,000, an example of a strong causal signal shared across populations. b, Association with platelets count on chr12:104,900,000-105,050,000, an example of a weak causal signal shared across populations. c, Association with albumin on chr12:13,100,000-13,400,000, an example of population-specific causal signals.

Extended Data Fig. 4 Association with total bilirubin on chr11: 5,100,000-5,700,000.

Panels are aligned vertically, with the x-axis showing the genomic position. The top six panels visualize GWAS association statistics and single-population fine-mapping results of the European (Pan-UKBB European), African (Pan-UKBB African) and East Asian (Taiwan biobank) populations following the LocusZoom style. The second to bottom panel visualizes the fine-mapping results from SuSiEx, which integrated GWAS summary statistics from the three populations. The bottom panel shows gene annotations. For GWAS panels, the left y-axis shows the −log10(P-value) of each SNP. The gray horizontal dash line represents the genome-wide significance threshold (P = 5 × 10−8). The purple rectangle for each locus represents the lead (most associated) variant. Variants are colored by descending LD with the lead variant (ordered red, orange, green, light blue, and dark blue dots). For fine-mapping panels, different colors are used to distinguish different credible sets. The diamond represents the variant with the maximum PIP in each credible set. The left y-axis shows the PIP from fine-mapping, and the right y-axis shows the recombination map obtained from the 1000 Genomes Project. For the SuSiEx panel, the average recombination rate across three populations is used. P-values are from linear regression with no multiple testing correction applied.

Extended Data Fig. 5 Association with albumin on chr13: 31,150,000-31,450,000.

Panels are aligned vertically, with the x-axis showing the genomic position. The top six panels visualize GWAS association statistics and single-population fine-mapping results of the European (Pan-UKBB European), African (Pan-UKBB African) and East Asian (Taiwan biobank) populations following the LocusZoom style. The second to bottom panel visualizes the fine-mapping results from SuSiEx, which integrated GWAS summary statistics from the three populations. The bottom panel shows gene annotations. For GWAS panels, the left y-axis shows the −log10(P-value) of each SNP. The gray horizontal dash line represents the genome-wide significance threshold (P = 5 × 10−8). The purple rectangle for each locus represents the lead (most associated) variant. Variants are colored by descending LD with the lead variant (ordered red, orange, green, light blue, and dark blue dots). For fine-mapping panels, different colors are used to distinguish different credible sets. The diamond represents the variant with the maximum PIP in each credible set. The left y-axis shows the PIP from fine-mapping, and the right y-axis shows the recombination map obtained from the 1000 Genomes Project. For the SuSiEx panel, the average recombination rate across three populations is used. P-values are from linear regression with no multiple testing correction applied.

Extended Data Fig. 6 Proportion of variants showing quality issues binned by the drop in PIP between single- and multi-population fine-mapping.

Quality issues were defined as (i) the best PIP variant is in the low complexity region; (ii) the best PIP variant is in allelic imbalance or violates Hardy Weinberg equilibrium in gnomAD; or (iii) the best PIP variant is multi-allelic or colocalizes with indels at the same genomic position, which might influence imputation quality.

Extended Data Fig. 7 Proportion of variants with high/moderate functional impact in cross-population biobank fine-mapping analyses.

The functional impact of each variant was annotated using VEP, with the definition and classification of functional impact obtained from https://useast.ensembl.org/info/genome/variation/prediction/predicted_data.html. The high impact category includes transcript ablation, splice acceptor variants and splice donor variants; the moderate impact category includes missense variants and protein-altering variants; the low impact category includes synonymous variants and splice region variants; the modifier impact category includes introns and intergenic variants among others.

Supplementary information

Supplementary Information

Supplementary Note and Figs. 1–44.

Reporting Summary

Peer Review File

Supplementary Tables

Supplementary Tables 1–27.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yuan, K., Longchamps, R.J., Pardiñas, A.F. et al. Fine-mapping across diverse ancestries drives the discovery of putative causal variants underlying human complex traits and diseases. Nat Genet 56, 1841–1850 (2024). https://doi.org/10.1038/s41588-024-01870-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-024-01870-z

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing