Abstract
Genome-wide association studies (GWAS) of human complex traits or diseases often implicate genetic loci that span hundreds or thousands of genetic variants, many of which have similar statistical significance. While statistical fine-mapping in individuals of European ancestry has made important discoveries, cross-population fine-mapping has the potential to improve power and resolution by capitalizing on the genomic diversity across ancestries. Here we present SuSiEx, an accurate and computationally efficient method for cross-population fine-mapping. SuSiEx integrates data from an arbitrary number of ancestries, explicitly models population-specific allele frequencies and linkage disequilibrium patterns, accounts for multiple causal variants in a genomic region and can be applied to GWAS summary statistics. We comprehensively assessed the performance of SuSiEx using simulations. We further showed that SuSiEx improves the fine-mapping of a range of quantitative traits available in both the UK Biobank and Taiwan Biobank, and improves the fine-mapping of schizophrenia-associated loci by integrating GWAS across East Asian and European ancestries.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
27,99 € / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
209,00 € per year
only 17,42 € per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Data availability
Publicly available data are available from the following sites: 1000 Genomes Project Phase 3 reference panels: https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3.html; genetic map for each subpopulation: https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130507_omni_recombination_rates; Pan-UKB summary statistics: https://pan.ukbb.broadinstitute.org/downloads. Individual-level genotypes for the UKB samples were obtained under application no. 32568; the TWB data used in this study contain protected health information and are thus under controlled access. Request for data access can be made to the TWB (https://www.twbiobank.org.tw/). The PGC schizophrenia GWAS can be found at https://pgc.unc.edu/for-researchers/download-results. The in-sample LD of individuals with EUR and EAS ancestry for the PGC schizophrenia analysis were obtained from the Schizophrenia Working Group of the PGC. The Ensembl variant effect predictor can be found at https://ftp.ensembl.org/pub/release-95/.
Code availability
The code used in this study is available from the following websites: SuSiEx (v.1.1.2): https://github.com/getian107/SuSiEx (https://doi.org/10.5281/zenodo.11211744)41; PAINTOR (v.3.0): https://github.com/gkichaev/PAINTOR_V3.0; MsCAVIAR (v.0.1): https://github.com/nlapier2/MsCAVIAR; HAPGEN2 (v.2.2.0): https://mathgen.stats.ox.ac.uk/genetics_software/hapgen/hapgen2.html; FlashPCA2 (v.2.0): https://github.com/gabraham/flashpca; KING (v.2.3.2): https://www.kingrelatedness.com/index.shtml; PLINK 1.90 (v.1.9): https://www.cog-genomics.org/plink; LDmergeFM: https://github.com/Pintaius/LDmergeFM; and METASOFT (v.2.0.1): https://zarlab.cs.ucla.edu/software/.
References
Huang, H. et al. Fine-mapping inflammatory bowel disease loci to single-variant resolution. Nature 547, 173–178 (2017).
Maller, J. B. et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 44, 1294–1301 (2012).
Cortes, A. et al. Identification of multiple risk variants for ankylosing spondylitis through high-density genotyping of immune-related loci. Nat. Genet. 45, 730–738 (2013).
Trubetskoy, V. et al. Mapping genomic loci implicates genes and synaptic biology in schizophrenia. Nature 604, 502–508 (2022).
Mahajan, A. et al. Multi-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation. Nat. Genet. 54, 560–572 (2022).
Kanai, M. et al. Meta-analysis fine-mapping is often miscalibrated at single-variant resolution. Cell Genom. 2, 100210 (2022).
Kanai, M. et al. Insights from complex trait fine-mapping across diverse populations. Preprint at medRxiv https://doi.org/10.1101/2021.09.03.21262975 (2021).
LaPierre, N. et al. Identifying causal variants by fine mapping across multiple studies. PLoS Genet. 17, e1009733 (2021).
Kichaev, G. & Pasaniuc, B. Leveraging functional-annotation data in trans-ethnic fine-mapping studies. Am. J. Hum. Genet. 97, 260–271 (2015).
Wyss, A. B. et al. Multiethnic meta-analysis identifies ancestry-specific and cross-ancestry loci for pulmonary function. Nat. Commun. 9, 2976 (2018).
Gharahkhani, P. et al. Genome-wide meta-analysis identifies 127 open-angle glaucoma loci with consistent effect across ancestries. Nat. Commun. 12, 1258 (2021).
Robertson, C. C. et al. Fine-mapping, trans-ancestral and genomic analyses identify causal variants, cells, genes and drug targets for type 1 diabetes. Nat. Genet. 53, 962–971 (2021).
Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Series B Stat. Methodol. 82, 1273–1300 (2020).
Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
Feng, Y.-C. A. et al. Taiwan Biobank: a rich biomedical research database of the Taiwanese population. Cell Genom. 2, 100197 (2022).
Mitchell, T. J. & Beauchamp, J. J. Bayesian variable selection in linear regression. J. Am. Stat. Assoc. 83, 1023–1032 (1988).
George, E. I. & McCulloch, R. E. Approaches for Bayesian variable selection. Stat. Sin. 7, 339–373 (1997).
Su, Z., Marchini, J. & Donnelly, P. HAPGEN2: simulation of multiple disease SNPs. Bioinformatics 27, 2304–2305 (2011).
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, e1004722 (2014).
Ulirsch, J. C. et al. Interrogation of human hematopoiesis at single-cell and single-variant resolution. Nat. Genet. 51, 683–693 (2019).
Weissbrod, O. et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat. Genet. 52, 1355–1363 (2020).
Ulirsch, J. C. Identification and Interpretation of Causal Genetic Variants Underlying Human Phenotypes. Doctoral thesis, Harvard Univ. Graduate School of Arts and Sciences (2022).
Chen, C.-Y. et al. Analysis across Taiwan Biobank, Biobank Japan and UK Biobank identifies hundreds of novel loci for 36 quantitative traits. Cell Genom. 3, 100436 (2023).
Li, H. Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics 30, 2843–2851 (2014).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).
Benner, C. et al. Prospects of fine-mapping trait-associated genomic regions by using summary statistics from genome-wide association studies. Am. J. Hum. Genet. 101, 539–551 (2017).
Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021).
Zhou, W. et al. Global Biobank Meta-analysis Initiative: powering genetic discovery across human disease. Cell Genom. 2, 100192 (2022).
Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010).
Wang, Q. S. et al. Leveraging supervised learning for functionally informed fine-mapping of cis-eQTLs identifies an additional 20,913 putative causal eQTLs. Nat. Commun. 12, 3394 (2021).
Hou, Y. et al. Schizophrenia-associated rs4702 G allele-specific downregulation of FURIN expression by miR-338-3p reduces BDNF production. Schizophr. Res. 199, 176–180 (2018).
Schrode, N. et al. Synergistic effects of common schizophrenia risk variants. Nat. Genet. 51, 1475–1485 (2019).
Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Abraham, G., Qiu, Y. & Inouye, M. FlashPCA2: principal component analysis of Biobank-scale genotype datasets. Bioinformatics 33, 2776–2778 (2017).
Han, B. & Eskin, E. Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. Am. J. Hum. Genet. 88, 586–598 (2011).
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Ge, T. & Yuan, K. getian107/SuSiEx: SuSiEx-v1.1.2 (v1.1.2). Zenodo https://doi.org/10.5281/zenodo.11211744 (2024).
Acknowledgements
We thank M. Kanai for helpful discussions. The UKB EUR and AFR GWAS summary statistics were obtained from the Pan-UKB project. H.H. acknowledges support from National Institute of Diabetes and Digestive and Kidney Diseases (nos K01DK114379 and R01DK129364), National Institute of Mental Health (NIMH) (nos U01MH109539 and R01MH130675), Brain and Behavior Research Foundation Young Investigator Grant (no. 28450), the Zhengxu and Ying He Foundation, and the Stanley Center for Psychiatric Research. T.G. is supported by a National Human Genome Research Institute grant no. R01HG012354. A.F.P. was supported by the Academy of Medical Sciences ‘Springboard’ award (no. SBF005\1083). M.C.O. acknowledges support from Medical Research Council grants to Cardiff University: center (no. MR/L010305/1), program (no. MR/P005748/1) and project (nos MR/L011794/1, MC_PC_17212). Y.-F.L. is supported by the National Health Research Institutes (nos NP-110, 111, 112-PP-09) and the National Science and Technology Council (Ministry of Science and Technology nos 109-2314-B-400-017 and 110-2314-B-400-028-MY3) of Taiwan. Y.-C.A.F. acknowledges support from the National Science and Technology Council (no. 112-2314-B-002-200-MY3) and the Ministry of Education (the Yushan Young Fellow Program, no. MOE-111-YSFAG-0003-001-P1; the Population Health Research Center from Featured Areas Research Center Program within the framework of the Higher Education Sprout Project, no. NTU-113L9004). B.M.N. acknowledges support from the NIMH (nos R37MH107649 and R01MH101244). The Schizophrenia Workgroup of Psychiatric Genomics Consortium acknowledges support from the NIMH (no. R01MH124873).
Author information
Authors and Affiliations
Consortia
Contributions
T.G. and H.H. designed and supervised the project. T.G. developed the statistical methods. K.Y. and T.G. programmed the code for SuSiEx. K.Y., R.J.L. and T.G. conducted the simulation studies. K.Y., R.J.L. and M.Y. performed the biobank fine-mapping analysis. T.-T.C., S.-C.L. and Y.-F.L. performed the analysis in the TWB. Y.-C.A.F., Y.-F.L. and C.-Y.C. supervised the work in the TWB. K.Y. and A.F.P. performed the analysis of the cohorts with schizophrenia. M.J.D. and B.M.N. provided critical suggestions for the study design. M.Y., Y.C., M.L., R.L. and Y.X. took part in the testing of the code. M.C.O. and Z.G. made substantial contributions to the generation and management of the schizophrenia data. W.S. and C.S. provided support to the computational infrastructure. K.Y., T.G. and H.H. wrote the manuscript. All authors reviewed and approved the final version of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
W.S. and C.S. are employees of Digital Health China Technologies. M.J.D. is a founder of Maze Therapeutics. B.M.N. is a member of the scientific advisory board at Deep Genomics and Neumora. C.-Y.C. is an employee of Biogen. R.J.L. is an employee of Ionis Pharmaceuticals. M.C.O. was supported by a collaborative research grant from Takeda Pharmaceuticals for a project unrelated to work presented in this article. Takeda played no part in the conception, design, implementation or interpretation of this study. H.H. received consultancy fees from Ono Pharmaceutical and an honorarium from Xian Janssen Pharmaceutical. The other authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Wenan Chen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Schematic illustration of the meta-analysis-based fine-mapping method, single-population combining method, and SuSiEx.
All panels were created following the LocusZoom style. Variant positions are shown on the x-axis. The gold diamond for each locus represents the lead (most associated) variant. The association strengths for other variants are colored by descending degrees of linkage disequilibrium (LD) with the lead variant (ordered red, orange, green, and blue dots). The purple bars represent the posterior inclusion probabilities (PIPs) inferred by fine-mapping methods. The light gray boxes represent the credible sets estimated by fine-mapping. a, Example of a strong causal signal shared across populations. b, Example of a weak causal signal shared across populations. c, Example of a population-specific causal signal.
Extended Data Fig. 2 Comparison of SuSiEx, PAINTOR and MsCAVIAR in simulations.
a, The job completion summary for the three Bayesian fine-mapping methods using different parameters and input datasets. Red represents jobs taking longer than 24 hours. Yellow represents jobs returning unreasonable results, defined as the sum of PIPs across variants in the genomic locus >5 or <0.1 (1 is expected). Green represents jobs that were completed within 24 hours and returned reasonable results. The lower panel represents different sample size combinations of the discovery GWAS. P-values are from linear regression with no multiple testing correction applied. b, Number of identified true causal SNPs with PIP > 0.5 (x-axis) versus the coverage of the credible sets (y-axis) for different input datasets and fine-mapping methods. Color represents the combination of discovery populations, the size of the symbols represents the total discovery sample size, and the shape of the symbols represents different methods and parameters. Only simulation runs that were completed within 24 hours and returned reasonable results were included.
Extended Data Fig. 3 Examples of the improvement of SuSiEx over single-population fine-mapping in the biobank analysis.
Each of the three sub-figures consists of eight panels, which are aligned vertically, with the x-axis showing the genomic position. The top six panels visualize GWAS association statistics and single-population fine-mapping results within the European (Pan-UKBB European), African (Pan-UKBB African) and East Asian (Taiwan biobank) populations. For association statistics, the left y-axis shows the −log10(P-value) of each SNP. The color represents the descending degrees of LD with the lead SNP (from red, orange to blue). The right y-axis shows the recombination rate in centimorgan per Megabase. The solid line indicates the population-specific recombination maps obtained from the 1000 Genomes Project. Different colors are used to distinguish different credible sets in the fine-mapping results. The second to bottom panel visualizes the results from SuSiEx. ‘Null’ indicates that single-population fine-mapping did not obtain any reliable credible set. The bottom panel shows gene annotations, if any. P-values are from linear regression with no multiple testing correction applied. a, Association with albumin on chr8:9,170,000-9,190,000, an example of a strong causal signal shared across populations. b, Association with platelets count on chr12:104,900,000-105,050,000, an example of a weak causal signal shared across populations. c, Association with albumin on chr12:13,100,000-13,400,000, an example of population-specific causal signals.
Extended Data Fig. 4 Association with total bilirubin on chr11: 5,100,000-5,700,000.
Panels are aligned vertically, with the x-axis showing the genomic position. The top six panels visualize GWAS association statistics and single-population fine-mapping results of the European (Pan-UKBB European), African (Pan-UKBB African) and East Asian (Taiwan biobank) populations following the LocusZoom style. The second to bottom panel visualizes the fine-mapping results from SuSiEx, which integrated GWAS summary statistics from the three populations. The bottom panel shows gene annotations. For GWAS panels, the left y-axis shows the −log10(P-value) of each SNP. The gray horizontal dash line represents the genome-wide significance threshold (P = 5 × 10−8). The purple rectangle for each locus represents the lead (most associated) variant. Variants are colored by descending LD with the lead variant (ordered red, orange, green, light blue, and dark blue dots). For fine-mapping panels, different colors are used to distinguish different credible sets. The diamond represents the variant with the maximum PIP in each credible set. The left y-axis shows the PIP from fine-mapping, and the right y-axis shows the recombination map obtained from the 1000 Genomes Project. For the SuSiEx panel, the average recombination rate across three populations is used. P-values are from linear regression with no multiple testing correction applied.
Extended Data Fig. 5 Association with albumin on chr13: 31,150,000-31,450,000.
Panels are aligned vertically, with the x-axis showing the genomic position. The top six panels visualize GWAS association statistics and single-population fine-mapping results of the European (Pan-UKBB European), African (Pan-UKBB African) and East Asian (Taiwan biobank) populations following the LocusZoom style. The second to bottom panel visualizes the fine-mapping results from SuSiEx, which integrated GWAS summary statistics from the three populations. The bottom panel shows gene annotations. For GWAS panels, the left y-axis shows the −log10(P-value) of each SNP. The gray horizontal dash line represents the genome-wide significance threshold (P = 5 × 10−8). The purple rectangle for each locus represents the lead (most associated) variant. Variants are colored by descending LD with the lead variant (ordered red, orange, green, light blue, and dark blue dots). For fine-mapping panels, different colors are used to distinguish different credible sets. The diamond represents the variant with the maximum PIP in each credible set. The left y-axis shows the PIP from fine-mapping, and the right y-axis shows the recombination map obtained from the 1000 Genomes Project. For the SuSiEx panel, the average recombination rate across three populations is used. P-values are from linear regression with no multiple testing correction applied.
Extended Data Fig. 6 Proportion of variants showing quality issues binned by the drop in PIP between single- and multi-population fine-mapping.
Quality issues were defined as (i) the best PIP variant is in the low complexity region; (ii) the best PIP variant is in allelic imbalance or violates Hardy Weinberg equilibrium in gnomAD; or (iii) the best PIP variant is multi-allelic or colocalizes with indels at the same genomic position, which might influence imputation quality.
Extended Data Fig. 7 Proportion of variants with high/moderate functional impact in cross-population biobank fine-mapping analyses.
The functional impact of each variant was annotated using VEP, with the definition and classification of functional impact obtained from https://useast.ensembl.org/info/genome/variation/prediction/predicted_data.html. The high impact category includes transcript ablation, splice acceptor variants and splice donor variants; the moderate impact category includes missense variants and protein-altering variants; the low impact category includes synonymous variants and splice region variants; the modifier impact category includes introns and intergenic variants among others.
Supplementary information
Supplementary Information
Supplementary Note and Figs. 1–44.
Supplementary Tables
Supplementary Tables 1–27.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yuan, K., Longchamps, R.J., Pardiñas, A.F. et al. Fine-mapping across diverse ancestries drives the discovery of putative causal variants underlying human complex traits and diseases. Nat Genet 56, 1841–1850 (2024). https://doi.org/10.1038/s41588-024-01870-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-024-01870-z
This article is cited by
-
Cross-ancestry genome-wide association study identifies new susceptibility genes for preeclampsia
BMC Pregnancy and Childbirth (2025)
-
Diabetes mellitus polygenic risk scores: heterogeneity and clinical translation
Nature Reviews Endocrinology (2025)
-
Realizing the promise of genome-wide association studies for effector gene prediction
Nature Genetics (2025)
-
Methodological opportunities in genomic data analysis to advance health equity
Nature Reviews Genetics (2025)
-
Genome-wide association studies in a large Korean cohort identify quantitative trait loci for 36 traits and illuminate their genetic architectures
Nature Communications (2025)