Abstract
The association of gut microbial features with type 2 diabetes (T2D) has been inconsistent due in part to the complexity of this disease and variation in study design. Even in cases in which individual microbial species have been associated with T2D, mechanisms have been unable to be attributed to these associations based on specific microbial strains. We conducted a comprehensive study of the T2D microbiome, analyzing 8,117 shotgun metagenomes from 10 cohorts of individuals with T2D, prediabetes, and normoglycemic status in the United States, Europe, Israel and China. Dysbiosis in 19 phylogenetically diverse species was associated with T2D (false discovery rate < 0.10), for example, enriched Clostridium bolteae and depleted Butyrivibrio crossotus. These microorganisms also contributed to community-level functional changes potentially underlying T2D pathogenesis, for example, perturbations in glucose metabolism. Our study identifies within-species phylogenetic diversity for strains of 27 species that explain inter-individual differences in T2D risk, such as Eubacterium rectale. In some cases, these were explained by strain-specific gene carriage, including loci involved in various mechanisms of horizontal gene transfer and novel biological processes underlying metabolic risk, for example, quorum sensing. In summary, our study provides robust cross-cohort microbial signatures in a strain-resolved manner and offers new mechanistic insights into T2D.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
27,99 € / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
209,00 € per year
only 17,42 € per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
The individual-level raw shotgun sequencing data and metadata have been deposited in the European Nucleotide Archive with accession codes PRJEB37249, PRJEB38742, PRJEB41311 and PRJEB46098 for the Fromentin_2022 dataset (MetaCardis); the Sequence Read Archive (SRA) under accession code ERP002469 for the Karlsson_2013 dataset; the NCBI SRA under accession numbers SRA045646 and SRA050230 for the Qin_2012 dataset (Shenzhen cohort); the China NGDC Genome Sequence Archive: HRA000020 or EGA: EGAS00001004480 for the Wu_2020 dataset; and the China Nucleotide Sequence Archive (CNSA) with the dataset identifier CNP0000175 for the Zhong_2019 dataset (Suzhou cohort). The shotgun metagenomic sequencing data from the Nurses’ Health Study II (NHSII) and Health Professionals Follow-up Study (HPFS) are publicly available at the BIOM-Mass Data Portal (https://biom-mass.org/; project names: HPFS and MBS). Due to the gaining of informed consent from the participants, all of the individual-level phenotype data from NHSII and HPFS are available via a request for external collaboration and upon approval of a letter of intent and a research proposal. Details on how to request external collaboration with NHSII and HPFS can be found at https://nurseshealthstudy.org/researchers (contact principal investigator: A. H. Eliassen, email: [email protected]) and https://sites.sph.harvard.edu/hpfs/for-collaborators/ (contact principal investigator L. Mucci, email: [email protected]). The individual-level metadata in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL) are archived at the National Institutes of Health repositories dbGap (study accession: phs000810.v2.p2) and BIOLINCC (accession number: HLB01141423a). Shotgun metagenomic sequencing data from the HCHS/SOL samples described in this study are deposited in QIITA (study ID: 11666). HCHS/SOL has established a process for the scientific community to apply for access to participant data and materials, with such requests reviewed by the project’s Steering Committee. These policies are described at https://sites.cscc.unc.edu/hchs/ (contact HCHS/SOL at [email protected]). The DIRECT-PLUS Study recruited participants in Israel and was designed as a clinical trial. That study used only baseline, pre-randomization data from the DIRECT-PLUS Study for an observational analysis. Due to gaining of informed consent from the participants, the individual-level de-identified metadata and metagenomic sequencing data in the DIRECT-PLUS Study will be available for general research purposes through a request to I. Shai (email: [email protected]) and D. D. Wang (email: [email protected]) after publication. All of the source data for creating figures and extended data figures are available as supplementary information. Source data are provided with this paper.
Code availability
This study mainly relies on open-source bioinformatic tools described in detail in Methods. The analysis-specific programs are publicly available through https://github.com/DW-Group/T2D_Microbiome_Meta-analysis.
References
IDF Diabetes Atlas https://diabetesatlas.org/atlas/tenth-edition/ (2021).
American Diabetes Association Professional Practice Committee 2. Classification and diagnosis of diabetes: standards of medical care in diabetes – 2022. Diabetes Care 45(Suppl. 1), S17–S38 (2022).
Canfora, E. E., Meex, R. C. R., Venema, K. & Blaak, E. E. Gut microbial metabolites in obesity, NAFLD and T2DM. Nat. Rev. Endocrinol. 15, 261–273 (2019).
Forslund, K. et al. Disentangling type 2 diabetes and metformin treatment signatures in the human gut microbiota. Nature 528, 262–266 (2015).
Karlsson, F. H. et al. Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature 498, 99–103 (2013).
Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012).
Reitmeier, S. et al. Arrhythmic gut microbiome signatures predict risk of type 2 diabetes. Cell Host Microbe 28, 258–272 (2020).
Sankaranarayanan, K. et al. Gut microbiome diversity among Cheyenne and Arapaho individuals from Western Oklahoma. Curr. Biol. 25, 3161–3169 (2015).
Thingholm, L. B. et al. Obese individuals with and without type 2 diabetes show different gut microbial functional capacity and composition. Cell Host Microbe 26, 252–264 (2019).
Wu, H. et al. The gut microbiota in prediabetes and diabetes: a population-based cross-sectional study. Cell Metab. 32, 379–390 (2020).
Zhong, H. et al. Distinct gut metagenomics and metaproteomics signatures in prediabetics and treatment-naïve type 2 diabetics. EBioMedicine 47, 373–383 (2019).
Sonnenburg, J. L. & Bäckhed, F. Diet–microbiota interactions as moderators of human metabolism. Nature 535, 56–64 (2016).
Pedersen, H. K. et al. Human gut microbes impact host serum metabolome and insulin sensitivity. Nature 535, 376–381 (2016).
Dobrindt, U., Chowdary, M. G., Krumbholz, G. & Hacker, J. Genome dynamics and its impact on evolution of Escherichia coli. Med. Microbiol. Immunol. 199, 145–154 (2010).
Van Rossum, T., Ferretti, P., Maistrenko, O. M. & Bork, P. Diversity within species: interpreting strains in microbiomes. Nat. Rev. Microbiol. 18, 491–506 (2020).
Fromentin, S. et al. Microbiome and metabolome features of the cardiometabolic disease spectrum. Nat. Med. 28, 303–314 (2022).
Yaskolka Meir, A. et al. Effect of green-Mediterranean diet on intrahepatic fat: the DIRECT PLUS randomised controlled trial. Gut 70, 2085–2095 (2021).
Pirzada, A. et al. Evolving science on cardiovascular disease among Hispanic/Latino adults. J. Am. Coll. Cardiol. 81, 505–1520 (2023).
Mehta, R. S. et al. Stability of the human faecal microbiome in a cohort of adult men. Nat. Microbiol. 3, 347–355 (2018).
Bao, Y. Origin, methods, and evolution of the Three Nurses' Health Studies. Am. J. Public Health. 105, 1573–1581 (2016).
Beghini, F. et al. Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. Elife 10, e65088 (2021).
Ma, S. et al. Population structure discovery in meta-analyzed microbial communities and inflammatory bowel disease using MMUPHin. Genome Biol. 23, 208 (2022).
Mallick, H. et al. Multivariable association discovery in population-scale meta-omics studies. PLoS Comput. Biol. 17, e1009442 (2021).
Thomas, A. M. et al. Metagenomic analysis of colorectal cancer datasets identifies cross-cohort microbial diagnostic signatures and a link with choline degradation. Nat. Med. 25, 667–678 (2019).
Ruuskanen, M. O. et al. Gut microbiome composition is predictive of incident type 2 diabetes in a population cohort of 5,572 Finnish adults. Diabetes Care 45, 811–818 (2022).
Atarashi, K. et al. Ectopic colonization of oral bacteria in the intestine drives T(H)1 cell induction and inflammation. Science 358, 359–365 (2017).
Clooney, A. G. et al. Ranking microbiome variance in inflammatory bowel disease: a large longitudinal intercontinental study. Gut 70, 499–510 (2021).
Cohen-Poradosu, R., McLoughlin, R. M., Lee, J. C. & Kasper, D. L. Bacteroides fragilis-stimulated interleukin-10 contains expanding disease. J. Infect. Dis. 204, 363–371 (2011).
Garcia-Lopez, M. et al. Analysis of 1,000 type-strain genomes improves taxonomic classification of Bacteroidetes. Front. Microbiol. 10, 2083 (2019).
Petersen, C. et al. T cell-mediated regulation of the microbiota protects against obesity. Science 365, eaat9351 (2019).
Fung, T. C. et al. Intestinal serotonin and fluoxetine exposure modulate bacterial colonization in the gut. Nat. Microbiol. 4, 2064–2073 (2019).
Riester, M. et al. Risk prediction for late-stage ovarian cancer by meta-analysis of 1,525 patient samples. J. Natl Cancer Inst. 106, dju048 (2014).
Wu, H. et al. Metformin alters the gut microbiome of individuals with treatment-naïve type 2 diabetes, contributing to the therapeutic effects of the drug. Nat. Med. 23, 850–858 (2017).
Forslund, S. K. et al. Combinatorial, additive and dose-dependent drug–microbiome associations. Nature 600, 500–505 (2021).
Caspi, R. et al. The MetaCyc database of metabolic pathways and enzymes: a 2019 update. Nucleic Acids Res. 48, D445–D453 (2020).
Anastasi, A., Knight, C. G. & Barrett, A. J. Characterization of the bacterial metalloendopeptidase pitrilysin by use of a continuous fluorescence assay. Biochem. J. 290, 601–607 (1993).
Roden, M. & Shulman, G. I. The integrative biology of type 2 diabetes. Nature 576, 51–60 (2019).
Vatanen, T. et al. Variation in microbiome LPS immunogenicity contributes to autoimmunity in humans. Cell 165, 842–853 (2016).
Wang, D. D. et al. The gut microbiome modulates the protective association between a Mediterranean diet and cardiometabolic disease risk. Nat. Med. 27, 333–343 (2021).
Wang, D. D. et al. The gut microbiome modifies the association between a Mediterranean diet and diabetes in USA Hispanic/ Latino population. J. Clin. Endocrinol. Metab. 107, e924–e934 (2022).
Tett, A. et al. The Prevotella copri complex comprises four distinct clades underrepresented in westernized populations. Cell Host Microbe 26, 666–679 (2019).
Truong, D. T., Tett, A., Pasolli, E., Huttenhower, C. & Segata, N. Microbial strain-level population structure and genetic diversity from metagenomes. Genome Res. 27, 626–638 (2017).
Vangay, P. et al. US immigration westernizes the human gut microbiome. Cell 175, 962–972 (2018).
Wang, T. J. et al. Metabolite profiles and the risk of developing diabetes. Nat. Med. 17, 448–453 (2011).
Karcher, N. et al. Analysis of 1,321 Eubacterium rectale genomes from metagenomes uncovers complex phylogeographic population structure and subspecies functional adaptations. Genome Biol. 21, 138 (2020).
Beghini, F. et al. Large-scale comparative metagenomics of Blastocystis, a common member of the human gut microbiome. ISME J. 11, 2848–2863 (2017).
Hildebrand, F. et al. Dispersal strategies shape persistence and evolution of human gut bacteria. Cell Host Microbe 29, 1167–1176 (2021).
Kaper, J. B., Nataro, J. P. & Mobley, H. L. Pathogenic Escherichia coli. Nat. Rev. Microbiol. 2, 123–140 (2004).
Borodovich, T., Shkoporov, A. N., Ross, R. P. & Hill, C. Phage-mediated horizontal gene transfer and its implications for the human gut microbiome. Gastroenterol. Rep. 10, goac012 (2022).
Bobay, L. M., Traverse, C. C. & Ochman, H. Impermanence of bacterial clones. Proc. Natl Acad. Sci. USA 112, 8893–8900 (2015).
Navarro-Garcia, F. & Elias, W. P. Autotransporters and virulence of enteroaggregative E. coli. Gut Microbes 2, 13–24 (2011).
Cani, P. D. et al. Metabolic endotoxemia initiates obesity and insulin resistance. Diabetes 56, 1761–1772 (2007).
Vazquez-Lopez, J. & Navarro-Garcia, F. In silico analyses of core proteins and putative effector and immunity proteins for T6SS in enterohemorrhagic E. coli. Front. Cell. Infect. Microbiol. 10, 195 (2020).
Ahmed, S. A. et al. Genomic comparison of Escherichia coli O104:H4 isolates from 2009 and 2011 reveals plasmid, and prophage heterogeneity, including Shiga toxin encoding phage stx2. PLoS One 7, e48228 (2012).
Sun, H. et al. Regulation of flagellar motility and biosynthesis in enterohemorrhagic Escherichia coli O157:H7. Gut Microbes 14, 2110822 (2022).
Chaban, B., Hughes, H. V. & Beeby, M. The flagellum in bacterial pathogens: for motility and a whole lot more. Semin. Cell Dev. Biol. 46, 91–103 (2015).
Lux, R. & Shi, W. Chemotaxis-guided movements in bacteria. Crit. Rev. Oral Biol. Med. 15, 207–220 (2004).
Ng, W. L. & Bassler, B. L. Bacterial quorum-sensing network architectures. Annu. Rev. Genet. 43, 197–222 (2009).
Everard, A. et al. Cross-talk between Akkermansia muciniphila and intestinal epithelium controls diet-induced obesity. Proc. Natl Acad. Sci. USA 110, 9066–9071 (2013).
Hillmann, B. et al. Evaluating the information content of shallow shotgun metagenomics. mSystems 3, e00069-18 (2018).
IHMS Consortium. IHMS_SOP 03 V1: Standard Operating Procedure for Fecal Samples self-collection, laboratory analysis handled within 4 to 24 hours (4 hours < x ≤ 24 hours), (International Human Microbiome Standards, 2015).
Salonen, A. et al. Comparative analysis of fecal DNA extraction methods with phylogenetic microarray: effective recovery of bacterial and archaeal DNA using mechanical cell lysis. J. Microbiol. Methods 81, 127–134 (2010).
Courtois, S. et al. Recombinant environmental libraries provide access to microbial diversity for drug discovery from natural products. Appl. Environ. Microbiol. 69, 49–55 (2003).
Deschasaux, M. et al. Depicting the composition of gut microbiota in a population with varied ethnic origins but shared geography. Nat. Med. 24, 1526–1531 (2018).
Fang, C. et al. Assessment of the cPAS-based BGISEQ-500 platform for metagenomic sequencing. Gigascience 7, 1–8 (2018).
Matthews, D. R. et al. Homeostasis model assessment: insulin resistance and beta-cell function from fasting plasma glucose and insulin concentrations in man. Diabetologia 28, 412–419 (1985).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Suzek, B. E., Huang, H., McGarvey, P., Mazumder, R. & Wu, C. H. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23, 1282–1288 (2007).
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Caspi, R. et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 44, D471–D480 (2016).
Ye, Y. & Doak, T. G. A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes. PLoS Comput. Biol. 5, e1000465 (2009).
Mirdita, M. et al. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res. 45, D170–D176 (2017).
Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).
Pasolli, E., Truong, D. T., Malik, F., Waldron, L. & Segata, N. Machine learning meta-analysis of large metagenomic datasets: tools and biological insights. PLoS Comput. Biol. 12, e1004977 (2016).
Carpenter, B. et al. Stan: a probabilistic programming language. J. Stat. Softw. 76, 1 (2017).
Paradis, E., Claude, J. & Strimmer, K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20, 289–290 (2004).
Vehtari, A., Gelman, A. & Gabry, J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat. Comput. 27, 1413–1432 (2017).
Zhou, X., Kao, M. C. & Wong, W. H. Transitive functional annotation by shortest-path analysis of gene expression data. Proc. Natl Acad. Sci. USA 99, 12783–12788 (2002).
Acknowledgements
The authors thank K. Dennis for coordinating the collection and transfer of the data, and F. Bäckhed, K. Kristiansen, J. Li, H. Zhong and J. Qin for sharing their data and helping with the data transfer. The authors are indebted to the participants in the Health Professionals Follow-up Study (HPFS) and Nurses’ Health Study II (NHSII) for their continuing outstanding level of cooperation, and to the staff of the HPFS and NHSII for their valuable contributions. The authors also thank the staff and participants of the Hispanic Community Health Study/Study of Latinos (HCHS/SOL) for their important contributions; the DIRECT-PLUS Study participants for their valuable contributions; and A. Yaskolka-Meir, G. Tsaban, A. Kaplan, H. Zelica, I. Youngster, K. Tuohy and O. Koren for their contribution to the DIRECT-PLUS Study. This work is funded by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK; R00 DK119412 and Boston Nutrition Obesity Research Center Pilot and Feasibility Program grant supported by P30 DK046200 to D.D.W.; R24 DK110499 to C.H.), National Institute of Nursing Research (R01 NR01999 to D.D.W.), National Institute on Aging (R01 AG077489 and RF1 AG083764 to D.D.W.) and National Cancer Institute (NCI; R35 CA253185 to A.T.C.). A.T.C. is an American Cancer Society Research Professor. F.W. is supported by the American Heart Association Postdoctoral Fellowship (Grant number: 897161 to F.W.). The HPFS is supported by research grants U01 CA167552 (to W.C.W.) and R01 HL035464 (to E.B.R.) from the National Institutes of Health (NIH). The Men’s Lifestyle Validation Study in HPFS was supported by U01 CA152904 (to M.J.S. and E.B.R.) from NCI. The fecal sample collection and metagenomic data sequencing in HPFS were supported by the STARR Cancer Consortium Award (I7-A714 to C.H.). NHSII was supported by U01 CA176726 from NIH and P01 CA055075 (to W.C.W.) from NCI. The fecal sample collection and metagenomic data sequencing in NHSII were supported by the R01 CA202704 (to A.T.C. and C.H.) from NCI. The HCHS/SOL is a collaborative study supported by contracts from the National Heart, Lung and Blood Institute (NHLBI) to the University of North Carolina (HHSN268201300001I/N01-HC-65233), University of Miami (HHSN268201300004I/N01-HC-65234), Albert Einstein College of Medicine (HHSN268201300002I/N01-HC-65235), University of Illinois at Chicago (HHSN268201300003I/N01-HC-65236 Northwestern University) and San Diego State University (HHSN268201300005I/N01-HC-65237). The following institutes, centers and/or offices have contributed to the HCHS/SOL through a transfer of funds to the NHLBI: National Institute on Minority Health and Health Disparities (NIMHD), National Institute on Deafness and Other Communication Disorders, National Institute of Dental and Craniofacial Research, NIDDK, National Institute of Neurological Disorders and Stroke, and NIH Institution-Office of Dietary Supplements. Additional funding for the ‘Gut Origins of Latino Diabetes’ ancillary study to HCHS/SOL was provided by R01 MD011389 (to R.C.K., R.D.B. and R.K.) from the NIMHD and the Life Course Methodology Core at Albert Einstein College of Medicine and the New York Regional Center for Diabetes Translation Research (P30 DK111022-8786 and P30 DK111022) through funds from NIDDK. Additional funding for this work was provided by R01 HL060712 (to F.B.H. and Q.Q.) from NHLBI. The DIRECT-PLUS Study was funded by grants from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) Collaborative Research Center SFB1052 ‘Obesity Mechanisms’ (SFB-1052/B11 to I.S.); Israel Ministry of Health grant 87472511 (to I.S.); Israel Ministry of Science and Technology grant 3-13604 (to I.S.); California Walnuts Commission (to I.S.) and the CABALA_DIET&HEALTH Project, which received funding from the European Union’s Horizon 2020 Programme. The funding source had no role in the design and conduct of the study; collection, management, analysis and interpretation of the data; preparation, review, or approval of the manuscript; and the decision to submit the manuscript for publication. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The computations in this paper were run in part on the FASRC Cannon cluster supported by the FAS Division of Science Research Computing Group at Harvard University.
Author information
Authors and Affiliations
Contributions
Z.M., F.W., C.H. and D.D.W. conceptualized the study. Z.M. and F.W. performed the data analysis. Z.M., F.W., C.H. and D.D.W. drafted the paper. C.H. and D.D.W. supervised the study. E.B.R, M.D., W.C.W., R.K., F.B.H., Q.Q., A.T.C., R.D.B., M.J.S., E.R., I.S., R.C.K., C.H. and D.D.W. collected the sample and data, and obtained funding. Z.M., F.W., A.B., D.D., R.M., A.G., Y.Z., Y.L., E.R., S.M., E.B.R., M.D., W.C.W., R.K., F.B.H., Q.Q., A.T.C., R.D.B., M.J.S., I.S., R.C.K., C.H., and D.D.W. discussed the results, critically reviewed the text and approved the final paper.
Corresponding authors
Ethics declarations
Competing interests
C.H. is a member of the scientific advisory board for Zoe Nutrition, Empress Therapeutics, and Seres Therapeutics. All other authors have no competing interests.
Peer review
Peer review information
Nature Medicine thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Sonia Muliyil, in collaboration with the Nature Medicine team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Workflow.
We adjusted for the study effect by adopting a conservative meta-analysis approach in the downstream analyses. Our analyses examined the overall microbial community structure, specific microbial taxonomic and functional features, strain-specific biochemical pathways, and within-species phylogeny and gene families in a cross-cohort meta-analysis framework. This figure was created with BioRender.com.
Extended Data Fig. 2 Principal coordinate analysis of all samples using species-level Bray–Curtis dissimilarity colored by cohorts before and after correcting batch and study effects.
R2 values are calculated from permutational multivariate analysis of variance (PERMANOVA, n = 999 permutations) and indicate the variance attributable to study and batch effects.
Extended Data Fig. 3 Comparisons in associations between microbial species and type 2 diabetes across different statistical models.
Meta-analyzed associations of individual microbial species with type 2 diabetes (T2D) phenotype from the ordinal (a) and binary (b) models. The ordinal model modeled the disease status as an ordinal variable (T2D, prediabetes, or controls) and used data from all the participants. The binary model modeled the disease status as a binary variable (T2D or controls) and used data from T2D patients and normoglycemic controls. The blue-to-red and purple-to-orange gradients represent the magnitude and direction of the associations as quantified by meta-analyzed beta coefficients from linear mixed models adjusted for age, sex, and body mass index (BMI) and further adjusted for metformin use in MaAsLin2. All the results were corrected for multiple hypothesis testing by controlling the false discovery rate (FDR) using the Benjamini–Hochberg method with a target rate of 0.10. All models included each participant’s identifier as random effects and simultaneously adjusted for covariables. (c) Comparisons in associations between microbial species and T2D between multivariate MaAsLin2 models with and without further adjustment for BMI and metformin use from the ordinal model. (d) Comparisons in associations between microbial species and T2D between multivariate MaAsLin2 models with and without further adjustment for BMI and metformin use from the binary model. Dots in the scatter plots in (c) and (d) represent meta-analyzed beta coefficients from linear mixed models adjusted for covariables in MaAsLin2. All the statistical tests were two-sided. A total of 8,117 metagenomes from 1,851 T2D patients, 2,770 individuals with prediabetes, and 2,277 normoglycemic controls were included in the analyses in (a), (b), (c), and (d). Abbreviations: BMI, body mass index; Con, control; metf, metformin use; insul, insulin use; T2D, type 2 diabetes.
Extended Data Fig. 4 Metformin has a direct impact on the gut microbiome composition and confounds the associations between microbial species and type 2 diabetes.
(a) Distance-based redundancy analysis (dbRDA) based on species-level Bray–Curtis dissimilarity colored by type 2 diabetes (T2D) and metformin use. The centers of the boxplot show medians with boxes indicating their inter-quartile ranges (IQRs) and upper and lower whiskers indicating 1.5 times the IQR from above the upper quartile and below the lower quartile, respectively. (b) Meta-analyzed and cohort-specific associations of microbial species with metformin use among T2D patients. We defined microbial signatures of metformin as those significantly associated with metformin use in T2D cases only but not associated with T2D after further adjusting for metformin use in all participants. We also identified 4 species associated with both metformin use and T2D. The centers of the error bars represent the β coefficients of the associations, and the error bars represent their standard errors (SEs). (c) Our modeling approach effectively accounted for the potential confounding effect of metformin use, as evidenced by the high correlation between the beta coefficients of species–T2D associations obtained in the primary analysis and those calculated in a sensitivity analysis excluding T2D patients treated with metformin. The beta coefficients in (b) and (c) represent the associations quantified by linear mixed models, adjusting for age, sex, body mass index (BMI), and metformin use where appropriate, in MaAsLin2. All the results were corrected for multiple hypothesis testing by controlling the false discovery rate (FDR) using the Benjamini–Hochberg method with a target rate of 0.10. All the analyses in (a), (b), and (c) were based on 5,114 metagenomes from 1,851 T2D patients and 2,277 normoglycemic controls. The statistical tests in (a) and (b) were two-sided. Abbreviations: Con, control; metf, metformin use; T2D, type 2 diabetes.
Extended Data Fig. 5 Sensitivity analyses demonstrate that identified microbial features of type 2 diabetes are unlikely to reflect the duration or comorbidities of this disease.
(a) Comparisons in associations between microbial species and T2D in one analysis that includes all study participants and the other that excludes individuals with prevalent T2D in the Hispanic Community Health Study/Study of Latinos. (b) Comparisons in associations between microbial species and T2D in one analysis that includes all study participants and the other analysis that excludes insulin-treated T2D patients. The dots represent the associations quantified by linear mixed models, adjusting for age, sex, body mass index, and metformin use in MaAsLin2. Abbreviation: T2D, type 2 diabetes.
Extended Data Fig. 6 Associations of microbial features with circulating metabolic and inflammation biomarkers.
(a) Meta-analyzed associations of individual MetaCyc pathways with circulating biomarkers of metabolic risk. (b) Meta-analyzed associations of individual microbial enzymes with circulating biomarkers of metabolic risk. Only pathways and enzymes listed in Fig. 3 were analyzed and presented in this figure. The blue-to-red gradients represent the magnitude and direction of the associations as quantified by meta-analyzed beta coefficients from linear mixed models adjusted for age, sex, body mass index, and metformin use in MaAsLin2. All the results were corrected for multiple hypothesis testing by controlling the false discovery rate (FDR) using the Benjamini–Hochberg method with a target rate of 0.10. Abbreviations: BMI, body mass index; HbA1c, hemoglobin A1c; HDL-C, high-density lipoprotein cholesterol; hs-CRP, high-sensitivity C-reactive protein; HOMA-B, homeostasis model assessment of β-cell function; HOMA-IR, homeostasis model assessment of insulin resistance; LDL-C, low-density lipoprotein cholesterol; TG, triglyceride.
Extended Data Fig. 7 Prevotella copri’s differential carriage of branched-chain amino acid biosynthesis function is explained by its discrete subclade structure.
(a) Distribution of different P. copri subclades across geographic regions and studies. We applied MetaPhlAn taxonomic profiling based on P. copri subclade-specific marker genes to detect the presence of a subclade in metagenomes. (b) Comparisons in adjusted relative abundance of branched-chain amino acid (BCAA) biosynthesis pathways and enzyme encoded by P. copri subclades dominated by clade A versus other clades. The adjusted relative abundance of pathways and enzymes is estimated by anpan (ANalysis of microbial Phylogenies And geNes)’s pathway random effects models (Methods) with simultaneous adjustment for the abundance of P. copri subclades. The centers of the boxplot show medians of adjusted relative abundance with boxes indicating their inter-quartile ranges (IQRs) and upper and lower whiskers indicating 1.5 times the IQR from above the upper quartile and below the lower quartile, respectively. P-values were generated from two-sided t-tests based on the adjusted relative abundance. (c) Clade A-dominant P. copri strains in type 2 diabetes (T2D) patients were more likely to retain pathways and enzymes of branched-chain amino acid biosynthesis compared to clade A-dominant nonT2D controls. The blue and red lines, fitted by linear regression in participants with T2D and control participants separately, represent the associations between the log-transformed relative abundance of P. copri subclade and the log-transformed relative abundance of a given pathway or enzyme encoded by P. copri. The numeric values in the top left corner are posterior differences and 98% posterior intervals of differences in log-transformed pathway abundance between case–control status, as determined by mixed effects models anpan (Methods). This model allows us to identify microbial functions encoded by a P. copri subclade that are differentially abundant between T2D cases versus controls while controlling for its subclade-level abundance. All the analyses in (a), (b), and (c) were based on 5,114 metagenomes from 1,851 T2D patients and 2,277 normoglycemic controls.
Extended Data Fig. 8 Phylogenetic trees of select species show divergent associations between subclades and type 2 diabetes within each species.
The annotation bars represent metformin use (metf), study, body mass index (BMI), sex, age, and type 2 diabetes (T2D) status, respectively. The boxplots in the bottom represent the posterior mean of the phylogenetic effect of each phylogenetic tree leaf (metagenome) estimated by the phylogenetic generalized linear mixed models (PGLMMs) in anpan (ANalysis of microbial Phylogenies And geNes, see Methods) with whiskers representing the 95% credible intervals of the posterior means. By applying PGLMMs, we compared two generalized linear mixed models with and without incorporating within-species phylogeny as a random effect (Methods). Both models were adjusted for age, sex, body mass index, metformin use, and study membership as fixed effects. We generated within-species phylogenetic trees by randomly splitting the edges based on the Euclidean similarity matrix derived from clustered sets of protein sequences (UniRef90 gene families) after dimension reduction by principal components analysis.
Extended Data Fig. 9 Gene set enrichment analysis of gene ontology terms for biological process.
The line plots show the running enrichment score for the gene ontology (GO) term as the analysis ‘walks down’ the ranked list. The vertical black lines on the X-axis show where members of the GO term appear in the ranked list of UniRef90 gene families.
Supplementary information
Supplementary Information
Supplementary Text and Supplementary Figs. 1 and 2
Supplementary Tables 1–13
Supplementary Tables 1–13
Supplementary Data
Source Data for Supplementary Figs. 1 and 2
Source data
Source Data Fig. 1
Statistical Source Data
Source Data Fig. 2
Statistical Source Data
Source Data Fig. 3
Statistical Source Data
Source Data Fig. 4
Statistical Source Data
Source Data Fig. 5
Statistical Source Data
Source Data Extended Data Fig. 2
Statistical Source Data
Source Data Extended Data Fig.3
Statistical Source Data
Source Data Extended Data Fig. 4
Statistical Source Data
Source Data Extended Data Fig. 5
Statistical Source Data
Source Data Extended Data Fig. 6
Statistical Source Data
Source Data Extended Data Fig. 7
Statistical Source Data
Source Data Extended Data Fig. 8
Statistical Source Data
Source Data Extended Data Fig. 9
Statistical Source Data
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mei, Z., Wang, F., Bhosle, A. et al. Strain-specific gut microbial signatures in type 2 diabetes identified in a cross-cohort analysis of 8,117 metagenomes. Nat Med 30, 2265–2276 (2024). https://doi.org/10.1038/s41591-024-03067-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41591-024-03067-7
This article is cited by
-
Machine learning-derived diagnostic model of epithelial ovarian cancer based on gut microbiome signatures
Journal of Translational Medicine (2025)
-
Longitudinal change and causal relationship between gut microbiota and gestational diabetes mellitus
Diabetology & Metabolic Syndrome (2025)
-
Intestinal microbiota profiles of captive-bred cynomolgus macaques reveal influence of biogeography and age
Animal Microbiome (2025)
-
CRISPR–Cas therapies targeting bacteria
Nature Reviews Bioengineering (2025)
-
Topology of gut Microbiota Network and Guild-Based Analysis in Chinese Adults
Phenomics (2025)