Extended Data Fig. 7: Abundance and prevalence of bacterial species biomarkers and the performance of diagnostic models in cohorts from different ethnicities and regions.
From: Noninvasive, microbiome-based diagnosis of inflammatory bowel disease

a-b, Signature of bacterial species biomarkers for UC and CD diagnosis in patients and healthy individuals of discovery cohort, validation cohort, and three downloaded public datasets. The abundance of species was normalized to log2 fold change (log2FC) relative to the mean of control samples. P values were calculated using the two-sided Wilcoxon rank-sum test. P values were then converted to -log10(P-value) after using Benjamini–Hochberg correction to control for multiple testing. Prevalence indicates the proportion of bacterial presence in UC, CD, and healthy group of each cohort. c, The probability of disease calculated by the random forest model between UC/CD patients and controls in Hong Kong discovery cohort (205 UC,174 CD, 118 controls), validation cohort from Hong Kong (139 UC, 139 controls; 92 CD, 108 controls) and Australia (98 CD, 81 controls), and public datasets from the United States (53 UC, 68 CD, 34 controls), Netherlands (23 UC, 20 CD, 22 controls) and mainland China (25 UC, 15 controls; 48 CD, 54 controls). Data were shown in boxplots as the median (centre line), 25th and 75th percentiles (box limits), and 5th and 95th percentiles (whiskers). P values were calculated using the two-sided Wilcoxon rank-sum test. d, Correlation among the ten UC bacterial species biomarkers. UC-depleted bacteria were labelled with green color while the UC-enriched ones were labeled with yellow color. e, Correlation among the nine CD bacterial species biomarkers. CD-depleted bacteria were labelled with green color while the CD-enriched ones were labeled with orange color. Grids in red indicated positive correlation, while grids in blue indicated negative correlation. The correlation coefficient and two-sided P value were given by Spearman correlation. *p < 0.05, **p < 0.01, ***p < 0.001. CD: Crohn’s disease; UC: Ulcerative colitis.