Fig. 2: Characterization of novel species, strains and gene families in SPMP genomes.

a Rarefaction analysis showing that the SPMP database covers a substantial fraction of the species level diversity in its MAGs. Error bands represent confidence intervals of 95%. b Pie-chart showing the breakdown of species-level clusters in SPMP that have an isolate genome, only have MAGs (uncultivated) and are novel compared to genomes in public databases (UHGG, GTDB, SGB). c Stacked barcharts showing the number of SPMP strains that have an isolate genome, only have MAGs (uncultivated), and are novel compared to all UHGG genomes (>200,000, <99% ANI). The species shown are the top 20 in terms of median relative abundance in SPMP (most abundant on the left). d Stacked barcharts showing the number of BGCs (top) and GCFs (bottom) in different product classes that are present or absent in existing annotations comprising of the antiSMASH and MiBIG databases as well as antiSMASH annotations from HRGM. Inset piecharts show the overall breakdown. e Synteny plots showing the conservation of gene order and orientation (colored arrows, relatedness shown by vertical lines) for a novel GCF (GCF382) and related families. f Network diagrams depicting correlations between gut microbial species (nodes – species, edges – significant correlations) and overall microbiome structure in SPMP metagenomes when stratified based on presence or absence of GCF 382/271/37 (or missing the corresponding transporter gene) in a Blautia species (enlarged teal node, solid edges to correlated species, dashed edges between other nodes). Source data are provided as a source data file.