Main

Permafrost peatlands are critical carbon (C) reservoirs1, and as climate warming accelerates thaw, substantial carbon releases are expected. The magnitude of climate feedback depends on both the amount of C released through microbial decomposition and the composition of greenhouse gases (GHG) released (for example, CO2 versus CH4), influenced by microbial decomposition of soil organic matter (SOM)2,3. However, fine-scale interactions affecting ecosystem emissions remain poorly understood. Advances in metabolomics now enable precise characterization of environmental metabolites4, providing insights into metabolic mechanisms from microbial community–environment interactions5 that structure ecosystem response to environmental changes6. Metabolites, produced by microbial reactions in response to physiological or environmental changes7, provide direct insight into in situ microbial function8. However, the full suite of environmental metabolites9,10—the metabolome—results from biotic sources beyond microorganisms, including plant exudates and detritus11 as well as abiotic by-products12. This complexity can lead to distinct controls shaping metabolome and microbiome assembly, potentially resulting in divergent and asynchronous responses to environmental changes13,14. To understand these drivers, we applied community assembly theory jointly to microbiomes and metabolomes, treating metabolite ‘communities’ as ecologically analogous to microbial ones13. This approach helps disentangle metabolome and microbiome functions, enhancing our grasp of ecosystem biogeochemistry and improving climate change predictions13,15. We leveraged shotgun metagenomics and high-resolution metabolomics techniques respectively, guided by community assembly principles, to decipher the composition and structure of microbial communities and metabolomes across a thaw gradient. This is particularly relevant for transitioning ecosystems, such as thawing permafrost, where microbial communities and metabolomes reassemble as conditions change. Community assembly involves deterministic and stochastic factors16,17 (Table 1), which can be disentangled using phylogenetic null models18,19. Furthermore, we performed random matrix theory (RMT) network analysis to assess the interactions within these reassembling communities and metabolomes.

Table 1 Terms used in the Article and their definitions

Our focus is on the Stordalen Mire, a model permafrost ecosystem in northern Sweden spanning three thaw-stage habitats: dry permafrost-underlain palsas (with a seasonally thawed active layer), partially thawed and inundated ombrotrophic bogs, and fully thawed and inundated minerotrophic fens. This thaw progression substantially alters microbial communities3,20, vegetation21,22 and SOM composition23,24,25,26, resulting in increased GHG emissions27.

We hypothesized that the primary drivers shaping microbial community and metabolome assembly shifted in response to the unique environmental signatures of each habitat. In palsa environments, we expected selection pressures to dominate assembly, leading to homogenous microbial communities adapted to seasonal thawing events28,29. The high chemical lability of litter and presence of oxygen probably promoted rapid microbial degradation compared with the anoxic conditions of bogs and fens26. Steep environmental gradients suggested the dominance of deterministic factors, particularly variable selection, in shaping the palsa metabolome. Limited dispersal, encouraged by dry ombrotrophic conditions, probably contributed to microbial and metabolome homogeneity. Bog ecosystems presented a contrasting scenario. Although exposed to a strong selective filter from Sphagnum metabolites and acidity, the bog microbiota probably experienced some local dispersal owing to fluctuating water table levels, potentially mitigating homogenous selection. Sphagnum moss strongly influenced the bog metabolome through litter inputs and bioactive metabolites. We predicted homogenization of the metabolome due to accumulation of recalcitrant metabolites, altered enzymatic activity and anoxic conditions24. For fen environments, we hypothesized that minerotrophic conditions and full inundation promoted microbial dispersal, outweighing selective pressure28,30. Runoff from surrounding areas and water mixing might have facilitated metabolite dispersal30. We predicted a more variable metabolome shaped by the diverse and active microbial community present supported by the presence of microbially derived compounds with low aromaticity23,24,25.

We examined the congruence between microbial communities and metabolomes at community and feature scale (microbial lineages and metabolite formulas). We hypothesized that synchrony in ecological assembly processes at the community level signified a stable, coordinated ecosystem response while feature-scale alignment suggested specialized, finely tuned microbial activities and metabolite patterns.

Results

Divergent assembly of metabolites and microorganisms in permafrost

We used paired genome-resolved metagenomics and high-resolution Fourier transform ion cyclotron resonance (FTICR) mass spectrometry to characterize peatland microbial communities and metabolomes. Samples were collected from three habitats (palsa, bog and fen) at different depths (shallow, middle and deep) at the Stordalen Mire between June and August 2012 (Supplementary Data 1). Although we used a metagenome-assembled genome (MAG) database compiled over a longer time frame (2011–2017), containing 13,290 MAGs31,32, we analysed 1,402 MAGs from 2012 samples with paired metabolome data, detecting 14,432 metabolites (6,763 with assigned molecular formulas). By investigating ecological processes shaping microbiome and metabolome assembly at various scales, our analysis revealed a decoupling between drivers of microbiome and metabolome assembly. Microbial assembly shifted along the permafrost thaw gradient from environmental filtering and biotic interactions towards increased stochasticity resulting in low compositional turnover within habitats (Fig. 1a,b). Microbial assembly was driven by homogeneous selection in the palsa and homogeneous dispersal in the bog and fen (Fig. 1e). Variable selection contributed somewhat to the fen. In the palsa, limited spatial turnover probably resulted from consistent environmental filters such as dry, ombrotrophic conditions and strong seasonal freezing–thawing cycles. Increased water availability promotes dispersal in the bog and fen. Various ecological factors may shape bog communities, including Sphagnum presence, low pH, nutrient levels33 and water table fluctuations, creating distinct niches34. The diverse fen SOM composition22 provided various avenues for microbial communities to thrive3,24,25. In contrast to microbial assembly, metabolome compositions were primarily driven by variable selection and homogenizing dispersal (Fig. 1b–e). The bog had the highest variable selection, followed by the palsa and the fen. Homogenizing dispersal had the most impact on the fen, then the palsa and, lastly, the bog. Despite our hypotheses, Sphagnum-derived metabolites did not create a uniform bog metabolome (variable selection was dominant), suggesting that microorganisms can degrade complex polyphenolic compounds35. Abiotic reactions also facilitate Sphagnum leachate breakdown36. Divergent palsa and fen metabolomes may arise from factors systematically affecting production or transformation, akin to ecological selection13,37. Differing metabolite patterns probably stem from niche specialization, nutrient dynamics and habitat physicochemical properties23,38. As distinct processes govern microbial versus metabolite assembly, we examined their coordination using Mantel tests within each habitat, finding no significant correlations. However, Spearman correlations revealed strong microbial and metabolite beta nearest taxon index (βNTI) associations in the palsa at monthly and depth scales (Supplementary Data 2) indicating the need for finer-scale analysis to elucidate microorganism–metabolite interactions.

Fig. 1: Ecological assembly processes driving active-layer permafrost microbial communities and metabolites.
figure 1

a,b, Comparison of the average within-habitat βNTI, an index used to compare the ecological processes driving microbial (a) and metabolite (b) assemblages. Violin plots show the average within-habitat βNTI of each sample (microbial: palsa, n = 21; bog, n = 22; fen, n = 24; metabolome: palsa, n = 29; bog, n = 27; fen, n = 29) calculated from MAG-derived OTUs and TWCD metabolites. Boxes represent the upper and lower quartiles, the line in each box represents the median and the whiskers represent the maximum and minimum values, no further than 1.5 times the interquartile range; values beyond the whiskers represent outliers and are plotted individually. Significant differences between habitats were determined using a two-sided Mann–Whitney U test. Assembly processes can be delimited by the red dashed lines; variable selection (βNTI > 2), homogenous selection (βNTI < −2), stochastic assembly (|βNTI| < 2). P values were adjusted using the Bonferroni method and are indicated as ‘P =’. c,d, Density plots showing changes in the βNTI of each habitat with depth (c) and month (d). Assembly processes can be delimited by the red dashed lines: variable selection (βNTI > 2), homogenous selection (βNTI < −2), stochastic assembly (|βNTI| < 2). e, Bar plots showing the putative influence of different ecological processes within each habitat according to (from top to bottom) the microbial community and bulk metabolites. f, Correlation of microbial and metabolite βNTI values with environmental data using a one-sided Mantel test using Pearson’s correlation method. The heatmap shows Mantel correlations between peat samples’ βNTI, calculated within bulk metabolites and the pairwise differences of each environmental variable between samples. Mantel r values range from 1 to −1 showing positive and negative correlations. P values were adjusted using the Bonferroni method. Metabolite βNTI correlation with depth in palsa, P = 0.0115, and with depth, P = 0.0005; C:N ratio, P = 0.032; and precipitation, P = 0.0005, in the fen.

Source data

Next, we examined the environmental drivers underlying the observed divergent assembly patterns between microbiomes and metabolomes. Depth, C:N ratios and precipitation significantly contributed to shaping palsa and fen metabolomes (Fig. 1f). This highlights the influence of oxygen availability, microbial activity and peat stratification on metabolome assembly24,26,39. Notably, bog metabolite assemblages showed no significant correlations with environmental factors, suggesting the involvement of unexplored variables. Even though this habitat experienced a higher influence of variable selection (Fig. 1e), the strongest connection with environmental variables was observed in the other habitats suggesting that the bog is potentially organized at the feature scale and driven by the connections and actions of individual features. This highlights the importance of feature-scale microorganism–metabolite connections in the bog and the need for a multidimensional approach to understand the ecological processes shaping permafrost ecosystems.

Microbial and metabolite dynamics at the feature scale

Building on a previous work15, we used a feature-specific beta diversity null modelling approach (βNTIfeature) to identify individual microbial taxa and metabolites influencing ecological variation or similarity within communities15. This method pinpoints members with unique responses to ecological pressures15,40. Although most individual microorganisms minimally affected divergence or convergence (ecological variation or similarity, respectively) (Supplementary Data 3 and 4), a greater proportion of metabolites contributed to metabolome variation within each habitat compared with the microbiome (Supplementary Data 3 and 4, and Supplementary Note 1). This aligns with findings15 suggesting that the transient nature of metabolomes drives this difference.

Furthermore, our analysis of correlations between microbial genome abundances and metabolite-βNTIfeature-derived clusters unveiled groups influencing ecological chemical variation, similarity and stochasticity (Fig. 2c). More microbial–metabolite correlations were observed in the bog, supporting our findings that this habitat is organized at the feature-scale level (previous section). This suggests that strong microorganism–metabolite interactions are more important in the bog for organic matter diversification. In the bog, cluster 1 metabolites contributed to metabolome divergence with a higher relative abundance of S-containing compounds, while cluster 2 exhibited a prevalence of N-containing compounds contributing to divergence and stochasticity (Fig. 2c). Meanwhile, in the palsa and fen, we identified three metabolite clusters with different contributions to metabolome assembly and significantly different metabolite properties (Supplementary Note 2). Across habitats, clusters contributing towards metabolome convergence or having insignificant contributions had higher abundance of CHO compounds, primarily carbohydrates, aligning with previous observations15 and suggesting common processes driving chemical similarity across permafrost gradients (Fig. 2d).

Fig. 2: Metabolite-βNTIfeature-derived clusters correlate with specific microbial taxa and environmental factors.
figure 2

a, Spearman correlations between metabolite-βNTIfeature-derived clusters and bacterial genomes normalized abundances and environmental factors (T. soil is depth, precipitation, soil temperature). Correlation P values were estimated with rcorr() from the Hmisc R package (two-sided test). Only metabolite clusters showing significant correlations (P adjusted < 0.05) are shown. The false discovery rate method was used for multiple correction testing. The colour and thickness of the edges represent Spearman rho values; positive correlations are shown in red, negative correlations in blue. b, Differences of metabolite properties, including AI_mod, DBE_O and NOSC, between each cluster within each habitat (number of metabolite features in each habitat’s clusters: palsa: cluster 1 = 764, cluster 2 = 142, cluster 3 = 228; bog: cluster 1 = 29, cluster 2 = 284, cluster 3 = 696; fen cluster 1 = 95, cluster 2 = 539, cluster 3 = 259). Significant differences between metabolite clusters were determined using a two-sided Wilcoxon test. Boxes represent the upper and lower quartiles, the line in each box represents the median value and the whiskers represent the maximum and minimum values, no further than 1.5 times the interquartile range; values beyond the whiskers represent outliers and are plotted individually. P values were adjusted using the Bonferroni method. c, Percentage of metabolites that contribute to metabolome assembly as follows: |βNTIfeature| < 1, insignificant; βNTIfeature ≥ 2, significant high divergence (Sig. high); βNTIfeature ≥ 1, high divergence (High); βNTIfeature ≤ −2, significant high convergence (Sig. low); and βNTIfeature ≤ −1, high convergence (Low). d, Percentage of metabolite elemental compositions within each cluster.

Source data

Strong positive and negative correlations were evident between genome abundances and metabolite clusters in each habitat (Fig. 2a and Supplementary Data 5), probably arising from the peat spatial complexity, and dynamic C distribution creating microbial niches41 leading to specialization in the utilization of distinct metabolite pools. This was especially evident in bog cluster 1, in which S-containing metabolite clusters (Fig. 2d), with CHOS elemental composition, were linked to metabolome divergence. Cluster 1 metabolites exhibited characteristics of recalcitrance, potentially Sphagnum-moss-derived compounds. They had a high proportion of aromatic rings (modified aromaticity index (AI_mod) > 0.5 and AI_mod > 0.67); were highly unsaturated (double-bond equivalent minus oxygen (DBE_O) > 0); had a low nominal oxidation state of carbon (NOSC < 0), suggesting low bioavailability42; and were enriched in S-containing condensed hydrocarbon and lignin-like compounds (Fig. 2d and Extended Data Fig. 1c). Previous studies have proposed that Sphagnum litter is the primary source of organic S in peat, serving as the main reservoir for recycling43,44. These S-containing metabolites probably play a key role in SOM cycling within bogs. Comparing metabolites across plant species and peat soil from the three habitats showed that (1) Sphagnum had a lower NOSC than other plants, (2) S-containing compounds were highly consumed in bogs and (3) newly observed lignin-like compounds (compounds not observed in the plant extracts or the peat) were abundant in the bog and fen26. These findings suggest that bog microbial communities co-evolved to efficiently use these metabolites, consistent with previous substrate addition experiments45. Moreover, given low S and N levels in bogs, these metabolites may control SOM decomposition rates in the Stordalen Mire.

Organic S compounds, such as those in cluster 1, may serve as reservoirs for microbial sulfate mobilization46 via assimilatory or dissimilatory sulfate reduction. Supporting this mechanism in the bog, 55 MAGs across 9 phyla, mostly Acidobacteria, Verrucomicrobia and Proteobacteria, strongly correlated with cluster 1 metabolites and encoded and expressed genes for S metabolism pathways (Extended Data Fig. 2 and Supplementary Note 2).

Sulfur-containing compounds can also result from sulfide (H2S) incorporation into organic matter by sulfate-reducing microorganisms (SRM)47 during dissimilatory sulfate reduction. SOM degradation in peatlands depends on available terminal electron acceptors, forming a redox gradient48. As sulfate reduction is more thermodynamically favourable than fermentation and methanogenesis, SRM can outcompete methanogens for substrates, mitigating methane flux3,23. SRM have versatile substrates from H2, fatty acids, alcohols to aromatics49. Syntrophic SRM–methanogen interactions occur via H2, formate, acetate50, methyl sulfides51 or direct electron transfer52, further influencing GHG production.

Therefore, bacteria involved in organosulfur transformations and methanogen–methanotroph interactions may be key functional species in the bog. Our analysis identifies these potentially crucial bacteria and suggests their role in driving metabolome divergence. Correlations between bacterial genomes and S-containing metabolite clusters hint at their possible use as terminal electron acceptors, aligning with previous proposals23. Going beyond simply quantifying inorganic S species, our study identifies specific organosulfur metabolites and associated taxa and elucidates complex bog biogeochemical cycling.

Metabolite clusters further correlated with environmental factors (Fig. 2a and Supplementary Note 2). This analysis links individual microbial and metabolic features to assembly processes and environmental parameters, offering insights into the interplay and scale dependencies governing permafrost microbiome and metabolome dynamics53.

Microbial and metabolite assembly linked to greenhouse gas

To explore the connection between ecological assembly and biogeochemical function at the feature scale, we investigated whether metabolically important genomes (those significantly associated with metabolite clusters) were also associated with varying CO2 levels in the palsa peat or with dissolved CO2 and CH4 concentrations in the bog and fen porewater. Even though the measured GHG concentrations do not directly reflect in situ production rates or above ground fluxes, their trends can help us understand how microscale processes may influence systemic outputs54. Significant correlations between genome abundances and GHG levels were observed only in the bog with 45 MAGs correlating with CO2, and 29 MAGs correlating with CH4 (Supplementary Data 5). Here we focus the discussion only on five of those genomes that were also found to significantly contribute to community assembly (Fig. 3a,b). The lack of significant correlations in other habitats probably suggests that the relationship between ecological processes and biogeochemical functions may occur at different scales along the thaw gradient. In the bog, where there are strong ecological filters such as low pH and the presence of recalcitrant compounds24,33, specific features (that is, microorganisms) seem to contribute to GHG production (Fig. 3a,b). Meanwhile, in the palsa and fen, this function may depend on the action of many members of the microbial community, making their contributions less important at the feature level.

Fig. 3: Correlation of genomes that significantly correlated with BNTIfeature metabolite clusters with CO2 and CH4 in the bog porewater.
figure 3

a,b, Spearman correlation between genome abundances and CO2 (a) and CH4 (b) in the bog porewater (n = 19 bog samples). Correlation P values were estimated with rcorr() from the Hmisc R package (two-sided test). P values were adjusted using the FDR method. Contribution to microbial community assembly: βNTIfeature ≥ 2, significant high divergence (Sig. high), and βNTIfeature ≤ −2, significant high convergence (Sig. low). c, Functional potential of these taxa. The heatmap represents the functional potential of these five MAGs. The heatmap is divided into functional categories (right) with different functions within each category; the presence of a function is observed as a coloured rectangle.

Source data

Of the five bacterial genomes that correlated with GHG levels in the bog porewater, two genomes, Terracidiphilus (Acidobacteria) and Fen-455 (Actinobacteria), had positive correlations with dissolved CO2 and CH4 concentration. Holophaga (Acidobacteria) correlated solely with CO2, while RAAP-2 (Actinobacteria) correlated negatively with CO2 and CH4 (Supplementary Data 5).

Members of Acidobacteria have been described as the primary degraders of Sphagnum polysaccharides in peat bogs3. These Terracidiphilus genomes expressed a wide set of enzymes for the degradation and utilization of polysaccharides, highlighting their role in breaking down biopolymers55 and contributing to core biogeochemical processes. Genomic and gene expression analysis revealed that Terracidiphilus genomes encoded several genes related to central carbon metabolism and S cycling (Fig. 3, Supplementary Note 3 and Extended Data Fig. 3). The Terracidiphilus genome contained genes for assimilatory sulfate reduction.

Importantly, Acidobacteria genomes expressed genes for producing and/or utilization of common methanogenic substrates (that is, acetate, formate). Some expressed Wood–Ljungdahl pathway genes. The correlation of these genomes with porewater CH4 levels suggests they are key community members providing or competing for methanogen substrates. Our results highlight their versatile metabolism, linking the presence of these microorganisms to GHG production—insights relevant for environmental modelling and management strategies.

Microbial–metabolite networks in a permafrost thaw gradient

In our systematic exploration of microbial–metabolite interactions, we applied abundance-based co-occurrence networks using RMT-based network analysis56,57,58. This method helped us study how metabolites influencing metabolome convergence and divergence interacted with the microbial community within each habitat. The networks showed characteristics typical of complex networks, including scale-free node connectivity, small-world properties and high modularity (Supplementary Data 6). A total of 177 MAGs (palsa = 30, bog = 66, fen = 84) constituted the networks, termed networked microbial communities. Node numbers increased with thaw progression, with distinct interaction patterns within each habitat.

Notably, the bog exhibited the most microorganism–metabolite links, followed by the fen and the palsa. This corroborates our previous analysis that the bog has a strong feature-scale organization. Most microorganism–metabolite interactions were negative (Extended Data Fig. 4b), probably reflecting microbial transformation or utilization of metabolites23,25,59. Limited microorganism–microorganism connections in the palsa were attributed to its dry ombrotrophic conditions. Conversely, increased water content in the bog and fen was associated with greater microbial and metabolite dispersal, probably contributing to increased microorganism–microorganism, microorganism–metabolite and metabolite–metabolite interactions (Extended Data Fig. 4b). Across all three habitats, we identified nodes that played important roles in shaping network structure and stability, including module hubs, which are highly connected nodes within a network module, and connector nodes, which are extensively linked to multiple modules56. Interestingly, the bog exhibited a prevalence of connector nodes. These nodes in all three habitats were predominantly lignin-like compounds with CHO and CHNOP elemental composition, characterized by low NOSC values. CHNOP and CHOSP module hubs and connectors contributed to metabolome divergence (Supplementary Data 6). Lignin-like compounds probably serve as primary nutrient sources for microorganisms. Their degradation can produce polyphenols, which are considered recalcitrant and potentially inhibitory to microbial activity under anaerobic conditions48, despite evidence of anaerobic degradation35. Consequently, these compounds and their derived polyphenols may influence GHG emissions in peatlands.

Using the METABOLIC pipeline—community option60, we examined the functional profiles of the 177 networked MAGs. These communities, primarily Acidobacteria, Actinobacteria, Proteobacteria and Verrucomicrobia phyla, had only three shared MAGs among different habitats. In addition, the bog and fen networks included three archaeal phyla: Halobacteria, Methanobacteriota and Thermoproteota. Alluvial plots (Fig. 4d–f) elucidated microbial community contributions to metabolic and biogeochemical processes. Notably, in the palsa and bog, Acidobacteria played a substantial role in various steps of the C, N and S cycles, evident from numerous negative links with metabolite nodes (Extended Data Fig. 4d,e).

Fig. 4: Microbial–metabolite co-occurrence networks within each habitat across the thaw gradient and their functional potential.
figure 4

ac, Networks within the palsa (a), bog (b) and fen (c), in which metabolites are represented by circles and microbial phyla are represented by squares. Networks are coloured by metabolite elemental composition. Larger circles or squares represent microbial and metabolite node hubs and connectors. df, Metabolic alluvial plots representing the contribution of different MAGs to individual metabolic and biogeochemical processes within the carbon, nitrogen, sulfur and other cycles in the palsa (d), bog (e) and fen (f). Microbial genomes within each habitat are represented at the phylum level. The three columns from left to right represent taxonomic groups scaled by the number of genomes, the contribution to each metabolic function by microbial groups calculated based on genome coverage and the contribution to each functional category or biogeochemical cycle. Different colours represent different phyla.

Source data

Furthermore, metatranscriptomics analyses were conducted to investigate the potential microbial activity associated with the microbial–metabolite networks. The analyses revealed networked Acidobacteria are key degraders of palsa and bog SOM, highly expressing diverse carbohydrate-active enzymes (CAZymes) for plant-derived polysaccharides (for example, those in the glycoside hydrolase and polysaccharide lyase families) and polyphenolic compound degradation (for example, auxiliary activity family with ligninolytic activity61) (Extended Data Fig. 5), consistent with previous meta-omics data at this site3. The latter finding further supports the correlation of the networked bacteria with lignin-like compound nodes, especially in the bog (Extended Data Fig. 4e), highlighting the use of microbial–metabolite networks for surveying potential microbial activity.

Metatranscriptomics revealed genes expressed for fatty acid (that is, butanoate, propionate) and fermentation end products (that is, glycerol, ethanol, lactate) utilization. Transcripts for propionate fermentation were dominated by Acidobacteria in the palsa and bog. For butanoate, Acidobacteria led in the palsa while sharing with Actinobacteria in the bog. Glycerol and ethanol fermentation transcripts were mostly Acidobacteria in the palsa. In the bog, networked Acidobacteria highly expressed lactate and ethanol fermentation genes. Bog networked Acidobacteria, Verrucomicrobia and Halobacteria expressed acetogenesis genes compared with a broader range of genomes expressing them in the fen (Extended Data Fig. 6).

Metatranscriptomics also revealed the involvement of networked genomes in core redox reactions. Bog and fen Halobacteria and Methanobacteriota expressed hydrogenotrophic methanogenesis genes. Bog Methanobacteriota also expressed methylotrophic methanogenesis genes, while in the fen, only Methanobacteriota genomes were responsible for this process. We observed expression of dissimilatory sulfur oxidation and assimilatory and dissimilatory sulfur reduction in the bog and fen networked bacteria, but only assimilatory sulfur reduction in palsa. These expression patterns in C degradation, fermentation, methanogenesis and S cycling linked to network topology shifts emphasize the importance of metatranscriptomics to measure active processes across the different thaw environments. Unlike metagenomics, which does not ensure gene expression, integrating metatranscriptomics into the correlation networks offers a direct measure of active processes.

Discussion

We used an integrative metagenomic and metabolomic approach to study permafrost peatland ecosystems. Contrary to our initial hypothesis of congruence between microbial communities and metabolites, we observed divergent assembly patterns across the thaw gradient, highlighting the need for a nuanced framework capturing the complex microorganism–metabolite interplay. We integrated multi-omics data using a null modelling framework to quantify ecological processes governing microbial and metabolome assembly. This approach uses phylogenetic metrics to identify signatures of processes like selection and dispersal limitation, providing quantitative estimates of different ecological drivers’ influences on assembly patterns13,18,19,62,63.

While null modelling elucidated key assembly processes, correlation-based analyses complemented this by identifying potential microorganism–metabolite interactions across datasets. Notably, organosulfur compounds, potentially originating from Sphagnum mosses, were identified as important factors influencing both metabolome composition and microbial community assembly, thereby impacting GHG concentrations in porewater. This highlights the importance of S and C cycling dynamics in peatlands as S cycling exerts an important control on organic C degradation and GHG emissions47,49,64 and the need for further exploration of organosulfur compounds in Sphagnum-dominated peatlands.

Our integrated multi-omics approach sheds light on metabolic processes driving GHG emissions from thawing permafrost. Future work aims to elucidate causal mechanisms through complementary methods such as mapping multi-omics data into metabolic pathways using chromatographic techniques (for example, liquid chromatography tandem mass spectrometry (LC-MS/MS)) and microbial isolation studies. We acknowledge the limitations of relying solely on direct infusion FTICR mass spectrometry (DI-FTICR-MS) for metabolite identification, including the inability to differentiate isomers, or determine molecular structures, as well as issues with signal suppression or enhancement65. Moving forward, implementing this technique alongside chromatographic techniques could provide a better understanding of the complex processes in natural ecosystems.

Furthermore, incorporating longer timescales and expanded spatial representation will further refine our understanding of the most relevant microorganism–metabolite interactions. By integrating microbial lineages and metabolite formulas into future mechanistic models of GHG emissions such as those described here66, we can gain a focused picture of the key interactions and processes driving emissions across Arctic permafrost ecosystems, particularly given the importance of plant-derived organosulfur compounds and Sphagnum’s role in shaping microbial and metabolic dynamics.

Methods

Site description

The Stordalen Mire, a peat plateau characterized by discontinued permafrost, is located southeast of the Abisko Scientific Research Station in northern Sweden (68.35° N, 19.05° E; 351 m above sea level). Altered climate drivers, specifically permafrost thaw in response to rising temperatures (2.5 °C between 1913 and 2006)67, have caused topographic changes in the mire, which have affected hydrological patterns altogether with moisture levels and nutrient availability68. As a result, three habitats along the thaw gradient have been formed in the mire with different vegetation21, microbial communities3, SOM composition23,24,25 and rates of greenhouse gas production27,39. Palsa sites are elevated dry hummocks with intact permafrost (0.5–2.0 m above their surroundings)69, while bog and fen sites are wet depressions with partially thawed or thinning permafrost, and totally thawed permafrost, respectively68.

Palsa sites are underlain by a thick permafrost layer (10–20 m)70. These sites have no measurable water table, with thin peat and active layer depths (0.4–0.7 m)69. Palsa vegetation is dominated by dwarf shrubs, feather mosses and lichens68,69,71. Bog sites receive rainfall and runoff from nearby palsa sites69. They are ombrotrophic and have thicker peat (0.5 to >1 m) and active layer (>1 m depth) than the palsa. The bog’s vegetation is characterized by Sphagnum species and small sedges (for example, Sphagnum spp. and Eriophorum vaginatum, respectively)72. Fen sites are completely thawed ecosystems. They are minerotrophic and receive surface and groundwater inputs. The water table at the fen sites is at or near the peat surface, providing nutrients to support vegetation such as tall sedges (Carex spp. and Eriophorum. angustifolium) and Sphagnum mosses72.

Sample collection

Peat soil samples were collected between June and August 2012 at different depths (shallow, middle and deep) from the three habitats: palsa, Sphagnum-dominated bog and Eriophorum-dominated fen (Supplementary Data 1). Sets of triplicate cores were collected using an 11-cm-diameter homemade circular push corer. To avoid cross-contamination, 1 cm from each core’s edge was not included in the sampling and the corer was rinsed with distilled water between cores. After collection, the cores were divided directly in the field into sections as follows: surface (1–5 cm), middle (10–14 cm) and deep (20–24 cm) and then placed on dry ice until transport to the laboratory where they were stored at −20 °C until further analysis. Cores for microbial analysis were placed in cryotubes, mixed with a LifeGuard solution (MoBio Laboratories) and stored at −80 °C until processing.

Data collection of peat temperature, depth of each core, percentage of carbon and nitrogen content in the solid phase peat and dissolved CO2 and CH4 concentrations used in this study was previously described73. Geochemistry data generated in this study are available at the EMERGE Database at https://emerge-db.asc.ohio-state.edu/datasources/0001_Coring2012, https://emerge-db.asc.ohio-state.edu/datasources/0006_GeochemPorewater20102012 and https://emerge-db.asc.ohio-state.edu/datasources/0005_GeochemSolid20102012. Details about sampling date, time of sample collection, global positioning system (GPS) ___location, air and soil temperature, active layer depth and notes related to the sampling can be found at the EMERGE Database at https://emerge-db.asc.ohio-state.edu/datasources/12. For further clarity, we included this information in Supplementary Data 1 (sample metadata table). We believe that these data are in concordance with the requirements described in ref. 74. Meteorological data used in this study, including precipitation and temperature measurements, were retrieved from the Abisko Scientific Research Station, 2012, and the Sweden Meteorological and Hydrological Institute for Stordalen Station 188790 (https://www.smhi.se/data/meteorologi/ladda-ner-meteorologiska-observationer#param=airtemperatureInstant,stations=all,stationid=188790).

FTICR-MS sample preparation and data preprocessing

We implemented DI-FTICR-MS to gain a comprehensive overview of the soil metabolome across the thawing permafrost gradient. This technique leverages the high resolving power, ultrahigh mass accuracy and sensitivity of FTICR-MS75, along with rapid data acquisition from direct infusion into the mass spectrometer ion source. As SOM comprises a complex and dynamic mixture of metabolites, DI-FTICR-MS detects and resolves a wide range of individual molecules for characterization of molecular composition and transformation. While chromatography-based methods such as LC-MS/MS enable more accurate structural elucidation, they can be biased towards highly abundant compounds if appropriate care is not taken that increases the analysis time. As the objective here was to capture assembly processes across the soil metabolome, we favoured a high-throughput approach providing wider detection of low-abundance metabolites without focusing extensively on structural annotation beyond biochemical classification.

In this study, we used only water as an extractant to focus on the bioavailable compounds actively cycling with microorganisms in this dynamic thawing environment. Our goal was to mimic the natural thaw conditions as closely as possible. Ultrahigh-resolution characterization of the water-soluble metabolites that were extracted from peat was achieved using a 12 T Bruker FTICR mass spectrometer (Bruker, SolariX) located at the Pacific Northwest National Laboratory using a methodology previously described23. Briefly, 100 mg of frozen peat soil was combined with 1 ml of Milli-Q water and shaken for 2 h. Then, the water and peat soil mix was centrifuged and the resultant supernatant was mixed with high-performance liquid chromatography (HPLC)-grade methanol (1:2 water to methanol ratio). The resulting solution was injected directly into the 12 T Bruker FTICR mass spectrometer in which a Bruker electrospray ionization source was used to generate negatively charged molecular ions. The negative ionization mode was used for all metabolomics analyses in this study, as previous work has shown that organic matter, which makes up a large proportion of the sample matrix76, is predominantly composed of oxygen-containing compounds such as carboxylic acids that ionize best in negative mode. While positive ionization could provide additional coverage of some nitrogen-containing compounds, the focus of this analysis was presence and absence and relative comparisons between samples, rather than capturing all possible metabolites. As all samples were analysed under the same parameters, comparisons should still be valid. However, the use of only the negative ion mode may preclude detection of some metabolites that ionize exclusively in positive mode.

To ensure instrument stability, a Suwannee River fulvic acid standard obtained from the International Humic Substance Society was injected at the beginning of the run. To monitor potential carryover between samples, HPLC-grade methanol blanks were injected throughout the process. In addition, the instrument was flushed between samples with a solution of Milli-Q water and HPLC-grade methanol. Variations in carbon concentration from different samples were controlled by modulating the ion accumulation time that was adjusted for each sample and ranged between 0.1 s and 0.3 s. A total of 144 scans were collected for each sample. Scans were averaged and calibrated using an organic homologous series separated by 14 Da (CH2). The mass accuracy was <1 ppm for single changed ions across a 100–1,200 m/z range, the mass resolution was ~240,000 at 341 m/z and the transient was 0.8 s.

Raw spectra collected from each sample were converted to a list of m/z values using the BrukerDaltonik version 4.2 FT-MS peak picking module using a signal-to-noise ratio of 7 and an absolute intensity threshold of 100 (default). Out of the 85 samples analysed, 17 were technical replicates in which the mass spectrometry data were collected twice through independent sample injections. These replicate samples allowed assessment of technical variability in the metabolomics data pipeline. The remaining 68 samples corresponded to the biological experimental conditions of interest across different sites, depths and time points; these samples constitute 3 replicates per peat core sampled within each habitat and month, unless otherwise specified (Supplementary Data 1). Chemical formula assignment was performed with Formularity77 using parameters described in ref. 78. Compounds with m/z values outside of 200 to 900 m/z and isotopic peaks (13C-peaks) were filtered out for downstream analysis.

Log-transformed FTICR-MS intensities for the extracts of the solid peat used in this study were previously published26 and are available from the EMERGE Database at https://emerge-db.asc.ohio-state.edu/datasources/0141_Wilson-etal-2022-STOTEN_ICR-plants. We eliminated eight samples from this dataset because the number of m/z detected in them was very low compared with that of the rest of the samples, less than 200 m/z (and in some cases close to zero m/z) compared with the more than 2,000 m/z detected in most of the samples. These differences had the potential to introduce bias in the analysis. Furthermore, the number of m/z detected in the other replicates and technical replicates (when available) of these samples was like the rest of the dataset, suggesting that the eliminated samples might have suffered from issues during the m/z collection process. The eliminated samples were Aug_E_3_D_2012, Aug_P_3_D_2012, Aug_S_2_S_2012, Aug_S_3_D_2012, Aug_S_3_M_2012, July_P_3_D_2012, July_S_3_D_2012 and June_E_1_M_2012.

Standard indices inferred from FTICR molecular formulae (that is, Kendrick defect, double-bond equivalence (DBE), aromaticity index (AI), NOSC and the standard Gibb’s free energy (ΔG°C-ox)) were calculated using the ‘ftmsRanalysis’ R package version 1.1.0 (ref. 79).

Potential biochemical transformations occurring between metabolites identified with FTICR-MS were estimated as follows: pairwise differences between chemical masses (m/z values) were calculated and mapped to a database containing 1,255 known chemical transformations, retrieved from the Kyoto Encyclopedia of Genes and Genomes (KEGG) compound database80,81, and previously described in ref. 13. Mass differences within 1 ppm from the known transformations were considered in the analysis. This approach allows identifying potential relationships among metabolites, which are represented as transformation networks, in which chemical masses (m/z values) are represented as nodes and pairwise mass differences as edges.

Metabolite dendrogram construction

In this study, we applied community ecology metrics (for example, beta diversity) to understand metabolite assembly13, using three relational dendrograms built based on (1) metabolite molecular characteristics (molecular characteristic dendrogram (MCD)), (2) metabolite potential biochemical transformations (transformation dendrogram (TD)) and (3) metabolite transformation-weighted characteristics (transformation-weighted characteristic dendrogram (TWCD)). Moreover, metabolite dendrograms were built using binary presence and absence values instead of peak intensities to avoid biases in abundance estimates due to charge competition.

Briefly, the MCD was constructed using an unweighted pair-cluster method using arithmetic averages (UPGMA) hierarchical clustering analysis of the Euclidean distance matrix derived from between-metabolite similarities calculated based on their molecular properties (that is, elemental composition, double-bond equivalence, modified aromaticity index and Kendrick’s mass defect). The TD was constructed based on a transformation distance matrix derived from transformation networks that represent putative biochemical reactions occurring between different metabolites. Finally, the TWCD was created by combining previously mentioned matrices using a UPGMA hierarchical clustering analysis13.

The results shown in this study are derived from the TWCD, which is a combination of the molecular characteristic and transformation-based dendrograms. The full suite of metabolites represented 14,432 peaks. Of those, 6,763 were assigned a molecular formula (MCD), 13,177 peaks were part of potential biochemical transformations (TD) and 6,526 peaks were included in the combined dendrogram.

DNA extraction, metagenome sequencing, assembly and binning

DNA extraction and sequencing for these samples were performed using the PowerMax Total Nucleic Acid extraction kit (MoBio) and sequenced with a combination of HiSeq (2 × 100 bp) and NextSeq (2 × 150 bp) platforms, protocols described in ref. 3. For the analysis in this paper, we leveraged an existing MAG database31 built from MAGs derived from field samples collected from 2010 to 2017 at the Stordalen Mire, MAGs from a previously published study using the 2011–2012 samples3 and MAGs from a stable isotope probing experiment performed on field peat with labelled litter added (‘SIP study’). Full details about the database construction can be found in ref. 32 and are briefly summarized below.

Reads from both the field and SIP studies were cleaned via Trimmomatic82. In both studies, assembly and binning were performed on all samples independent of each other (no co-assembly). Field reads were assembled with SPAdes (v3.12, with -meta option) with default kmer sets83. For the field assemblies, initial bin sets for each sample were generated using the UniteM (v0.0.18) workflow84, with the following options: mb_sensitive, mb_verysensitive, mb_specific, mb_veryspecific, mb2, max40, max107, bs and gm2. Then, two rounds of ensemble binning were performed with DAS Tool (v1.1.1)85, MetaWRAP (v1.0.6)86 and UniteM (v0.0.18)84, with the output from the first round of ensemble binning being used as input for the second round. Following the second round of ensemble binning, completeness and contamination statistics of the resulting ensemble bins were assessed via CheckM (v1.0.12)87 lineage workflow. For each of the three ensemble bin sets, bins with 70% completion and less than 10% contamination were used to calculate a quality score: completeness − (5 × contamination). A candidate bin set for each sample was chosen based on the ensemble binning tool that had yielded the bin set with the highest quality score. The bins in each candidate bin set were then further refined with RefineM (v0.0.24)88 and manually examined in anvi’o (v.5.2)89. The SIP study reads were assembled with SPAdes (-meta option enabled) and MEGAHIT (v.1.1.3, default kmer set)90, and bins were generated with MetaBAT2 (v.2.12.1)91. Additional MAGs from all studies were generated independently via the Department of Energy Joint Genome Institute (DOE JGI) metagenome annotation pipeline92, and downloaded in December 2020.

Together, the database comprises 13,290 MAGs with at least 70% completeness and 10% contamination (determined via CheckM (ref. 87)). The ~13,000 MAGs were reduced to 1,806 genomes when dereplicated at the 95% similarity (species) level via galah (https://github.com/wwood/galah (ref. 93)). The relative abundance of each MAG was assessed by read mapping with CoverM (ref. 94; https://github.com/wwood/CoverM), using the same relative abundance calculation strategy as in ref. 3 (that is, using the CoverM genome option, with the following parameters: –min-read-percent-identity 0.95, –min-read-aligned-percent 0.75, -coupled and -m trimmed_mean (Supplementary Data 1)). For all statistical analyses, the relative abundance of each species in each sample was calculated by dividing its coverage by the total coverage of all species in the dereplicated set. MAGs were annotated using DRAM (1.4.0)95 and can be found at https://doi.org/10.5281/zenodo.7587534.

This study focused on 2012 as it provided matching metagenomic and metabolomic data, for 67 samples, and the desired temporal resolution. To maximize genome-resolved inferences from these data, we queried the 2012 metagenomes with the site-specific MAG database spanning samples from 2011 to 2017 of 13,290 MAGs (1,806 when dereplicated to species level; BioProject PRJNA386568 (ref. 32)). Read mapping with CoverM to the dereplicated dataset resulted in an abundance table of 1,402 species-dereplicated MAGs that were present in 2012 samples and were used for subsequent analyses.

To quantify the portion of the microbial community diversity present in the 2012 samples that was represented by the 1,402 MAGs, we applied the ‘appraise’ tool from SingleM (version 0.15.1)96 setting the sequence identity threshold to 0.86 (genus-level divergence). Our analysis revealed that the 1,402 MAGs collectively accounted for 71.8% of the bacterial community and 62.9% of the archaeal community at the genus level within the 2012 samples.

Metatranscriptomics

RNA extraction and sequencing of 2012 samples were described in ref. 3. Briefly, 240 ng of RNA extracted from peat material was further cleaned using DNAse I (Roche) to remove residual RNA, then library preparation was performed using ScriptSeq Complete (Bacterial) low-input library kits (Epicentre). Agilent 2100 Bioanalyzer and Agilent 2200 Tapestation (Agilent Technologies) were used to test the quality of the RNA and libraries. RNA quantity was measured using Qubit (ThermoFisher Scientific). Samples were sequenced on 1/8th of a NextSeq (Illumina) lane, with initial shallow runs conducted on 1/11th of a HiSeq (Illumina) and MiSeq (Illumina) lanes3.

Forward-stranded metatranscriptome reads were processed using TranscriptM (ref. 97) v0.3.1, including QC by KneadData (https://github.com/biobakery/biobakery/wiki/kneaddata), genomic-DNA decontamination and read counting to produce transcripts per million (TPM) per gene per genome (Supplementary Data 7). DRAM95 was used to perform gene calling and assign KEGG80,81 and CAZyme annotations98 (http://www.cazy.org/). GraftM (v.0.15.0)99 packages were generated for homologous genes forward and reverse dissimilatory sulfite reductase (dsrAB/rdsrAB) grouped under the same KEGG IDs (K11180, K11181), as follows. Seed sequences (with experimentally confirmed functioning where available) were searched against Uniref90 (ref. 100) r2022_01 using MMseqs2 (ref. 101) easy-search. A total of 300 sequences per seed were then combined with matching sequences from our MAGs to create hidden Markov models (HMMs) and phylogenetic trees using GraftM create. The trees were manually annotated in ARB (ref. 102) to label each clade with the function of the seed sequences. These trees were used to classify the matching sequences from our MAGs and encode their annotation as a specific homologue. Then, manually curated KEGG-based definitions (Supplementary Data 8) for carbon degradation, fermentation and fixation and methanogenesis and sulfur redox cycling pathways were used to transform TPM values for genes to a pathway-average count per pathway. For methanogenesis, curation was performed following the methods described in ref. 103. Briefly, TPM was averaged across reactions in a pathway and summed across alternative pathways.

Microbial tree construction

Bacteria and archaea phylogenetic trees were built from MAGs stored at the EMERGE Database31 using the de novo workflow of the GTDB toolkit version 1.5.1 (ref. 104) (reference data version r202). Bacterial and archaeal trees were initially built separately, as each group has different marker genes. The bacteria tree was rooted with Patescibacteria as the outgroup, and the archaeal tree was rooted with Altarchaeota as the outgroup. After the trees were built, they were joined at the archaeal–bacterial split proposed in ref. 105. Tree tips that were not present in any sample (for example, those from GTDB and those from the EMERGE MAG Database deriving from different studies) were dropped and not used for the analysis.

Microbial phylogenetic signal

The microbial community assembly from 2012 peat samples was investigated using ecological null modelling18,19, which assumes phylogenetic signal is correlated with environmental optima. The phylogenetic signal explains the tendency of closely related species to resemble each other ecologically more than they do the other randomly selected species in a tree106. To evaluate the phylogenetic signal, we estimated the environmental optima of each microbial operational taxonomic unit (OTU) for the following variables: peat temperature, average depth of the core, carbon and nitrogen ratio, and precipitation. Briefly, OTU environmental optima were calculated based on the abundance-weighted mean values for each environmental variable (function optima, ‘analogue’ package version 0.17.6)107. The Euclidean distance matrix was calculated from the optima estimates, which represents differences in ecological niches across OTUs. A matrix of between-OTU phylogenetic distance was calculated from the microbial phylogenetic tree (function cophenetic.phylo, ‘picante’ package version 1.8.2)108. Finally, the phylogenetic signal was evaluated by quantifying the relationship between these matrices via a Mantel correlogram as in ref. 19 (Supplementary Fig. 1).

β-diversity analysis and ecological null modelling

To estimate and compare the ecological processes driving microbial (n = 67, with paired metabolomics data) and metabolite (n = 85) assemblages, the βNTI and Raup–Crick Bray–Curtis (RCBC) index were calculated for the microbial phylogenetic tree and metabolite dendrogram following the methodology described in ref. 13. Briefly, the observed microbial and metabolite β-mean nearest taxon distance (βMNTD) was estimated using the comdistnt function from the package ‘picante’ version 1.8.2 (ref. 108) and was compared with the null model (obtained from 1,000 randomizations). Then, the βNTI was calculated by normalizing the observed βMNTD with the null expectation following ref. 18). Using this approach, the influence of ecological processes can be differentiated as follows: for |βNTI| > 2 deterministic processes are assumed to predominantly shape metabolite and microbial assemblages, suggesting that environmental abiotic factors and biotic interactions determine (or impose selection) changes in species diversity and composition (in the case of microbial communities)16,19 while biotic and abiotic transformations control fluctuations of metabolites within metabolite assemblages (for example, differences in production and degradation rates). If |βNTI| < 2, then it is assumed that stochasticity drives changes in species diversity, relative abundance and composition owing to random (unpredicted) disturbances. Similarly, for metabolites, dispersal can be explained by physical forces or vector movements that cause changes in the metabolome composition of the system13. Moreover, when βNTI > 2, variable selection explains how divergent environmental factors cause high compositional turnover between a pair of communities analysed, and when βNTI < −2, homogenous selection describes how steady selective pressures originated from persistent environmental conditions are the main cause of low compositional turnover between a pair of local communities18.

Significant differences between βNTI (microbial and metabolite) within each habitat were determined using a two-sided Mann–Whitney U test using the package ‘rstatix’ version 0.7.2 (ref. 109), and multiple testing was corrected with the Bonferroni method.

Stochastic ecological processes were further investigated using the RCBC turnover index. Briefly, the observed presence-and-absence-based Bray–Curtis values derived from pairwise comparisons were estimated and compared with the null expectation (generated after 1,000 randomizations). Then, deviations of the observed values from the null comparisons were normalized between +1 and −1 (RCBC metric). If RCBC > 0.95 and |βNTI| < 2, then higher-than-expected compositional differences between a pair of communities (or metabolomes) are primarily due to dispersal limitation enabling ecological drift. However, if RCBC < −0.95 and |βNTI| < 2, then lower-than-expected compositional differences between a pair of communities (or metabolomes) are primarily due to homogenizing dispersal. Finally, if |RCBC| < 0.95 and |βNTI| < 2, then the compositional turnover between a pair of local communities (or metabolomes) is not dominantly driven by selection, dispersal or ecological drift and this scenario is referred to as being undominated19.

Correlations with environmental variables

To further understand drivers that control deterministic or stochastic processes influencing metabolite assembly, calculated βNTI values were correlated with environmental data including peat temperature, average depth of the core, carbon and nitrogen ratio, and precipitation. The precipitation value used represents a 3 day accumulation before the sampling day (Supplementary Data 9). Mantel tests (Mantel function, ‘vegan’ package v2.5-7)110 using a Pearson correlation and 9,999 permutations were used to estimate correlations between peat and microbial samples’ βNTI (n = 67, where the sample number was filtered to include only those that have matching microbiome and metabolome data), and the pairwise differences of each environmental variable between samples. Mantel statistics were calculated within each habitat and multiple testing was corrected with the Bonferroni correction method.

Feature-specific βNTI estimation

We further used a recently developed approach15 that performs null modelling at the feature level (βNTIfeature); a feature within this Article is a microbial community member or metabolite (FTICR-MS molecular formula) that forms part of a phylogenetic tree or relational dendrogram, respectively. This approach allows us to investigate how ecological pressures differentially affect specific community members and metabolites. The βNTIfeature for microbial and metabolite data was estimated within the three habitats following ref. 15 (microbial: palsa, n = 21; bog, n = 22; fen, n = 24; metabolome: palsa, n = 29; bog, n = 27; fen, n = 29). Briefly, the MAG-derived phylogenetic tree and the relative abundance OTU matrix were used for the microbial analysis, while the TWCD dendrogram and the FTICR peak intensity matrix (transformed to presence and absence) were used for the metabolites. The βNTIfeature was estimated in a similar way to the community βNTI; the βMNTD index of an individual feature (for example, the OTU or molecular formula) is calculated using the formula below:

$${\upbeta \mathrm{MNTD}}_{\mathrm{feat}}=\,\frac{1}{n}\mathop{\sum }\limits_{j=1}^{n}{f}_{{a}_{i}}\left(({,d}_{{a}_{i}{b}_{i}},)\right)$$

In this formula, \({f}_{{a}_{i}}\) represents the relative abundance of the feature \(a\) in relation to the community \(i\), while \(n\) represents the number of samples and \(\min ({d}_{{a}_{i}{b}_{i}})\) is the average minimum relational distance of fixed feature \(a\) in relation to the fixed community \(i\) to any other feature \(b\) in the other communities \(j\). Conspecifics were not removed from both datasets. The term ‘fixed’ indicates that this calculation is performed in one feature at a time; in other words, one microbial member or FTICR molecular formula is compared with the rest of the microbial members and metabolites at a single time (see further details in ref. 15). Then, in a similar way as the calculation at the community level, βMNTDfeat was estimated using the comdistnt function from the package ‘picante’ version 1.8.2 (ref. 108) and was compared with the null model (obtained from 999 randomizations). Finally, βNTIfeature is calculated by the difference of the observed βMNTDfeat with the average of the null results (\(\scriptstyle{\underline{\upbeta \mathrm{MNTD}}}_\mathrm{feat}^\mathrm{null}\)), divided by the standard deviation of the null values, using the following formula:

$$\upbeta \mathrm{NT{I}_{feat}}=\,\frac{\upbeta \mathrm{MNTD}_\mathrm{feat}^\mathrm{obs}-{\underline{\upbeta \mathrm{MNTD}}}_\mathrm{feat}^\mathrm{null}}{\upbeta \mathrm{MNT{D}_{feat}^{sd}}}$$

To understand how a specific feature, either a specific taxon or metabolite, contributes towards the community variation at a specific scale, we used the following rules: if |βNTIfeature| < 1, the contribution is considered insignificant, and if 1 < |βNTIfeature| < 2, the contribution is considered moderate, whereas if |βNTIfeature| > 2, then the contribution of a specific taxon or metabolite is considered significant. Moreover, if the βNTIfeature was negative, then the feature was assumed to contribute to community convergence (ecological and functional similarities) whereas positive values represented contributions to divergence (ecological and functional differences)15.

Feature-specific βNTI-derived metabolite clusters

A hierarchical cluster analysis of metabolites (palsa, n = 29; bog, n = 27; fen, n = 29) was performed in R using the package ‘cluster’ version 2.1.3, using a modification of a previously described approach111 but using the βNTIfeature value estimated in a previous section instead of metabolite abundances. Only metabolite features that were in 50% of the samples were kept. A distance matrix of the βNTIfeature values within each habitat was calculated using Manhattan distance using the function daisy. The resulting dissimilarity matrix was used for clustering the metabolites using the function pam. The optimal number of clusters (k) was determined by calculating the silhouette value of every metabolite, which represents the ratio of the distances to members of its own cluster to distances to members of the nearest neighbour cluster. For each k, the average silhouette value (ASV), also called silhouette coefficient112, was calculated, to find which k maximizes the ASV.

A consensus βNTIfeature matrix was calculated as the median of the βNTIfeature value of the representative features of each cluster in each sample. Representative features (metabolites) were determined as those whose silhouette value was higher than the ASV of their respective cluster. This approach provided a better representation of the patterns of βNTIfeature values among the clusters111.

Differences between the biochemical indexes AI_mod, DBE_O and NOSC of the representative features were determined using a Wilcoxon test.

Correlations between the consensus βNTIfeature values and environmental factors and individual microbial abundances were calculated using a Spearman’s rank test with the Hmisc package version 5.0.1 (ref. 113). Correlation P values were adjusted using the false discovery rate (FDR) method. Correlation networks were visualized using the igraph package version 1.4.1 (ref. 114,115), ggraph package version 2.1.0 (ref. 116) and tidygraph version 1.2.3 (ref. 117). The abundance of genomes that significantly correlated with the consensus βNTIfeature metabolite clusters were correlated with CO2 and CH4 levels in palsa peat, and bog and fen porewater using Spearman’s rank test; the FDR method was used for multiple testing correction.

Microbial–metabolite co-occurrence networks

Co-occurrence networks of metabolite and microbial abundance data from each habitat (n = 67, analysis within each habitat included palsa, n = 21; bog, n = 22; and fen, n = 24) were constructed using the Molecular Ecological Network Analyses (MENAP) pipeline (http://ieg4.rccc.ou.edu/mena/main.cgi)56,57,58. Only metabolites identified to significantly contribute towards convergence or divergence were included. Metabolite and microbial abundance were transformed separately using the centred log ratio transformation, then combined into a single matrix and uploaded to the MENAP web server. Data were filtered to include only features that were present in at least 50% of the samples for the accuracy and reliability of our correlation calculations57,118. The similarity matrix was built using Spearman’s correlation, and RMT (ref. 57) was applied to objectively determine the association threshold, thus preventing the use of arbitrary cut-offs that can introduce uncertainties during the process of building the networks. Further advantages of using this approach can be reviewed in refs. 56,57,58,118. Networks were divided into modules using the greedy modularity optimization algorithm119. Nodes were assigned to one of four nodal topological roles based on their within-module connectivity (Zi) and intermodule connectivity (Pi)56. Finally, networks were visualized using igraph R package version 1.4.1 (ref. 114) using a Fruchterman–Reingold layout algorithm, and metabolite nodes were coloured based on the elemental composition. Topological roles of nodes were classified into peripherals, module hubs and connectors based on the within-module connectivity (Zi) and participation coefficient (Pi)56. Also, the MENAP pipeline calculates different network topological parameters including network size (n), number of links (L), power-law fitting of node degrees, average connectivity or degree, average cluster coefficient, average path distance (GD), geodesic efficiency (E), harmonic geodesic distance (HD), maximal degree, centralization of degree (CD), maximal betweenness, centralization of betweenness (CB), maximal stress centrality, centralization of stress centrality (CS), maximal eigenvector centrality, centralization of eigenvector centrality (CE), density (D), reciprocity, transitivity (Trans), connectedness (Con), efficiency, hierarchy, lubness, number module and modularity. Finally, to test the significance of the networks, a total of 100 random networks were constructed by rewiring the links (edges) among nodes while constraining n and L. The network properties of the randomized networks are calculated altogether with means and standard deviation, which are compared with the original or empirical network. This process was performed in the MENAP pipeline.

Functional annotation of networked microbial communities

The genomes of networked microbial communities from each habitat were annotated using the Metabolic and Biogeochemistry Analyses in Microbes (METABOLIC) pipeline in community mode (METABOLIC-C.pl)60. The METABOLIC pipeline was run with default parameters, using a total of 30 MAGs as an input for the palsa, 66 MAGs for the bog and 84 MAGs for the fen. Briefly, this pipeline uses Prodigal from gene calling and annotates them using three sets of HMM-based databases: KOfam (ref. 120), TIGRfam (ref. 121) and Pfam (ref. 122) as well as custom metabolic HMM profiles. In addition, CAZymes are annotated using dbCAN2 (ref. 123).

Alluvial plots generated by the METABOLIC pipeline were used to represent the contribution of different metagenomes (MAGs) to individual metabolic and biogeochemical processes within the carbon, nitrogen, sulfur and other cycles.

For MAGs that showed the strongest correlations with specific metabolites, we further analysed their corresponding metatranscriptomic data. This analysis validated the expression of genes involved in the biosynthesis pathways of these metabolites, reinforcing the functional connection between the identified microbial populations and the observed metabolite profiles.

Statistics

All statistical analyses and visualization were performed using the R statistical language versions 4.1.0 and 4.2.1 (ref. 124) and with the packages ggplot2 (v3.4.2)125, ftmsRanalysis (v1.1.0)79, picante (v1.8.2)108, rstatix (v0.7.2)109, vegan (v2.5-7)110, cluster (v 2.1.3)126, igraph (v1.4.1)114, ggraph (v 2.1.0)116, tidygraph (v1.3.1)117, Hmisc (v5.0.1)113, patchwork (v.1.1.2)127 and ggpubr (v.6.0)128.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.