Introduction

Soil-dissolved organic matter (DOM) represents an important portion of terrestrial organic matter, playing a role in mediating energy and nutrient transfer between land, water, and the atmosphere1,2. In terrestrial ecosystems, DOM is subject to mobilization, deposition, and microbial degradation and can be converted into CO2, microbial products, and partially degraded molecules3,4,5. The transformation and mobilization of terrestrial DOM play different roles in regulating nutrient availability and microbial activity, balancing ecosystem carbon, and altering the ambient environment6. Intrinsically, these ecosystem functions would vary greatly with ecosystem types and influence wider processes, because of the diverse and complex molecular properties of DOM.

The input of terrestrial DOM into fluvial systems alters water quality and aquatic ecosystems markedly, ultimately impacting drinking water production and human health. The intruded DOM directly increases the organic carbon content and changes the DOM quality in water bodies7,8. Moreover, such DOM in catchments supplying source water to drinking water treatment plants poses further challenges. First, it affects the treatment efficiency of coagulation, adsorption, oxidation, and membrane filtration processes that are used in water treatment. Second, it would lead to health concerns. The prevalence of some endemic diseases of unknown etiology (e.g., kidney disease) was related to the special characteristics of natural organic matter in local environments9,10. Because these compounds may be either potentially toxic or interact with harmful chemicals and pollutants11,12, which can enter and remain in human tissues13. Additionally, during drinking water disinfection (e.g., chlorination), byproducts (e.g., trihalomethanes and haloacetic acids) are formed by the unavoidable reactions with DOM, which may have toxicity and carcinogenicity14,15. Since humans can suffer from long-term exposure to DOM in drinking water16, there may be a link between source DOM and human health, yet this is ill-defined and remains poorly investigated.

Terrestrial DOM is, in turn, susceptible to environmental conditions. The input of plant materials determines the abundance and quality of organic matter and also alters microbial environments in soils17,18. Edaphic properties, such as pH19, water content20, metal/metal oxides concentration21, and clay fraction22, can alter the soil microenvironment and molecular mobility and bioavailability. Moreover, climate (e.g., temperature and rainfall) affects the transport and chemical reactions of molecules and regulates the interactions among soil, plants, and microbes4,18. In view of this, the fate of DOM and the subsequent environmental impacts, vary as a function of the combined effects of ecosystem properties. To date, however, the underlying control-response mechanisms related to the ecosystems and soil DOM, as well as their environmental impacts, are poorly understood, especially under the ongoing climate change scenario.

Grasslands comprise ~40% of the world’s ice-free land surface and store over one-third of terrestrial carbon globally23. They represent vast soil organic carbon stocks with dissolved organic carbon (DOC) concentrations comparable to the forests and croplands24. In China, grassland landscape also accounts for approximately 40% of the total land area, which varies in climate, plant biome, soil type and topography. Consequently, the DOM in widespread grassland areas should have profound impacts on terrestrial and aquatic environments. Moreover, due to the connections between soil and drinking water sources, the DOM may be potentially associated with human activities and health10,25,26. A few studies have previously considered the quantity or quality of grassland soil DOM in China, but mainly at a regional scale27,28. Systematic studies of the DOM geochemistry and the controlling factors at large spatial scales are scarce. Crucially, there are gaps in our understanding of ecosystem roles and environmental impacts of the ubiquitous terrestrial DOM, especially its potential health effects.

In this study, we ask two questions: (i) What are the compositional features and potential environmental functions of soil DOM in large-scale grassland ecosystems? (ii) How do environmental factors, particularly climate variables, drive the spatial patterns of DOM at the continental scale? To address these questions, we conducted a nationwide survey which characterized DOM in 89 grassland soil samples across 30 provinces of China in the dry season (Supplementary Figs. S1, S2). DOM is analyzed for fluorescence components, molecular weight (MW) distribution, and molecular composition. Given the concerns regarding drinking water safety, the disinfection byproduct formation potential (DBP-FP) and chlorine reactivity of DOM are examined. We then assess the spatial patterns of DOM quantity and quality, and their relationships to ecosystem carbon exchange, under the control of environmental factors. Furthermore, the DOM is associated with local cancer incidence and mortality rates to explore the potential connection between terrestrial-derived DOM and human health.

Results and discussion

Considerable storage of DOM in grassland soils

Four fluorescent components (C1–C4) were identified from the soil DOM in grasslands (Fig. 1a, Supplementary Fig. S3, and Table S1). Details on the fluorescent components are described in Supplementary Text S1. The humic-like (C1, C2, and C3) and medium-to-high molecular weight (1.2–25 kDa) components comprised the major grassland soil DOM (Fig. 1b, c). The mean concentrations of dissolved organic carbon (DOC) and nitrogen (DON), trihalomethane precursors (THM-FP) and haloacetic acid precursors (HAA-FP) in grassland soils were 199.9 [15.3–804.5], 14.3 [0.4–68.4], 33.7 [2.6–123.8] and 4.3 [0.3–16.9] mg·kg−1, respectively (Fig. 1d and Supplementary Fig. S4). The DOC and DON concentrations are in good agreement with the previous regional records in China’s grasslands27,29. Moreover, the DOM quantity was largely related to the abundance of high molecular weight (HMW; 3.4–25 kDa) and humic-like (C1–C3) compounds (P < 0.05) (Fig. 1f), and nearly 40% of the DOC and DON variations can be explained by the fluorescent and size fractions (Fig. 1g).

Fig. 1: The storage of different dissolved organic matter (DOM) species in grassland soils.
figure 1

a Spectral properties of four fluorophores (C1, C2, C3, and C4) identified by parallel factor analysis. b Intensity of the four fluorescent components in Raman unit (R.U.). c Cumulative intensity (absorbance unit) of the four fractions with different molecular weights. VHMW, 25–100 kDa; HMW, 3.4–25 kDa; MMW, 1.2–3.4 kDa; LMW, <1.2 kDa. d Concentrations (mg·kg−1-soil) of dissolved organic carbon (DOC), dissolved organic nitrogen (DON), trihalomethane precursors (THM-FP), and haloacetic acid precursors (HAA-FP). e UV absorbance at 254 nm (A254), biological index (BIX), fluorescence index (FI), and humification index (HIX) of DOM. f Spearman correlation between DOC, DON, disinfection byproduct precursors, and A254 vs different DOM compounds. g Variation partitioning analysis for DOC and DON. Numbers in each circle are the unique effects of size fraction and fluorescent component, respectively, and the number in the overlapping area of the two circles is the shared effects of the two factors. *P < 0.05, **P < 0.01, and ***P < 0.001.

The grassland soils display higher DOC abundance than many forest and cropland soils24, and have a similar molecular diversity of DOM as compared to the other ecosystems (Supplementary Fig. S5)30,31. The fluorescent compounds in grassland soils were identified in diverse soil and aquatic ecosystems32,33,34. Furthermore, lignin-, tannin-, and condensed aromatic-like compounds, mainly with intermediate O/C and H/C were universal in different grasslands (Supplementary Fig. S6). These compounds are prevalent in various ecosystems and are relatively persistent in the environments30,31,35,36. Consequently, grassland soils are enormous DOM reservoirs, which have high potential influences on wider ecosystems and processes. However, the impacts of soil DOM may have large spatial differences because of the wide range of DOM characteristics and chemistry of water extracts across grasslands (Fig. 1b–e and Supplementary Fig. S7).

Compositional signature of grassland soil DOM

The humic-like (C1 + C2 + C3) (accounted for over 70% of the fluorescent components on average) and high molecular weight (HMW, 3.4–25 kDa) compounds from plant sources dominate the DOM composition in grassland soils, compared with the microbial-derived proteinaceous compounds (C4) and low (LMW, <1.2 kDa) and very high molecular weight (VHMW, >25 kDa) compounds (Fig. 1b, c and Supplementary Fig. S8). Accordingly, CHO and CHNO compounds, such as lignin-, tannin-, and condensed aromatic-like compounds, also comprised the majority of the molecular composition (Supplementary Fig. S5). Compared with the DOM in some other soil ecosystems, such as forest, permafrost and arid/semi-arid cropland37, DOM in grassland soils displayed high average HIX and FI (>1.5) values concurrently (Fig. 1e), which was consistent with the observations of surface grassland soil DOM at regional scale38,39. This might be because both microbial activity and plant input are high in grassland soils, while compounds from the microbial source contribute more to the organic matter composition40.

The composition of grassland soil DOM was between two different patterns. The one is featured by higher humic-like and HMW (3.4–25 kDa) fractions, and the other is characterized by the greater protein-like and LMW (<1.2 kDa) fractions (Fig. 2a–d and Supplementary Fig. S9). The relative abundance (%) of humic-like components (C1, C2, and C3) was positively correlated with the percentage of HMW compounds and HIX (P < 0.05) (Supplementary Fig. S10). Moreover, the humic-like and HMW fractions were mainly positively associated with lignin-, tannin- and condensed aromatic-like compounds with H/C < 1 and O/C > 0.5 (Fig. 2e, f and Supplementary Fig. S11), which was largely produced from the breakdown of plant litter in surface soils41,42. In contrast, the protein-like (C4) fraction was positively correlated with the relative abundance of VHMW and LMW compounds and BIX (P < 0.05) (Supplementary Fig. S10). The protein-like and LMW fractions were related to compounds with intermediate O/C and medium-to-high H/C, especially the small amino sugar- and lipid-like compounds, which were less aromatic and derived from biological processes (Supplementary Fig. S11)35.

Fig. 2: Two distinct signatures of the composition of dissolved organic matter (DOM) in grassland soils.
figure 2

a, b Fluorescence spectra of the DOM with a high humic-like and b high protein-like fractions. Region I, tyrosine protein-like; Region II, tryptophan protein-like; Region III, fulvic acid-like; Region IV, soluble microbial byproduct-like; and Region V, humic acid-like. c, d Apparent molecular weight distribution of the DOM with c high HMW and d high LMW fractions. e, f Significant Spearman’s rank correlations (P < 0.05) between the relative abundance of individual molecules vs e humic-like (C1 + C2 + C3) and f 3.4–25 kDa fractions. The green and purple colors of the circles indicate negative and positive correlation coefficients, respectively. The dashed lines in van Krevelen diagrams separate the molecules into different compound categories. HMW, 3.4–25 kDa; LMW, <1.2 kDa.

Spatial features of DOM geochemistry

The northern and southern regions of China varied in DOM geochemistry. The northern regions of China displayed higher 3.4–25 kDa and humic-like (C1 + C2 + C3) fractions, greater humification degree (HIX), and higher concentrations of DOC, DON, and DBP precursors (DBP-FP) (Fig. 3a, c, and Supplementary Figs. S12S14). These observations indicate a greater DOM abundance in these regions, particularly higher proportions of allochthonous humified compounds with high aromaticity and MW, including lignin-, tannin-, and condensed aromatic-like molecules with H/C < 1 and O/C > 0.5 (Supplementary Fig. S15). By contrast, higher levels of LMW, MMW, C4, biological index (BIX) and fluorescence index (FI) were apparent in southern China (Fig. 3b, d and Supplementary Figs. S12, S13), which had less DOM but was more enriched in autochthonous <3.4 kDa and proteinaceous compounds, especially the less oxidized protein/amino sugar- and lipid-like compounds with H/C > 1.5 (Supplementary Fig. S15).

Fig. 3: Spatial features of soil-dissolved organic matter (DOM) in China’s grasslands.
figure 3

af Spatial distribution of the relative abundance of a 3.4–25 kDa, b <1.2 kDa, c humic-like (C1 + C2 + C3), and d protein-like (C4) compounds, and the chlorine reactivity of e trihalomethane (STHM) and f haloacetic acid (SHAA) precursors. Values in maps are the average of sites in each province. The black lines indicate the boundaries of the nine river basins of China. The gray area denotes no data observed. g, h Explanation of DOM geochemistry by geographical ___location (South-North and West-East orientation) shown by g regression model variance and h standardized regression coefficients. i Spatial variations in DOM geochemistry shown by principal component analysis with permutational multivariate analysis of variance (PERMANOVA). The seven regions were displayed by seven colors, including the east coast (EC), middle of Yellow River (MYR), northeast (NE), northwest (NW), south coast (SC), South-North Water Transfer (STN), southwest (SW) regions. The polygons indicate the groups of different regions. j The significance (P value) of geochemical differences among the seven regions in PERMANOVA. THM-FP and HAA-FP are trihalomethane (including TCM, BDCM, and DBCM) and haloacetic acid (including DCAA, BDCA, CBDAA, DBAA, and TBAA) precursors, respectively, and details are listed in Supplementary Table S3. DOC, dissolved organic carbon; DON, dissolved organic nitrogen; VHMW, 25–100 kDa; HMW, 3.4–25 kDa; MMW, 1.2–3.4 kDa; LMW, <1.2 kDa; A254, UV absorbance at 254 nm; BIX biological index, FI fluorescence index, HIX humification index. *P < 0.05; **P < 0.01; ***P < 0.001.

The distinct patterns of the fluorescent and molecular weight composition of soil DOM for northern and southern China imply divergent effects on local or downstream ecosystems and processes. For one thing, the DOM in different regions may play varied roles in ecosystem exchange. For another, it suggests distinct influences on aquatic environments after being released into waters during rainfall and erosion events7. The DOM compounds in northern China are more recalcitrant14,43, and may impose longer negative effects on local and downstream water chemistry and drinking water production due to the longer persistence in the environment. In contrast, the more labile DOM fractions in southern China are more readily mineralized and would increase carbon release. They have less persistent impacts on aquatic systems and water sanitation but can increase labile nutrients in waters and improve CO2 outgassing from the waters44. In addition, DOM in the northwestern and Qinghai-Tibetan regions, was more reactive in terms of trihalomethanes (THMs) and haloacetic acids (HAAs) formation, as displayed by the notably higher chlorine reactivity (SDBP) (Fig. 3e, f). Hence, these regions, with water towers for Asia45, can be sources of highly reactive DBP precursors to downstream waters.

Multiple linear regression further demonstrated significant variations of DOM geochemistry with its geographical ___location (Fig. 3g and Supplementary Fig. S16). Fluorophores, size fractions, HIX, BIX, and DOM quantity, displayed a divide in the South-North orientation (P < 0.05), while the dissolved nitrogen and SDBP varied in the West-East orientation (P < 0.05) (Fig. 3h and Supplementary Fig. S16). The Yangtze River basin serves as a watershed where the characteristics of DOM varied remarkably in both South-North and West-East orientation. Interestingly, DOM in provinces along the Middle Route of the South-North Water Transfer Project (Supplementary Fig. S17) varied gradually and displayed greater <1.2 kDa and protein-like fractions with lower aromaticity (HIX) than the surrounding provinces (Fig. 1b, d and Supplementary Fig. S14).

The DOM quantity, DBP precursors, HMW, and humic-like fractions, showed negative PC1 loadings in the principal component analyses, in contrast with the C4, LMW, MMW, VHMW, FI, and BIX (Fig. 3i). The SC and STN regions exhibited positive PC1 scores, indicating more microbial features of DOM in these regions. In contrast, the MYR and NE regions were mainly featured by the higher abundance of plant-derived DOM. Indeed, grassland soil DOM in the southern regions (STN and SC) differed significantly from that in the northern regions (NW, MYR, and NE) (Fig. 3j).

Roles in ecosystem carbon exchange

After incubation for 31 days, soil DOM with higher humic-like and HMW fractions (mainly from the northern regions) displayed 21–35% lower biodegradability (P < 0.01), compared to that with higher protein-like and LMW fractions (mainly from the southern regions) (Fig. 4a). Furthermore, the higher humic-like and HMW fractions led to over 20% higher stable pools for the DOM (P < 0.05) (Supplementary Fig. S18). In contrast to the protein-like fractions and compounds with O/C < 0.5 and H/C > 1, the humic-like and HMW fractions, and compounds with higher O/C and lower H/C, were negatively correlated with the biodegradation of DOM but positively correlated with the stable fractions of DOM (P < 0.05) (Supplementary Figs. S19, S20). As a result, the biological decomposition of soil DOM was significantly faster because of the greater protein-like and LMW composition (P < 0.05) (Fig. 4b).

Fig. 4: Association between dissolved organic matter (DOM) composition vs ecosystem exchange and human health.
figure 4

a Biodegradation level (%) at day 31 and b average biodegradation rate (d−1) of DOM during incubation. DOC dissolved organic carbon, CDOM chromophoric DOM, FDOM fluorophoric DOM. The value of the biodegradation rate constant is shown by mean ± standard error in the bar plot. c Comparisons between Rs and NEP vs the percent relative abundance (%) of humic-like and 3.4–25 kDa components. Rs soil respiration, NEP net ecosystem productivity. d, e Spatial distribution and north-south difference of d nasopharynx and e pancreas cancer incidence (100,000−1). The north represents the sites with latitudes above 33°, and the south represents those with latitudes lower than 33°. The Wilcoxon test was applied to examine the difference between the north and south. f, g Relationships between nasopharynx cancer incidence and f <1.2 kDa and g protein-like fractions. h, i Relationships between pancreatic cancer incidence and h 3.4–25 kDa and i humic-like fractions. ASR_I, age-standardized rate of cancer incidence. The fitting line is based on regression analysis (including linear, logarithmic, and exponential regression), with a line indicating mean model fit ±95% confidence interval. *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001.

On this basis, the association between the grassland soil DOM observed in the dry season and mean annual ecosystem carbon were examined to reveal the roles of the DOM in ecosystem exchange at the continental scale. DOM quantity is positively associated with soil organic matter (P < 0.01) (Supplementary Fig. S21). In this respect, although DOM is a small portion of organic matter in soil24, it is still associated with soil carbon storage. The 3.4–25 kDa and humic-like fractions were negatively associated with soil respiration (Rs) and net ecosystem productivity (NEP) (P < 0.05) (Fig. 4c and Supplementary Fig. S22). In contrast, the relative abundance of <1.2 kDa and protein-like compounds were positively correlated with these ecosystem carbon (P < 0.01) (Supplementary Fig. S22). Consequently, given the spatial differences in the molecular composition (Fig. 3), the grassland soil DOM present in the dry season plays distinct roles in annual ecosystem carbon exchange in the northern and southern regions of China. The LMW and proteinaceous portions are associated with less aromatic and persistent molecules that are more readily available to microbes (Supplementary Figs. S11, S20). Consequently, preferential mineralization of these labile components during microbial metabolism can accelerate carbon emission (Fig. 4b), and balance the higher ecosystem productivity in the southern and eastern regions of China46. However, the HMW and humic-like components originating from plant sources are the products of atmospheric carbon with a more recalcitrant nature to biodegradation (Supplementary Figs. S11, S20). These compounds can suppress carbon loss (Fig. 4a), and thus conserve ecosystem productivity in northern and western China46.

Potential linkage with human health

The age-standardized rate of cancer incidence (ASR_I) and mortality (ASR_M) of the nasopharynx, pancreas, and kidney and urinary organs cancers varied with the latitude, and differed significantly from the southern to the northern regions of China (P < 0.05) (Fig. 4d, e and Supplementary Figs. S23, S24), in line with the spatial distribution of grassland soil DOM composition in the dry season (Fig. 3). Regression analysis further demonstrated that higher LMW and protein-like fractions were correlated with higher nasopharynx cancer incidence and mortality (P < 0.01), while HMW and humic-like components were positively related to the pancreas as well as kidney and urinary organs cancer cases (P < 0.05) (Fig. 4h, i and Supplementary Figs. S23, S24).

The associations could be partly ascribed to two reasons: (i) Some organic compounds, such as hydrophilic proteinaceous and large-sized humic compounds, might be harmful to human cells, as reported in many bioassays11,25,26,47; (ii) DOM interacted with other toxic chemicals or pollutants, such as arsenic and fluoride, may cause adverse health outcomes12,48,49. Previous studies also related the unique DOM compounds (e.g., humic-like) in natural environments to endemic diseases (e.g., chronic kidney disease)9,10. Moreover, some particular organic components (e.g., tryptophan and tyrosine protein-like) were found present in pathological tissues13,50,51,52,53. Therefore, since the causative factors of endemic nasopharynx, pancreas, and kidney and urinary organ cancers are largely unknown, the significant correlations suggested that the soil DOM could play an indicative role. Moreover, the endemic cancers of unknown etiology were related to the DOM in source waters9,12. Similarly, we found a significant correlation between grassland soil DOM and the organic carbon concentration of local tap water (P < 0.05) (Supplementary Fig. S25). Thus, exposure to organic carbon (0–7 mg-C·L−1) in drinking water may be partially responsible for the potential associations between soil DOM and cancer cases in China16.

Given the connections between soil, source water, and drinking water, the associations between soil DBP precursors and cancer cases were examined. Trihalomethane precursors (THM-FP) were positively correlated with liver cancer cases, while haloacetic acid precursors (HAA-FP) were positively correlated with additional kinds of cancers concerning pancreas, kidney and urinary organs and bladder (P < 0.05) (Supplementary Figs. S26, S27). Exposure to toxic DBPs (including THMs and HAAs) in drinking water was indeed associated with increased risks of liver, pancreatic, kidney, and bladder cancers previously16,54,55,56. Thus, the formation of harmful byproducts in drinking water may be another potential pathway linking the DOM to local human health.

Lastly, we must emphasize that our findings, based on statistical significance, only highlight the potential association between soil DOM and human health rather than proposing that DOM is the pathogenic factors. In this context, the grassland DOM geochemistry at the continental scale in the dry season could serve as a geographical indicator of endemic cancers. Moreover, the results provide new insights into the screening and indication of environmental risk factors for endemic diseases of unknown etiology, and further studies are required to reveal the underlying mechanisms.

Environmental controls over the DOM

Random forest (RF) analysis was first conducted to reveal the importance of environmental factors to predict DOM features. Edaphic properties, including soil organic carbon (SOC) and total nitrogen (TN), were key predictors for DOC, DON, THM-FP, and HAA-FP (P < 0.05) (Fig. 5a, and Supplementary Fig. S28), while DOM composition (LMW, HMW, C2, and C4) was primarily explained by climate parameters (Fig. 5b, c). The climate, vegetation, and soil factors largely drive the variations in soil DOM across China (Fig. 5d, e). Among them, both climate (temperature, MAT; precipitation, MAP; and solar radiation, Srad) and soil (soil moisture, pH, clay fraction, and Ca content) parameters had negative effects on DOM quantity (DOM) and quality (fluorescence composition, FC; and molecular weight fractions, MW) (Fig. 5d, e and Supplementary Fig. S29). Compared to the southern China region, the northern China region overall had relatively lower MAT, MAP, Srad, net primary production, and soil moisture and clay fraction but higher soil mineral ions57,58,59,60, which contribute to the accumulation of HMW and humic-like fractions (Supplementary Fig. S29). Consequently, the differences in these environmental factors were responsible for the greater humic-like and HMW composition for the northern regions and the greater protein-like and LMW fractions for the southern regions. Also, the distinct DOM properties along the South-North Water Transfer Project may partly result from the altered climate (evaporation and precipitation) at the regional scale61.

Fig. 5: Environmental drivers for the geochemistry and associated impacts of soil-dissolved organic matter (DOM).
figure 5

ac The importance of environmental predictors for a DOM quantity, b size fraction, and c fluorescence composition based on random forest regression analysis. An increase in the mean square error (MSE) denotes the higher importance of environmental variables. All models were significant at P < 0.01. DOC dissolved organic carbon, DON dissolved organic nitrogen, HMW 3.4–25 kDa, LMW <1.2 kDa, MAT mean annual temperature, MAP mean annual precipitation, Srad surface solar radiation, PET potential evapotranspiration, LAI leaf area index, NPP net primary production, GPP gross primary production, SM soil moisture, Clay soil clay fraction, pH soil pH, CEC cation exchange capacity of soil, Na, Mg, and Ca soil exchangeable sodium, potassium, and calcium, respectively, SOC soil organic carbon, TN soil total nitrogen, TP soil total phosphorus. d, e Partial least-squares path model (PLS-PM) showing the relationships between DOM vs d ecosystem carbon exchange and e local cancer incidence and mortality, under the influence of environmental variables. Rs soil respiration, NEP net ecosystem productivity. A detailed description of the latent variables in the model is available in Supplementary Table S2. The purple and red lines in PLS-PM denote significant positive and negative relationships between variables, respectively. PLS-PM are built after 999 bootstraps and only significant (P < 0.05) paths are shown. Numbers in the arrows are standard path coefficients, and the determination coefficients (R2) are the variance explained by the model. GoF goodness of fit. *P < 0.05; **P < 0.01; ***P < 0.001.

The DOM is regulated by environmental factors, and its variation, in turn, controls the roles in ecosystem exchange and is linked with local human health. On the one hand, the grassland soil DOM in the dry season and environmental variables explained 15 and 51% of the variance (R2) of annual soil respiration (Rs) and net ecosystem productivity (NEP), respectively (Fig. 5d). Under the control of environmental drivers, Fluorescence and MW fraction mainly had direct negative effects on Rs and NEP, respectively (Fig. 5d and Supplementary Fig. S30). Increasing temperature and extreme precipitation are projected because of climate change, which are associated with a decrease in soil DOM (Fig. 5d). However, the composition of DOM (Fluorescence and MW fraction) can be modulated to decrease soil respiration, increase DOM storage and balance net ecosystem productivity. Long-term grassland management (e.g., grassland conservation and restoration) helps to increase plant biomass and soil clay minerals, which can enhance the ratio of plant-derived components to microbial-derived components and thus restrain soil respiration and increase soil carbon sequestration62,63,64. On the other hand, the combined variables explained 55 and 90% of the variance of cancer incidence and mortality, respectively (Fig. 5e). Statistically, the DOM quantity (DOM) was significantly associated with cancer mortality, and the fluorescent composition was significantly related to both cancer incidence and mortality (Fig. 5e and Supplementary Fig. S30). This further strengthened the results regarding the potential linkages between the grassland soil DOM stored in the dry season and cancer incidence and mortality. Therefore, our results have important implications for indicating the spatial distribution of endemic cancers, even if DOM is variable under the influences of environmental factors.

Collectively, based on continental-scale sampling, we depict the spatial patterns, drivers, and environment functions of DOM in grassland soils. The geochemistry of the DOM varied across geoclimatic zones and displayed a pronounced South-North divide. Under variable environmental scenarios, LMW proteinaceous fractions and HMW humic-like fractions of soil DOM in grasslands play distinct roles in local ecosystem carbon exchange and imply different human health conditions in China. Our results provide new insights into the ecosystem processes of grassland soil DOM, which are fundamental to understand the environment functions of terrestrial organic matter in wider processes (Fig. 6). It should be emphasized that the samples in this study were collected in the dry season, and soil DOM may vary seasonally. Our results primarily represent the link between grassland soil DOM in the dry season and the annual ecosystem exchanges and cancer cases, which should be taken into consideration when interpreting the results. Furthermore, microbes play a role in regulating the composition of DOM. Therefore, to fully establish the connections between soil DOM throughout the year and the environmental drivers and functions, observations in other seasons and biotic factors should be considered in further studies.

Fig. 6: Conceptual framework for the key compounds and processes concerning dissolved organic matter (DOM) from surface soil in grassland ecosystems.
figure 6

DOM (mainly in the form of dissolved organic carbon flow) in surface soil (0–20 cm) of grassland was mainly derived from plant sources and undergoes transport and transformation77. First, as a portion of soil organic matter, DOM compounds would be processed and mineralized by microbes. Plant-derived humic-like compounds with large molecular mass are more persistent, they are degraded much slower and contribute more to carbon sink78. Proteinaceous compounds with low or very high molecular weight, mainly derived from microbial activities (e.g., processing of humic-like compounds), are readily accessible to microbes and can lead to greater carbon emissions. Second, in addition to migration downwards, DOM can be delivered into aquatic ecosystems via hydrological processes (e.g., run-off or erosion events)77. The intruded DOM can alter water chemistry and increase CO2 outgassing7. Moreover, soil DOM in source water supplying drinking water can be ingested by humans directly or in the form of harmful byproducts (e.g., disinfection byproducts), and thus is potentially linked to human health.

From a sustainability viewpoint, enriching organic nutrients to foster plant growth is a natural and anticipated aspect of the grassland ecosystems. Although some soil DOM compounds may be related to undesirable environmental impacts, it should be clarified that the complete geochemical cycling of DOM in soils can benefit soil health and environmental stability. On this basis, the environmental functions based on soil DOM geochemistry should be utilized for ecosystem and health benefits: (i) To increase the carbon sequestration potential of grassland soils, reducing overgrazing, restoring vegetation, and decreasing land use intensity of natural grassland are required64. These management practices contribute to increasing the input of plant-derived compounds to soils, stabilizing the humic-like components in soils, and mitigating DOM loss caused by biodegradation and erosion63, particularly the increasing loss of organic carbon from grassland soil is projected in the context of climate change64,65. (ii) Given the connections between soil, source water, drinking water, and human health (Fig. 6), the potential adverse effects related to soil DOM in grasslands and broader ecosystems and processes should be minimized via ecological management or engineering measures. For example, in agricultural production, stabilizing the organic fertilizers in soils is necessary to suppress the loss of mobile compounds and thus reduce their migration into aquatic environments. Further, targeted treatment processes should be applied to remove the associated harmful compounds and prevent the formation of undesirable derivatives (e.g., disinfection byproducts) during drinking water production16. (iii) In addition, the linkages between DOM and cancer cases could be a crucial roadmap for further studies to fill the knowledge gaps in the identification, prediction, and indication of these endemic cancers of unknown etiology.

Methods

Study sites and soil sampling

Soil samples were collected in May 2021 from 89 geographically dispersed sites of natural grasslands in 30 provinces of China (excluding Hainan, Hongkong, Macao, and Taiwan) (Supplementary Fig. S1). Since we aim to investigate the geochemical features and environmental functions of dissolved organic compounds present in grassland soils, the samples were collected prior to the wet season to avoid the substantial loss of mobile compounds from soils during rainfall and erosion events. The sample areas covered the major Chinese climate, soil and landscape conditions with diverse grassland communities to provide a continental-scale inference for our results. Sample sites were determined based on the spatial distribution of China’s grassland. Soil samples were collected from the grasslands undisturbed or less disturbed by direct human activities based on historical data, field surveys, and inquiries with local residents. In each sample site, five soil cores were collected and mixed homogeneously as a composite soil sample. For the regions with a large grassland area, soil cores were collected at a 15 m interval along the diagonal of a 50 m × 50 m square plot in the grassland. For those regions with a small grassland area, soil cores were collected from five dispersed grasslands in each sample site. DOM were extracted and analyzed based on the composite soil sample that represents the value for each sample site. We focused on the topsoil DOM in this study, as organic carbon in the surface soil accounts for more than 40% of that in the 0-1 m soil depth in grasslands and is more susceptible to mobilization and transformation66. Thus, soil samples at 0–20 cm depth were collected, followed by the removal of soil fauna, stones, and plant litter. All soil samples were air-dried, ground, and passed through a 2 mm sieve.

Regions of China and position of sites

The study area was divided into seven regions to compare the spatial difference of soil DOM (Supplementary Fig. S2), including the east coast (EC), middle of Yellow River (MYR), northeast (NE), northwest (NW), south coast (SC), South-North Water Transfer (STN), and southwest (SW) regions. The relative position of sample sites was reflected by the South-North and West-East orientation, and was calculated based on the latitude and longitude coordinates as follows:

$${{\rm{West}}}{\mbox{-}}{{\rm{East}}}\, {{\rm{orientation}}}=\frac{{135}^{{{\circ }}}{05}^{{\prime} }-X}{{135}^{{{\circ }}}{05}^{{\prime} }-{73}^{{{\circ }}}{33}^{{\prime} }}$$
(1)
$${{\rm{South}}}{\mbox{-}}{{\rm{North}}}\, {{\rm{orientation}}}=\frac{{53}^{{{\circ }}}{33}^{{\prime}}-Y}{{53}^{{{\circ }}}{33}^{{\prime} }-{3}^{{{\circ }}}{51}^{{\prime} }}$$
(2)

where X and Y are the longitude and latitude of the sample site, respectively; 53°33′ and 3°51′ are the latitudes of the northernmost and southernmost boundaries of China, respectively; and 135°05′ and 73°33′ are the longitudes of the easternmost and westernmost boundaries of China, respectively.

Chemical analysis of water extracts

To avoid the effects of spatial and temporal differences in moisture between soils and to compare the DOM geochemistry in different grassland ecosystems67, DOM was extracted from air-dried soils, and thus, the water-extractable organic matter was used to represent soil DOM in this study. Specifically, soil DOM was extracted by shaking (220 rpm and 25 °C) 50 g soil in 500 ml Milli-Q water for 24 h. The soil suspension was thereafter centrifuged at 10,000×g and filtered through pre-combusted glass fiber filters (0.7 μm, GF/F, Whatman). The water samples were stored in a glass vial in a refrigerator (<4 °C) for subsequent analysis.

The pH, electrical conductivity (ED), and redox potential (Eh) of the water extracts were determined by pH, ED (METTLER TOLEDO, Switzerland), and Eh (501, REX, China) meters, respectively. To reduce the electromagnetic interference in laboratory settings, the redox potential of samples was measured in a fixed place, away from wires and other electrical appliances68. All the electrodes were washed with concentrated nitric acid and recalibrated before testing and after each ten measurements. Dissolved organic carbon (DOC) and total dissolved nitrogen (TDN) were analyzed by a TOC-L CSH/CSN (Shimadzu, Japan) analyzer. Inorganic nitrogen, including ammonium (NH4+) and nitrate and nitrite (NOx), was measured by a continuous flow analyzer (SAN++, Skalar, Holland), and dissolved organic nitrogen (DON) was calculated by subtracting inorganic nitrogen from TDN.

Samples were chlorinated to examine the disinfection byproduct formation potential (DBP-FP) and chlorine reactivity (SDBP) of DOM. Samples were diluted to DOC ~3 mg·L−1 and buffered to pH 8.0 with 3 mM NaH2PO4/Na2HPO4 solution, followed by addition of NaOCl solution at a dose of [Cl2] = 3 × [DOC] + 7.6 × [NH3]69. Then, incubation was conducted at 25 °C in the dark with no headspace for chlorination reaction70. The residual chlorine was quenched by 0.5 M ascorbic acid after 72 h. Disinfection byproducts (DBPs) were extracted and analyzed by gas chromatography with an electron capture detector (Clarus 590, PerkinElmer, USA) based on the Method 551.1 and 552.3 from US EPA (details are described in Supplementary Text S2)71,72. After chlorination of the DOM solution, trihalomethanes (THMs) and haloacetic acids (HAAs) were the major detectable DBPs. The specific compounds of DBPs are listed in Supplementary Table S3. The DBP formation potential (DBP-FP; mg·kg-soil−1), representing the abundance of DBP precursors in soil, was calculated as the ratio of DBP content formed by water extract to the mass of soil, while the chlorine reactivity of DOM was denoted as the specific DBP formation potential (SDBP; mg·g-DOC−1), which was calculated by normalizing DBP concentration to the DOC concentration.

DOM characterization

UV-visible absorbance was analyzed by a UV-visible spectrophotometer (UV-2600, Shimadzu, Japan). A three-dimensional excitation-emission matrix (3D-EEM) of DOM was obtained by a fluorescence spectrophotometer (F-4600, HITACHI, Japan). The UV absorbance at 254 nm (A254), humification index (HIX), biological index (BIX), and fluorescence index (FI) were calculated to show DOM characteristics. A254 and total fluorescence intensity were applied to represent the quantity of chromophoric (CDOM) and fluorophoric (FDOM) DOM, respectively. Greater HIX reflects DOM with higher molecular weight and aromaticity, while greater BIX indicates recently produced DOM with higher freshness73. FI reflects whether the DOM was derived from allochthonous sources (~1.2, plant and soil organic matter) or autochthonous sources (~1.8, bacteria and algae byproducts)73. To further quantify and characterize the composition of fluorescent DOM, parallel factor analysis (PARAFAC) was performed with the DOMFluor toolbox in MATLAB. A four-component model was constructed and validated in this study (Supplementary Table S1), which is comparable to the fluorophores in 63 independent studies in the OpenFluor database (Tucker Congruence Coefficient >0.95). Detailed information regarding the processing of EEM data, calculation of the optical index, and PARAFAC modeling are provided in Supplementary Text S3.

High-performance size exclusion chromatography (HPSEC), employing an HPLC system (Waters, United States) and a UV detector (Phenomenex, UK), was used to determine the apparent molecular weight (MW) distribution of DOM. To characterize the MW composition of DOM, the apparent MW was divided into four MW fractions based on the peak position: very high (VHMW, 25–100 kDa), high (HMW, 3.4–25 kDa), medium (MMW, 1.2–3.4 kDa) and low (LMW, <1.2 kDa) molecular weight fractions. The relative abundance of each fraction was calculated as the ratio of the intensity integral of each MW interval to the total intensity of the four MW fractions.

The molecular composition of fourteen selected samples were further analyzed using Fourier-transform ion cyclotron resonance mass spectrometry (FT-ICR MS). Prior to analysis, solid phase extraction (SPE) was conducted using Bond Elute PPL cartridges (Agilent; 200 mg, 3 mL−1). Then, a 15 T Bruker SolariX FT-ICR MS coupled to an electrospray ionization (ESI) interface was applied to analyze the samples. Peaks with signal/noise (S/N) ratio greater than four were assigned molecular formulae following the elemental combination of C1-100H1-250O1-100N0-3S0-1. Detailed information of SPE processes, FT-ICR MS analysis, and the results are described in Supplementary Text S4 and Table S4.

DOM incubation

To avoid the influence of other complex processes in soils on DOM biodegradation, DOM was extracted from the soils and then incubated5,74,75. Fourteen DOM samples (with seven samples from both the northern and southern China regions, respectively) were incubated at 20 °C in the dark after the addition of microbial inoculum36. The inoculum was prepared at first: (1) 8 g sieved (2 mm) fresh soils from each of the 14 soils and 500 mL Milli-Q water were mixed and incubated at 25 °C in the dark; (2) After 2 weeks, the upper layer suspension was collected and used as the microbial inoculum. Subsequently, the inoculum and DOM solution (at a ratio of 1:100) were added into a pre-combusted 3-L glass bottle filled with 1 kg cleaned and pre-combusted quartz sand for microbial attachment. The C:N:P stoichiometry was amended to 106:16:1 by KNO3 and KH2PO4 to avoid possible nutrient limitation. All bottles were capped by parafilm, and solutions were stirred twice daily and for 2 min each time. After 31 days, DOM samples were analyzed and the biodegradation rate (% of initial concentration) were calculated to show the biodegradability of DOM. Furthermore, the one-phase association and two-phase association models were fitted in GraphPad Prism 9 to quantify the biodegradation rate constant (d−1) and the labile and stable fractions (%) of DOM (including DOC, CDOM, and FDOM)36, respectively.

Data sources

Environmental factors, including mean annual temperature (MAT), mean annual precipitation (MAP), potential evapotranspiration (PET), surface solar radiation (Srad), leaf area index (LAI), gross primary production (GPP), net primary production (NPP), edaphic variables (including soil moisture, ‘SM,’; soil clay fraction, ‘Clay’; soil pH; cation exchange capacity, ‘CEC’; soil exchangeable sodium, potassium and calcium, ‘Na’, ‘K’ and ‘Ca’; soil organic carbon, ‘SOC’; soil total nitrogen, ‘TN’; soil total phosphorus, ‘TP’), net ecosystem productivity (NEP), and soil respiration (Rs) were collected to examine their associations with the quantity and quality of grassland soil DOM. Details on the environmental factor datasets are provided in Supplementary Text S5 and Table S5. All the datasets were extracted to site-specific values using ArcGIS (v 10.6), and data with several years were averaged to the mean annual values. The cancer incidence and mortality data at the county scale were obtained from 2019 China Cancer Registry Annual Report76. To assess the association between grassland soil DOM and local drinking water quality, the total organic carbon (TOC) concentration of tap water for each province of China was collected from a previous study16.

Statistical analysis

The differences between DOM components and biodegradability were assessed by the Wilcoxon test. Spearman correlation was used to examine the univariate relationships between DOM geochemistry and environmental factors, and between optical properties, molecular composition, and biodegradability. Regression analysis was employed to test the relationships between DOM geochemistry and the position of sites, and between DOM and ecosystem carbon and cancer cases using SPSS 18.0 (IBM, USA). Variation partitioning analysis (VPA) (R package vegan) was conducted to quantify the explanation of size fraction and fluorescence composition to DOM quantity. Principal component analysis (PCA) was conducted using R package vegan, with permutational multivariate analysis of variance (PERMANOVA) applied to examine the group significance. Random forest regression (RF) analyses (R package randomForest) were conducted to explore the key predictors for DOM quantity and quality. Partial least-squares path model (PLS-PM) analysis (R package plspm) was applied to reveal the control-response mechanism between environmental factors, DOM, and its environmental functions. The final model was determined from all constructed models based on the goodness of fit (GoF) index that indicates the model performance.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.