Introduction

As a vital part of water resources, groundwater is indispensable in water resource supply for daily life, agricultural production and industrial activities worldwide1,2,3. This is especially significant in semi-arid and arid plains where with scarce surface water resource, large population, as well as dense agricultural and industrial activities4,5,6. It is estimated that at least millions of people are suffering to diseases related to drinking water chemical quality in water-scarce regions over the world7,8. In this context, it is of great significance to perform in-depth studies on the hydrogeochemical characteristics, spatial evolution and genesis of groundwater quality to ensure safe water supply and sustainable groundwater resource utilization in drought-prone regions globally.

Hydrogeochemical compositions are the fundament determining groundwater quality suitability for diverse usages9,10. Numerous processes and factors can influence and even control groundwater hydrogeochemical composition11,12. Generally, natural factors establish the initial framework of groundwater chemical composition13,14,15. Basic physicochemical characteristics are inherited by groundwater from the recharge water16,17. Then, various hydrological processes and hydrogeochemical interactions would further sculpture the hydrogeochemical features of groundwater along its flow path18,19. Substances produced by geogenic processes would enter and be enriched in groundwater during this process, and lead it to be beneficial or harmful to water consumers’ health20,21,22. Spatial discrepancy would be exhibited in groundwater chemistry due to the different climatic conditions, surrounding overburden or rock lithologies, residence time in aquifer23,24,25. This makes groundwater hydrogeochemical features variable and complex in nature.

In addition to natural factors, with the rapid development of human community in various aspects including urbanization, industrialization, agriculture, etc., there have been great external disturbances to groundwater chemistry in many countries 26,20. These disturbances could be direct substances input into aquifers, and also indirect influence on groundwater hydrogeochemical composition through changes to aqueous hydrogeochemical environments (e.g., change of reduction-oxidation (redox) potential)28,29,30. The anthropogenic disturbances vary in both degree and space for aquifers, resulting in much more complex groundwater chemistry feature and genesis when compared with the natural status31,32,33. Comprehensive knowledge of spatial distribution of groundwater chemistry driven by natural and anthropogenic factors could provide important support to rational utilization and scientific management of the invisible water resource beneath the ground surface34,35.

To reveal and distinguish the various factors governing the spatial distribution of groundwater chemistry, multivariate analysis approaches, such as factor analysis, principal component analysis, discriminant analysis, and cluster analysis, etc., were commonly employed to interpret the available data of groundwater geochemistry36,37,38. Although these conventional linear dimensionality reduction techniques can simplify the complex measured data without losing too much information and project them on a low-dimensional space39,40,41, inaccurate and even erroneous interpretations were usually made for groundwater chemistry36. This is attributed to the nonlinearity and highly dispersiveness of groundwater physicochemical data. Thus, more capable techniques should be considered for the interpretation of complex groundwater chemistry, especially that driven by both natural and anthropogenic factors simultaneously. Self-organizing map, a typical artificial neural network, is powerful and intelligent non-linear technique to transform complex high-dimensional dataset and visualize it into a low-dimensional space42,43,44. It has been widely applied into various aspects of earth science, e.g., meteorology, geomorphology, geo-disaster, hydrology and groundwater science, etc.44,45,46,47,48. Self-organizing map is robust and suitable for solving the potential nonlinearity, noisy and irregularity of groundwater hydrogeochemical data1,49,36.

A piedmont plain is one of the most important geomorphic units for human community and regarded as the ideal area for various human activities involving urbanization, industrialization, agricultural activity, as well as groundwater exploitation for field construction51,52,53,54. It is also one of the most interactive regions between nature and human society, thus, potentially with complex groundwater hydrogeochemical characteristics and mechanisms55,56. A piedmont plain of the North China Plain, a quintessential large sedimentary plain significantly affected by human activities, serves as an example for examining the spatial distribution and variations of groundwater chemistry in semi-arid and arid piedmonts with intense anthropogenic influence. The piedmont plain of interest in the present study is in semi-arid and arid regions and was of concern due to the variation of hydrogeochemical composition of groundwater.

The specific objectives of this study are to (1) delineate the spatial distribution of groundwater chemistry within a human-impacted piedmont regions, (2) identify the hydrogeochemical variations and evolutionary trends of groundwater in space, and (3) elucidate the critical natural and anthropogenic processes that shape groundwater chemistry, leveraging the Self-organizing map methodology in conjunction with hydrogeochemical simulations and graphical analysis. This study can enhance the understanding of groundwater chemistry in water-scarce piedmont plains, which have been significantly impacted by human activities. The findings could inform the sustainable development and conservation efforts for these vulnerable water resources, not only within the focal region but also in analogous areas globally.

Study area

The study region, defined by latitudes ranging from 38°02’26"N to 38°40’23"N and longitudes from 114°15’26"E to 115°15’35"E, spans an area of approximately 4,095 km2. This area is within the piedmont plains of the North China Plain and is bounded by the Taihang mountains in the west (Fig. 1). The terrain of the region gradually slopes downwards from the northwest to the southeast, with an average elevation of approximately 72 m. This area is traversed by four significant rivers including the Hutuo river, Ci river, Sha river, and Tang River, with the Hutuo River being the most substantial in terms of size. The study region is characterized by temperate continental arid climate and boasts an average annual temperature of 12.3℃. Annual precipitation in the region varies from 400 mm to 600 mm, predominantly falling during the wet season from June to September.

Fig. 1
figure 1

Location of (a) the North China Plain, (b) the study piedmont plain, and (c) the groundwater sampling sites. The map was created via ArcGIS 10.2 (https://www.esri.com/en-us/arcgis/products/arcgis-desktop/resources) based on the ASTER GDEMV2 data from Geospatial Data Cloud site, Computer Network Information Center, Chinese Academy of Sciences (http://www.gscloud.cn).

This region is underlain by Quaternary sediments, predominantly from the extensive alluvial fans of the Hutuo River, complemented by smaller alluvial fans from neighboring rivers. The Quaternary sediments gradually vary laterally from gravel pebbles to medium-coarse sands, and then to fine sands and clays along the southeastwards river flow direction57,58. As a result, it has better sediments permeability in the upper piedmont area and along the Hutuo River (Fig. 2a)59. The predominant aquifer system in this area is formed by these Quaternary sediments and reaches a thickness of hundreds of meters60. The Quaternary groundwater flows from west to east, and evolves from phreatic to semi-confined along the flow direction. The buried depth of groundwater level ranges from ~ 10 m to >50 m, resulting ignorable evaporation effects. The aquifers system is recharged by lateral flow-in, local precipitation, and river seepage, and discharges groundwater through lateral flow-out in various environments57. As a typical part of North China Plain, the study area has undergone significant anthropogenic disturbances to the groundwater recharge and discharge patterns61. A lot of Quaternary groundwater was extracted from the aquifers35. As a result, artificial extraction had become one of the most important groundwater discharge ways. Meanwhile, the infiltration of irrigation water in the widespread agricultural lands (Fig. 2b) has become one of the major processes of the Quaternary aquifer recharge. External contaminants, particularly those originating from human activities including daily life and agricultural practices, may infiltrate the aquifers alongside irrigation water, thereby posing a potential threat to groundwater quality.

Fig. 2
figure 2

(a) Infiltration coefficient map of phreatic aquifer, and (b) the land use within the study area along with the groundwater sampling sites in various SOM clusters. The map was created via ArcGIS 10.2 (https://www.esri.com/en-us/arcgis/products/arcgis-desktop/resources) based on (a) the data from field hydrogeological survey in present research and (b) the GlobeLand30 data from National Catalogue Service For Geographic Information (https://www.webmap.cn/commres.do? method=globeIndex).

Materials and methods

Sampling strategies and analytical methods

A total of 92 groundwater samples were collected from phreatic boreholes and wells, with depth ranging from 20 m to 100 m, across the piedmont plain during the July to August 2022 (Fig. 1). Prior to sampling, stagnant water from boreholes and wells was pumped out to ensure the representativeness of the collected samples. Groundwater parameters pH and electrical conductivity (EC) were measured in-situ with the aid of a portable multiparameter device (Multi 350i/SET, Munich, Germany), and sample collection was initiated once these two parameters had stabilized during continuous in-situ monitoring. Sample polyethylene bottles were pre-rinsed with a 10% nitric acid solution, followed by ultrapure water, and finally with the target groundwater itself, with each of these steps repeated at least three times to ensure thorough cleanliness. Immediately following collection, water samples were stored at a temperature of 4 °C in portable coolers and promptly transported to the laboratory for hydrogeochemical analysis. All water sample collection and storage were conducted in accordance with the Technical Regulations for Groundwater Sampling62 in China.

Physicochemical analyses of groundwater were conducted in the Key Laboratory of Groundwater Science and Engineering, Ministry of Natural Resources, People’s Republic of China. Major cations Ca2+, Mg2+, Na+, and K+ were determined by ICP-MS (Inductively Coupled Plasma Mass Spectrometry, Agilent 7500ce, Tokyo, Japan). Concentrations of NH4+, SO42−, Cl, NO3, and NO2 were measured by ion chromatography (Shimadzu LC-10ADVP, Kyoto, Japan). Total dissolved solids (TDS) was determined using the gravimetric method, while HCO3 concentrations were measured by acid-base titration. The ion balance charge error (ICBE) was calculated to assess the precision and accuracy of the laboratory measurement63. All the ICBE were within ± 5%, indicating credible and reliable analysis results.

Self-organizing map (SOM)

Self-organizing map (SOM) is a mathematical algorithm for unsupervised self-organizing learning38. This method can effectively realize the visualization of high-dimensional data by projecting it on a two-dimensional space1,64,65. Consequently, SOM has been widely used for the explanation of many complex data like hydrogeochemical data40,45. The details of SOM algorithm has been described in detail by Clark, et al.44. Its procedure can be summarized into three steps: (1) Input and pre-process data; (2) Create a SOM network grid and determine the grid size and shape; and (3) Run the SOM training algorithm. In the present study, the MATLAB software (version R2021b) was used to implement the SOM algorithm described above. The output neurons number of the SOM model is determined by heuristic rules \(\sqrt {5N}\), in which N is the input samples number38,66. The K-means algorithm was used to classify the samples, and determine the optimal clustering number according to the Davies-Bouldin index (DBI) 8. The DBI can be obtained by the following equation:

$$DBI = \frac{1}{k}\sum\limits_{{i = 1}}^{k} {\mathop {\max }\limits_{{i \ne j}} } \left[ {\frac{{C_{i} + C_{j} }}{{d\left( {C_{i} ,C_{j} } \right)}}} \right]$$
(1)

Where, Ci+Cj represents the sum of the average distances from each point to their respective cluster centroid; d(Ci, Cj) denotes the distance between the centroids of cluster i and cluster j.

Statistical analysis, hydrogeochemical diagrams and simulation

Multivariate statistical analysis, such as descriptive and Spearman correlation analysis67, were conducted using SPSS software (version 22.0) to explore the statistical characteristics of groundwater chemistry and the relationships between physicochemical parameters. Spearman correlation analysis was conducted to statistically corroborate the nonlinear inter-variable associations visualized in the SOM-derived component planes, thereby providing quantitative validation of the topological relationships identified through the SOM’s similarity-driven clustering algorithm68. Hydrogeochemical diagrams, including Piper, Gibbs, and bivariate diagrams38,56,69, were generated using Origin software (version 2021) to elucidate the hydrochemical facies and hydrogeochemical processes of groundwater. Hydrogeochemical simulations were conducted using PHREEQC software (version 3.5.0) to determine the saturation indices (SI) of specific minerals in groundwater. The SI values indicate the mineral equilibrium status: oversaturation if SI>0, equilibrium if SI = 0, undersaturation if SI<070. In addition, ArcGIS software (version 10.2) was used to map the geospatial distribution of groundwater chemistry.

Entropy-weighted water quality index

Water quality evaluation can be effectively performed using the entropy-weighted water quality index (EWQI)71. In contrast to traditional methods such as the one using the Water Quality Index (WQI), the EWQI approach incorporates information on entropy to mitigate the influence of subjective human factors on the weighting of hydrogeochemical parameters during the assessment55,55. The EWQI assessment procedure is as follows:

First, construct the eigenvalue matrix X as follows.

$$X=\left[ {\begin{array}{*{20}{c}} {{x_{11}}}&{{x_{12}}}& \cdots &{{x_{1n}}} \\ {{x_{21}}}&{{x_{22}}}& \cdots &{{x_{2n}}} \\ \vdots & \vdots & \ddots & \vdots \\ {{x_{m1}}}&{{x_{m2}}}& \cdots &{{x_{mn}}} \end{array}} \right]$$
(2)

Where m and n are the number of water samples and physicochemical parameters, respectively.

Second, the eigenvalue matrix X is transformed into a normalized eigenvalue matrix Y. The normalization process employs Eq. (4) presented below.

$$Y=\left[ {\begin{array}{*{20}{c}} {{y_{11}}}&{{y_{12}}}& \cdots &{{y_{1j}}} \\ {{y_{21}}}&{{y_{22}}}& \cdots &{{y_{2j}}} \\ \vdots & \vdots & \ddots & \vdots \\ {{y_{i1}}}&{{y_{i2}}}& \cdots &{{y_{ij}}} \end{array}} \right]$$
(3)
$${y_{ij}}=\frac{{{x_{ij}} - {{({x_{ij}})}_{min}}}}{{{{({x_{ij}})}_{max}} - {{({x_{ij}})}_{min}}}}$$
(4)

Where (xij)max and (xij)min are the maximum and minimum values of a single index in the X matrix, respectively.

Third, the information entropy ej is computed using the following Eq. 

$${e_j}= - \frac{1}{{\ln m}}\sum\limits_{{i=1}}^{m} {({P_{ij}} \times \ln {P_{ij}})}$$
(5)
$${P_{ij}}=\frac{{{y_{ij}}}}{{\sum\nolimits_{{i=1}}^{m} {{y_{ij}}} }}$$
(6)

Where Pij represents the ratio of the parameter value for hydrogeochemical parameter j in groundwater sample i.

Fourth, calculate the EWQI value using the following Eq. 

$$\:{w}_{j}=\frac{1-{e}_{j}}{{\sum\:}_{j=1}^{n}(1-{e}_{j})}$$
(7)
$${q_j}=\frac{{{C_j}}}{{{S_j}}} \times 100$$
(8)
$$EWQI=\sum\limits_{{{\text{j}}=1}}^{m} {({w_j} \times {q_j})}$$
(9)

Where wj is the entropy weight of parameter j, qj denotes the quality rating scale of the jth physicochemical index, Cj is the jth physicochemical index measured value, Sj is the drinking water standard value of the jth physicochemical index given by Chinese guideline73 or World Health Organization guideline74.

The EWQI assessment algorithm was implemented using the Excel package in Microsoft Office software (version 365), following the method described above. Groundwater overall quality can be assessed based on the calculated values of the EWQI. Groundwater quality is categorized into five distinct ranks based on EWQI values: Rank 1 for ‘excellent quality’ (EWQI ≤ 50), Rank 2 for ‘good quality’ (50 < EWQI ≤ 100), Rank 3 for ‘medium quality’ (100 < EWQI ≤ 150), Rank 4 for ‘poor quality’ (150 < EWQI ≤ 200), and Rank 5 for ‘extremely poor quality’ (EWQI > 200)59.

Results and discussion

Physicochemical properties of groundwater

The physicochemical parameters of the groundwater samples from the study piedmont plain are presented statistically in Table 1. For comparative purposes, this research also introduced the desirable limits for drinking water quality as recommended by the Chinese guidelines73. If there was no recommendation from the Chinese guidelines73, the World Health Organization guideline74 was used in the comparison.

In the study piedmont plain, phreatic groundwater exhibited a pH range from 6.98 to 8.68, with an average pH of 7.64, indicating a characteristic that is nearly neutral to slightly alkaline. An overwhelming majority of the groundwater samples (98.91%) displayed pH values that fall within the acceptable range of 6.5 to 8.5, as stipulated by the Chinese guidelines73. Only one sample (1.09%) was observed with the pH of 8.68 and slightly beyond the upper limit of 8.5. The EC values of the collected groundwater samples ranged from 291.99 µS/cm to 1,872.00 µS/cm, suggesting significant variability in groundwater salinity. The majority of the groundwater samples (96.74%) exhibited EC values below 1500 µS/cm, whereas only four samples surpassed this threshold. Generally, water salinity is categorized into three classes based on EC value, i.e., fresh water with EC values below 1,500 µS/cm, brackish water with EC values ranging from 1,500 µS/cm to 3,000 µS/cm, and saline water with EC values exceeding 3,000 µS/cm56. Thus, groundwater in the study piedmont plain was dominantly fresh water with a few of brackish water type. Analysis results indicated that all groundwater samples from the study piedmont plain had TDS values ranging from 189.90 mg/L to 844.80 mg/L, which was within the recommended drinking water limit of maximum 1,000 mg/L73. This suggested groundwater was suitable for drinking purpose in terms of water salinity. Groundwater samples exhibited a range of total hardness (TH) values from 119.10 mg/L to 623.00 mg/L, averaging at 318.86 mg/L. The majority of phreatic groundwater samples possessed acceptable levels of hardness, with only about 9.78% slightly exceeding the recommended drinking water limit of 450 mg/L as specified by the Chinese guidelines73.

Based on their average concentrations, the major cations in groundwater samples are ranked as follows: Ca2+ is the most prevalent and it is successively followed by Mg2+, Na+, and K+. The concentration of Ca2+ in the groundwater samples exhibited a range from 29.38 mg/L to 189.80 mg/L, with an average value of 87.95 mg/L. Mg2+ and Na+ also showed variability in their concentrations, with Mg2+ ranging from 6.22 mg/L to 50.96 mg/L and Na+ ranging from 9.53 mg/L to 144.40 mg/L. The mean concentrations for these ions were 24.10 mg/L for Mg2+ and 20.80 mg/L for Na+. In comparison to the other major cations, K+ was found in relatively lower concentrations, with a range from 0.21 mg/L to 22.42 mg/L and an average concentration of 1.98 mg/L. The average concentrations of the major anions in the samples were ranked in the following order: HCO3 was the most abundant, followed by SO42−, with Cl being the least prevalent. The concentration of HCO3 varied within the range of 87.97 mg/L to 488.10 mg/L, with an average value of 255.33 mg/L. The maximum value of Cl and SO42− was 159.00 mg/L and 227.90 mg/L, respectively. Both Cl and SO42− were within the drinking water desirable limits recommended by Chinese guideline (Table 1). Overall, phreatic groundwater in the study piedmont plain was characterized by a dominance of Ca2+ among cations and HCO3 among anions.

Table 1 Statistical listing of physicochemical characteristics of groundwater samples with reference to drinking water standards.

Nitrogen is one of main potential threats to groundwater quality in many regions worldwide including the study area75. Notably, NO3 concentration in the sampled groundwater ranged from 2.55 mg/L to 171.4 mg/L. Most of samples are with the NO3 concentration greater than the natural background level (8.7 mg/L) in the mountainous area 76. Approximately 43.48% of the groundwater samples exceeded the WHO drinking water limit for NO3, which is set at 50 mg/L74. The concentrations of NO2 and NH4+ were measured within the ranges of 0.001 mg/L to 0.16 mg/L and 0.02 mg/L to 0.28 mg/L, respectively. Approximately 14.13% of the collected groundwater samples exceeded the desirable limit for NO2, which is set at 0.02 mg/L. In addition, 2.17% of the samples surpassed the recommended threshold of 0.2 mg/L for NH4+. Therefore, nitrogen contamination is a significant issue within the aquifers of the piedmont plain, and existed in various forms including NO3, NO2, and NH4+. This contamination warrants attention due to its potential adverse impacts on groundwater quality.

Self-organizing map (SOM) and clustering analysis

The self-organization map clustering

Twelve hydrogeochemical parameters including TDS, Total hardness (TH), Ca2+, Mg2+, Na+, K+, NH4+, HCO3, SO42−, Cl, NO2, and NO3 were used as the input variables for the SOM model in the present research. According to heuristic rules, 48 hexagons (i.e., neurons) were obtained in the component planes by SOM (N = 92 groundwater samples) (Fig. 3). The SOM matrices signify the projection value of hydrogeochemical variables on the two-dimensional plane, which is reflected by the gradient of colour. The neurons in dark colours represent high values (black being for the highest values), and those in light colours signify low values (yellow being for the lowest value). The SOM matrices with colour gradient are significant for illustrating the clustering characteristics of hydrogeochemical variables and water samples38. Each hydrogeochemical variable was illustrated by one SOM matrix map, where the colour gradient could visually and clearly reveal the relationship among the hydrogeochemical variables50. Similar colour gradients indicate positive correlations. The stronger the correlation, the more similarity the colour gradients are77.

Fig. 3
figure 3

SOM matrices of physicochemical variables and groundwater samples.

As illustrated in Fig. 3, fairly similar colour gradients were observed among the hydrogeochemical variables of TH, TDS, Ca2+, Mg2+, Cl, SO42−, HCO3 and NO3, implying positive correlation among them. Between some of these variables, some differences in the colour gradient are noted, suggesting some degree of discrepancy. TH, TDS, Ca2+, Mg2+ and HCO3 showed higher similarities in colour gradient, indicating strong positive correlation. High similarities were also observed among the colour gradient of NO3, Cl, SO42− and TDS, suggesting relatively strong positive correlation among them. K+ and Na+ did not have high similarities in the colour gradient with other hydrogeochemical variables, implying relatively weak correlation. NO2 and NO3 were characterized with similar pattern of colour gradient, although the similarity was not of high degree. Same relation was observed between the colour gradients of NO2 and NH4+. To some extent, these observations suggested the potential connection through chemical processes between NO2 and NO3, and NO2 and NH4+, respectively. Spearman correlation test was employed for all the hydrogeochemical variables with the aid of the SPSS software. The results (Fig. 4) were generally consistent with the aforementioned SOM calculation, confirming the accuracy and reliability of the established SOM models in the present research.

Fig. 4
figure 4

Spearman correlation coefficients among the groundwater hydrogeochemical parameters in the study area. (* and ** represent the significant correlation at the p < 0.05 and p < 0.01 levels, respectively)

The water samples were also clustered to further get insights into groundwater hydrogeochemical features and formation based on the SOM map. The lowest DBI value corresponds to the optimal clustering number78. As illustrated in Fig. 5, there could be four optimal clusters for the present study. It can be seen in Fig. 3 that 1 (1.09%) groundwater samples were classified in Cluster I, 33 samples (35.87%) were classified in Cluster II, 11 samples (11.96%) were classified in Cluster III, and 47 (51.09%) were classified in Cluster IV.

Fig. 5
figure 5

Variations of Davies-Bouldin index (DBI) with the optimal number of SOM clusters.

Hydrogeochemical features and Spatial distribution of SOM clusters

The mean values of various physicochemical parameters for each cluster and all samples calculated by SOM are presented in Table 2. NO3 is an effective indicator of human-induced pollution of the groundwater environment18. Cl is a conservative ion in aquifers and can be used to trace external inputting of human-derived substances in to groundwater environment 63. It can be seen from Table 2, groundwaters in Cluster I were featured by the highest concentrations of NO3 and Cl among the four SOM clusters, and followed successively by those in Clusters III, II, and IV.

Table 2 Mean values of physicochemical parameters in each cluster and the whole data.

To get insights into the overall groundwater hydrogeochemical characteristics and its evolution among the SOM clusters, the Piper diagram79 was used to illustrate their hydrogeochemical facies. As illustrated in Fig. 6a, the predominant hydrogeochemical facies of the sampled groundwaters from the study piedmont plain was characterized by HCO3-Ca, accounting for 92.39% of the samples. In contrast, only a minority of the groundwaters exhibited hydrogeochemical characteristics associated with the Cl-Mg·Ca type, representing 6.52%, and the Cl-Na type, which constituted 1.09%. When considering each SOM cluster separately, the hydrogeochemical facies are HCO3-Ca for Cluster IV, Cl-Na for Cluster I, and both partly HCO3-Ca and partly Mixed Cl-Mg·Ca for Cluster II and Cluster III (Fig. 6a). The hydrogeochemical facies of groundwater depict a gradual evolving trend along the Cluster order of IV, II, III and I from fresh hydrogeochemical facies of HCO3-Ca to relatively saline hydrogeochemical facies of Cl-Mg·Ca and Cl-Na type. This is consistent with the aforementioned evolving trend of NO3 content among various clusters. In other words, the salinity levels of the groundwater in Clusters IV, II, III, and I exhibited a progressive shift from the fresh hydrogeochemical facies characterized by HCO3-Ca to more saline types, specifically the Cl-Mg·Ca and Cl-Na types, correlating with an increase in NO3 content.

Fig. 6
figure 6

(a) Piper Diagram and (b) spatial distribution of nitrate content in groundwater across various clusters identified through Self-organizing map analysis. The map was created via ArcGIS 10.2 (https://www.esri.com/en-us/arcgis/products/arcgis-desktop/resources).

The distribution of SOM clusters along with groundwater NO3 content was spatially illustrated in Fig. 6b. Groundwater of Cluster I (P2), which featured with the highest NO3 and saltiest hydrogeochemical facies, was found located in the upstream area of the plain and close to the mountains where the highest permeability is found in the vadose zone and the aquifers 57,59. Cluster III groundwaters, characterized by the second-highest nitrate concentrations (NO3) and secondary high-salinity hydrogeochemical facies, exhibited a distinct spatial distribution pattern. These samples were predominantly located in the upstream area of the study area closing to the mountains (P3, P22 and P23), and along the upstream section of Hutuo River (P60, P92, P91, P87, P88), as well as at some scattered sites adjacent to Tang River (P38), Ci River (P65) and Hutuo River (P82) in the middle area. Groundwaters of Cluster II, which were characterized by the third highest NO3 and third saltiest hydrogeochemical facies, were situated in the middle-upper stream area of the plain, especially far away from the rivers. Whereas for Cluster IV which was featured by the lowest NO3 content and freshest hydrogeochemical facies, groundwaters were found distributed mainly in the lower elevation area of the plain, also some along the Tang River, and some sporadically in the middle-upper area. In summary, the SOM clusters depicted a spatial distribution trend in the successive order of Clusters I, III, II, and IV, and extending from the upper areas adjacent to the mountains towards the downstream regions, along with the NO3 content gradually decreasing and hydrogeochemical facies being fresher.

Hydrogeochemical regimes of groundwater

Natural processes governing groundwater chemistry

The chemical composition of groundwater is fundamentally governed by a suite of natural factors and mechanisms. The natural factors include the lithology of aquifer matrices, the characteristics of recharge waters, the duration of water residence within the aquifer, prevailing climate conditions, the extent of evaporation, etc59. There are various natural mechanisms, and the present research explored those relevant to understand the spatial distribution of groundwater chemistry in the study piedmont plain.

In general, the natural mechanisms that govern the chemistry of groundwater can be categorized into three primary processes, namely precipitation, evaporation, and rock-water interaction14,16. These three primary natural mechanisms can be effectively elucidated through the analysis of the relationship between TDS and the ratio of Na+/((Na++Ca2+), and between TDS and the ratio of Cl/(Cl+HCO3)80. As depicted in Fig. 7, the data points for all SOM clusters were predominantly located within the rock dominance field of the Gibbs diagrams. This distribution suggested that the chemical composition of sampled groundwater in the study piedmont plain was primarily influenced by interactions between the aquifer matrix and groundwater, rather than by precipitation and evaporation processes. Groundwater usually flows very slowly in porous aquifers when compared to river water, even in the piedmonts like the study area81. As a result, the residence time of groundwater within the aquifer system would be relatively long, allowing ample time for the water to undergo extensive chemical reactions with the surrounding geological media. Furthermore, the study area is situated in the piedmont region of the Taihang Mountains, where the depth of groundwater table is relatively substantial. Consequently, the impact of evaporation on groundwater in this region is considered to be negligible82.

Fig. 7
figure 7

Gibbs diagrams delineating the principal natural mechanisms governing the hydrogeochemical composition of groundwater within each cluster of the study area.

To further identify the specific minerals involved in the interactions between the aquifer matrix and groundwater, this study employed the analysis of the ratios Ca2+/Na+ versus HCO3/Na+ and Mg2+/Na+83. As depicted in Fig. 8, the groundwater samples from the study piedmont plain were situated along a transition from the dominance of silicates to that of carbonates. This distribution suggested that the dissolution of carbonates and the weathering of silicates were the fundamental natural processes contributing to the chemical composition of groundwater solutes.

Fig. 8
figure 8

End-member diagrams of (a) Ca2+/Na+ ratio versus HCO3/Na+ ratio and (b) Ca2+/Na+ ratio versus Mg2+/Na+ ratio in groundwater samples across various SOM clusters.

The ratios of major ions originating from specific mineral dissolutions are fixed, so the relationship between major ions can reflect the underlying hydrogeochemical processes 84. The ratio of Cl/Na+ usually presents a value approximating 1:1 if these two ions naturally originate from halite dissolution85. As shown in Fig. 9a, the majority of groundwater samples from all clusters aligned closely with the 1:1 line for the Cl to Na+ ratio. Additionally, a strong positive correlation was quantitatively confirmed with a correlation coefficient (R2) of 0.67, which is significant at the p < 0.01 level (Fig. 4). Thus, halite dissolution is a potential contribution for Na+ and Cl for all clusters. Saturation indexes (SI) were further calculated to examine the potential contribution of minerals. As shown for halite in Fig. 10a, SI values were below 0 for groundwaters in all clusters; these values suggest that halite, if existing in the surrounding media, can contribute Na+ and Cl to groundwater. While, as demonstrated in Fig. 8, the predominant minerals engaged in interactions between the aquifer matrix and groundwater in the study region were carbonates and silicates, rather than evaporites. This is attributable to the fact that the current area is situated in the piedmont region of the North China Plain where the aquifer lithology is predominantly characterized by medium-coarse sand and sand gravel59,86. Evaporates are very limited in the aquifer media and could hardly provide substantial halite mineral sources for groundwater. Consequently, the dissolution of halite was not be the dominant natural process influencing the chemical composition of groundwater within the study area.

Fig. 9
figure 9

Scatter plots of (a) Cl versus Na+, (b) HCO3+SO42− versus Ca2++Mg2+, (c) HCO3 versus Ca2++Mg2+, and (d) SO42− versus Ca2+ in groundwater samples across various SOM clusters.

The majority of sampled groundwaters across various SOM clusters were also observed to align along the 1:1 line of the ratio (HCO3+SO42−) to (Ca2++Mg2+) as shown in Fig. 9b. This alignment suggested that the dissolution of carbonate and sulfate minerals could be significant sources of major ions in the groundwater of the study piedmont plain16. For carbonates, strong positive correlations were observed among the ions Ca2+, Mg2+, and HCO3, as indicated by the SOM component planes (Fig. 3) and further supported by the Spearman correlation coefficients (Fig. 4). These findings suggest that the dissolution of carbonate minerals may be a predominant source of hydrogeochemical solutes in groundwater of the study area.

While, as shown in Fig. 9c, groundwaters except a small portion in Cluster IV exhibited a deviation from the 1:1 line of the HCO3 to (Ca2++Mg2+) and presented excessive Ca2++Mg2+ compared to HCO3. This implied that there were some other processes contributing Ca2+ and Mg2+ to groundwaters in most sites besides carbonates dissolution. The saturation status simulation of carbonate minerals showed that only small portions of sampled groundwaters in Cluster IV and Cluster II were with the SI below 0 for aragonite, calcite and dolomite minerals (Fig. 10b, c,d). Thus, the contribution of carbonates dissolution to groundwater solutes was only dominant for some groundwater samples in Cluster IV and Cluster II, but not for the other groundwater samples, especially those in Cluster III and Cluster I.

For the sulfates, both the SOM component planes (Fig. 3) and the Spearman correlation test (Fig. 4) presented their existing strong positive correlation between Ca2+ and SO42−, implying the high possibility that sulfates dissolution contributed to the natural hydrogeochemical composition of groundwater. In addition, groundwaters in all SOM clusters had SI values of anhydrite and gypsum below 0 (Fig. 10e, f), evidencing the possible contribution of sulfates dissolution. However, excessive Ca2+ was observed when compared with the SO42− for almost all groundwaters, indicating that there were other Ca2+ sources or sulfates dissolution was not the dominant contribution (Fig. 9d). As discussed before, evaporates (including sulfates) were not the dominant minerals in the aquifer media. Thus, although sulfates dissolution can contribute Ca2+ and SO42− to groundwater in some degree, it was not the main source. This was evidenced by the end-member diagrams (Fig. 8).

Fig. 10
figure 10

The Saturation index of (a) halite, (b) aragonite, (c) calcite, (d) dolomite, (e) anhydrite and (f) gypsum for groundwaters in various SOM clusters.

Groundwater can also obtain some solutes from surrounding media through ion exchange processes. The relationship between (Ca2++Mg2+-HCO3-SO42−) and (Na++K+-Cl) was utilized to elucidate potential ion exchange processes occurring in the piedmont plain. As shown in Fig. 11, the majority of the groundwater samples in the study area were found to align along the line Y=-1.33X + 0.89 (R2 = 0.58), suggesting the existence of ion exchanges. Furthermore, it is observed that only a minority of groundwaters in Cluster IV were located in the lower right dominance, suggesting the occurrence of cation exchange where the aqueous Ca2+ or Mg2+ displaced the Na+ on aquifer media (Eq. 10). While, most groundwaters in Cluster II, Cluster III, and Cluster IV were situated in the upper left dominance, suggesting the occurrence of reverse cation-exchange (Eq. 11). The groundwater (P2) in Cluster I was significantly deviated from the the line Y=-1.33X + 0.89 (R2 = 0.58), demonstrating a limited contribution of ion exchange. This latter observation is due to the fact that aquifers in the area close to the mountains are usually composed of geological material with coarse lithology and deficient of fine sediments 86 for ion exchange reactions.

$$Ca^{{2 + }} \left( {Mg^{{2 + }} } \right) + 2NaX\left( {solids} \right) \to 2Na^{ + } + CaX_{2} \left( {MgX_{2} } \right)\left( {solids} \right)$$
(10)
$$2Na^{ + } + CaX_{2} \left( {MgX_{2} } \right)\left( {solids} \right) \to Ca^{{2 + }} \left( {Mg^{{2 + }} } \right) + 2NaX\left( {solids} \right)$$
(11)
Fig. 11
figure 11

Scatter plots of (Ca2++Mg2+-HCO3-SO42−) versus (Na++K+-Cl) of groundwater samples across various SOM clusters.

Anthropogenic factors influencing hydrogeochemical composition

The aforementioned natural processes have shaped the fundamental framework of groundwater chemistry within the study piedmont plain. However, anthropogenic factors often introduce disturbances to, and can even significantly alter, the natural chemical composition of groundwater. This impact is particularly notable in regions with dense populations and high levels of human activity.

The anthropogenic input of chemical solutes is the most direct and severe way in which human communities disturb the chemistry framework of groundwater. Among the externally introduced solutes, nitrate stands out as the most prevalent and contaminant of concern affecting both the surface and subsurface hydrosphere. While the natural background concentration of NO3 in groundwater is generally considered to be 10 mg/L56,87, groundwater with NO3 concentration exceeding this threshold is typically deemed to have been affected by the introduction of external nitrate sources14. Thus, nitrate is widely used as the indicator of man-induced external solutes to both surface water and groundwater. As shown in Fig. 12a, the majority of groundwater samples in Cluster IV, along with all samples from Clusters I, II, and III exhibited NO3 concentrations surpassing the natural groundwater threshold of 10 mg/L56. This suggested that the hydrogeochemical composition of groundwater at most sampling sites had been influenced by the introduction of anthropogenic nitrate contaminants. To identify the specific sources of nitrate, an analysis of the relationship between the ratios of Cl/ Na+ and NO3/Na+88 was done. As illustrated in Fig. 12b, with the exception of a portion of the groundwater in Cluster IV, the remaining samples were plotted in the upper right dominance of the diagram, showing a proximity to the influence of agricultural activities. This suggested that the anthropogenic nitrate contamination observed in these samples likely originated from agricultural practices.

Fig. 12
figure 12

Scatter plots of (a) TDS versus NO3, and (b) Cl/Na+ versus NO3/Na+ of groundwater samples across various SOM clusters.

As previously mentioned, the hydrogeochemical facies of the groundwater exhibited a progressive evolution from the fresh facies of HCO3-Ca to the relatively salty facies of Cl-Mg-Ca and Cl-Na, correlating with an increase in NO3 content (Fig. 6). Furthermore, positive correlations were identified between NO3 and TDS, Ca2+, Mg2+, HCO3, SO42−, Cl by the SOM component planes (Fig. 3) and the Spearman correlation test (Fig. 4). Consequently, the major inorganic salts were also brought into groundwater along with nitrate contaminants from agricultural activities.

Quantitatively, the groundwater salinity represented by TDS was shown by a fairly similar increasing trend with the nitrate content of various SOM clusters (in the cluster order of IV, II, III and I) (Table 2), indicating that more intensive agricultural practices have led to a greater influx of chemical substances into the aquifers, along with the nitrate contaminant. Chloride, a typical conservative ion in aqueous environments, exhibited a consistent upwards trend in correlation with the increase in nitrate content, progressing from Clusters IV to II, then III, and finally I (Table 2). Generally, the chloride content in aqueous environment is controlled by chloride minerals (e.g., halite) dissolution and anthropogenic input, and not significantly influenced by any other hydrogeochemical processes (such as precipitation, ion exchanges70. As previously discussed, halite dissolution does not constitute the primary hydrogeochemical process contributing Cl to groundwater in the study area. Therefore, the significant increase of Cl content in space was mainly caused by the anthropogenic input. This further indicated that groundwater salinity increase and hydrogeochemical facies evolution were fully attributed to chemical substances emanating from agricultural activities.

It is noted that groundwaters with higher salinity were predominantly located in the upper stream area, and those with lower salinity were mainly situated in the lower stream area of the study piedmont plain. This pattern is the inverse of the natural spatial evolution of groundwater salinity. As discussed before, this is caused by agricultural chemicals. Spatially, the degree of agricultural practice influence was strong in the upstream area close to the mountains, then the areas along Hutuo River in the upstream section, and weaker in the middle-upper stream area far away from rivers, and the weakest in the lower stream area. It can be seen that agricultural lands were widely and evenly distributed in the study area (Fig. 2). Thus, the external chemical loads from agricultural practices to the groundwater environment were with no significant discrepancy in space. The difference of external chemical substances input was attributed to the sediment permeability of vadose zone and aquifers in space89. The upstream area close to the mountains (the sampling sites of Cluster I and most sites of Cluster III) has the best permeability of sediments in both vadose and saturated zone (Fig. 2a). As a result, the external chemical substances including major inorganic salts, nitrate and others were very easy to enter the aquifers. Hutuo River is a relatively large river in the study region, thus, the sediments along its channels in the upper stream area also has relatively good permeability and are beneficial for the infiltration of external substances downwards. In contrast, the areas far from the rivers and those in the lower stream area are with relatively fine sediments and poor permeability, leading to relatively hard infiltration of external substances from the ground surface downwards. All the above observations make the effects of agricultural practices on groundwater chemistry strong for the sampling sites of Cluster I and III, but weaker for the sites of Cluster II and weakest for that of Cluster IV.

Groundwater quality and its implication

Groundwater quality in various SOM clusters was assessed using the EWQI approach19,90 to comprehensively reveal the hydrogeochemical quality status of groundwater under the regulation of both natural and anthropogenic factors. All the physicochemical parameters listed in Table 1 were considered in the EWQI assessment, and the results are shown in Fig. 13a.

The results demonstrated that the predominant groundwater samples had the EWQI value below 100 and belonged to the Rank 1 and Rank 2 categories (Fig. 13a). Approximately 6.52% and 1.09% of groundwater samples were observed with the EWQI varying from 100 to 150 (Rank 3) and between 150 and 200 (Rank 4), respectively. Generally, water with the EWQI value below 100 is considered suitable for direct potable use. Water with an EWQI value between 100 and 150 may be used for domestic purposes, but is not recommended for direct consumption. Water classified as poor (Rank 4) or extremely poor (Rank 5) should be avoided for any direct use. Thus, groundwater in most sampling sites can be exploited for domestic purposes (98.91%) and even for direct potable usage (92.39%).

Fig. 13
figure 13

Scatter plot of (a) TDS versus EWQI, and (b) the geospatial distribution of groundwater quality based on EWQI. The (b) was created via ArcGIS 10.2 (https://www.esri.com/en-us/arcgis/products/arcgis-desktop/resources).

The quality of groundwater in various clusters presented a gradual deterioration trend from Cluster IV to Cluster II, and then to Cluster III and Cluster I. Specifically, groundwaters in Cluster IV had the best water quality with 89.36% of excellent quality (Rank 1) and 10.64% of good quality (Rank 2). Groundwaters in Cluster II were also observed having relatively good water quality although with higher EWQI than that in Cluster IV. With the exception of one sample (P69), which had an EWQI value exceeding 100, all other samples of Cluster II exhibited EWQI values below 100. This indicates an exceptionally high quality of groundwater, rendering it suitable for direct consumption. Groundwaters in Cluster III had a relatively wide range for EWQI value (62.52-156.68), implying good to poor water quality. Among them, 36.36% (P22, P23, P65, and P38) and 9.09% (P82) belonged to the medium (Rank 3) and poor (Rank 4) quality category, respectively. Groundwater (P2) of Cluster I had the EWQI value of 127.55 and categorized into the medium groundwater quality classification (Rank 3).

Spatially, the majority of the study piedmont plain was distributed in zones of excellent (Rank 1) to good (Rank 2) groundwater quality based on the EWQI assessment (Fig. 13b), and can meet the requirements of groundwater for safe drinking. Groundwater was also shown to be of better overall hydrogeochemical quality in the lower stream areas than in the upper ones in the study piedmont plain. The medium and poor quality groundwaters were mainly located in the upper stream areas close to the mountains (P2, P22, and P23) and sporadic sites (P38, P65, P69, and P82) of the middle-lower parts of the study area.

Overall, groundwater sampled in the study piedmont plain had a relatively good quality in the majority of the plain and was considered safe for daily domestic water supply. While, particular attention should be paid to the upper stream regions and the sporadic sites within the middle-lower areas when groundwater there serves as a water supply source. Although groundwater quality in most parts of the piedmont plain was good and safe for usage by human community, the deterioration of groundwater quality was significant in terms of salinity and nitrate content. Special attention should be paid to the agricultural practices in this area to address their threats to groundwater quality. Otherwise, groundwater quality may be altered in an unexpected speed and degree considering the vulnerability of aquifers system in the piedmont plain against contaminants originating from anthropogenic activities. In addition, microbial bacteria are not considered in the present research, but they usually accompany human-derived contaminants. Thus, microbial parameters are recommended to be involved in the monitoring of the groundwater environment in regions like the present agricultural piedmont plain.

Conclusions

The present research employed self-organizing map (SOM) coupled for example with hydrogeochemical simulation and diagrams, as well as entropy-weighted water quality index approaches, to reveal the spatial distribution of groundwater chemistry in a typical piedmont plain of northern China that has been influenced by anthropogenic and natural forces simultaneously. The major findings are as follows:

  1. (1)

    Groundwater in the piedmont plain is predominantly of fresh and slightly alkaline nature. Nitrogen contaminants of NO3-, NO2-, and NH4+ were characterized with concentrations exceeding the selected drinking limits for groundwater in some sites. Four hydrogeochemical clusters were deciphered by the SOM approach, and with 1.09%, 35.87%, 11.96% and 51.09% of the sampled groundwaters from Clusters I, II, III and IV, respectively.

  2. (2)

    Hydrogeochemical facies of groundwater presented a gradual evolution from fresh facies of HCO3-Ca to relatively salty facies of Cl-Mg·Ca and Cl-Na in the order of Cluster IV, II, III and I. NO3- also follows the same order of clusters for its content increase in groundwater. Groundwaters with relatively high salinity and NO3- (Clusters I and III) were dominantly distributed in the upper stream area close to the mountains and along the upstream section of Hutuo River, and that with relatively low salinity and NO3- were mainly located in sites of the lower stream area, or the middle-upper stream areas far away from rivers.

  3. (3)

    The chemical composition of groundwater in the study piedmont plain was predominantly governed by interactions between the aquifer matrix and groundwater, specifically the weathering of silicates and reverse cation-exchange reaction. Dissolution of carbonates and cation exchange reactions also significantly contributed to the chemical composition of groundwater in the lower stream areas. Agricultural practices have extensively driven the input of external substances into the aquifers, which in turn has led to a widespread increase in both groundwater salinity and nitrate content. The spatial permeability discrepancy of sedimentary lithology resulted in the great influences in the upper stream area rather than in the lower parts of the piedmont plain.

  4. (4)

    Groundwater had excellent (EWQI ≤ 50) to good (50 < EWQI ≤ 100) quality in most of the study area (92.39% of the sampling sites), and can be used for direct drinking purpose based on EWQI assessment. The relatively poor-quality groundwaters were predominantly associated with Clusters III and I, and were mainly located in the upper stream areas adjacent to the mountains, as well as in sporadic sites within the middle to lower stream areas. Special attention should be paid to the significant deterioration in groundwater quality, particularly in terms of salinity and nitrate levels, which were attributed to agricultural contaminants in the study piedmont plain.