Introduction

A. bunge is one of the commonly used major medicinal materials in China, first recorded in the "Shennong Bencao Jing" and once ranked at the forefront among the seven Chinese herbal materials with an export value over ten million dollars1,2,3. However, due to the severe over-harvesting, the habitat of wild A. bunge has been severely damaged, leading to the consumption rate of wild medicinal plant resources far exceeds their natural growth rate, severely affecting the sustainable and healthy development of China’s traditional Chinese medicine resources industry4,5,6. Therefore, the current A. bunge has been included in the "China Biodiversity Red List—Higher Plants Volume" on September 2, 2013- Vulnerable, and has also been listed as a national third level protected plant. Meanwhile, Inner Mongolia Autonomous Region established a workstation to conduct a census of wild A. bunge resources, establish an information database, and achieve dynamic monitoring. In the growth and development process of medicinal plants, besides being influenced by genetic factors, environmental variables also play a significant role in their growth, development, and the accumulation of SMs. Plants are affected by various environmental variables and even stress throughout their growth, including climatic variables (such as light, temperature, and moisture), topographic variables (such as latitude, longitude, and altitude), and soil variables (such as soil texture and physicochemical properties), etc.7,8,9. Furthermore, rich human activities impact the ecosystems of medicinal plants, thereby affecting the processes of growth and development of medicinal plants. Overall, global climate change and human activities can cause changes or loss in the habitats of Chinese medicinal materials, consequently affecting the market supply of high-quality medicinal materials10. Given this context, it is essential to study the relationship between Chinese medicinal resources, growth, and environmental variables under the backdrop of global climate change and human activities, in order to identify suitable habitats for these resources11,12. Simultaneously, current excessive harvesting and severe habitat destruction have intensified the uncertainty of suitable habitats for medicinal plants. Additionally, few studies have been able to link the spatial quality differences of the SMs of medicinal plants with their ecological habitats, making it challenging to pinpoint the high-quality suitable habitats for these plants. Therefore, understanding the habitat requirements of species is a vital condition for studying their ecological suitability13. Actively exploring the ecological suitable areas for medicinal plants and analyzing the spatial quality differences in their SMs is of significant importance for enhancing the production and quality of Chinese medicinal materials14.

Numerous scholars have reported differences in the SMs of A. bunge across various locations. For instance, Zheng et al.15, in their study of the total saponin content in A. bunge from different locations, found that the content was relatively higher in Inner Mongolia, Shanxi, Jilin, etc., with Gansu producing A. bunge having the lowest total saponin content. Hu et al.16 exploring the chemical composition differences in A. bunge from different locations, discovered that A. bunge from Inner Mongolia contained more of compounds Calycosin(C.) and Formononetin (F.) compared to those from Gansu. Jin et al.17 found significant variations in the SM content of A. bunge from 22 different locations, with the highest total saponin content found in A. bunge from Pinglu District, Shanxi, and the highest total flavonoid content in A. bunge from Zhaosu County, Xinjiang.

Species Distribution Models (SDMs) are mathematical models that estimate species’ ecological niche requirements based on species distribution data (response variable) and environmental variable data (explanatory variable), thereby predicting potential species distributions and assessing the contributions of environmental variables18,19,20,21. The Biomod2 platform, as the most comprehensive integration software currently created within SDMs, has been widely applied in biodiversity conservation, potential species distribution especially for endangered species, forest management and protection, biological invasions, climate change predictions, ecological suitability assessments, crop yield forecasts, and disease prevention and control, among other areas22,23,24. Biomod2 is an ensemble modeling platform based on R language that sets ten common single models, including Artificial neural networks (ANN), Classification tree analysis (CTA), Flexible discriminant analysis (FDA), Generalized additive model (GAM), Generalized Boosting Model or Boosted Regression Trees (GBM/BRT), Generalized linear model (GLM), Multivariate adaptive regression splines (MARS), Maximum Entropy Model (MAXENT), Random forest (RF), Surface range envelope (SRE). By constructing ensemble models and integrating the different principles, assumptions, and algorithms of different models, the platform circumvents the instability of single-model simulations, achieving highly accurate predictions25,26. Shrestha et al.27 based on the Biomod2 platform, selected 8 models for combination to determine the suitable distribution areas for 6 invasive species in Nepal under current and future climate change scenarios.Zhu et al.28 utilized the Biomod2 platform to construct a composite model to study the suitable distribution of Magnolia officinalis in southern China.

This study providing a theoretical basis for in-depth studies on the sustainability of A. bunge resources by identifying high-quality habitable zones of A. bunge. This research aims to: (1). Identify potential high ecological suitability areas for A. bunge; (2). Determine the dominant environmental variables affecting the growth of A. bunge. (3). Construct CGQM to determine the high-quality comprehensive suitable habitat for A. bunge.

Materials and methods

Study area

The research area primarily concentrates in the central part of Inner Mongolia, Shanxi Province, northern Shaanxi Province, and southeastern Gansu Province in China, with scattered distributions in the Xinjiang valley and the northeastern region. It spans approximately 20° in latitude from north to south and about 60° in longitude from east to west, covering most of the northern part of China. The climate within the research area is predominantly temperate continental, characterized by distinct seasons, relatively low rainfall ranging between 50 to 500 mm, and a decrease from east to west. The precipitation during the plant growing season accounts for 80% of the annual rainfall, with an average annual temperature ranging from 3 to 6 °C. The annual temperature difference lies between 33 and 45 °C, featuring long sunshine hours exceeding 2700h per year, high annual evaporation rates, and an average annual atmospheric relative humidity of about 37%. Shrub communities are widely distributed across the region, extending from the southern mountains of the Greater Khingan Range in the east to the Xinjiang Tianshan river valleys in the west. Vegetation growth is influenced by rainfall, deteriorating from east to west29. The area is rich in wild medicinal resources, with a wide distribution and large reserves, including over 1000 species of medicinal plants, more than 120 species of medicinal animals, and over 40 types of medicinal minerals30. A. bunge is one of the authentic medicinal materials found here.

Occurrence collection

The presence data for A. bunge were primarily obtained from the Global Biodiversity Information Facility database (GBIF, https://www.gbif.org/), the Chinese Virtual Herbarium (CVH, https://www.cvh.ac.cn/), and records of A. bunge distribution points documented in previous literature. To ensure the timeliness of the data, this study mainly collected A. bunge distribution data from 2010 onwards. For distribution points described with precise ___location information but without latitude and longitude details, the specific geographical coordinates were determined using Google Earth. To ensure reliability of the data and reduce errors caused by the clustering effect, all sample point records were from the year 2000, and duplicate records were removed and spatially filtered such that a single point occurred within each grid cell (10 km × 10 km). An observation point map was created using ArcGIS 10.7 software (Esri, Redlands, CA, USA). This resulted in 191 documented A. bunge presence records for model building (Fig. 1).

Fig. 1
figure 1

Distribution records of A. bunge in the study area. This map was made in ArcGIS 10.7 software (Esri, Redlands, CA, USA). The original boundary was obtained from Natural Earth (http://www.naturalearthdata.com).

For models within the Biomod2 platform that require absent data points for A. bunge, Biomod2 offers several methods to generate Pseudo-absence points31. This study employs the default Biomod2 method ‘random’—that is, selecting Pseudo-absence points randomly from the background grids of all layers to generate the absence points for A. bunge. To minimize the impact of randomness and enhance the accuracy of the results, three sets of Pseudo-absence data, equal in number to the presence points of A. bunge, were set to participate in model construction, and the weights of pseudo-absence points were made consistent with those of presence points.

The study selected sites of SMs from A. bunge in five areas: Shanxi, Shaanxi, Inner Mongolia, Ningxia, and Gansu. Based on ecological suitability, the production areas and analysis of A. bunge are delineated and conducted in conjunction with the medicinal value of A. bunge. The five types of SMs of A. bunge include: Astragaloside IV(A. IV) from the saponin compounds in A. bunge, and from the flavonoid compounds, Calycosin-7-O-beta-D-glucoside(C. glucoside) , Ononin(O.) , Calycosin(C.) , and Formononetin (F.) . These five types of SMs serve as indicators for assessing the quality of A. bunge.

Environmental variables screening and data processing

This study collected data on four types of environmental variables, including bioclimatic variables, topographic variables, soil variables, and human activities: (1) Bioclimatic variables: A total of 24 bioclimatic variables were selected for modeling, The comparison Program (CMIP6) WorldClim database (http://worldclim.org (accessed on 5 February 2024)), primarily temperature and precipitation data, was used. Both were the standard annual mean data from 1970 to 2000, while solar radiation (Srad) and vapor pressure (Vapr) were the standard monthly mean data from 1970 to 200032. The Space Information Alliance platform provides data on the annual average Aridity Index (AI) and the annual average Potential Evapotranspiration (PET) for reference crops. (2) Topographic variables: The topographical environmental variables were elevation (Elev), slope (Slop) and aspect (Asp), all of which were taken from the FAO Soil Portal (https://www.fao.org/soil-portal/data-hub/en (accessed on 5 February 2024)). (3) Soil variables: because A. bunge is a deep-rooted plant33,34, with some of its main roots reaching up to 70 cm, and its roots are used as medicinal parts35, to better study the relationship between A. bunge and environmental variables, this study selected a total of 31 soil characteristic data, including both topsoil and subsoil. The data were obtained from the Nanjing Institute of Soil Science, Chinese Academy of Sciences (https://vdb3.soil.csdb.cn/), which includes two categories: soil characteristic data and soil quality data. Soil quality data are crucial to crop production, focusing on seven key soil qualities, including nutrient availability and nutrient retention capacity. Seven variables from the soil quality data were chosen for modeling. (4) Human activity variables include human footprints (Hfs)36; this variable came from the Social and Economic data and Application Center (http://sedac.ciesin.columbia.edu (accessed on 5 February 2024)). Hf provides a global map of the cumulative human pressure on the environment.

All data in this study were selected with a 30" (~ 1 km) resolution. The multicollinearity among environmental variables will increase the uncertainty of model results37,38. To avoid overfitting in the final model due to strong correlations between environmental variables, the 36 environmental variables were filtered using Spearman’s correlations in R39. We retained the variables with a correlation coefficient ≤|0.8|, and only one factor was eliminated if the values were ≥|0.8| (Table 1).

Table 1 Environment variables used in Biomod2 model for A. bunge.

Biomod2 construction and evaluation

This study employed the Biomod2 platform to construct an integrated model for A. bunge, evaluating the suitability of different models for A. bunge by assessing the accuracy of the model structures, and establishing the optimal integrated model. Upon completion of the simulations, K-fold cross-validation was chosen for evaluation. The study set K at 5 and the number of repetitions at 10 to evaluate model precision40. The AUC (Area Under the Curve) value, TSS (True Skill Statistic) value, and KAPPA coefficient were used as precision evaluation indices. AUC is the area under the receiver operating characteristic (ROC) curve, and it is not affected by the diagnostic threshold and species occurrence and distribution rate, and the range is (0, 1). The larger the value, the stronger the model’s ability to distinguish between positive and negative samples, and the higher the model accuracy41. The KAPPA coefficient encompasses the probability distribution, specificity, and sensitivity of species, generally ranging from 0 to 1, being more robust and conservative; the larger the value, the higher the model accuracy. The TSS value is a further development and improvement based on the KAPPA coefficient, it has the ability to distinguish between “TRUE” and “FALSE” results, which can effectively avoid the unimodal curve response to the incidence of species42.

Biomod2 allows for the customization of conditions within successfully run models to filter and select single models that meet specific criteria for creating composite models. This study chose to establish composite models using single models that exhibit excellence across three evaluation metrics: KAPPA, AUC, and TSS (KAPPA ≥ 0.80, AUC ≥ 0.90, TSS ≥ 0.85).

Its calculation formula is as follows:

$$w_{i} = {\raise0.7ex\hbox{${r_{i} }$} \!\mathord{\left/ {\vphantom {{r_{i} } {\mathop \sum \nolimits_{i = 1}^{i = n} r_{i} }}}\right.\kern-0pt} \!\lower0.7ex\hbox{${\mathop \sum \nolimits_{i = 1}^{i = n} r_{i} }$}}$$
(1)
$${ESI}_{j}=\sum_{j=1}^{j=n}{X}_{ij}{w}_{i}$$
(2)

In formulas (1) and (2), \({w}_{i}\) represents the weight of the i-th model, \({r}_{i}\) represents the TSS value of the i-th model, \(n\) denotes the number of models selected (the number of models that meet the criteria of KAPPA ≥ 1.80, AUC ≥ 0.90, TSS ≥ 0.85), and \({ESI}_{j}\) represents the combined model’s Ecological Suitability Index (ESI) for the j-th grid, \({X}_{ij}\) denotes the ESI of the i-th model for the j-th grid. Biomod2 outputs the model results in ASCII format. In ArcGIS 10.7, the ASCII format is converted into raster format. The ESI of A. bunge in each grid was obtained, and the average logical value was used to divide the suitability into five levels (I: 0–20%; II: 20–40%; III: 40–60%; IV: 60–80%; V: 80–100%), representing the unsuitable area, the low-suitability area, the medium-suitability area, the high-suitability area and the best-suitability area, respectively. The overall Biomod2 modeling framework is shown in Fig. 2.

Fig. 2
figure 2

Research flowchart.

Constructing geospatial quality model and comprehensive geospatial quality model

Analyzing the differences in SM across various regions, considering issues such as non-conformity to homogeneity of variance and normal distribution, a non-parametric testing approach was adopted43. The Kruskal–Wallis rank sum test method, suitable for multiple sample testing, was chosen. Furthermore, a correlation analysis of five types of SMs in A. bunge was conducted using SPSS 25 software, selecting the Spearman correlation test method.

This study has constructed a linear regression model as a Geospatial Quality Model (GQM) to simulate the spatial distribution characteristics of five types of SMs, as shown in formula (3) below44:

$$g\left[ {E\left( Y \right)} \right] = LP = \upalpha + X\upbeta$$
(3)

In the linear regression model, the linear predictor \(\left(LP\right)\) is associated with predictor variables \({X}_{p}\left(p=1, 2...,j\right)\), where \(\alpha\) is the intercept, \(\beta\) is the vector of regression coefficients, \(\mu =E\left(Y\right)\) represents the conditional expectation of \(Y\) given \(X\), and \(\text{g}()\) denotes the link function; here we choose the Gaussian type. The corresponding terms for the i-th observation in the sample are as follows:

$$\text{g}\left({\mu }_{i}\right)=\alpha +{\beta }_{1}{X}_{i1}+{\beta }_{2}{X}_{i2}+\cdots +{\beta }_{j}{X}_{ij}$$
(4)

This study employed SPSS (IBM SPSS Statistics 26; https://www.ibm.com/cn-zh/analytics/spss-analytics-software) for linear regression to establish a relationship model between five types of SMs of A. bunge and environmental variables, thereby only retaining informative predictive variables45. The spatial quality model’s outcomes were validated using the F-statistic and R-squared, calculated through formulas (3) and (4)46.

$$F=\frac{{R}^{2}/p}{\left(1-{R}^{2}\right)/\left(n-p-1\right)}$$
(5)
$${R}^{2}=1-\frac{{\sum }_{i=1}^{n}\left({y}_{i}-{\widehat{y}}_{i}\right)}{{\sum }_{i=1}^{n}\left({y}_{i}-\overline{y }\right)}$$
(6)

In this context, \(n\) represents the number of sample points, and \(p\) represents the number of predictor variables in the model, and \(\left(n-p-1\right)\) indicating the degrees of freedom.

According to the Chinese Pharmacopoeia (2020 Edition)47, the content of A. IV in A. bunge must not be less than 0.08%, and the content of C. glucoside must not be less than 0.02%. This study uses these figures as the minimum threshold levels for A. IV and C. glucoside, respectively. For the other three SMs not specified, their minimum threshold levels are set at 0%. Based on the linear regression model obtained from 2.5 and combining the pre-set thresholds of each SM the ‘Spatial Analysis’ function in ArcGIS 10.7 is utilized to estimate the content of SMs within the suitable planting areas for A. bunge nationwide, and the natural breaks method is used to categorize them into four classes to show the variations in levels, resulting in the final delineation of A. bunge production areas.

The cultivation of A. bunge requires an appropriate environment and stable habitat48, hence this study established CGQM formula (5) for evaluating the suitability of artificial cultivation of A. bunge, based on ESI and GQM. To ensure the reliability of the research findings, areas classified as moderately suitable or above were designated as the suitable habitat for A. bunge, meaning that the regions identified \(ESI\ge 0.4\) are suitable habitats for A. bunge. The final comprehensive evaluation index results range between 0 and 1, with higher values indicating that the research area is more suitable for the artificial cultivation of A. bunge. This study limits the evaluation of suitability for artificial cultivation to the suitable habitats of A. bunge.

$${CGQM}_{i}=0.5\times \left({ESI}_{i}\cap GQ{M}_{i}\right)$$
(7)

This study assigns equal weights to ESI and GQM, both being 0.5. Following this, a weighted summation is conducted to derive the Composite Evaluation Index. Here, \({CGQM}_{i}\) represents the comprehensive geospatial quality simulation result for A. bunge in the i-th evaluation unit, \({ESI}_{i}\) denotes the suitable habitat simulation result for A. bunge in the i-th evaluation unit, and \({GQM}_{i}\) indicates the spatial quality simulation result for A. bunge in the i-th evaluation unit.

Results

Biomod2 simulation results

In Biomod2, out of the 10 species distribution models, 9 were successfully run, excluding MARS. The results with the highest accuracy evaluation from each of the 9 models were outputted. Combined with the ecological suitability zoning criteria mentioned in section "Biomod2 construction and evaluation", the spatial distribution of ecological suitability for A. bunge from the 9 individual models was obtained (Fig. 3), and the area of various suitability zones for A. bunge according to different models was calculated (Fig. 4).

Fig. 3
figure 3figure 3

Results of the singular model simulation on the spatial distribution of A. bunge. (a) Prediction of the spatial distribution of A. bunge in China based on the MAXENT model. (b) Prediction of the spatial distribution of A. bunge in China based on the GLM model. (c) Prediction of the spatial distribution of A. bunge in China based on the RF model. (d) Prediction of the spatial distribution of A. bunge in China based on the CTA model. (e) Prediction of the spatial distribution of. A. bunge in China based on the GBM model. (f) Prediction of the spatial distribution of A. bunge in China based on the FDA model. (g) Prediction of the spatial distribution of A. bunge in China based on the GAM model. (h) Prediction of the spatial distribution of A. bunge in China based on the ANN model. (i) Prediction of the spatial distribution of A. bunge in China based on the SRE model. These maps were made in ArcGIS 10.7 software (Esri, Redlands, CA, USA) using the result produced by Biomod2 Model. The original boundary was obtained from Natural Earth (http://www.naturalearthdata.com).

Fig. 4
figure 4

Results of the different models simulating the distribution area of A. bunge.

Based on the spatial distribution results, the suitable distribution areas for A. bunge identified by nine models are relatively consistent, mainly located in North China, Northeast China, and Northwest China, including central and eastern Inner Mongolia, most of Shanxi, eastern Gansu, Ningxia, Shaanxi, and Hebei, among other areas. However, there are some differences in the specific details. The output of the SRE model is in Boolean values (1 for suitable distribution, 0 for unsuitable distribution), which differs from the continuous values obtained by other models. The CTA and GAM models show more extreme results, with overly optimistic predictions. The GBM and RF models perform the best, with highly consistent simulation results, smooth transitions between different suitability levels, and rich details. Based on the area performance results, the RF model identifies the largest total suitable distribution area, covering 4.96 × 106 km2; within the suitable distribution areas, the CTA model identifies the largest area of most suitable distribution, covering 2.18 × 106 km2, while the smallest is identified by the MAXENT model, at just 1.08 × 105 km2.

Following the method described in 2.4, the composite model’s KAPPA value is 0.815, AUC value is 0.975, and TSS value is 0.854. According to these three evaluation criteria, the composite model’s accuracy is deemed excellent and surpasses that of any single model, demonstrating superior performance.

Based on the criteria for suitable distribution area classification, the composite model results (Fig. 5) cover all levels of suitable distribution areas, with a natural and uniform transition. The model demonstrates good richness and continuity, effectively illustrating the suitable distribution of A. bunge across China. The suitable distribution areas for A. bunge are mainly located in Northeast, North, and Northwest China, with areas of higher suitability concentrated in most of Shanxi, southeastern Gansu, central and southern Inner Mongolia, northern Hebei, northern Shaanxi, and southern Ningxia. According to the ensemble model’s predictions, the most suitable distribution area for A. bunge in China covers an area of 5.48 × 105 km2, with the area of high suitability covering 5.88 × 105 km2. The areas of moderate and low suitability are more extensive, while the most suitable and high suitability areas are relatively smaller.

Fig. 5
figure 5

Prediction of the spatial distribution of A. bunge in China based on the ensemble model. This map was made in ArcGIS 10.7 software (Esri, Redlands, CA, USA) using the result produced by Biomod2 Model. The original boundary was obtained from Natural Earth (http://www.naturalearthdata.com).

Analysis of the correlation between five SMs of A. bunge and environmental variables

Based on the test results (Table 2), the contents of five SMs differ significantly between different regions (i.e., progressive significance < 0.05). Therefore, it is necessary to further study the relationship between A. bunge SMs and environmental variables, as well as the spatial distribution characteristics of A. bunge SMs, in order to carry out targeted artificial cultivation of A. bunge.

Table 2 Non-parametric test of the content of SMs in A. bunge.

Referring to the test results (Table 3), at a confidence level (two-tailed) of 0.05, C. glucoside is significantly correlated with both A. IV and C., with the correlation coefficients being positive, indicating a positive influence. At a confidence level (two-tailed) of 0.01, A. IV and F. are significantly correlated, with the correlation coefficient being negative, indicating a negative impact on each other. O. is significantly correlated with both C. glucoside and C., with all correlation coefficients being positive, suggesting a positive effect between them. C. also has a significant positive correlation with F.

Table 3 Correlation of five SMs in A. bunge.

A correlation test between five SMs of A. bunge and environmental variables was conducted using the Spearman correlation test method, to demonstrate the impact of environmental variables on the content of these five SMs. Considering the differences between environmental variables affecting the accumulation of plant SMs and those affecting plant growth, this study analyzed all collected environmental variables, rather than just those involved in the Biomod2 model. According to Table 4, the content of A. IV is significantly correlated (P < 0.01) with Bio3, Bio4, Bio7, Bio8, Bio9, Bio11, Bio17, Bio19, Wind, Slope, Ele, Sq4 and T_CLAY. It is also significantly correlated (P < 0.05) with Bio6, Bio14, Bio15, Srad, S_GRAVEL and S_OC. Among these, the content of A. IV is positively correlated with Bio4, Bio7, Bio8, Wind, Sq4, T_CLAY, Bio15 and S_OC, and negatively correlated with the other variables.

Table 4 The correlation between A. IV and environmental variables.

According to Table 5, the content of C. glucoside is significantly correlated (P < 0.01) with Bio13, Bio16, Bio18, Srad, Ele and Sq4. It is also significantly correlated (P < 0.05) with Bio15, Bio17, Bio19, PET, AI, Slope, Sq3 and Sq7. Among these, it has a positive correlation with Bio13, Bio16, Bio15, Sq4, Bio18 and AI, and a negative correlation with other environmental variables. Specifically, the influence of AI on C. glucoside is consistent with the conclusions of49; drought stress can promote its accumulation.

Table 5 The correlation between C. glucoside and environmental variables.

According to Table 6, the content of C. is significantly correlated with Srad and Wind (P < 0.01), and significantly correlated with PET, AI and Sq3 (P < 0.05). Among these, it shows a positive correlation with AI, and a negative correlation with other environmental variables.

Table 6 The correlation between C. and environmental variables.

According to Table 7, the content of F. is significantly correlated with Bio3, Bio15 and Wind (P < 0.01), and is significantly correlated with Bio4, Bio6, Bio8, Bio9, Bio11, AI and Ele (P < 0.05). Among these, it has a positive correlation with Bio3, Bio6, Bio9, Bio11, AI, Ele and a negative correlation with other environmental variables.

Table 7 The correlation between F. and environmental variables.

According to Table 8, the content of O. is significantly correlated only with Srad and Sq3, showing a negative correlation with both (P < 0.05).

Table 8 The correlation between O. and environmental variables.

Overall, environmental variables such as Srad, Bio15, Wind, AI, Ele and Sq3 have a significant impact on the accumulation of SMs in A. bunge. Among these, Srad shows a significant negative correlation with the content of four types of SMs. To a certain extent, Srad reflects light intensity, which is consistent with the findings of Wang et al.50 regarding the impact of light intensity on the accumulation of SMs. Sq3 also displays a significant negative correlation with the content of three types of SMs, whereas AI shows a significant positive correlation with the content of three types of SMs. Bio15, Wind, and Ele all show significant correlations with the content of three types of SMs, yet their effects vary across different SMs. Bio15 is positively correlated with the content of A. IV and C. glucoside, but negatively correlated with the content of F. Wind also shows a significant negative correlation with the content of C. and F., but a significant positive correlation with the content of A. IV, demonstrating certain differences. Ele exhibits a similar pattern, being significantly negatively correlated with the content of A. IV and C. glucoside, yet significantly positively correlated with the content of F..

Relationship model between the main SMs of A. bunge and environmental variables

Based on the relationship model between five SMs of A. bunge and environmental variables, the results are as follows:

The linear regression model for A. IV and environmental variables is as follows:

$$Y_{1} = 0.028x_{1} - 0.019x_{2} + 0.039x_{3} + 0.004x_{4} + 0.001x_{5} - 0.113$$
(8)

In formula (8): \({Y}_{1}\) represents the content of A. IV, \({x}_{1}\) is T_ECE, \({x}_{2}\) is Bio14, \({x}_{3}\) is Wind, \({x}_{4}\) is T_CLAY, and \({x}_{5}\) is T_SAND. The regression model’s \({R}^{2}\) is 0.479, with an \(F\) value of 13.767. The \(P\) value of the F-test results is 0.000 < 0.05, indicating that the model is significant, has statistical significance, and can be applied to study the content of A. IV.

The linear regression model between C. glucoside and environmental variables is:

$${Y}_{2}=-0.00002970{x}_{1}-0.00002573{x}_{2}+0.004{x}_{3}-0.307{x}_{4}-0.022{x}_{5}+0.014{x}_{6}+0.525$$
(9)

In formula (9): \({Y}_{2}\) represents the content of C. glucoside, \({x}_{1}\) is Srad,\({x}_{2}\) is Ele, \({x}_{3}\) is T_GRAVEL, \({x}_{4}\) is T_CASO4, \({x}_{5}\) is T_OC, and \({x}_{6}\) is Wind. The regression model’s \({R }^{2}\) is 0.463, the \(F\) value is 10.654, and the \(P\) value from the F-test results is 0.000 < 0.05, indicating that the model is significant, has statistical meaning, and can be applied to study the content of C. glucoside.

The linear regression model of O. with environmental variables is:

$${Y}_{3}=-0.00001648{x}_{1}+0.012{x}_{2}-0.001{x}_{3}-0.014{x}_{4}+0.168$$
(10)

In formula (10): \({Y}_{3}\) represents the content of O., \({x}_{1}\) is Srad, \({x}_{2}\) is Bio12, \({x}_{3}\) is T_CLAY, and \({x}_{4}\) is Sq2. The regression model’s \({R}^{2}\) is 0.263, with an \(F\) value of 6.772. The \(P\) value of the F-test result is 0.000 < 0.05, indicating that the model is significant, has statistical significance, and can be applied to the study of O. content. Additionally, the test results for each regression coefficient are also < 0.05, meaning that all regression coefficients are significant.

The linear regression model between C. and environmental variables is:

$${Y}_{4}=0.015{x}_{1}-0.00001346{x}_{2}-0.012{x}_{3}+0.001{x}_{4}+0.121$$
(11)

In formula (11): \({Y}_{4}\) represents the content of C., \({x}_{1}\) is Bio7, \({x}_{2}\) is Srad, \({x}_{3}\) is Bio2, and \({x}_{4}\) is T_BS. The regression model’s \({R}^{2}\) is 0.467, with an \(F\) value of 13.158. The \(P\) value from the F-test is 0.000 < 0.05, indicating that the model is significant, has statistical meaning, and can be used to study the content of C.. The test results for each regression coefficient are also < 0.05, meaning each regression coefficient is significant.

The linear regression model between F. and environmental variables is:

$${Y}_{5}=-0.00001091{x}_{1}-0.001{x}_{2}-0.00009956{x}_{3}+0.004{x}_{4}-0.008{x}_{5}+0.168$$
(12)

In formula (12): \({Y}_{5}\) represents the content of F., \({x}_{1}\) stands for Srad, \({x}_{2}\) represents T_CEC_SOIL, \({x}_{3}\) denotes Bio16, \({x}_{4}\) indicates Bio2, and \({x}_{5}\) signifies Sq2. The regression model’s \({R}^{2}\) is 0.338, with an \(F\) value of 7.660. The result of the F test, with a \(P\) value of 0.000 < 0.05, indicates the model’s significance, suggesting its applicability in studying the content of F..

Overall, the above 5 regression models all show significant performance and can, to a certain extent, reveal the impact of environmental variables on the accumulation of SMs content. They can be used in conjunction with environmental variable data to predict the content of SMs in A. bunge.

A. bunge production area division results

Based on the results obtained in Sect. 2.6, the production area delineation for A. bunge is presented (Fig. 6).

Fig. 6
figure 6

A. bunge production area division results. (a) A. IV content in ecological suitable areas for A. bunge (b) C. glucoside content in ecological suitable areas for A. bunge (c) O. content in ecological suitable areas for A. bunge (d) C. content in ecological suitable areas for A. bunge (e) F. content in ecological suitable areas for A. bunge. This map was made in ArcGIS 10.7 software (Esri, Redlands, CA, USA) using the results obtained in Sect. 2.6. The original boundary was obtained from Natural Earth (http://www.naturalearthdata.com).

Predicted variations in the concentrations of five SMs show spatial disparities, among which the distribution patterns of C. and O. concentrations are inversely related. The concentration of C. decreases gradually from north to south, while the concentration of O. increases from north to south. The concentration of A. IV generally increases from south to north, but areas with high concentrations are few; the concentration of C. glucoside exhibits a distribution pattern that is high at both the northern and southern ends and low in the middle, similar to the distribution pattern of F. concentration, which is also high at the northern and southern ends and low in the middle, but with fewer areas of high concentration at both ends.

The highest predicted content of A. IV reaches 1.32%, but the content values between 0.58 and 1.32% are particularly scarce, appearing only in a very few areas of Xinjiang. The majority of content values range between 0.08 and 0.11%, distributed across regions including the central-northern part of Shanxi, the northern part of Hebei, the central-southern part of Gansu, the central part of Ningxia, the eastern part of Qinghai, and the eastern part of Inner Mongolia. Some content values are between 0.11 and 0.15%, mainly found in a few areas of Inner Mongolia, Hebei, Shanxi, and Gansu. Additionally, there are fewer content values ranging from 0.15 to 0.58%, primarily located in the central regions of Inner Mongolia. Overall, the A. IV content in Inner Mongolia is higher than in other regions, consistent with the research findings of Yang et al.51, indicating that the A. IV content in most suitable planting areas of Shanxi, Inner Mongolia, and Hebei can meet the requirements of the Chinese Pharmacopoeia, whereas some suitable planting areas in Gansu, Ningxia, and Shaanxi do not meet the standard.

The highest predicted content of C. glucoside reaches 0.26%, with high-content areas primarily located between 0.14 and 0.26% in southern Gansu, central Shaanxi, and eastern Inner Mongolia. These areas have a smaller distribution footprint, among which the content distribution in the Inner Mongolia region differs from the study by Yang et al.51. Content values between 0.10 and 0.14% are also relatively scarce, mainly surrounding the high-content distribution areas. Most of the predicted values of C. glucoside content range between 0.06 and 0.10%, chiefly located in southern Gansu, northern Shaanxi, most of Shanxi, northern Hebei, and the central-southern region of Inner Mongolia. Low-content areas between 0.02 and 0.06% are mainly found in central Gansu, central Ningxia, northern Shaanxi, and the central-eastern region of Inner Mongolia. The content of C. glucoside performs better compared to A. IV content, meeting the pharmacopoeia requirements in most suitable planting areas.

The highest predicted value of O. content reached 13.38%, with the high-content area, ranging between 7.32 and 13.38%, primarily distributed in southern Gansu, central and southern Shaanxi, and the northwest corner of Henan. Overall, the predicted O. content shows a trend of gradual decrease from south to north, and within the Inner Mongolia Autonomous Region, the O. content gradually increases from the central area to the eastern area.

The predicted C. content peaks at 0.83%, with the high-content area, ranging from 0.60 to 0.83%, primarily located in the northeastern part of Inner Mongolia. The predicted values of C. content exhibit a pattern of gradual decrease from north to south, with the central southern and southeastern parts of Inner Mongolia having C. contents between 0.53 and 0.60%. In certain areas of Shanxi, Shaanxi, Hebei, Ningxia, and Gansu, the predicted C. content ranges from 0.42 to 0.53%, while the southern part of Gansu and the central southern area of Shaanxi have the lowest C. content, ranging from 0.00 to 0.42%.

The predicted content of F. can reach as high as 0.54‰, with few high-content areas ranging between 0.22 and 0.54‰, mainly located in the eastern part of Inner Mongolia and the southeastern region of Gansu. Areas with content ranging between 0.14 and 0.22‰ are also scarce, primarily surrounding the regions of highest content. Within A. bunge, the content of F. is extremely low compared to other SMs. The predicted F. content mostly ranges between 0.00 and 0.07‰, and in some suitable planting areas, it even falls below 0.00‰, such as in most parts of southern Shanxi and northern Shaanxi. There are also a few areas with content ranging between 0.07‰ to 0.14‰, mainly distributed in the eastern part of Inner Mongolia and the southeastern region of Gansu.

Discussion

For plants, various environmental factors will continue to affect the entire process of their growth and development. A. bunge has strong cold and drought resistance, prefers cool climates, but it needs to grow in an environment with sufficient sunlight. Meanwhile, to ensure the oxygen permeability and nutrient absorption of its roots, it is suitable for growing in areas with deep soil layers, loose soil, and good water and air permeability. Li et al.52 found that the standard deviation of seasonal temperature changes, average precipitation in October, and total contribution rate of sunshine duration during the growing season of A. bunge in potential suitable growing areas in Inner Mongolia Autonomous Region reached 40%. However, current overharvesting and overgrazing have resulted in the destruction of suitable habitat for A. bunge. Due to environmental degradation and human activities, various environmental factors are undergoing changes, which will indirectly affect the distribution of A. bunge and the quality of it53,54. Therefore, more emphasis must be placed on the conservation and efficient utilization of this species.

Simulating the geospatial pattern of the contents for effective medicinal plant components faces one challenge: limited data45. The accuracy and richness of environmental variables and the spatial distribution points of species are essential conditions for niche modeling, yet the data collection for many variables is quite challenging55. Collecting data on components is difficult because few studies have been done on the spatial patterns of the five SM components. Moreover, due to the impact of differences in survival traits among species, in view of the characteristics of a species when establishing an accurate ecological niche becomes even more challenging56. In this study, on the basis of a sufficient collection of A. bunge distribution points and assuming no external driving variables, such as strong human interference, habitat fragmentation, natural disasters, excessive collection, or other human activities, we simulated the spatial dynamics and niche of A. bunge based on environmental parameters and occurrence data.

The model results show that the suitable distribution areas for A. bunge are mainly located in northeastern, northern, and northwestern China, with high suitability concentrated in most areas of Shanxi, southeastern Gansu, central southern Inner Mongolia, northern Hebei, northern Shaanxi, and southern Ningxia, which is similar to Yang et al.13 research finding. Furthermore, we used a stepwise regression-based model to predict, across China, the content of five SM components characterizing A. bunge, and the F-statistic and R-squared for GQM results performed well. We obtained the final comprehensive geospatial distribution for five SM components of A. bunge by coupling the results of these two models. The findings will promote the sustainable use and appropriate conservation of A. bunge in China by indicating which areas can potentially host larger A. bunge populations with high-quality SM components.

Effects of environmental variables on ecological suitability of A. bunge

The ‘variables_importance’ function, built into the Biomod2 platform, can calculate the contribution rate of various environmental variables to the ecological suitability prediction for A. bunge57. No two models are completely identical among the top 5 contributing environmental variables of a single model, with each exhibiting more or less difference. Generally speaking, among the top 5 important environmental variables across all models, Bio12 appears most frequently. This also underscores the critical role of precipitation in the growth of A. bunge. Variables such as Hf, Ele, and Bio3 also appear multiple times, indicating their significant impact on the growth of A. bunge.

The contribution rate of environmental variables in the composite model is determined by calculating the weighted average of the contribution rates of environmental variables from the individual models that participate in the composite model construction, based on their TSS values. This results in the identification of variable importance in the composite model, with the top five contributing environmental variables presented in Table 9. From the perspective of environmental variable types, similar to previous research findings, our results indicate that bioclimatic factors are the primary variables affecting habitat suitability. Among these, the contribution rate of annual precipitation(Bio12, 13.4%) ranks first, suggesting that annual precipitation is the most crucial factor influencing the ecological niche of A. bunge. Precipitation provides adequate water to alleviate soil drought, supply nitrogen fertilizer to the soil via nitrogen compounds in snow meltwater, and moisten the underground roots to aid perennial plant germination in the following spring. Furthermore, human footprint (Hf, 6.6%) is another influencing factor for A. bunge, as its growth environment is primarily found in grasslands and hill slopes. The grassland activities such as human grazing and farming continuously alter the growth environment of A. bunge, thereby affecting its ecological suitability. Additionally, temperature plays a crucial role in plant survival, especially for species distributed in arid areas where living conditions are harsh58,59. Hence, the contributions of Mean temperature of driest quarter (Bio9, 6.03%) and isothermality (Bio3, 5.79%) to the model are also relatively crucial.

Table 9 Contribution rate of the top five environmental variables in the composite model.

The key role of environmental variables in five SM components

Research has shown that changes in various environmental factors directly affect plant growth and the accumulation of effective SMs. Wang Yu et al.50 found through research that a decrease in light intensity enhances the comprehensive effect on the secondary metabolites of A. bunge. The total content of A. IV, C. glucoside, and C. increases under dark conditions. While high light is beneficial for the accumulation of O.. Wang et al.60 found that the content of O. increases with the increase of annual average temperature, which is consistent with the research conclusion of Yang et al.61 that the average temperature in July has a positive effect on the content of C. glucoside and O.

The regression coefficient results largely indicate the importance of precipitation, wind speed, solar radiation, and soil type on the content of A. IV; solar radiation, altitude, and soil content have a significant impact on the accumulation of C. glucoside; solar radiation, precipitation, soil texture, and nutrient retention capacity play a restrictive role in the content of O.; solar radiation, precipitation, and temperature are significantly related to the content of C.; temperature and solar radiation; soil nutrient retention capacity has a larger impact on the accumulation of F.. At the same time, the environmental factors affecting the accumulation of various secondary metabolites differ, and the same environmental variables have different effects in the accumulation processes of different secondary metabolites, such as wind speed, which has a negative impact on the accumulation of C. and F., yet plays a positive role in the accumulation process of A. IV. A. bunge is able to overcome and adapt to environmental stress by generating five SM components, which are important indicators for assessing the quality of it62. Temperature, precipitation, wind speed, and soil availability all have a significant impact on the physiological activity and accumulation of SMs in plants63. Further analysis of the independent variables in five regression models revealed that solar radiation has a certain degree of influence on the accumulation of flavonoid secondary metabolites in A. bunge, as it can directly regulate the plant’s photosynthesis and respiration, thereby affecting the plant’s production and reproduction64. As a perennial deep-rooted plant, A. bunge's survival and growth process is directly affected by soil variables such as soil texture, soil nutrient retention capacity, and rooting conditions, explaining the significant correlation of soil variables in the relationship model results. Moreover, temperature is another important factor affecting the accumulation of SMs in A. bunge because it can trigger various mechanisms to deal with heat stress at the molecular level, enhancing its heat tolerance and growth vitality, also promoting the accumulation of five kinds of SMs of A. bunge.

To summarize, the five SMs of A. bunge exhibit significant differences across different areas, indicating that the content of A. bunge SMs is constrained by regional conditions. Furthermore, the environmental variables most significantly affecting the content of the five SM components are predominantly bioclimatic variables such as solar radiation, seasonal variations in precipitation, wind speed, the annual average dryness index, altitude and rooting conditions.

Conservation and cultivation strategies for A. bunge

Chinese herbal materials such as A. bunge are experiencing ongoing habitat changes or loss due to global climate change and increased human activities, transitioning from widespread contiguous distributions to fragmented, scattered distributions, which severely limits the sustainable development of the Chinese herbal medicine industry. In our research, Inner Mongolia, Shanxi, Gansu, Ningxia and other regions are important suitable habitats for bungee jumping. The climate, soil and other conditions in these areas are suitable for the growth and development of bungee jumping, but due to human activities, the local microclimate, land types, etc. of these areas are gradually changing, which may further affect the growth and accumulation of SMs of A. bunge in the future. Therefore, they are supposed to be considered as the top priority for restoration. Medicinal species have a crucial impact on the health of most people worldwide, creating immense economic value and social benefits. A. bunge, as the main source of the Chinese herb Astragalus, possesses extremely high medicinal and preservation value, with a market demand that is steadily increasing. Therefore, identifying the ecologically suitable areas producing high-quality SMs for the cultivation of A. bunge is of paramount importance. This study employs CGQM based on ESI and GQM to delineate the production areas for A. bunge, subsequently proposing rational protection and management strategies for its cultivation.

Furthermore, based on the results of the zoning of the five types of SM production, there is a noticeable difference in the spatial distribution of the five types of SMs, with almost no region displaying high levels of all five products. Overall, the A. IV content increases from south to north. If a higher A. IV content in A. bunge is required, the planting area could be set in Inner Mongolia. Other areas like Shanxi, Inner Mongolia, and Hebei are mostly suitable for planting, where the A. IV content can meet the requirements of the Chinese Pharmacopoeia. A. bunge produced in this region can be used by manufacturing companies that do not have high demands for A. IV content; The contents of C. glucoside and F. are both high at the northern and southern ends, with lower levels in the middle. If higher contents of C. glucoside and F. are required, the planting area should be selected in southern Gansu and eastern Inner Mongolia; The O. content gradually increases from north to south, with southern Gansu, central and southern Shaanxi, and the northwestern corner of Henan being relatively high-quality planting areas. However, within Inner Mongolia, high-content planting areas should be chosen in the eastern regions; The C. content gradually decreases from north to south. Manufacturing companies with high demands for C. content can establish planting areas in northeastern Inner Mongolia to achieve the optimal benefit for the A. bunge industry.

In the actual cultivation process, it is essential to fully consider the varying characteristics of A. bunge in different areas, while also integrating local labor conditions, cultivation techniques, and survival potential among other influencing factors, to carry out efficient planting and protection of A. bunge in a manner best suited to local conditions. Furthermore, based on the diverse demands of different production companies for SM, priority planting areas can be determined to maximize the value of A. bunge.

Conclusion

This article utilizes the integrated model of the Biomod2 platform to simulate the suitable habitats for A. bunge in China and discusses the impact of environmental variables on A. bunge. In the model evaluation, the GBM, RF, FDA, and GLM models showed better results. However, compared to individual models, the integrated model improved prediction accuracy and more accurately simulated the potential suitable habitats for the species. We analyzed the key environmental variables affecting the distribution of A. bunge and found that Bio12 is the critical environmental variable constraining the distribution of A. bunge.

This paper also explores the relationship between five types of SMs and environmental variables, conducting research on the zoning of A. bunge production areas based on CGQM. Overall, according to the results of correlation tests with environmental variables, the environmental variables influencing the accumulation of each SM vary, with bioclimatic variables such as solar radiation, precipitation, wind speed, altitude, and rooting conditions having a significant impact. Based on the zoning map of A. bunge production areas, it was analyzed that there are differences in the spatial distribution of the five types of SMs, and there are almost no regions where the content of all five metabolites is high. Therefore, in artificial cultivation of A. bunge, different planting areas can be selectively used for different purposes. Moreover, the accumulation of SMs in medicinal plants is also related to planting methods, years of cultivation, cultivation techniques, and other variables, which should be considered from multiple perspectives in actual artificial cultivation.