Introduction

Traditional Chinese medicine (TCM) resources serve as both a national strategic resource and an essential source of raw materials for the growth of the Chinese medicine business1. Medical, health, and livelihood programs are connected to the steady and sustainable growth of Chinese medicinal ingredients2. The total area planted with Chinese medicinal plants (including wild plants) in the country is estimated as more than five million hm2. Nearly 300 of more than 600 regularly used Chinese medicinal plants are cultivated3. Since 2015, Mongolian Chinese medicinal plants have grown rapidly in Naiman Banner, Inner Mongolia. The total planting area of Chinese medicine in Mongolia has increased to 100 km2. This supplies medicinal ingredients to 15 townships, exceeding 40 types of plants. With more than 1000 acres of growing space, there are five demonstration areas. The ecology for cultivating Chinese medicinal materials has first developed across the whole span4.

The formation and characteristics of TCM resources are closely related to spatial ___location, and species distribution models (SDMs) using specific algorithms to correlate species distribution with environmental variables, thereby predicting the potential distribution of species in different geographic space and time. This probabilistic response reflects habitat suitability and provides conditions for further research on the regional distribution of target species. As one of the widely used SDMs in recent years, the Maximum Entropy (Maxent) model has better modeling performance than other SDM models and has been successfully used in predicting the suitable habitat of plants and animals5,6. Remote sensing (RS) technology is an important source for obtaining effective data in the modernization of TCM resources and the technical system of traditional Chinese medicinal materials. It can provide a large amount of temporal and spatial information on TCM resources and inform the production and management of traditional Chinese medicinal materials. These technologies are gradually being applied to the field of TCM7. Obtaining a wide range of data with high efficiency will allow quantitative expression of TCM resource reserves, habitat status and changes, resource adjustment, and planting structure. The omnidirectional information to predict ecological environment monitoring, data management, data analysis, and the generation and visualization of results, could compensate for the shortcomings of traditional investigation methods8,9.

Glycyrrhiza uralensis Fisch. is a perennial herb belonging to a leguminous genus. It may cleanse and remove heat from the body, replace the spleen, soothe coughs, moisten lungs, and reconcile numerous medications10. Northeast China, north China, and northwest provinces and regions account for the majority of its distribution. This study used Naiman Banner in Inner Mongolia as a study area. First, MaxEnt was used to predict the suitable growth areas for G. uralensis and to monitor the ecological factors affecting its production. Secondly, RS technology was utilized to select training samples by integrating the field sampling data from the ground positioning function of the Global Positioning System (GPS) and to extract feature variables to obtain the optimal feature combinations. Finally, an RS extraction study for G. uralensis distribution information (including artificially cultivated and wild G. uralensis) in Naiman Banner was conducted using the random forest (RF) classification algorithm. This study provides basic data for the formulation of the production layout of G. uralensis industry at the county level, control of the direction of the cultivation industry, establishment of a high-quality G. uralensis cultivation technology system, and theoretical guidance for the development of G. uralensis industry and rural revitalization plan.

Materials

Overview of the study area

On the southern boundary of the Horqin Sandy Land, Naiman Banner is southwest of Tongliao City in Inner Mongolia (longitude 120° 19′ 40″ E−121° 35′ 40″, latitude 42° 14′ 40″−43° 32′ 20″ N). It is 68 km wide and 140 km long from north to south, with a total area of 8137.6 km2. The southern part of the Naiman Banner is the edge of the western Liaoning Mountains, dominated by shallow hills, with an elevation of 400–600 m. This area has experienced structural water erosion, with vertical and horizontal valleys. The middle section is dominated by wind-eroded sand deposits, forming a desert region, with two sand-marsh regions running east and west. The central and northern areas are alluvial plains with flat and open terrain. The whole area slopes from southwest to northeast, with an average elevation of 450 m. It experiences a dry winter and spring, humid summer and fall, simultaneous rain and heat, and an average annual temperature of 6.0–6.5 °C. Thus, it has a continental monsoon desert climate11,12.

Ground data acquisition

To facilitate image analysis for crop-type mapping, training and validation samples for crop types were developed using ground reference data. In this study, ground measurement data from a monitoring test site were used for crop evaluation. A field survey was conducted in Naiman Banner using GPS, geographic coordinates of different crop types were obtained, and photographs were taken to record the local landscapes. Sample points in the classification process were chosen at random in each plot, provided that the minimum distance between any two points was not less than 20 m. Sample points in the same sample region could only be used for testing or training; therefore, no spatial relationship existed between the samples used for training and verification13.

In a preliminary investigation in September 2019, the expedition team carried out the route planning, involved 13 towns in Naiman Banner. During the 5-day survey, 93 G. uralensis sample points were collected (Fig. 1). In this study, only the latitude and longitude information of wild and cultivated G. uralensis was obtained, and no sample collection was carried out, which met the relevant management regulations.

Fig. 1
figure 1

Distribution of G. uralensis sample points. Maps created in ArcGIS 10.8 (https://www.esri.com/zh-cn/arcgis/products/arcgis-desktop/resources).

Obtaining climate data

The global climate data website served as the source of all 19 climate data (with a resolution of 30 arc-seconds) and elevation data used in this investigation (http://www.worldclim.org/) (Table 1). To avoid overfitting of the model prediction caused by multicollinearity between environmental factors, we screened the variables according to the correlation analysis and contribution value. First, MaxEnt (version 3.4.1) was used to analyze the contribution value of 19 climate and elevation data, and then ArcGIS 10.8 software was used to extract the values of 19 climate and elevation variables at the G. uralensis locations, and SPSS software was used for Pearson correlation analysis. During the selection of environmental variables, variables with a correlation coefficient < 0.70 are preferentially reserved, and when the correlation coefficient was greater than 0.70, the environmental variables with higher contribution values were selected for the construction of the model. Based on these conditions, five environmental variables were screened to predict the suitable growth range for G. uralensis.

Table 1 The 19 climate data and elevation data.

Satellite data acquisition and preprocessing

Gaofen-6 (GF-6) is a low-orbit optical RS satellite used for precise agricultural observations. It is equipped with a 2 m panchromatic/8 m multi-spectral high-resolution camera and a 16 m multi-spectral medium-resolution wide-area camera with a large coverage area and high quality image14,15. This study obtained cloudless, preprocessed GF-6 image data from May to July 2019 (the entire Naiman Banner) from the China Centre for Resources Satellite Data and Application (i.e., after orthorectification, fusion registration, and having a resolution of 2 m, etc.) (http://www.cresda.com/CN/).

MaxEnt model

To estimate the geographic distribution of G. uralensis, MaxEnt software (3.4.1) was used, and 10-fold cross-validation was employed for model validation. Set the maximum number of iterations to 105, the convergence threshold to 0.0005, perform the calculation 10 times and use the default values for habitat suitability Extraction and analysis for the remaining values. The jackknife method was used to calculate the weight of each variable.

Principal component analysis (PCA) of RS images

PCA transforms a group of related variables into another group of unrelated variables through linear transformation. The largest extracted principal component variable contained all the information of the original variables to achieve dimensionality reduction and make the variables are independent of each other. Owing to the relatively high correlation between spectral bands, there was a duplication of information, which not only led to redundancy of information but also greatly increased the amount of computation. Therefore, a PCA of the multiband surface reflectance data for July was conducted.

RS image feature extraction and feature data set construction

Using the GF-6 images from June and July 2019 as the data source, first, the points of different types of features collected were superimposed on the RS image to establish the spectral characteristics of typical features and to analyze the relationship between the features and the bands. Second, a feature set is constructed. The feature set included three aspects: (1) eight original bands of multi-spectral data in June and July, (2) the establishment of vegetation index (VI) features in June and July, and (3) the use of the Grey-level co-occurrence matrix (GLCM) calculation method to extract the texture features of the first principal component (PC1) after the PCA in July16. These included the mean, variance, homogeneity, contrast, dissimilarity, entropy, second moment, and correlation as eight types of texture features. Eighteen features were obtained to construct the initial feature data set, and the detailed information is shown in Table 2.

Table 2 Initial feature data set.

RF classification of RS images

The RF algorithm is a combined algorithm based on the classification and regression Tree (CART) proposed by Breiman17. The method has high computational efficiency, good robustness to outliers and noise, avoids overfitting, and performs various statistical data analyses. Therefore, it has been applied in many studies in the field of RS image processing, such as RS image classification and agricultural and forestry model regression analysis18,19.

This algorithm must determine two parameters during operation: the number of variables (Mtry) and the number of trees (Ntree) of random subsets on each node of the RF20. In this study, ENVI software was used to classify the entire RS image. Through several experiments, considering the classification accuracy and computational efficiency, Mtry was set as the square root of all features, and Ntree was set to 100 for classification.

Accuracy verification

Accuracy evaluation of regional suitability uses commonly used models to predict the evaluation method area under the curve (AUC) of the receiver operating characteristic (ROC) curve, which is a non-threshold dependent index and is currently recognized as a diagnostic test evaluation index21.

RS data accuracy verification used the 2019 field ground survey sample data to generate a confusion matrix using overall accuracy (OA), Kappa coefficient (Kappa), mapping accuracy (Producer’s Accuracy, PA), and user accuracy (User’s Accuracy, UA) were used to perform quantitative evaluation.

Results

Evaluation of results based on MaxEnt model

The model performed well under the current climate change scenario, with an AUC value of 0.857. The results are shown in Fig. 2. The estimated ecologically appropriate G. uralensis regions had a high degree of reliability and accuracy.

Fig. 2
figure 2

AUC value of G. uralensis modeled by MaxEnt based on distribution data.

Main environmental factors affecting the distribution of G. uralensis

The contributions of environmental variables to the model and environmental variables to the distribution of G. uralensis were determined using the jackknife method. The results showed that annual precipitation (50.5%), the maximum temperature of the warmest month (32.4%), and elevation (15.6%) were the three leading environmental variables affecting the distribution of this species, with a cumulative contribution rate of 98.5% (Table 3).

Table 3 Contribution of environmental factors.

Evaluation of ecological suitability of G. uralensis in Naiman Banner

Based on the MaxEnt model, the results of climatically suitable areas for G. uralensis were calculated, and five levels were divided according to their distribution probability. In ArcGIS, overlay analysis was used to re-categorize the data, and a layout map of suitable areas for G. uralensis planting was produced (Fig. 3). The values ranged between 0 and 1. These values are divided into different groups: high suitability (> 0.7), good suitability (0.5–0.7), medium suitability (0.3–0.5), minimum suitability (0.1–0.3), and no suitability (< 0.1)22.

Fig. 3
figure 3

Climatic suitability areas of G. uralensis in Naiman Banner. Maps created in ArcGIS 10.8 (https://www.esri.com/zh-cn/arcgis/products/arcgis-desktop/resources).

Analysis of the average reflectance of the original band

For the convenience of analysis, this study drew a Digital Number (DN) value curve of the original bands of typical features in June (A) and July (B) (Fig. 4). The results showed that the trends of the typical features tended to be consistent. G. uralensis had a high value in band 4 in June and bands 1, 2, and 3 in July. The DN values of typical features in bands 3 and 4 in June and July are quite different, which could distinguish forestland from Zea mays L. In these two months, the DN values of sandy land in all bands tended to be the highest, which could be used as a distinguishing feature. All water bodies had a low value, and band 4 in June and July were the most obvious. In July, the DN values of buildings in bands 1, 2, and 3 were higher, which can be used as a distinguishing feature of buildings.

Fig. 4
figure 4

DN value curve of the original bands of typical features in June (A) and July (B).

Classification results of different feature combinations

RF models under the three feature combinations extracted from the images of June and July 2019 were used for G. uralensis recognition, where (A) represents four original bands in June, (B) four original bands in July, (C) the original band in June and July and two months, (D) the original wave band in June and July + VIs, (E) the original wave bands in June and July + texture features, (F) the VIs in June and July + texture features, and (G) the original wave bands + VIs + texture features in June and July. The results showed that Naiman Banner G. uralensis has large-scale spatial distribution characteristics (Table 4).

Table 4 Identification results of G. uralensis by random forest method (km2).

The distribution of G. uralensis based on the RF classification method is shown in Table 4. As the number of features increased, the distribution area became smaller and more concentrated. This may be because, as the features increase, the features of each place become increasingly concentrated. In the most accurate combination (G), G. uralensis was mainly distributed in Mingrensumu, Dongming, Guribanhua, and Daqintala and other areas. There were also a small number of suitable areas in other areas, such as the towns of Liuhaonongchang, Baiyintala, Huanghuatala, and Qinglongshan. These areas were similar to the suitability distribution results.

Comparison of classification accuracy of different feature combinations

The recognition accuracies for different feature combinations are listed in Table 5. As shown in Table 5, the overall classification accuracy of the ground objects under the combination of three features (spectral, VI, and texture features) exceeded 75%, and the Kappa coefficient exceeded 0.7. Among them, the classification accuracy using the “spectrum + VI + texture” feature combination was the highest, with the OA and Kappa coefficients reaching 96.6487% and 0.9599, respectively. Overall, the classification accuracy for June using only the spectral characteristics of a single month was the lowest. The OA and Kappa coefficients in June were 75.6388% and 0.7161, respectively, whereas those in July were higher at 91.4131% and 0.8972, respectively. Compared with the use of spectral features alone, the addition of VI features effectively improved the classification accuracy, and the OA and Kappa coefficients increased by 6.4903% and 0.075, respectively. Adding texture features to the spectral features also effectively improved the classification accuracy, and the OA and Kappa coefficients increased by 3.5733% and 0.0411, respectively. It is worth mentioning that the classification accuracy of ground could be improved by adding VI and texture, which contribute to an increase of both OA and Kappa coefficient respectively by 9.7474% and 0.1136.

Table 5 Recognition accuracies with RF based on different feature combinations.

From the perspective of the classification accuracy of a single feature (Table 5), using only the spectral characteristics of a single month, the accuracy of G. uralensis in June and July was < 50%, indicating that the spectral information from a single month could not effectively identify G. uralensis. However, the combination of June and July effectively improved the classification accuracy of G. uralensis. Compared with using only spectral features, the addition of VI features also greatly improved the classification accuracy of G. uralensis, and its producer accuracy and user accuracy increased by 9.36% and 0.19%, respectively. Adding texture features to spectral features can also effectively improved the overall classification accuracy; G. uralensis producer accuracy and user accuracy increased by 5.28% and 2.41%, respectively. The classification accuracy of the “VI + texture feature” in June and July was higher than the spectral feature, indicating that the VI and texture feature plays an important role in the identification of G. uralensis and the classification of ground objects. The “spectrum + VI + texture” feature combination obtained the best results. The G. uralensis producer accuracy and user accuracy were 97.06% and 84.86%, respectively, proving the effectiveness of the feature combination for feature recognition in Naiman Banner.

Analysis and comparison of RS and ArcGIS processing results

The results of suitability regional and the results of RS identification of G. uralensis were superimposed (Fig. 5) to analyze the rationality of planning for G. uralensis planting.

Fig. 5
figure 5

The results of G. uralensis suitability regional are superimposed with the results of RS identification. Maps created in ArcGIS 10.8 (https://www.esri.com/zh-cn/arcgis/products/arcgis-desktop/resources).

As shown in Fig. 5, the potentially suitable areas for G. uralensis based on the MaxEnt model, were mainly concentrated in the towns of Mingrensumu, Dongming, Yilongyong, Baxiantong, and Daqintala. Other areas, including Liuhaonongchang, Huanghuatala, and Sharihaolai, also had a small number of suitable areas. The areas of low suitability and unsuitability were mainly distributed in Guribanhua, Qinglongshan, and Xinzhen. The RS distribution and regional results showed that G. uralensis was mainly distributed in the towns of Dongming, Mingrensumu, Daqintala, and Guribanhua.

To quantitatively study the relationship between G. uralensis suitability regional and actual distribution, the regional map was converted into vector data, and the suitable area and calculated actual distribution area of each township in Naiman Banner were counted (Tables 4 and 6).

The results in Table 6 show that the area suitable for planting G. uralensis was 609.90 km2, the area of good suitable area was 947.57 km2, and the area of medium suitable area was 1 169.46 km2. The total area of G. uralensis monitored by RS (Table 4) was 219.59 km2, and the actual distribution of G. uralensis was much lower than the area of the suitability region, which could be appropriately expanded. Based on the statistical analysis of the administrative regions, it is possible to prioritize the expansion of high-suitability areas for G. uralensis planting, including Dongming, Yilongyong, and Mingrensumu. In addition, suitable areas can be appropriately expanded, including Baxiantong and Daqintala.

Table 6 Statistics of suitable planting area of G.uralensis in the towns of Naiman Banner.

Discussion

Recent research on G. uralensis has mainly focused on its chemical composition and pharmacological effects, with little research on its suitability and cultivation. Zhao et al. (2022) elucidated the potential of isoprenylated phenolic compounds isolated from G. uralensis extracts as lead compounds in osteoarthritis23. Duan et al. (2022) found that G. uralensis and its active components alleviated acute neurotoxicity induced by Semen Strychni partly through the HMGB1-related pathway24. Therefore, this study utilized the MaxEnt model and RS technique to study the cultivated and potential distribution areas of G. uralensis in Naiman Banner. The suitable area for G. uralensis cultivation was much larger than the existing cultivation area. Therefore, in the future, ecological planting should be carried out in potentially suitable planting areas, and natural reserves and G. uralensis cultivation technology systems should be established to promote sustainable development and ecological construction of the G. uralensis industry.

G. uralensis not only has unique medicinal value but also important significance in ecological construction. Relevant studies have shown that G. uralensis has a well-developed root system, which prevents wind, fixes sand, maintains ecological balance, and also enhances soil fertility, increases land cover, and improves the self-healing ability of the ecosystem25,26. In addition, G. uralensis is a perennial plant that promotes biodiversity by providing habitats for flora, fauna, and microbial communities27. Furthermore, excessive planting of G. uralensis may lead to related competition among species and increase the risk of pests and diseases, thus causing damage to the local ecological climate28. Therefore, when planting G. uralensis in suitable areas, rational planning and pest control should be carried out such that the benefits to local biodiversity and the environment are enhanced and negative impacts minimized.

Jaynes et al. proposed the MaxEnt theory in 195729. According to this model, American scholars Phillips, Anderson, and Schapire developed MaxEnt software using the JAVA language, which is easy to operate, fast, and requires only points of presence, making it suitable for our needs30. In this study, the prediction results of the MaxEnt model were similar to the distribution extracted by RS, reflecting the potential of using geographic information to predict the distribution of G. uralensis. Since the 1970s, the RS has entered a stage of rapid development and has been widely used in agricultural production monitoring. It has achieved excellent results in crop identification, growth monitoring, disaster assessment, and crop yield estimation. Our research on the distribution of G. uralensis plantations demonstrated that RS can be used to study the yield, distribution, and biological characteristics of medicinal plants. RS data compensate for this by their large coverage, timeliness, low cost, and repeatable acquisition. In addition, because of the limitations of the RS image resolution, acquisition method, quantity, and other factors, we can only conduct accurate research in a small area. In the future, we will gradually expand the scope of our research and apply RS technology to the cultivation and monitoring of G. uralensis and other medicinal plants.

This study had certain limitations. In the MaxEnt model to predict the suitability distribution, only climatic variables and altitude data were used. Other related conditions, such as soil factors and vegetation types, can also have an impact on the suitability of G. uralensis. In the future, we will combine soil, vegetation, and other related factors to further study to improve the accuracy of prediction results. In a survey of medicinal plant resources, the research and application of RS technology in Chinese medicinal resources was relatively limited31. It is mainly used for the monitoring of cultivated medicinal plants; however, its monitoring of wild medicinal plants is limited, and further experimental verification is required. Admittedly, because of the strong linear correlation of individual groupings of PCA when training RFs, the input data were highly correlated, leading to a slight overfitting, and there could be a slight error between the classification results from RS images and the actual situation, which is a common problem. More detailed field investigations and model designs are required in the future to reduce the error between calculations and reality. Furthermore, sample plots were randomly selected for onsite measurements and compared with the interpretation results to evaluate their accuracy.

Conclusion

In this study, we used MaxEnt to predict the suitable planting area of G. uralensis in Naiman Banner and used GF-6 images to monitor the planting situation of G. uralensis. These findings demonstrate that long-term series analysis is beneficial for understanding the phenological cycle and mode of vegetation growth, which exhibits different traits at various times and aids in identification. The RF classification algorithm based on multiple classification features has high accuracy in the RS recognition of G. uralensis. A comparative analysis of MaxEnt and RS demonstrated that the G. uralensis planting area was much smaller than the area of a suitable regional distribution and the planting area could be enlarged. It is conceivable to prioritize the extension of high-suitability places for G. uralensis planting, including areas in Dongming, Yilongyong, and Mingrensumu, according to the statistical analysis of administrative regions. This study successfully provides a new framework that combines RS and regional data to verify each other. The predicted distribution was similar to the distribution of G. uralensis extracted from RS, which demonstrates the potential of using geographic information to predict the distribution of G. uralensis, and verifies the potential of extracting the distribution of G. uralensis from GF-6 images. The prospects for G. uralensis planting and development in Naiman Banner are considerable, especially for poor villages and towns, where natural resources can be fully utilized to fully develop the G. uralensis economy. This is conducive to the optimization of the ecological environment and promotes the pace of rural revitalization plans, which are of great significance for producing an overall positive effect.