Introduction

Species distribution models (SDMs), which are commonly used in theoretical and applied research in ecology and biogeography1,2, have seen advances in application in recent years, as computer processing power and the amount of available environmental and species distribution data have increased3. The most common applications are the simulation of suitable habitats for species4 and conservation planning5. Each of these applications relies on the model having good predictive performance.

SDMs are widely used for marine species, with fish being the most frequently studied group6,7,8. Thanks to commercial fishing activities, data from commercial catches are available6, leading to the thorough assessment and management of commercial species9,10,11,12. In contrast, it is difficult to obtain reliable data for rare or protected species in complex marine environments13,14,15, making the application of SDMs less common, despite the critical importance of conserving and managing marine species16,17. Factors such as climate change and human pressures18,19 are accelerating changes in the geographical distribution of species20. Identifying the spatial distribution of fish is crucial for population assessment and management21,22,23, with habitat distribution prediction being a particularly important aspect24.

C. nasus, a coastal migratory fish, inhabits the northwest Pacific regions of China, Japan, and Korea25. In China, it is mainly found in the Bohai Sea, Yellow Sea, East China Sea, and the middle and lower reaches of the rivers that flow into these seas25,26,27. Based on reproductive habits and morphological characteristics, C. nasus can be categorized into three ecological types: anadromous, freshwater resident, and freshwater landlocked28,29. The anadromous type spawns in rivers and grows in the sea, while the freshwater resident and landlocked types live their entire lives in freshwater habitats30,31. Factors such as water pollution, intensive fishing, destruction of juvenile fish resources, and barriers to migration caused by hydroelectric projects have led to habitat degradation, significant population decline, and loss of ecological function for C. nasus32. As an important economic fish in China, C. nasus was listed as "Endangered (EN)" on the International Union for the Conservation of Nature (IUCN) Red List in 201825. Understanding the habitat preferences and spatiotemporal distribution of C. nasus is crucial for its conservation and management, as well as for ecosystem restoration. Current research on C. nasus mainly focuses on feeding habits33, habitat history34, and growth characteristics26, with a lack of direct studies on its habitat.

SDMs can accurately predict potential habitats by quantifying the relationships between a species' spatial distribution and a set of environmental variables35. The maximum entropy model (MaxEnt)36 is one of the most commonly used methods for studying species distributions6,37,38. This model can better describe the complex relationships between species habitats and the environment than traditional modeling approaches39,40. As a machine learning method, MaxEnt does not require assumptions concerning the relationship between predictor variables and species distributions, and thus it provides flexibility in describing complex relationships among predictive factors23,41. When data are limited, MaxEnt is considered effective36, as it is capable of using partial data to model species distributions6.

Therefore, this study employed habitat suitability modeling based on the MaxEnt model to assess the potential impact of seasonal environmental variables on the spatiotemporal distribution of C. nasus. The goal of this work was to analyze and map the seasonal habitat preferences and distribution of C. nasus in southern Zhejiang. The results provide a reference for establishing protected areas and for the sustainable use, conservation, and management of C. nasus populations.

Materials and methods

Fishery data

The MaxEnt model36 used in this study only requires presence records of C. nasus. Species data were obtained from fishery resource surveys conducted during different seasons from 2016 to 2020. The survey areas included estuaries, bays, and coastal waters, and the survey vessels used bottom trawling. A total of 248 stations were set up within the range of 26° 30′ N–29° 30′ N and 120° E–123° E, and 273 C. nasus specimens were found (Fig. 1). All records were assigned to one of four seasons (Table 1) based on the survey dates. The fishery record data included information on the operation date, net deployment and retrieval positions, trawling duration, mesh size, and the biological characteristics of the catch.

Figure 1
figure 1

Distribution of C. nasus catch points (small black dots) during 2016–2020. Made by ArcMap 10.4.1 (https://www.esri.com/) using China map data from the standard map service system of the Ministry of Natural Resources of China (http://bzdt.ch.mnr.gov.cn/) and the survey data of this study. The subsequent maps are the same.

Table 1 Distribution of the number of stations and the catch of Coilia nasus by season.

Environmental data

When selecting environmental variables, we considered the biological characteristics of C. nasus32, its ecological type (warm temperate, coastal migratory)25, and information from previous studies33. We chose sea water temperature at the sea floor (BOT), chlorophyll-a concentration (CHL), dissolved oxygen (DO), ocean mixed layer depth (MLD), sea surface height (SSH), sea water salinity (S), and sea surface temperature (SST) as variables for our habitat model to predict the spatiotemporal distribution of C. nasus. All marine environmental data were downloaded from the COPERNICUS website (http://marine.copernicus.eu/). We used seasonal average data from 2016 to 2020 and coupled it with species occurrence data. Detailed information for each variable is provided in Table 2.

Table 2 Environmental variables used in the study.

Modeling with MaxEnt

MaxEnt is one of the most widely used modeling techniques for simulating species distributions, especially for predicting species distributions from partial data6,37,38. MaxEnt was designed to work with presence-only data and thus requires relatively little information36. MaxEnt has a robust prediction accuracy, even with small sample sizes42,43,44,45. These characteristics made it suitable for modeling the relatively small amount of fishery survey data in this study to accurately provide species occurrence records and biological information.

Treatment of environmental variables and sampling bias correction

In this study, environmental variables were temporally divided into winter (January–March), spring (April–June), summer (July–September), and autumn (October–December). The variance inflation factor (VIF) was used to avoid multicollinearity, with all VIF values being less than 10 (Table 3). Monthly average data of different environmental variables were processed in ArcMap (version 10.4) to calculate seasonal averages for model coupling. For MaxEnt, all input data must have the same resolution to ensure compatibility. Using the resampling function in ArcMap, the spatial resolution of environmental data was standardized to 0.05° × 0.05° for model inclusion. The importance of each environmental factor was analyzed based on its contribution rate.

Table 3 Correlation analysis of environmental factors.

MaxEnt modeling operates under the assumption of systematic or random unbiased sampling of the study area, which is often difficult to achieve in practice. Furthermore, MaxEnt is susceptible to sampling bias46, and thus it requires bias correction for species data. In this study, the most common spatial filtering method was used. ENMTools (version 1.3) was employed to retain species data points based on the spatial scale of environmental variables47. Additionally, background sampling can reduce spatial bias and model uncertainty48. Therefore, this study used ArcGIS software to restrict MaxEnt background sampling within a 95% kernel density contour of all occurrence data, and a bias file was created to enhance the model’s accuracy.

MaxEnt modeling solutions

Modeling was performed using MAXENT 3.4.1 software (https://biodiversityinformatics.amnh.org/open_source/maxent/), and 75% of the records of occurrence of C. nasus samples were randomly chosen as the training set each time, while the remaining 25% of the occurrence data points were used as the test set. In this study, MaxEnt's feature parameter options included Linear features, Quadratic features, Product features, and Hinge features. Random seed was selected in the base setting; the regularization multiplier parameter was set to 1, and for the Repeat run mode we chose subsample and repeated the calculation 100 times for each month. For model output, we chose Logistic as the result output format.

Evaluation of model performance

We evaluated the predictive ability of MaxEnt using the size of the area under the receiver operating characteristic curve (AUC)36, which is the most commonly used method for assessing SDM performance49. This metric quantifies the model's ability to distinguish between actual species occurrence points and pseudo-absences, with an AUC value closer to 1 indicating a higher predictive accuracy44. As a modeling method based solely on presence data, specific thresholds can also be derived from logistic output to assess the model's performance36. Determining thresholds also involves considering the relative importance of both omission and commission errors. True skill statistics (TSS)50 is a widely accepted metric for evaluating both types of errors42. TSS values range from 0 to 1, with higher values indicating better model performance50. In this study, AUC and TSS values from both the test and training sets were used as indicators to evaluate the overall performance of the model.

Correlations of environmental variables

In this study, the contribution ratio provided by the software was chosen to measure the correlations of environmental factors. In this system, the correlations of environmental factors are measured by the percentage of contribution rate produced by the model, and the larger the ratio, the greater the correlation. The environmental factors with higher correlations were selected to plot response curves as well as to plot the distributions of the environmental factors with the highest correlations in each season to compare with the predicted habitat distribution results.

Mapping of habitat suitability distributions

MaxEnt's logistic outputs can be interpreted as the probability of the presence of the target species51,52, and when MaxEnt's outputs are used in qualitative analytical explorations, they can be interpreted as a habitat suitability index (HSI)51. This index ranges from 0 to 1, with values close to 1 indicating higher habitat suitability53. The spatial and temporal patterns of the HSI distribution of C. nasus were plotted and compared according to the outputs of the habitat model in different seasons to visualize and analyze the changes in the distribution characteristics of the habitat for C. nasus in each season.

Results

Evaluation of the model’s predictive performance

The AUC values for both the training and testing datasets in each season were above 0.8 (Table 4). This indicated that the selected environmental variables played significant roles in habitat selection for C. nasus. The habitat suitability simulation results demonstrated high discriminatory ability, suggesting that the model was adequate for studying the distribution of suitable habitats for C. nasus in southern Zhejiang. Furthermore, the TSS values for each season were high, indicating good model performance (Table 4).

Table 4 AUC and TSS values of model results for each season.

Contributions of environmental variables to the model

The percentage contributions of the environmental factors in the MaxEnt model for each season indicated (Fig. 2) that the seasons showed a pattern in which the correlation of a single factor was the most prominent. Among winter, spring, and fall, S had the highest contribution, with values all higher than 60%, while in summer BOT had the highest correlation to the distribution of habitat suitability for the knife crake, also above 60%.

Figure 2
figure 2

Contribution-based correlations of environmental factors in different seasonal MaxEnt models.

Spatial and temporal distribution for C. nasus

Regarding habitat suitability for C. nasus (Fig. 3), the distribution was the widest in winter, after which the habitat range gradually contracted, becoming narrowest in summer, and then expanding in autumn. However, the distribution of the most suitable habitat (Fig. 3, warmer colors) was roughly the same in all seasons, being primarily in brackish or marine waters such as Yueqing Bay, Dongtou, and the Yuhuan Outer Sea.

Figure 3
figure 3

Seasonal distribution of habitat suitability for C. nasus.

Response curves and optimal ranges of environmental variables

In this study, the two environmental factors with the highest relevance to the distribution of suitable habitat for C. nasus were selected to plot response curves (Fig. 4). In addition, the actual distributions of the factors with the highest correlations with habitat suitability were plotted for each season (Fig. 5). In the seasonal salinity trends (Fig. 4A), there were clear peaks in winter and summer, with the highest values of the habitat suitability index at salinities of 22 and 32, respectively; the habitat suitability in spring and autumn showed decreasing trends with increasing salinity, with a rapid decline in autumn until the salinity reached < 24, and then leveling off. This was followed by a gentle decline in spring until the salinity reached 30, and then a slow decline with increasing salinity. In the spring, there was a gentle decline until the salinity was < 30 followed by a sharp decline with increasing salinity, with the habitat suitability index approaching zero at a salinity of 33.

Figure 4
figure 4

Response curves of major environmental variables to habitat suitability for C. nasus (A is the response curve for salinity; B is the response curve for seafloor temperature).

Figure 5
figure 5

Example of the actual distributions of environmental variables with the highest correlations in each season.

For the seafloor temperature (Fig. 4B), the habitat suitability for C. nasus gradually increased in winter up to 13 °C, reaching the highest value between 13 and 17 °C and remaining stable, and then decreasing with increasing temperature; in spring and autumn, there were single peaks, with the highest suitability at 21 °C and 28 °C, respectively; in autumn, the suitability continued to increase with the temperature, reaching the highest at 27 °C and then remaining stable.

The highest correlations with habitat suitability for C. nasus in each season (Fig. 5) were for salinity in winter, spring, and autumn, but there were differences in the extent of the influence on the habitat suitability. Compared with the seasonal distribution of C. nasus habitat suitability (Fig. 3), we found that in winter, C. nasus was largely distributed in waters with salinity < 30, and the salinity of the highly suitable distribution area ranged from 14 to 18; in spring, C. nasus was also largely distributed in waters with salinity < 30, and the salinity of the highly suitable distribution area ranged from 24 to 28. In autumn, the salinity trend was the most significant, and the main distribution area for C. nasus was in the sea, with salinity < 30, and the salinity in the highly suitable distribution area was between 16–20. In summer, the correlation of seafloor temperature was the most important, with the main distribution area of C. nasus being in the sea area with a temperature > 26 °C, and the temperature in the highly suitable distribution area ranged from 28 to 30 °C.

Discussion

The maximum entropy model and its applicability for rare species

All of the models used in this study achieved AUC values above 0.8, indicating that they had good overall predictive performance. The TSS values were also high (see Table 4), indicating low error rates in predicting the presence of the species. SDMs are reliable tools for predicting species distributions and hence guiding conservation and management efforts. The IUCN has employed SDMs to assess the range of extinction risk for species54. The reliability of SDMs is influenced by data availability42,43, appropriate predictor variable selection55, and model accuracy45. For this study, despite the catch data for C. nasus across seasons being highly credible survey data, the information generally covered a limited range in terms of time and space56. This is particularly true for rare or scarce species, where records often only indicate occurrence57. Such incomplete distribution information poses challenges for evaluation and assessment58. Partial data simulation is necessary to model species distributions under such conditions of limited data6. In such cases, MaxEnt is considered a viable alternative36, at times performing better than traditional methods41,59,60. MaxEnt, as a machine learning method using "presence-only" data49,61, has been widely applied in habitat modeling for various marine taxa, and it remains the most common approach6,7. MaxEnt excels in modeling distributions of rare species57,62. Based on the evaluation criteria for our model, the habitat distribution results for C. nasus in the southern waters of Zhejiang generated by MaxEnt are highly reliable and can support the delineation of conservation areas for this species.

Habitat utilization and environmental preferences of C. nasus

The coastal waters of southern Zhejiang are part of the central-southern core of the East China Sea. Influenced by the Zhejiang Coastal Current, Taiwan Warm Current, and Kuroshio Current, this region boasts excellent hydrological conditions, abundant nutrients, and plentiful food organisms63,64,65,66, creating an ideal habitat for C. nasus. The spatial and temporal distribution of C. nasus in this region is relatively uniform, primarily concentrated in bays and shallow areas near islands and the coast. This may be due to the complex seabed topography and diverse geology between the islands, along with the rich hydrological environment, which supports various fish species67. The habitat range of C. nasus gradually decreases following winter, reaching its lowest extent in summer and then expanding significantly in autumn. The summer habitat range is narrower compared to the other three seasons (Fig. 3), which is consistent with the findings of Liu et al.68 regarding the spatiotemporal niche of C. nasus in this region. This may be due to reproductive migration. Coilia nasus in the Oujiang Estuary is a warm-temperate migratory species68 that prefers warmer waters32. The species exhibits homing behavior in fixed river channels during the reproductive migration69. Several rivers are present in this region, with the Oujiang River being the largest and the primary reproductive migration base. Coilia nasus in the Oujiang Estuary begin migrating upriver into the Oujiang from March to April70 and then return to the sea for overwintering when the water temperatures drop in October67. This seasonal pattern aligns with the study’s findings concerning the changes in the habitat range of C. nasus (Fig. 3).

Except for in summer, the habitat distribution of C. nasus in this study is closely related to seawater salinity. This matches the ecological habits of migratory C. nasus30,34, as they mainly inhabit areas with lower salinity, consistent with the findings of Chen et al.67. Juvenile C. nasus hatch and feed in low-salinity environments before moving to higher-salinity areas for overwintering. In the Oujiang Estuary, C. nasus may be ecologically isolated from those in the Yangtze and Yellow Rivers69,71, but their growth characteristics are similar to those of the Qiantang River population72. Similar to the Qiantang River C. nasus, the species in our study area spends a short life stage in freshwater, only spawning and breeding there72. Most of their time is spent in brackish or marine waters, and they can migrate flexibly between these environments71,73,74.

Limitations and prospects

This study uses fishery survey data and environmental data to model and analyze the habitat distribution and environmental preferences of C. nasus in the southern Zhejiang sea area. However, the limited sample size and data coverage, particularly across different seasons and regions, may affect the representativeness and generalizability of the results. Otolith microchemistry habitat histories reveal that the C. nasus population in the Oujiang River area includes freshwater-estuarine brackish water and freshwater-estuarine brackish water-marine types, each with different life stages69,71,75. This study, however, only depicts the habitat distribution of C. nasus, and it lacks detailed descriptions of migration paths and behavior patterns between different water bodies and growth stages, information that is crucial for developing conservation and management strategies. Therefore, future research should consider these factors. Additionally, there are differences in environmental preferences and feeding habits between adult and juvenile fish. Considering the distribution of C. nasus at different life stages may be more meaningful for resource conservation and management and provide a scientific basis for these efforts.