Background & Summary

Inorganic fertilizers are essential for replenishing the nutrients that are removed from soils during crop harvesting. The three main nutrients provided by fertilizers, nitrogen (N), phosphorus (P) and potassium (K), play a key role in plant functions. While N and P, which are basic components of nucleotides, proteins and membrane lipids, are essential in energy metabolism1,2, K is essential for the transportation of water, metabolites, and nutrients across plant tissues, for defense against oxidative stresses, and for the maintenance of osmotic homeostasis3,4. Although the first commercial inorganic fertilizers were developed in 1843, they were not the main anthropogenic inputs in the N, P, and K biochemical cycles until the second half of the 20th century5. Today, inorganic fertilizers dominate as the primary nutrient input in croplands, surpassing the second human input, manure, by over double5, and also serve as one of the main N input for grasslands6. This substantial surge during the 20th century not only facilitated the rapid growth in human population, but also had ecological and socioeconomic ramifications, such as water eutrophication, soil degradation, climate change, and mineral resource depletion7,8. In the remainder of this study, the term ‘fertilizer’ will refer to inorganic fertilizers, and all data and results regarding P and K will be presented in their oxidative forms (P2O5 and K2O, respectively), in accordance with common references in international standards and regulations.

Given their food security, socioeconomic and environmental implications, considerable research has been conducted to discern the temporal and regional trends in the use of N, P2O5, and K2O6,9,10,11,12. Nevertheless, limited availability of temporal global spatial information regarding their application across various crops have restricted these analyses to a few global and regional studies that primarily focused on N9,13,14. These studies initially estimated consumption at the country- and state-level using simple equations, based on a few crop-specific fertilization features and changes in crop surface area9,13,14, or using Bayesian Markov Chain Monte Carlo modeling15. A global, crop-specific fertilization dataset is crucial for understanding crop nutrient management practices worldwide, identifying past trends and current gaps in fertilization, guiding agricultural policies to improve crop yields while minimizing environmental impacts, and providing input data for modeling9. Therefore, we aim to address this knowledge gap by providing insights into the application rates of P2O5 and K2O while also seeking to improve estimates for N.

In order to accomplish this objective, we began by updating the panel datasets on cropland fertilization; enhancing the most comprehensive database developed by Ludemann et al.16 by incorporating global datasets covering data from the 1970s and 1980s17,18, country-specific data for European countries from 2001 to 201419,20,21,22. This compilation process led to a 35% expansion of the Ludemann et al.16 database. Second, the dataset was expanded with data of various potential socioeconomic, environmental, and agricultural drivers of cropland fertilization. Third, two ML regression models - XGB23 and HGB24, both capable of handling the prevalent missing values within the dataset24- were applied to predict N, P2O5, and K2O fertilizer application rates for the different crop classes over 60 years. Since these models are considered black-box models, feature importance was incorporated using SHAP25 values to identify the global socioeconomic, agricultural, and environmental drivers of cropland fertilization and to validate the ML models. Fourth, the predictions were validated on national databases. However, since the ML models were trained on global data, which show a discrepancy with the national data, the model predictions were first adjusted to match the total annual country-level N, P2O5, and K2O use in agricultural land, similar to previous studies9,10,11,12,26. Crucial in this adjustment was the fraction of total country-level fertilizer use allocated to grasslands and fodder crops, as an important portion of total fertilizer use in some countries is devoted to these areas, and little previous estimates existed5,26,27,28, especially for K2O5. Therefore, these fractions were estimated by reviewing scientific and technical information from 75 countries. The adjusted predictions were then validated using national databases of fertilizer application rates at the crop-level. Finally, the results were spatially allocated using crop maps of the year 2000, developed by Monfreda et al.29; the annual harvested area of each crop class in each country; and the spatial changes in cropland surface based on the Hyde v3.3. project30.

Methods

The following section outlines the comprehensive methodology that was adopted in this study. The methodology encompasses various stages, including the collection and aggregation of different datasets and the compilation into a unified dataset, as well as all preprocessing steps that were carried out. Additionally, we introduce the ML models used in this study, as well as the respective training and evaluation procedures. Furthermore, we discuss the measures that were undertaken to explain the predictions made by the ML models. Following this, we describe how we used the predictions to create detailed maps of global fertilizer application rates. Finally, we explain how we assessed the validity and plausibility of the dataset derived from our study.

Data collection and preprocessing

Data collection

Fertilizer application rate by crops. To compile a consistent and detailed dataset of fertilizer application rates for different crops, countries, and years, 14 global datasets16,17,18,19,20,21,22,31,32,33,34,35,36,37 were used. We discarded national databases, such as the USA38 and India39,40,41,42,43,44,45, to construct a homogeneous database. This approach avoids multiple year-nutrient-crop-country entries from both global and national databases, and allows us to retain external databases for validating the ML model predictions. To standardize all these datasets and minimize data loss, we classified all crop types into 13 crop groups (wheat, maize, rice, other cereals, soybean, palm fruit, other oilseeds, vegetables, fruits, roots and tubers, sugar crops, fiber crops and other crops) (Table 1), in alignment with the ICC Version 1.146.

Table 1 Crop Classification with FAOSTAT Item Codes.

During the 80 s, the IFDC published two reports17,18 regarding crop-specific data of FUBC (hereinafter referred to as FUBC-IFDC). After the crop grouping, these publications included data for 459 country-crop-years combinations (kg ha−1 of N, P2O5, and K2O) from 83 countries for 1973–1988. During the 90 s, the FAO, in collaboration with the fertilizer industry (IFDC and IFA), published five crop-specific datasets of fertilizer application rate (hereinafter referred to as FUBC-FAO). After grouping the data, these publications included data for 1693 fertilizer application rate specific to years and crops (kg ha−1 of N, P2O5, and K2O) from 108 countries for 1984–2002, although most of the data (98%) covered 1988–2002. The data were collected using questionnaires from governmental agencies, members of industry companies, agronomists, and economic experts. In both datasets (FUBC-IFDC and FUBC-FAO), the use of fertilizer for each combination of nutrient, crop, country, and year was provided two ways: (a) as the average application rate of a fertilizer over total cropland area, and (b) as the percentage of fertilized cropland area and the application rate in that area. We transformed all data to the average application rate by multiplying the percentage of fertilized area by the application rate in that area. The data were either from a year (e.g., 1996) or a season (e.g., 1996/97). For seasonal data, we considered the starting year of the season as the year of the data in the analyses. Fore data for nutrient, crop, country, and year that were in more than one report, the data was selected from the most recent report. Data for crop, country, and year that were divided into crop varieties or management practices (e.g., irrigated or rain-fed rice, or soft or durum wheat) were aggregated and weighted by the area of the crop class included in the report. Data for sweet maize, or corn, were excluded, assuming that it referred to Zea mays var. saccharata and the data for silage maize, because FAOSTAT reports only the harvested area for maize grain. Values for the crop groups were derived from individual crops when either more than 90% of the harvested area (based on FAOSTAT data47) was dedicated to the production of a single crop, or when a combination of crops was available in the data, their weighted average was assigned to the entire group.

Since the last FAO publication, IFA has released five reports detailing the total amount of N, P2O5, and K2O used for various crop classes, providing yearly or seasonal data spanning from 2006 to 201816,36,37,48 (hereinafter referred to as FUBC-IFA). Initially covering 11 crop types, these reports expanded to 14 types in the fourth report. They encompassed information for the European Union (EU) together as well as 27 other countries. In 2022, Ludemann published a more comprehensive dataset covering data for 66 countries, featuring EU data at the country scale, and information for 20 crop classes16. This report also included the FUBC-FAO data for the 1990s and prior data from IFA. However, small discrepancies between the FUBC-FAO original data and the one compiled by Ludemann et al.16 prompted us to retain the original FUBC-FAO information. To estimate the average application rate for each combination of crop, country, and year, we divided the total used amount of each fertilizer by the harvested area provided by FAOSTAT47. As previous research we assumed the harvested area as a proxy for the crop’s annual surface on each country9,10. It is worth noting that the average application rate for maize was slightly overestimated because FUBC-IFA included the amount discharged to silage maize. According to Maiz’Europ’49, the current area of forage maize crops is 17.3 million ha (approximately 1% of the total area of maize crops in 2020) with the European Union as the most important producer of silage maize, with 6 million ha. We utilized the available raw data from Ludemann et al.16, adopting FAO-IFDC datasets methods for grouping, and omitted certain countries where values were estimated based on the previous report and changes in crop surface. For the EU countries, Norway and the UK, four unpublished datasets from FE spanning 2001–2015 (referred to as FUBC-EFMA)19,20,21,22 were used. These datasets offered similar information to the FUBC-FAO publications for the EU countries, the UK and Norway and allowed us to exclude the fertilizer application to silage maize, which is important in the EU49. However, FUBC-EFMA datasets lacked individual crop classes for rice and soybeans, resulting in missing data at the country-level for these crops since 2000 in EU countries.

The resulting dataset included data for the average fertilizer application for 3712 combinations of 13 crop classes, 114 countries, and years from 1973 to 2018. For most of the combinations of countries and crops, data were available for only a few years (on average, a country-crop combination had data for 4.1 ± 2.9 years, and 64% of the combinations had five or fewer years with available data).

In order to later validate our estimations, we compiled a series of national databases. National data was quite limited, as only a few countries conduct surveys to study fertilizer management across different crops. The two countries with most available data were the USA38, and the UK50, which collected long time series on cropland fertilization for the three primary nutrients. The USA dataset38 contains fertilization information for four crops -cotton, maize, soybean, and wheat- dating back to 1964. To compare with our predictions, we converted all data to average kg ha−1. Additionally, based on the same surface threshold used for global datasets, we assumed that the application rate for cotton was equivalent to that of all fiber crop classes. The UK dataset50 provides data for four crop classes -roots and tubers, other oilseeds, sugar crops, and wheat- starting from 1998 for the three nutrients across all Great Britain. We also compiled existing information from several Asian countries, including India, the Philippines, and Pakistan39,40,41,42,43,44,45,51,52. The datasets from India39,40,41,42,43,44,45 and Pakistan51 did not require additional preprocessing, as they provided the data in average kg ha−1. However, the dataset from Pakistan presented the information for all three nutrients combined51. For the dataset from the Philippines, which covers rice and maize, we converted the raw data on the regional number of 50 kg bags per hectare of different fertilizers to N and P2O5 using the country-specific fertilizer nutrient information53. Finally, we also compiled existing data from Sweden54,55,56,57 and New Zealand58. The data for P2O5 and K2O in the Sweden dataset, initially present in their pure nutrient form, were transformed to their oxidized forms by multiplying by the molecular weights of these elements.

Fertilizer use in other agricultural lands. An important step in the methods involves adjusting ML model predictions to national-level fertilizer use. We used the FAOSTAT database regarding fertilizer annual use at the country level for making this adjustment59. This database includes data on all fertilizer use for agricultural lands, covering both croplands and grasslands59. However, the crops included in the ML models, as well as in the FAOSTAT harvested area data47 do not cover grasslands -whether permanent or temporary- nor fodder crops such as silage maize or fodder beet. Therefore, the primary goal of this section is to estimate the fraction of total fertilizer used for these types of agricultural lands.

Data regarding fertilizer application rate for grasslands and fodder crops is even more scarce than fertilization for other croplands. Additionally, FAOSTAT lacks information about the surface of the majority of the fodder crops47. Therefore, the methods used for estimation may not be as accurate as those used for other agricultural lands. Here, we reviewed technical information, such as the FUBC compiled reports16,17,18,19,20,21,22,31,32,33,34,35,37, and scientific information from countries where the fertilization of grasslands was considered to be higher than 1% of the total fertilizer consumption in previous research5,6,26,27,28. Previous research typically focused only on permanent grassland fertilization, as their goal was to distinguish agricultural fertilizer usage between arable -croplands and temporary grasslands- and non-arable land -permanent grasslands-5,27,28. However, we included in the estimation the proportion of fertilizer used for temporary grasslands and fodder crops for two main reasons: 1) our main goal was to distinguish agricultural fertilizer usage between all croplands included in the thirteen crop classes defined in the previous section and the rest of the agricultural land, 2) the majority of data available in the compiled global reports give information about all grasslands and fodder crops together16,17,18,19,20,21,22,31,32,33,34,35,36,37. The information estimated was the annual country proportion of N, P2O5, K2O fertilizers used for agriculture for grasslands and fodder crops. Depending on the available information, we have assessed at the country- or regional-level. In total, we reviewed scientific and technical reports for 75 countries. As in previous research5,26,27,28, the methods used for estimating the share of N, P2O5, and K2O usage for grasslands and fodder crops varied between countries and regions depending on the available information. Therefore, for every country, we argued the decisions taken based on the available data for providing at least as transparent as possible the estimations made. Moreover, we included a summary table (Table 2) with the sources used for estimating the range of values used for each country.

Table 2 Fraction of N, P2O5, and K2O allocated for grasslands and fodder crops.

Argentina: In the 1960s, fertilizer application rate in Argentina was primarily directed towards sugar cane and citrus60, with minimal application to grasslands, nearly zero in 196460. Throughout the 1970s and 1980s, the fertilizer application rate remained low, although there was a notable increase in P2O5 application to grasslands, reaching 28% country consumption in 197917. The substantial expansion in N and P2O5 fertilizer occurred during the 90 s, leading to a slight rise in the share of N used for grasslands, and to a significant decrease in P2O5 share for grasslands31,32,33,34,35. To fill data gaps, we adopted a methodology similar to Lassaletta et al.27, utilizing linear interpolation of national61,62,63,64,65,66,67,68,69 and global datasets for the years lacking data, with grasslands’ fertilizer share assumed as 0 in 196560. Despite potential limitations, setting the share to 0, as done in FAO nutrient budgets5, may underestimate fertilizer application rate, particularly for P2O5. K2O fertilizer application rate in Argentina remains minimal due to soil composition, with all reports except one considering it as 0 in the use for grasslands and fodder crops17,18,31,32,33,34,35,61,62,63,64,65,66,67,68,69.

Brazil: According to several sources, the use of fertilizer in Brazil’s grasslands has been very low70,71. The most recent values reported by IFA in 2014 and 2018 indicate that less than 1% of the fertilizer used in Brazil is used in grasslands16,37. However, Lassaleta et al.27 and FAO5 considered higher percentages for N and K2O based on regional averages27 or previous research5. For P2O5 and K2O, only FAO includes an estimation, considering 0 for P2O5, while they estimate the K2O consumption by calculating the average between N and P2O5 consumption5. We have decided to consider 0 as the share used for grasslands and fodder crops due to the latest reported values and considering that no information is reported in previous reports16,17,18,31,32,33,34,35,37.

Canada: Most of the compiled reports do not provide information about the use of fertilizers for fodder crops and grasslands17,18,31,32,33,34,35. The latest report, with 2018 data, indicated that 0.5% of N, 0.9% of P2O5, and 0.6% of K2O fertilizers were allocated to permanent grasslands, which increased to 12%, 14.5%, and 25% respectively when considering tame hay and silage maize as well16. Regarding N, FAO5 and the 2014 estimation by Lassaleta et al.27 are consistent with the 2018 estimation for all forages. However, the values for P2O5 and K2O for all forages in the latest report differ significantly from those used by FAO5 (0% for P2O5 and 5% for K2O). This discrepancy in P2O5 may be due to FAO’s reliance on Heffer et al.37 which does not consider nongrass perennial crops 0%72, and the discrepancy for K2O because FAO considered the average value between N and P2O55. We decided to utilize the percentage for all forages included in the last report16 for the entire period. We maintained the same values throughout the period due to insufficient data to estimate any trends. Additionally, in 1974, Beaton and Berger noted that a significant share of fertilizer used in Canada was for forages, estimating 45% of total use in 1970 was for hay and grazing grasslands73. They suggested that their estimation might be overestimated; however, it is unlikely that the fraction of fertilizers used for forages was 0 between 1960 and 1990.

Chile: Based on the estimations of the FAO and IFA reports, Lassaleta et al.27 and FAO5 considered a significant share of fertilizer used for grasslands. For N Lassaleta et al.27 suggested an increasing percentage from 0% in 1960 to 20% in 2005, while FAO maintained a constant percentage of 20%. For P2O5 and K2O, the values used by FAO were also high, at 30% and 25% respectively. However, for Chile, using a constant value for the period overestimated the early years as the share used for grasslands for N and P2O5 was only 1% at the beginning of the 1960s74. We therefore decided to make a reconstruction similar to the one demonstrated by Lassaleta et al.27, by considering 1% as the starting share for each nutrient, and incorporating the reported values for all grasslands16,17,18,31,32,33,34,35,37.

Dominican Republic: The values reported in global studies from the 1990 s indicate that during this decade, the percentage of fertilizer application rate on grasslands and fodder crops ranged between 2% and 4%31,33,34. Considering these findings, Lassaleta et al.27 allocated values ranging from 0% to 2% for N. We have chosen to utilize the average values from the three reports31,33,34 for the period 1990–2020. This decision was influenced by the lack of available data since 1997, and by the emergence of fertilizer application rate for pasture as a new and increasing practice during the 90s75.

Mexico: The use of fertilizer for grasslands and fodder crops appears to be nearly zero, as indicated by previous research5,27 and reported values16,31,35,37. The only relevant fertilizer used for grasslands and fodder crops in Mexico appears to be related with P2O5 related with alfalfa production31,35,76. Due to limited available information, and the longstanding presence of alfalfa production in Mexico since the Spanish colonization, we opted to consider the average percentage (2.5%) used in the two reports with data for alfalfa31,35.

United States of America: According to global and national estimates from previous research, the share of N used for grasslands during the period ranged from 0% to 20% of the total5,13,27. For P2O5 and K2O, the most recent estimation from FAO indicated a constant share of 0% for phosphorus and 10% for potassium5. To estimate the total fertilizer use for permanent and non-permanent grasslands from 1959 to 2014, we used all the available data31,33,37,73,77. In many sources, the information for grasslands is combined, encompassing both permanent and non-permanent grasslands. We used linear interpolation to estimate the share used for all grasslands together, replicating the method from the most recent estimation13. However, we included data from three additional years (1974, 1992, 1996)31,33,73, and also extended the estimation to cover P2O5, and K2O.

Uruguay: Grassland fertilization was actively promoted by the Uruguayan government during the 60 s78. As early as 1963, one-third of the fertilizer used in the country was applied to pastures, with a focus on P2O5 due to the low P content of the Uruguayan soils78. These trends are reflected in the first IFDC report, which allocated 45% of the P2O5 used in the country for grasslands and fodder in the year 198618. However, this share decreased to 22% by 2018. In contrast, the percentage of N used for grasslands has shown an increasing trend, from almost 0% in 198618 to 12% in recent years16,35. K2O is not used for these agricultural lands in the country16,18,31,35. Given the significant variation in percentages between decades and nutrients, we performed linear interpolation considering 33% for P2O5 in 1960, and 0% for N as starting points, and all the values included in the reports16,18,31,35.

Venezuela: Information regarding grassland and fodder crop fertilization in Venezuela is limited. Due to the scarcity of data and discrepancies between reported values33,35, FAO has considered a fertilization rate of 0% for grasslands during the specified period. Conversely, Lassaleta et al.27 proposed different rates between 0% and 9% from 1960 to 2009 for N. Given the challenge of determining the most appropriate criteria, we opted to adhere to the FAO considerations. This decision is influenced by low government optimal use recommendations for grasslands compared to croplands79, along with scientific evidence suggesting minimal fertilization for warm-climate grasslands79,80.

Australia: According to Lassaletta et al.27, the share of N used for grasslands never exceeded 8.5%27, which is similar to the 10% used by FAO in their nutrient budgets assessments5. Despite an intensification in the use of N in Australian grasslands over the past three decades81, it is noted that these grasslands were already being fertilized in the late 1950s, primarily with K2O82. For instance, in 1956, 15% of the K2O used in South Australia was directed towards pastures, a figure that rose to 42% by 196682. Therefore, we have opted to consider a constant share of 6.4% for N use since 1960 derived from the mean value of the reports16,18,31,32,33,34,35,37. Regarding P2O5 and K2O fertilizer, it appears that the FAO estimations5 may have underestimated their use, particularly for K2O. Thus, we decided to use the average value of all reports, because even with fluctuations, the variation in the reported values since 1985 is not too high, resulting in figures of 38.4 ± 4.1% for P2O5 and 41.6 ± 6.9% for K2O16,18,31,32,33,34,35,37.

New Zealand: Previous global research presented contradictory estimates of fertilizer application rate for grasslands in New Zealand5,27, with figures ranging widely from 0% to 90%. However, both global and national reports consistently support the notion that the majority of the fertilizer application rate in the country is directed towards grasslands and fodder crops16,33,35,37,58. Therefore, we have adopted a constant percentage throughout the entire period as grasslands have been the primary type of agricultural land developed in the country since the British colonization, their fertilization has been relevant since the early 20th century83, and the fraction used for grassland has remained constant at least in the last 30 years16,33,35,37. The percentages selected were derived from the average of global reports16,33,35,37: 91.1 ± 1.4% for N, 93.0 ± 3.3% for P2O5, and 88.8 ± 4.4% for K2O.

Europe: Between 1980 and 2000, Europe accounted for at least half of the N fertilizer used for grasslands and fodder crops, while consuming less than one-third of the total global fertilizer consumption6. Consequently, the available information was broader, and the methods applied could be more comprehensive. Einarsson et al.28 provided the most comprehensive estimation for N in most European countries28. They compiled and estimated the surfaces of croplands, including fodder crops, as well as temporary and permanent grasslands for the EU countries spanning from 1960 to 2019. Using their compiled data and the fertilizer application rate information from our study, we employed a similar methodology to estimate the fraction of N, P2O5, and K2O used in these areas.

However, we extended the analysis to include fodder crops and all types of grasslands together, while also estimating P2O5, and K2O. First, we used Eq. (1) to estimate the ratio (Rfa) between the fertilization intensity of grasslands and fodder combined, and the fertilization intensity of all agricultural land for the years with available data:

$$\frac{{Q}_{f}}{{Q}_{a}}=\frac{{R}_{f}\times {A}_{f}}{{R}_{a}\times {A}_{a}}\to \frac{{Q}_{f}\times {A}_{a}}{{Q}_{a}\times {A}_{f}}=\frac{{R}_{f}}{{R}_{a}}={R}_{f-a}$$
(1)

where Qf is the amount of fertilizer (N, P2O5, or K2O) used for grasslands and fodder crops, Qa denotes all the fertilizer of the same nutrient used in the country, Af represent the grassland and fodder surface, and Aa represents the total agricultural land, and Rfa the ratio of fertilizer application rate per area between fodder and grasslands (Rf), and all agricultural land (Ra). Therefore, Rfa represents the fertilizer application relationship between fodder and grassland in comparison to all agricultural lands.

After estimating the annual Rfa, we used two different procedures and equations depending on the years for which Rfa data was available. If scientific literature and the observed variation in Rfa indicated significant differences across the years, we performed a linear interpolation of the available values and then applied Eq. (2). Otherwise, we applied (3). To assess the variation in Rfa we estimated the MAE of the results derived from Eq. (3) compared with all the reported values. When the variation of Rfa occurred only in some decades within the period, we combined Eqs. (2) and (3). Detailed explanations were provided for each country individually. For non-EU countries, we applied similar procedures as those used for the other continents. In Eqs. (2) and (3) presented below, \(\overline{{R}_{f-a}}\) is the average Rfa of all reports with data, and i is the year.

$$\frac{{Q}_{{f}_{i}}}{{Q}_{{a}_{i}}}={R}_{f-{a}_{i}}\times \frac{{A}_{{f}_{i}}}{{A}_{{a}_{i}}}$$
(2)
$$\frac{{Q}_{{f}_{i}}}{{Q}_{{a}_{i}}}=\overline{{R}_{f-a}}\times \frac{{A}_{{f}_{i}}}{{A}_{{a}_{i}}}$$
(3)

Austria: The methodology used by Einarsson et al.28 results in an almost constant percentage of N used for permanent grasslands of ≈10% for the 1960–2019 period. This result led FAO to consider that 10% of fertilizer used in agricultural lands was used for permanent grasslands5. The intensification of grassland management began in the 1970s and 1980s84, and the share used for grasslands was higher in the late 1970s than in the 1990s17,31. For P2O5 and K2O, FAO considered a constant 10% allocation for permanent grasslands5, based on previous estimations for P2O526 and the average value between the fraction used for N for K2O. While historical data suggest fluctuations in the percentage of fertilizers used for grasslands and fodder crops over time17,18,31,32,33,34,35, the application of Eq. (3) using constant \(\overline{{R}_{f-a}}\) values of 0.33 for N, 0.46 for P2O5, and 0.32 for K2O, and surfaces changes28, provided an MAE of 2.33 ± 3.09%, 3.87 ± 3.47%, 3.31 ± 2.29% respectively. Only two errors higher than 10% occurred, both underestimations, namely −11.8% for N in 197717, and −10.2% for P2O5 for 199031, suggesting higher Rfa during the 1970–1990 period. Based on these results, we decided to utilize the mentioned \(\overline{{R}_{f-a}}\) values for the period from 1991 to 2020 as well as for the period from 1961 to 1969. For the years from 1970 to 1990, we calculated the average Rfa from 1977 and 1990 reports17,31 to minimize the errors during the period.

Belgium and Luxembourg: Belgium and Luxembourg often share statistics as a single entity in historical statistics. Consequently, we adopted the same estimation for both countries. According to Einarsson et al.28, the percentage of fertilizer application rate for permanent grasslands ranged from 53% in the 1980s to 40% in the last years. They deem the N fertilization of permanent grasslands significant throughout the period based on the little available information they found28. The same literature confirmed the use P2O5, K2O for grasslands as early as 1955, although with slightly lower applications28 as in the actual reports. The use of constant Rfa values of 1.03 for N, 0.91 for P2O5, and 0.81 for K2O based on the technical reports values16,17,19,20,21,22,31,32,33,34,35,36 resulted in MAE values of 2.18 ± 1.82% for N, 5.46 ± 4.04% for P2O5, 3.62 ± 2.51% for K2O. Only two instances of overestimations exceeding 10% were observed for P2O5 in the two last reports16,22. This may be linked with the enforcement of limits on P2O5 application in the Flanders region since 201185. Therefore, for P2O5 we decided to use the average Rfa for the 1960–2010 period, and use a linear interpolation of the Rfa values since 2011.

Czech Republic, Slovakia, and Czechoslovakia: Information regarding grasslands and fodder crops before the disintegration of the Czechoslovak Republic is very limited28. Following assumptions made by Einarsson et al.28, we extended the average \(\overline{{R}_{f-a}}\) reported for the Czech Republic and Slovakia since 199316,21,22,35 through the period 1960–1992, considering surfaces changes, and the agricultural land of each country28. Potential overestimations could occur for the early years, as the fertilization of these areas compared to other croplands might have been lower than in the 1990s, like in neighboring countries such as Hungary or Germany86,87. After 1993, there are four years with available data for both countries16,21,22,35. The Rfa values for all years are similar for each nutrient in each country, so we used Eq. (3) to estimate the 1993–2020 period. This approach resulted in low deviations from the reported values for the Czech Republic (MAE = 2.08 ± 1.58% for N, 2.57 ± 1.30% for P2O5, 1.69 ± 1.47% for K2O) and Slovakia (MAE = 1.49 ± 1.47% for N, 2.02 ± 2.87% for P2O5, 1.79 ± 2.13% for K2O).

Denmark: Danish grasslands and fodder crop fertilization have a long history with N, with average rates of 45 and 17 kg ha−1 for temporary and permanent grasslands respectively in the early 1960s88. The usage of Eq. (3) for the whole period for the three nutrients resulted in large deviations (MAE = 8.89 ± 4.40% for N, 5.36 ± 3.71% for P2O5, 8.42 ± 5.67% for K2O). Therefore, as the amount of available data was large in the compiled technical reports we used Eq. (2), and linear interpolation of all Rfa values for the period 1980–202016,17,19,20,21,22,31,32,33,34,35. For the 1960–1980 period, we utilized N data from 9 years within that timeframe88. Additionally, we considered the 1980–2020 relationship between N \(\overline{{R}_{f-a}}\) and P2O5 or K2O \(\overline{{R}_{f-a}}\), and the available N data for estimating the 1960–1980 timeframe regarding the P2O5 or K2O values. We regard this assumption as the only available information for the period spanning 1960–1980 for P2O5 and K2O73 suggests a similar relationship in the application rates for all forages between N and the other nutrients, at least in the reported values since 198016,17,19,20,21,22,31,32,33,34,35.

Finland: Einarsson et al.28 did not consider significant fertilization on permanent grasslands in Finland, as they mainly use arable land for forage production89. However, fodder crops and temporary grasslands are key parts of the agricultural production in the country89, and they are commonly fertilized16,17,19,20,21,22,31,32,33,34,35. Using Eq. (3) for the entire period across the three nutrients resulted in minimal deviations for N and P2O5 (MAE = 1.57 ± 2.99%, 2.10 ± 3.34% respectively), but substantial deviations for K2O (7.51 ± 7.38%). Given the substantial deviation for K2O, and the large bias for Rfa in 197917 for N and P2O5, the first year with available data, we opted to use Eq. (2), and the linear interpolation of the Rfa. However, potential deviations may arise for the 1960s, as fertilizers were predominantly utilized for high-value crops during the early part of the decade90, yet no data are available for that period.

France: Data regarding grasslands and fodder crop fertilization is less limited than in the majority of EU countries, although large differences exist between the available data. Two recent publications estimated the share of N and P2O5 used for permanent grasslands since 196028,91 based on country surveys at the region-level92,93,94,95. However, the results obtained by them differ from the FUBC-FAO and FUBC-FE reports16,19,20,21,22,31,32,33,34,35. For example, for 2006, Le Nöe et al.91 report a share of P2O5 used for permanent grasslands of 27% whereas the FE reports a value for all grasslands of 20%. Considering other years with comparable data, such as 1990 or 2017, Einarsson et al.28 estimate a share of 16% and 7% respectively for N used for permanent grasslands, while FAO only reports 6% for 1990, and the national survey reports 4.7% for 201795. Therefore, as it is difficult to discern the more accurate value between the two estimations, we opted to use the average between the Rfa linear interpolated data from the global datasets16,19,20,21,22,31,32,33,34,35, and from the national surveys92,93,94,95, considering for both as 0 the share in 195591 and the single estimate for the 70 s73.

Germany: The availability of data since the German reunification is substantial in global reports19,20,21,22,31,32,33,34,35. These reports suggest a decline since 1990 in fertilizer use for all forages compared to the rest of croplands, with the drop being particularly notable for N and P2O5. As a result, we decided to use Eq. (2), and interpolate the Rfa values, instead of \(\overline{{R}_{f-a}}\) for the 1990–2020 period. For the 1960–1989 period, data on grassland and fodder fertilization is scarce and primarily pertains to West Germany73. Most of the data available for the period are relative to N, except the 1982 IFDC-FUBC report. For the 1960–1989 period, We decided to use the linear interpolation assuming, similar to the case of France, zero fertilization of grasslands and fodder crops in 1955, as fertilization of these areas in Western Germany, where most of this agricultural land is located, was minimal before 196087, using the only report with available data for the three nutrients17. We extrapolate the data from Western Germany for the entire country due to data availability17,28,73, the prevalence of these agricultural areas in Germany28, and because grassland fertilization in East Germany was similar to that in West Germany, at least in the late 1970 s96. Using these approaches, we deviate by approximately 3.9% from the N estimated data for the year 197473. Additionally, we deviated by about 10% from the N value for permanent grasslands reported by Einarsson et al.28 for 1966 (based on real data)28. This deviation is reasonable, considering that the average difference between Qf/Qa only using information for permanent grasslands or all forages for N is 7.9%19,20,21,22,31,32,33,34,35.

Greece: Fertilization has not been considered for permanent grasslands in either previous research5,27,28 or technical reports16,19,20,21,22,31,32,33,34,35. However, since we are also considering fertilization for fodder crops, the technical reports have allocated fertilization for them, especially for alfalfa and sillage maize16,19,20,21,22,31,32,33,34,35, which constitute the two main actual fodder crops in the country28. Therefore, we used Eq. (2) and the linear interpolation of Rfa because the values of the 1990s are lower than the actual ones, and we have assumed a zero level of fodder fertilization in 1960, as it was only experimental in the country97.

Hungary: Einarsson et al.28 did not consider fertilization for permanent grasslands due to the scarcity of the data and because grassland fertilization is not a common practice nowadays28. Reported values suggest that a significant fraction, approximately 5% of the fertilizer used since 1990 in the country was allocated to grasslands and fodder crops16,20,21,22,31, with an even higher proportion during the 1980s17. Scientific information confirms that the change in the political regime in 1989 was a key driver of fertilization practices in the country, reducing the fertilizer use by five-fold in the country, and limiting fertilization of these areas to managed grasslands98. Furthermore, fertilization in the country commenced in the 1960 s and remained stagnant during the 1980 s86. Therefore, for the period 1960–1989, we applied Eq. (2), and the linear interpolation of Rfa from a 0 value in 1960, to the 1980 reported value17. For the 1990–2020 period, we used Eq. (3), and the average \(\overline{{R}_{f-a}}\), as there is no deviation larger than 10% from the reported values using this method.

Ireland: Ireland is likely one of the countries that use a larger proportion of fertilizers for grasslands and fodder crops5,27,28, and also has more available information. Since 1972, six national surveys have been conducted, providing data for 22 years99,100,101,102,103,104. Moreover, the global datasets also include information from ten different years since 198716,19,20,21,22,31,32,33,34,35. For the 1986–2020 period, we used the average of the linear interpolation of the Rfa values based on national surveys99,100,101,102,103,104, and surfaces data105,106,107,108,109, along with the Rfa values based on the global datasets16,19,20,21,22,31,32,33,34,35 and the Einarsson et al.28 surface compilation28. We excluded Rfa values based on the global datasets16,19,20,21,22,31,32,33,34,35 and the Einarsson et al.28 surface compilation28 for the 2006–2010 period due to a change in the criteria for temporary grassland surface, which resulted in overestimations (Qf/Qa > 1). For the 1960–1985 period, we only considered the linear interpolation of the available data, all from the national surveys Rfa99,100,101, and surfaces105,106. In cases where there was no available surface data105 in the national databases, like 1972, we used the closest year with available data (e.g., 1970). For 2008, which has two available national surveys103,104, we took the average of both. We considered the share of fertilizer used for grasslands and fodder crops as zero in 1955 because almost all fertilizer was used for tillage crops in that year110, with grassland fertilization increasing during the 1960s111.

Italy: Einarsson et al.28 used a constant \(\overline{{R}_{f-a}}\) for permanent grasslands for all years, as similar values are given in various reports and scientific information28. When considering grasslands and fodder crops together, the Rfa were also consistent for each nutrient over all years16,19,20,21,22,32,33,34,35,73, even including the 1974 data73. The MAE using Eq. (3) for the entire period across the three nutrients resulted in minimal deviations comparing with the reported values16,19,20,21,22,32,33,34,35,73 (MAE = 2.24 ± 1.55% for N, 2.00 ± 1.37% for P2O5, and 3.21 ± 1.21% for K2O). Therefore, we used the the \(\overline{{R}_{f-a}}\) for the three nutrients. However, there could be potential overestimations for the 1960s decade because nearby countries like France or Germany did not use fertilizers for these agricultural lands before 195591.

The Netherlands: Information regarding grassland fertilization in the country is abundant28,112. However, before the development of global datasets, information regarding P2O5 and K2O is very limited. For the period 1979–2019, we used Eq. (2) considering the linear interpolation of the eleven Rfa data derived from the global datasets16,17,19,20,21,22,33,34,35 and the agricultural surfaces changes28. We used the global datasets instead of the national data available because they provide information regarding the three nutrients. For the years 1960 to 1979, we used the available compilation of N application rates112, and the total N fertilizer consumption59 to estimate the Qf/Qa values for N. For P2O5 and K2O, we used the ratio between the Qf/Qa used for N and these two nutrients for the most recent year with available data, 197917, to extrapolate the results for the 1960–1979 period.

Poland: The available data in reports from the period 1988-201816,19,20,21,22,31,34,35 did not show a constant Rfa for any nutrient N, P2O5 and K2O. Data on fertilization before 1989, during the communist government, is sparse28,31. However, similar to other Eastern European countries like Hungary, it appears that fertilizer intensification in the country started during the 1960s113, with a significant drop following the regime change59. As a result, we adopted the same criteria used for other Eastern European countries, setting the 1960 value to zero, and applying two distinct linear interpolations of Rfa: one for the 1960–1989 period, and another for the 1990–2020 period. For the 1990–2020 period, there are seven years with available data, whereas for the 1960–1989 only 1989 has data. Despite this limited data for the earlier period, survey estimates113 combined with FAOSTAT totals59 suggest that the combined share of the three nutrients was between 14% and 15% in the late 1960s, which aligns with the individual nutrient shares calculated by the linear interpolation which are between 10% and 13%.

Portugal: Einarsson et al.28 did not consider fertilization of permanent grasslands, citing the relatively low surface area in the country28. However, recent technical reports suggest that Qf/Qa exceeds 20% for the three major nutrients16,19,20,21,22,33,34,35. We chose to apply Eq. (2) and to interpolate the 1977–2020 data16,17,19,20,21,22,31,32,34,35 because using Eq. (2) led to discrepancies greater than 10% in some years. For the years before 1977, we retained the Rfa 1977 values17 (which resulted in Qf/Qa < 2%) as there is no information for the earlier period.

Romania: As with other Eastern European countries, there is no available information regarding grassland and fodder crop fertilization before the political regime change in 1989. However, between 1990 and 2020, data from five years suggest that about 5% of fertilizer is used for grasslands and fodder crops20,21,22,31,33. For Romania, we applied Eq. (3), using the average \(\overline{{R}_{f-a}}\) value and the grassland and cropland surface data28. Potential overestimations occurred during the first decades, although the estimated Qf/Qa are less than 5% for the first decades.

Spain: Previous research has not considered the fertilization of permanent grassland because this practice in the country is very uncommon5,28. However, when considering temporary grasslands and fodder crops, this assumption changes, as forage crops occupy about 8% of the arable land in the country and consume nearly the same percentage of fertilizers114. To estimate the share of fertilizer use in these areas, we created a linear interpolation of the Rfa data from the ten years with available data, ranging from 1979 to 2014, and applied Eq. (2). Using Eq. (3) resulted in estimations that were twice the reported values for the earlier years. Given the fraction used for these areas in 1979 was minimal (Qf/Qa < of 2%), potential overestimations for the first years are also likely minimal.

Sweden: In the country, fertilization of forage production areas is closely linked to the transition from natural permanent grassland to temporary grassland production on arable land that occurred during the first part of the 20th century, especially during the 1940s and 1950s115. Moreover, based on the available data, fertilizer intensification of these areas compared to other croplands Rfa was lower during the 1970s than at the end of the century34,35,73. Therefore, we applied Eq. (2) and performed the linear interpolation of the Rfa of each nutrient of the 11 years with available data since 197416,18,19,20,21,22,31,32,34,35,73. A slight overestimation might occur for the earlier years, as the intensification of these areas was increasing before the first year with available data115, but no data for the period was found.

United Kingdom and Northern Ireland (UK): The UK has the world’s longest and most complete dataset on the fertilization of grasslands and croplands50. Annual time series data on fertilizer use for permanent and temporary grasslands are available for England and Wales since 1969 and for Great Britain since 198250. Northern Ireland is not included in these surveys. Additionally, there are surveys for the years 1957, 1962, and 1966 for England and Wales116. Two problems arise for the estimation of Qf/Qa from this data. The first one is that the surveys only include fertilization on permanent and temporary grassland, excluding rough grazing. The second challenge is that there is no information for Northern Ireland - which accounts for about 6% of the country’s fertilizer consumption50-, and from 1960 to 1982, there is also no data for Scotland, who are responsible for about 14% of the country’s fertilizer consumption50. For the period 1982–2019, we used the annual fertilizer application rates for Great Britain’s tillage crops50 and the corresponding cropland surface area117 (excluding temporary grasslands) to estimate the total fertilizer use for croplands. We considered grassland fertilization to be the complement of the value obtained, assuming the same application rates for Northern Ireland. To include these estimations in the fraction used for fodder crops, we add the average share used for them, which is less than the 3% for all nutrients16,19,20,21,22,31,33. For the period 1960–1981, we applied the same methodology but using the application rates50,116 and surfaces118 from England and Wales, adjusted by -2.5% for N, +2.8% for P2O5, and +0.9% for K2O. These adjustments are based on the observed differences between the application rates in Great Britain and those in England and Wales during the 1980s decade. Moreover, for the 1960s decade for which there are no data available for all years, we applied the linear interpolation of the years with data. We used the national databases instead of the global datasets because they provide annual information covering almost the entire period for the three nutrients, and the values between them were quite similar.

Iceland: Iceland’s agriculture sector is primarily focused on livestock production, with about 90% of its agricultural land being permanent grasslands119. Additionally, most of the arable land is used for forage crops119. While grassland fertilization is a common practice in Iceland120, there is limited information on application rates for different types of agricultural land, and no specific estimates on the proportion of fertilizer used for forage crops in the country. When we applied Eq. (3) using the average Rfa from other Nordic countries—Denmark, Sweden, and Finland, it resulted in a Qf/Qa ratio greater than 100%. To address this, we allocated a mid-value between 100% and the proportion of agricultural land occupied by grasslands and fodder crops, ensuring it does not exceed 100%.

Switzerland: Data on fodder crop and grassland fertilization in the country from the period 1979–1999 suggest that between 30 and 50% of the fertilizer used in the country is applied to these lands17,31,32,33,34,35. However, whereas the data of the first two years indicate that almost 50% of N is used for grasslands and fodder crops17,31, only about 30% was used in 199935. Since 2000, the areas of artificial grasslands and silage maize (the two main forages that receive fertilizers31) have remained almost constant121. As there is no information available regarding grassland fertilization before 1979 or after 2000, we used the 1979 data for the period 1960–1979 and the 2000 data for the period 2000–2020. For the period from 1979 to 2000, we applied linear interpolation to the six years with available data17,31,32,33,34,35.

Norway: Fodder crops and grasslands (both temporary and permanent) play a key role in the agricultural sector of the country122,123. Technical reports and scientific studies data indicate a nearly constant share of Qf/Qa for N, P2O5, and K2O16,19,20,21,22,32,33,34,35,73. Therefore, we used the average of all the available Qf/Qa data16,19,20,21,22,32,33,34,35,73, covering the period 1974–2018 for N, and from the period 1990–2018 for P2O5 and K2O. The resulting values, with a share of 64.02% ± 1.76% for N, 50.02% ± 2.25% for P2O5, and 65.59% ± 6.07% for K2O, were comparable to those estimated for other Scandinavian countries.

Yugoslav Socialist Federal Republic (Yugoslav SFR), and actual former countries: Fodder crops and grasslands played a significant role in the agricultural production of the Yugoslav SFR124. Pastures and meadows occupied 33% of the country’s land, while fodder crops took up 20% of the arable land124. However, to the best of our knowledge, no information is available regarding fertilization for different agricultural lands before the dissolution of the country. After the dissolution, information became available in global reports for Croatia and Slovenia, but not in the other countries16,19,20,21,22,32,35. To estimate the Qf/Qa values for Yugoslav SFR during the period 1961–1991, we used the weighted average by agricultural land surface28 of the earliest Rfa values from Croatia and Slovenia19,28,32, given that their Rfa values have changed significantly in recent years16,19,20,21,22,32,35. We also considered the cropland, grasslands, and fodder crop surfaces of Yugoslavia SFR from the 1990 s124 to estimate the Qf/Qa used for the 1961–1991 period. For the period 1990–2019, for actual EU former countries, we performed the linear interpolation of the Rfa values16,19,20,21,22,32,35 to estimate Qf/Qa considering the annual surfaces values28. In Serbia, the largest country, forage production is a crucial component of its agricultural sector, with about two-fifths of the agricultural land dedicated to this purpose125. However, as no specific information on fertilization rates has been found. We considered the average weighted Rfa ratio of Croatia and Slovenia along with the 2004–2008 surfaces of agricultural lands, grasslands, and fodder crops125. For smaller countries like Montenegro of North Macedonia, we assumed the average annual Qf/Qa values of Serbia and Croatia.

Union of Soviet Socialist Republics (USSR) and Former USSR Countries: Quantitative and qualitative information about fertilization of grassland and fodder crops before the collapse of the USSR is quite scarce31,126,127. Some publications suggest that the use of fertilizers in these areas was minimal before 1975126,127. However, data from 1990–1991, just before the collapse, from certain republics (Russia, Latvia, Estonia, or Belarus) indicate that a significant share of fertilizers was used for fodder crops and grasslands31 (e.g., 40% for N in the Russian Federation31). For the period 1960–1991, we estimated the Rfa for the entire USSR in 1990, weighing the value of each republic Rfa31,128 in 1990-1991 by the total fertilizer use of each republic128. The four republics with available data for this year (Russian Federation, Belarus, Latvia, and Estonia) account for 40% of the agricultural land of the country and 62% of its fertilizer consumption128. After estimating Rfa for each nutrient in 1990, we used linear interpolation to estimate the annual Rfa values, considering the value in 1975 as zero126,127. Finally, similar to the EU countries, we considered the annual cropland, grassland, and fodder crop surfaces128, along with the calculated Rfa, to estimate the annual Qf/Qa. For the period from 1992 to 2020, we considered individual country information where some data was available. However, for the following actual countries, there is no information in the global reports16,19,20,21,22,31,32,33,34,35,36: Armenia, Georgia, Kazakhstan, Kyrgyzstan, Tajikistan, and Turkmenistan. For all these countries, we considered a constant Qf/Qa ratio during the 1992–2020 period due to the limited information. For Armenia and Georgia, we assumed the Qf/Qa value in 1998 for Azerbaijan, the other Caucasian country31. For the Central Asian countries, we used the ratio for grasslands derived from Uzbekistan’s 2014 data37, which is significantly lower than the USSR’s share in 1990. This reduction seems reasonable given the significant decrease in fertilizer use, temporary grasslands, and fodder crop surfaces in the region since the USSR collapse129.

Estonia, Latvia, Lithuania: The Baltic countries are the three former USSR countries with the most available data in global datasets16,19,20,21,22,35. Fertilizer intensification in these areas has changed significantly over the last three decades due to the abandonment of intensively managed areas28. This trend is reflected in the changing Rfa values. Therefore, we used Eq. (2) and the linear interpolation with the six years with available data Rfa from the 1991–2018 period16,19,20,21,22,35 to estimate the Qf/Qa values since the collapse of the USSR.

Belarus, Moldova, and Ukraine: For these three countries, limited data is available regarding fodder crops and grasslands, but some information can be found in global reports16,33,37. Thus, for each country, we used the average of the Qf/Qa values from the 1992–2020 period. In the case of Belarus, where two sets of data were available for grasslands and one for fodder crops16,37, we took the average for grasslands from both reports and the ratio that accounts for the share of grasslands and the share including fodder crops.

Russian Federation: There are three years with available data between 1992 and 202032,33,37. In the first two years, the data showed that an average of approximately 25% of the country’s fertilizer was used on grasslands and fodder crops32,33. However, in the latest report from 2014, only about 4% was attributed to these areas (excluding fodder crops not used for hay or silage)37. Therefore, we decide to use the linear interpolation of the Qf/Qa values for the years with available data. For the late years, we likely underestimated the value because some fertilizer is used for fodder crops, like fodder beet, that are not intended for silage or hay. However, these fodder crops only accounted for about the 8% of the total fertilizer used for fodder crops and grasslands in 199031.

China: Fertilization of China grasslands remains low at present37. Among the compiled reports, only the latest one considers a proportion of the total fertilizer application rate in China, allocating 2% for N, 4% for P2O5, and 3% for K2O. Other information on grassland fertilization in China is scarce, with the few authors that provided some information describing it as sparse130. FAO5, considers this proportion as 0% for all three nutrients throughout the entire period, which differs from Lassaleta et al.27, who, based on regional averages, estimated a percentage ranging between 0 and 4.7% from 1960 to 2014. However, any global report or national more detailed information considers any fertilization. We have decided to adopt the same criteria as FAO5, albeit potentially underestimating values for the last decades.

Iran: Fertilization of Iran’s grasslands and fodder crops appears to be minimal, with few reports providing data, and only since 1990, indicating values between 2% and 6% for all three nutrients31,32,37. Other information is scarce and focused on experimental trials rather than broader country-wide applications. Considering that the first fertilization trials were developed during the 70s, and the first report with data is for 199031, which reported 2% of N and 6% for P2O5, we considered as 0% the share for the period 1960–1990, and the average of the reports for the period 1990–2020.

Japan: Since the first report with data, in 1979, almost all reports have underscored the importance of grassland and fodder fertilization in Japan. FAO attributed a constant share of 20% for N, 0% for P2O5, and 10% for K2O for the 1960–2020 period5. Conversely, Lassaleta et al.27 suggested a growing percentage of 20% for N, starting from 0% in 1960, and increasing to 20% in 2009. Although data before 1979 is unavailable, the reported data for N use in 1979 was 15.7%, higher than the 5.2% estimated by Lassaleta et al.27. Additionally, due to the lack of data, it is challenging to determine the inception of grassland fertilization in Japan, though it appears to coincide with the transition from semi-natural grasslands to more intensive pasture during the 60s131. Therefore, we opted to adhere to FAO’s criteria, maintaining the same percentage throughout the period, despite the potential overestimated values for the initial years. We considered the average of all available reports with data16,17,31,32,33,34,35,37, because FAO criteria appears to underestimate the P2O5, and K2O used for grasslands, resulting in percentages of 17.3% for N, 16.9% for P2O5, 15.6% for K2O.

Korea Republic: Grassland fertilization appears to be a common practice in the country nowadays132. However, there is no available data on the fertilization of these areas in global reports17,31,33,35, nor scientific publications. We used the same assumption as Lassaleta et al.27, which is to consider the same proportion as in Japan, the geographically and socioeconomically closest country27. This assumption also aligns with the observation that the sum of this percentage, and the fertilizer used for the main crops17,31,33,35 is less than the total for the country59.

Turkey: Information about fertilization of grasslands and fodder crops in Turkey is scarce, suggesting that it is not a common practice. Lassaleta et al.27 considered percentages as high as 4.8% for N in 2009, whereas FAO considered 0% for all nutrients. All the available data since 1990 except for 2014 considered some amount of fertilizer used for grasslands, and forages16,17,31,32,33,34,35,37. Therefore, we used the average percentage of all reports for the period 1990–202016,17,31,32,33,34,35,37.

Other Asian Countries: Cambodia, Indonesia, Malaysia, The Philippines, Thailand, Vietnam, India, and Pakistan: In Asian Southeast countries, only Lassaleta et al.27 considered that some fertilizer is used on grasslands, based on regional averages used for grasslands and other crops (including fruits, tea, vegetables, and forage and grasslands)27. However, no global report16,17,18,31,32,33,34,35,37 or country-level sources133 mentioned fertilizer application to grasslands as significant in these countries. Therefore, we have chosen to align with FAO’s criteria, which assumes no fertilizer application rate for grasslands in this region5. We applied the same criteria for India and Pakistan, despite previous research considering a certain percentage used for grasslands6,27. The data reports16,17,18,31,32,33,34,35,37, the scientific literature134,135, and FAO5 support the idea of non-fertilization of grassland in these two countries.

Egypt: Data regarding grassland and fodder crop fertilization in Egypt are scarce18,34. As is common for many African countries, there is no fertilization of grasslands136. However, the few available data about the fertilization of Egyptian clover18,34, the main fodder crop in the country76, suggests that a significant portion of N and P2O5 is utilized for fodder production, aligning with country recommendations137. Previous research, focused solely on grasslands, has either considered 0% allocation for the three nutrients5 or a range between 0% and 4% for N27. Here, we opted to consider the average of the two reports (1986, 1997) with data18,34 for the entire period as Egyptian clover production has been significant since the beginning of the period138, and the available data is not sufficient to discern any trend.

Morocco: Previous research has indicated various fractions of N fertilizer used for grasslands in the country, ranging from 0% to 11%5,27,136. With the available information, it is impossible to discern if any application for permanent grasslands occurred in the country, but not for forages such as alfalfa, Egyptian clover, or vetch139,140. Additionally, due to the scarce available data in the reports, discerning any trend is challenging31,34,140, although the presence of improved pastures, usually linked to fertilizer application rate, doubled during the 80s decade139. Here, we have opted to use the same percentage, the average of all reports, to estimate the percentage of N, P2O5, and K2O, despite the potential overestimations in the first decades.

South Africa: Fertilization of grasslands and fodder crops such as alfalfa appeared to be significant throughout the study period in South Africa. Both previous scientific research5,27 and various technical reports16,31,32,33,34,35,37 indicated percentages ranging 0% and 22.3% for N. For all three nutrients, the share used for grasslands and fodder crops during the 90s was higher than in the last decades16,31,32,33,34,35,37. This percentage appears to be higher due to larger fertilizer application rates to croplands compared to grasslands and fodder crops16,31, and not due to the relationship between cropland and grassland surface141. While information regarding grassland fertilization prior to 1990 is limited, several factors support the hypothesis of early fertilizer application rate for grassland and fodder production. These include the fraction used for grasslands and fodder in 199031, substantial research conducted on improved grasslands since 1920s142, and the early introduction of alfalfa in 185876 which is a significant consumer of P2O5 and K2O in the country. Given the challenge of identifying any discernible trend and the likelihood of significant consumption at the beginning of the period, we have chosen to adopt the same percentage for the entire duration, aligning with FAO assumptions5, despite potential slight over- and underestimations throughout the period. The average of all reports16,31,32,33,34,35,37, resulted in percentages of 12.4% for N, 13.3% for P2O5, and 9.2% for K2O.

Potential drivers. To develop our ML models, we compiled a series of datasets that contain information on features that were identified in previous research as drivers or correlates of cropland fertilization. In this section and the next two, we clarify our rationale for the variable selection, the data sources and the methods used for estimating some of these variables. The list of all considered features can be found in Table 3 and further details about their estimation are provided below.

Table 3 Environmental, agrological and socioeconomic features used in the prediction of the fertilizer application rates, accompanied by their description, unit and data source.

Environmental data. Environmental variables related to climate and soil characteristics have been identified as factors that influence fertilization management in farm-level studies143 and regional panel data144,145. Therefore, we selected several potential factors, some of which have previously been shown to correlate with fertilization, such as MAP144, or SOC145, as well as newer potential factors such as the aridity index. Data for these factors were sourced from two main databases: the CRU v.4. databases146, for climatic factors, and the SoilGrids v.2. database147, for soil characteristics. Obtaining values at the country-level while considering variations in climatic and soil conditions within a country can be imprecise. However, our fundamental unit of analysis is the country-level, as the FUBC values are measured on this scale. To mitigate this limitation, we used spatial information for climatic and soil characteristics along with information about the ___location of crops29. All environmental variables were estimated using Eq. (4), but preprocessing differed across variables.

$$En{v}_{jic}=\frac{\sum _{g\in G}(En{v}_{ig}\times HArea\_M{2000}_{gcj})}{HArea\_M{2000}_{cj}}$$
(4)

Here, Envjic represents the mean value of the environmental variable for country j, in year i, where crop c is located in the country; Envig is the value of the environmental variable in year i, for grid cell g; \(HArea\_M2000gc\) denotes the area of grid cell g for crop c in country j; \(HArea\_M{2000}_{cj}\) is the total surface of crop c in country j based on Monfreda et al.29 crop maps; and G denotes the set of cells where the crop is located based on Monfreda et al.29 crop maps. For the MAP, the Envig values of Eq. (4) are calculated by summing the precipitation from all months in the CRU v.4. dataset146 for each grid cell g, and year i. For the MAT, the Envig values are calculated as the average of the monthly temperatures from the CRU v.4. dataset146, weighed by the number of days of each month. The PET values are derived by multiplying the daily month average from CRU v.4146. by the number of days in each month and summing the results. For the aridity index, we used the United Nations (UN) definition148 of the ratio between MAP and total PET for each grid cell. As soil variables do not have temporal resolution, we simplified Eq. (4) by removing the temporal factor. Additionally, for some soil variables like the soil CEC, we aggregated the information by calculating the average for the first three depth layers from SoilGrid v.2. (0–5, 1–15 and 15–30 cm)147.

Agrological data. We selected agrological features that were previously identified as factors potentially related to or driving fertilizer intensification, such as holding size149, crop area145, or irrigation implementation144, as well as features that should be connected to crop fertilization at the country-level, such as country fertilizer use per cropland area59. Most of the agrological variables used are taken directly from the sources indicated in Table 3. However, some required preprocessing. For holding size, we applied the methodology used by Zou et al.26, which involves standardizing the information based on the average holding size according to the total agricultural area. We used holding size data from the FAOSTAT agricultural censuses150 and previous research151. To estimate the annual nutrient removal for each crop class based on annual production, we used the recent compilation by FAO5 on nutrient removal in kilograms per tonne of crop produced, along with the annual country production data from FAOSTAT47. Additionally, we used this compilation alone as a proxy for fertilizer recommendations, since these recommendations are generally based on the nutrient requirements of each crop152.

Socioeconomic data. Economic factors, particularly those related to the profitability of fertilizer use, have been widely studied to understand fertilizer adoption at the farming-level153,154. Both input prices (fertilizers) and output prices (crops) determine profitability and can be key factors influencing fertilization decisions. However, assessing inputs at the country-level is challenging, primarily due to a lack of standardized data. The only available dataset, FAOSTAT155, does not cover all periods and lacks standardization. To address this, we used two variables as proxies of fertilizer prices: a) global real prices for Urea, phosphate rock, and muriate of potash, as compiled by the World Bank156; and b) the distance from the production sites or mines, following the methodology proposed by McArthur et al.157. This methodology uses gravity models of trade, based on the premise that fertilizers are produced in a few specific countries157. The underlying hypothesis is that countries closer to fertilizer plants or mines are more sensitive to price variations because transport costs are a significant factor for farmers157. We applied a similar approach, estimating the minimum cost-adjusted distance by using the costDist function from terra package158, global friction maps159, the locations and operational years of potash160 and phosphate mines, the locations of ammonia plants157,161 and the centroid of the cropland area on the country based on the Monfreda et al.29 crop maps29. Assessing the output prices for crops faces a similar problem: there is no standardized dataset with national-level data for the entire period. To resolve this, we used two proxies for crop prices: a) global real prices for specific commodities like wheat, maize, rice, palm oil, soybeans, sugar, and cotton, compiled by the World Bank156, and b) standardized data from two FAOSTAT datasets162,163 that provide prices paid to producers at the country-level. The first dataset162 contains information from 1990 onwards in USD, and LCU, while the second dataset163 covers 1966 to 1990, only in LCU. To standardize both datasets, we converted the older dataset into USD using annual currency exchange rates164. We then removed outliers independently for each crop by considering only values within 1.5 times the interquartile range. Before applying this method to the 1966–1990 dataset, we tested it on the LCU data for maize, wheat, and rice from the 1990 onwards dataset. We compared the original USD values with those obtained after converting the LCU data using exchange rates. The outlier detection method retained more than 99% of equivalent values (defined by a ratio between the original and calculated USD values of 0.99 to 1.01), while removing over 90% of non-equivalent values. Finally, the data was converted to real prices by applying the Consumer Price Index165.

Other socioeconomic factors, that are not directly related to the profitability of fertilizer use, have also been linked to country-level fertilizer use. These factors include the income level, reflected in the GDP per capita166; the population pressure, defined as the country’s population divided by its agricultural land area167; and the farmers’ knowledge about fertilizer use, as well as general education levels153, which we measured by the percentage of total GDP spent on education. We used the sources listed in Table 3 to obtain data for these variables.

Data preprocessing

Several preprocessing steps were performed to prepare the raw data for the ML models. First, drawing from both expert ___domain knowledge and exploratory data analysis, the features relevant to N, P2O5 and K2O fertilizer application rate were selected (Table 3). Since not every feature was relevant for each of the three targets, we narrowed down the dataset to data points where the average fertilizer application rate is known for all three fertilizers. This restriction ensured that the dataset comprised only labeled data points, which is crucial for supervised ML techniques. Subsequently, anomalies in the data where the fertilizer application rate was unrealistically large, i.e., greater than 5000 kg ha−1, were removed. Finally, categorical features were OHE.

Machine learning

Previous studies within this ___domain typically propose linear equations to estimate the fertilizer application rate, and only consider a limited set of agricultural factors9,10. However, it is well-established that natural phenomena frequently exhibit nonlinear relationships168, rendering them unsuitable for modeling with linear methodologies. Similar studies have also employed Bayesian methods15, with certain modeling assumptions that are not present in our study. ML has the potential to overcome these limitations. The field of ML has seen major increases in research169 and industry170, and, more specifically, ML has shown promising results in the field of ecology171,172, including agricultural research173,174, fertilizer consumption175,176 and fertilizer management177. For this reason, ML was used in this study to estimate the annual fertilizer application rate at the crop- and country-level. The benefit of using ML is threefold. First, ML allows us to include a larger range of variables, for example also including socioeconomic factors. Second, nonlinear ML techniques enable us to model nonlinear relationships between the variables in our dataset. Lastly, the model output can provide insights into the drivers associated with crop fertilization on a global scale, through the use of SHAP values25 outlining the feature importance. The employed ML methods to estimate fertilizer application rate for crops differ from previous research, which typically relied solely on changes in crop area, overall fertilizer consumption, and limited data regarding fertilizer application rate at the individual crop-level9,10. An advantage of our method is that it enables us to estimate values for countries where specific data is lacking by relying on other related variables. For example, the projected data for the USSR aligns closely with national totals, even in the absence of country-specific data and without adjustments based on total consumption, as conventionally done9,10.

Models

In this study, two ML models based on gradient boosted regression trees were selected to predict the average annual fertilizer application at the crop- and country-level. In gradient boosting178, an ensemble of weak learners (in our case regression trees) is trained sequentially. First, a weak learner is fitted to the original data. In the next iteration, another weak learner is fitted to the residuals, i.e., the differences between the ground truth target values and the current predictions made by the ensemble. When fitting a new weak learner to the residuals, gradient boosting adjusts its parameters in the negative gradient direction, aiming to reduce the residual error of the ensemble. This sequential learning process enables gradient boosting models to create a strong learner by combining multiple weak learners. The specific gradient boosting models employed in this study are XGB23 and HGB24,179. XGB has been shown to be a powerful tool for predictive modeling in a wide range of applications in both industry and research, including agricultural research174 and fertilizer research175. It offers an optimized and scalable implementation of gradient boosting, and includes regularization techniques to prevent overfitting23. The HGB model is primarily based on LightGBM179, which addresses one of the major bottlenecks in gradient boosting model training, namely the requirement to sort all samples at each node24. Indeed, in a traditional gradient boosting model, samples must be sorted at each node to determine the best split. This sorting process can become computationally expensive, especially when dealing with large datasets or deep trees. In HGB, the samples are first collected into a histogram, which removes the need for sorting as samples in a histogram are implicitly ordered. This optimization results in a model that is much faster to train than traditional gradient boosting models, while still achieving similar or better performance24. The choice for these two methods over other conventional ML approaches, such as neural networks, was primarily driven by the fact that both methods natively handle missing values. This constitutes a significant advantage, given that global fertilizer application rate data, along with the socioeconomic and agricultural variables used to predict the annual fertilizer application, are often incomplete. This also demonstrates another advantage of applying ML to this problem over the conventional approach using linear equations. Indeed, the absence of just one variable in the equation renders it impossible to compute.

Model training and evaluation

The selection of the optimal set of model hyperparameters is usually done using CV, after which the CV error is reported as the performance of a model180. However, based on Stone (1974)181, model assessment and model performance require different CV approaches. For this reason, we used nested CV, as it allowed us to find the optimal set of hyperparameters for a model and provide an unbiased estimate of the model’s performance182. In nested CV, two levels of CV loops are used: an outer loop and an inner loop. In the outer loop, the dataset is split into training and testing sets, typically using k-fold CV. Each fold of the outer loop trains the model on the training set and evaluates the model on the testing set. Within each fold of the outer loop, the training data is provided to an inner CV loop, in which the training data is further split into training and validation sets, also typically using k-fold CV. The inner loop is responsible for selecting the set of hyperparameters that performs best on the validation set. Finally, the performance of the selected set of hyperparameters is evaluated on the corresponding test set in the outer loop. In our study, we used a 2 × 5 nested CV, i.e., we had two outer loops and five inner loops. We employed a grid search that iteratively went over all possible combinations of hyperparameters, based on the explored hyperparameters as shown in Table 4 for both the HGB and XGB models. The performance of the models was evaluated by averaging the performances of the two models in the outer CV loop. The considered performance metrics included the determination coefficient (R2), MAE, MSE, and RMSE, all computed between the predicted and reported data points.

Table 4 Overview of the explored hyperparameters for the Histogram-based Gradient Boosting (HGB) and eXtreme Gradient Boosting (XGB) regression models.

Model interpretability through SHAP value analysis

Unfortunately, gradient boosting methods are so-called black-box models, i.e., it is not immediately clear how certain predictions are made. However, assessing the impact of the features on the predicted fertilizer application rate in the learned models could provide us with valuable insights into the drivers of fertilizer application rate. Therefore, we resorted to xAI methods to understand the predictions made by our models. More specifically, we used SHAP values25 as they are model-agnostic, can account for interactions between features and have an intuitive interpretation. Indeed, summing the SHAP values for all features in one sample results in the prediction of the model. Additionally, like XGB and HGB, SHAP values are robust with respect to missing data by design25. Special attention was given to categorical values, as retrieving one SHAP value for a categorical feature that is divided into OHE features is non-trivial. However, as the SHAP values are calculated using the preprocessed input data (i.e., containing the OHE categorical features), the SHAP values for one categorical variable were obtained by adding together all SHAP values for its respective OHE features.

Adjustment to country totals

Previous research has always started with the same premise of allocating total fertilizer consumption at the country-level for estimating crop-level use9,10. However, here we adopt a different strategy, initiating the estimation of the fertilizer consumption at the crop-level directly. Despite this shift in strategy, we still consider country-level data to be more reliable than datasets compiled from various FUBC sources. To reconcile our approach with the more dependable country-level data, we adjusted the ML predictions to align with FAOSTAT’s total fertilizer consumption at the country-level59. As shown in Eq. (5), we distributed the differences between the predicted total fertilizer consumption and the FAOSTAT totals equally among crops, after excluding the fraction used for grasslands and fodder crops from FAOSTAT totals.

$$Fert{\rm{\_}}Pre{d}_{icj}=FertML{\rm{\_}}Pre{d}_{icj}\times \frac{FAOSTAT{\rm{\_}}FERTn{g}_{ij}}{\sum _{d\in C}(FertML{\rm{\_}}Pre{d}_{idj}\times HArea{\rm{\_}}FA{O}_{idj})}$$
(5)

Where, \(Fert\_Pre{d}_{icj}\) represents the fertilizer application rate predictions after the adjustment for year i, crop c, and country j. \(FertML\_Pre{d}_{icj}\) denotes the ML model predictions, C is the set of all crops classes included in the models, \(HArea\_FA{O}_{idj}\) the FAOSTAT harvested area47 of each crop class d, and \(FAOSTAT\_FERTn{g}_{ij}\) is the total FAOSTAT fertilizer consumption for the country, after removing the fraction used for grasslands and fodder crops.

Validation

To validate the model predictions, containing information about the average use per hectare for different fertilizers and crops. This validation is quantified using the MAE and MAPE as well as with comparative plots if enough data was obtainable from the various national databases. The MAE gives an idea about the actual deviation, whilst the MAPE makes the comparison between prediction errors easier. The evaluated national databases include data obtained from for the USA38, UK50, India39,40,41,42,43,44,45, Sweden54,55,56,57, Philippines52, and New Zealand58. For Pakistan51, only data for the sum of fertilizer application rate is available, hence the sum of N, P2O5, and K2O was used, expressed as NPK. This approach is restricted by available data in national databases for average fertilizer application rate across various crops and nutrients.

Gridded crop-specific application rate per fertilizer

Following the generated comprehensive dataset of global fertilizer application rate, we constructed detailed 5-arcmin resolution gridded maps for each fertilizer (N, P2O5, and K2O), crop class and year from 1961 to 2019. The final gridded map dataset was compiled in a three-step process, as highlighted in Fig. 1. First, data of the gridded harvested area spanning from 1961 to 2019 for the 13 distinct crop classes (see Table 1) were acquired by combining data from the EARTHSTAT project of the year 2000 (\(HArea\_M2000\))29, supplemented with historical arable land and permanent crop areas per year (\(CArea\_Hyde\)) from the History Database of the Global Environment (HYDE version 3.3)30. The EARTHSTAT maps were created by combining national-, state-, and country-level census statistics with an up-to-date global dataset of croplands, organized on a 5-arcminute by 5-arcminute latitude-longitude grid. These datasets, reflecting land use around the year 2000, detail both the area harvested and the yield of 175 diverse crops worldwide29. Innovative maps outlining major crop groups were generated by consolidating these individual crop maps. The HYDE 3.3 project provides long time series estimates and maps for land use, including the cropland areas, based on an allocation algorithm with time-dependent weighting30. The elaborate information from the crop specific EARTHSTAT maps for the year 2000, in combination with the yearly changes in gridded cropland from HYDE 3.3, allowed us to make detailed gridded 5-arcmin resolution crop specific harvested areas for each of the evaluated years and crops using Eqs. (6) to (9):

Fig. 1
figure 1

Outline of the process for generating the gridded crop-specific fertilizer dataset.

For \(CArea\_Hyd{e}_{gi}\, > \,0\) and crop is rice:

$$HArea\_{M}_{gic}=CArea\_Hyde\_{R}_{gi}$$
(6)

For \(CArea\_Hyd{e}_{gi}\, > \,0\) and crop is not rice:

$$HArea\_{M}_{gic}=CArea\_Hyde\_N{R}_{gi}\times \frac{HArea\_M{2000}_{gc}}{CArea\_Hyde\_N{R}_{g2000}}$$
(7)

For \(CArea\_Hyd{e}_{gi}\,\mathrm{ > }\,0\,\cup \,{\sum }_{c\in C}HArea\_M{2000}_{c}\,=\,0\) and crop is rice:

$$HArea\_{M}_{gic}=CArea\_Hyde\_{R}_{gi}$$
(8)

For \(CArea\_Hyd{e}_{gi}\,\mathrm{ > }\,0\,\cup \,{\sum }_{c\in C}HArea\_M{2000}_{c}\,=\,0\) and crop is not rice:

$$HArea\_{M}_{gic}=CArea\_Hyde\_N{R}_{gi}\times \frac{\sum _{k\in K}HArea\_M{2000}_{gc}/K}{CArea\_Hyde\_N{R}_{g2000}}$$
(9)

Here, the indices denote the grid cell (g), the year (i), the crop (c). The harvested area (\(HArea\_{M}_{gic}\)) was generated through a series of conditional operations. These conditions stipulate that if the value of the HYDE3.3 cropland area map (\(CArea\_Hyd{e}_{gi}\)) for that year i and grid cell g is larger than 0, and the crop is not rice, then the value of that grid cell for that specific crop and year is given by the HYDE3.3 cropland area (\(CArea\_Hyde\_N{R}_{gi}\)) for that grid cell/year combination. The value of the grid cell is then further adjusted by the ratio of the HYDE3.3 map of the year 2000 to the EARTHSTAT map of the year 2000 for the corresponding grid cell and crop (\(\frac{HArea\_M{2000}_{gc}}{CArea\_Hyde\_N{R}_{g2000}}\)). In the case of rice, the specific HYDE3.3 map for cropland area of rice was selected and not altered as this is readily available. Additionally, in instances where \(CArea\_Hyd{e}_{gi}\) was larger than 0 and the sum of all crops across the EARTHSTAT maps of the year 2000 is equal to 0 (e.g., when new lands are cultivated), a progressively expanding area K was evaluated to find an appropriate ratio based on the average of the k neighboring cells. The evaluated values for k were 5, 10, 25, 50, 100, 150, 200 and 250, up until a value different from zero for the ratio is found. If no value different from zero was found, the ratio value was set equal to 1. This last step made the assumption that the crop distribution in neighboring cells adequately represents the distribution in the newly cultivated area, allowing for the calculation of adjusted harvested areas. Furthermore, as the \(HArea\_M{2000}_{gc}\) is consistently used, we assumed that the changes in crop distribution over time remain constant. To ensure the accuracy of the generated maps, we capped the harvested area at the maximum feasible value in each cell.

To ensure consistency with FAOSTAT data used in the model predictions, the gridded harvested area (\(HArea\_M\mathrm{1961\_2019}\)) was aligned with the country-specific harvested area reported by FAOSTAT (\(HArea\_FAO2000\)). Additionally, due to this alignment, some cells may initially have harvested area values that exceed the maximum possible for that cell. To correct this, we cap the harvested area at the maximum feasible value per cell and then redistribute any excess proportionally across other cells with harvested area values, ensuring overall consistency with FAOSTAT data. These adjustments, applied through Eq. (10), provided a corrected gridded harvested area for the 13 crop classes over the 60-year period (\(HArea\mathrm{\_1961\_2019}\)):

$$HAre{a}_{gic}=HArea\_{M}_{gic}\times \frac{\sum _{j\in J}HArea\_FA{O}_{icj}}{\sum _{j\in J}HArea\_{M}_{icj}}$$
(10)

In this equation, \(HArea\_FA{O}_{icj}\) represents the harvested area for year i, crop c, and country j as reported by FAOSTAT47,summed over all countries (J) in grid cell g (to accommodate grid cells with multiple countries). Similarly, \(HArea\_{M}_{icj}\) represents the estimated harvested area for the same combinations, also summed over all countries in grid cell g. The ratio of these sums adjusts the model gridded harvested area (\(HArea\_{M}_{gic}\)) to match FAOSTAT data, ensuring the resulting gridded harvested area on a country level is consistent with official statistics across the 60-year period.

Finally, the gridded harvested area (\(HAre{a}_{\mathrm{1961\_2019}}\)) was augmented with the average application rate of each predicted fertilizer (N, P2O5, K2O) as per Eq. (11):

$$FertCro{p}_{gic}=HAre{a}_{gic}\times \sum _{j\in J}(Fert\_Pre{d}_{icj}\times PercCountr{y}_{g})$$
(11)

where \(Fert\_Pre{d}_{icj}\) is the country-level prediction resulting from the HGB model after applying the adjustment, and \(PercCountr{y}_{g}\) refers to the percentage of grid cell g that is occupied by the country j. This process was then applied to each fertilizer separately to obtain gridded maps for each fertilizer, year, and crop combination.

Data Records

The gridded fertilizer application data for N, P2O5, and K2O by crops from 1961 to 2019 are available in a Figshare repository183. The dataset spans from 180°E to 180°W and 90°S to 90°N at a resolution of 5 arc-min in WGS84 (EPSG: 4326). It is provided in .tiff format, which can be read by many tools, such as R and Python. The gridded application data by crops and fertilizers are stored in several files named “Crop_NameFertilizerYear.tiff”. Here, “Crop_Name” represents each crop class listed in Table 1, “Fertilizer” refers to N, P2O5, or K2O, and “Year” indicates any year from 1961 to 2019.

Crop-specific N application

On a global scale, the N application has grown for all crops (Fig. 2). For example, the average N use of the three main cereals has risen from 17.1 ± 6.1 kg ha−1, 26.6 ± 7.2 kg ha−1, 12.1 ± 3.9 kg ha−1 for wheat, maize and rice, respectively, in the 1960s, to 97.8 ± 4.2 kg ha−1, 118.8 ± 4.2 kg ha−1, 113.8 ± 1.9 kg ha−1 in the 2010s decade. Moreover, the largest increases in N application occurred in vegetable crops, with a global growth of more than 120 kg ha−1 between these two decades (Fig. 2). Conversely, the lowest increases occurred in soybean, where N application rates grew by less than 20 kg ha−1. At the regional scale, the intensification of N fertilizer use has shifted from higher use at the beginning of the period in the USA and Europe to being currently dominated by Asian countries such as China and India (Fig. 2). This trend is particularly evident for some crops like vegetables and fruits, where China now has the areas with the highest N use worldwide, whereas in the 1960s, these areas were primarily in Southern Europe and California.

Fig. 2
figure 2

Spatial pattern of crop-specific fertilizer (N) kg ha−1 consumed by each 5-arcmin grid cell for the following: (a) average for the 1960s decade across all 13 crop classes, (b) average for the 1960s decade for wheat, (c) average for the 1960s decade for rice, (d) average for the 1960s decade for maize, (e) average for the 1960s decade for other cereals, (f) average for the 1960s decade for all oil crops, (g) average for the 1960s decade for vegetables and fruits, (h) average for the 1960s decade for roots and tubers, sugar crops, fiber crops, and other crop classes, (i) average for the 2010s decade across all 13 crop classes, (j) average for the 2010s decade for wheat, (k) average for the 2010s decade for rice, (l) average for the 2010s decade for maize, (m) average for the 2010s decade for other cereals, (n) average for the 2010s decade for all oil crops, (o) average for the 2010s decade for vegetables and fruits, (p) average for the 2010s decade for roots and tubers, sugar crops, fiber crops, and other crop classes. The 1960s decade includes the years 1961–1969, and the 2010s decade includes the years 2010–2019.

Crop-specific P2O5 application

The application of P2O5 also experienced global increases across all crops (Fig. 3), but to a lesser extent than N. The average P2O5 used for the three main cereals and soybean rose from 13.8 ± 3.3 kg ha−1, 13.1 ± 2.4 kg ha−1, 6.3 ± 1.9 kg ha−1, and 12.6 ± 2.4 kg ha−1 for wheat, maize, rice and soybean, respectively, in the 1960s to 35.5 ± 4.9 kg ha−1, 43.0 ± 5.7 kg ha−1, 39.9 ± 5.0 kg ha−1, and 39.1 ± 6.6 kg ha−1 in the 2010s. Similar to N, the largest increases occurred in vegetable crops, where P2O5 application rates increased by more than 50 kg ha−1. Conversely, the smallest increases were observed in the other cereal crop class, where the average P2O5 application rate increased by only about 2.5 kg ha−1 between the two decades. Regionally, a similar pattern occurred with P2O5 use, following the trend previously seen with N, where the hotspot shifted from Europe to Asia. This shift is particularly notable for wheat, where the hotspot of P2O5 intensification moved from Western Europe to northern India and northeastern China (Fig. 3).

Fig. 3
figure 3

Spatial pattern of crop-specific fertilizer (P2O5) kg ha −1 consumed by each  5-arcmin grid cell for the following: (a) average for the 1960s decade across all 13 crop classes, (b) average for the 1960s decade for wheat, (c) average for the 1960s decade for rice, (d) average for the 1960s decade for maize, (e) average for the 1960s decade for other cereals, (f) average for the 1960s decade for all oil crops, (g) average for the 1960s decade for vegetables and fruits, (h) average for the 1960s decade for roots and tubers, sugar crops, fiber crops, and other crop classes, (i) average for the 2010s decade across all 13 crop classes, (j) average for the 2010s decade for wheat, (k) average for the 2010s decade for rice, (l) average for the 2010s decade for maize, (m) average for the 2010s decade for other cereals, (n) average for the 2010s decade for all oil crops, (o) average for the 2010s decade for vegetables and fruits, (p) average for the 2010s decade for roots and tubers, sugar crops, fiber crops, and other crop classes. The 1960s decade includes the years 1961–1969, and the 2010s decade includes the years 2010–2019.

Crop-specific K2O application

Globally, the use of K2O has also increased across almost all crop classes (Fig. 4). For wheat, maize, rice, and soybean, the average K2O application rates have risen from 7.2 ± 1.6, 9.8 ± 2.0, 3.4 ± 0.5, and 11.6 ± 2.6 kg ha−1, respectively, to 15.4 ± 4.1, 33.1 ± 4.8, 27.3 ± 3.9, and 9.8 ± 3.2 kg ha−1. Unlike N and P2O5, the largest difference in average K2O application occurred for the oil palm crop, which increased from 3.7 ± 1.4 kg ha−1 during the 1960s to 87.6 ± 8.3 during the 2010s. Similar to P2O5, the other cereal class experienced the smallest change in K2O use. In this case, the average K2O application rate decreased from 11.7 ± 1.9 kg ha−1 during the 1960s to 9.8 ± 3.2 kg ha−1 during the 2010s. Regionally, a similar pattern emerged with K2O, following the trend observed with N and P2O5, with the hotspot of K2O fertilization shifting from Europe and the USA to Asia. However, this change was more pronounced in different crops, such as oil crops, where the use of K2O has increased significantly in countries like Malaysia and Indonesia (Fig. 4).

Fig. 4
figure 4

Spatial pattern of crop-specific fertilizer (K2O) kg ha -1 consumed by each  5-arcmin grid cell for the following: (a) average for the 1960s decade across all 13 crop classes, (b) average for the 1960s decade for wheat, (c) average for the 1960s decade for rice, (d) average for the 1960s decade for maize, (e) average for the 1960s decade for other cereals, (f) average for the 1960s decade for all oil crops, (g) average for the 1960s decade for vegetables and fruits, (h) average for the 1960s decade for roots and tubers, sugar crops, fiber crops, and other crop classes, (i) average for the 2010s decade across all 13 crop classes, (j) average for the 2010s decade for wheat, (k) average for the 2010s decade for rice, (l) average for the 2010s decade for maize, (m) average for the 2010s decade for other cereals, (n) average for the 2010s decade for all oil crops, (o) average for the 2010s decade for vegetables and fruits, (p) average for the 2010s decade for roots and tubers, sugar crops, fiber crops, and other crop classes. The 1960s decade includes the years 1961–1969, and the 2010s decade includes the years 2010–2019.

Technical Validation

This section provides a detailed discussion of the validation efforts made to confirm the validity, consistency, and plausibility of our compiled dataset and predictions. First, the performance of the ML models is evaluated. Subsequently, we use SHAP values to confirm that our models used sensible features to make their predictions, based on literature. Finally, the predictions are validated by comparing them with reported data in both national and global databases.

ML Model performance

The performance of the ML models predicting the fertilizer application rates for the three fertilizers is shown in Table 5. Both XGB and HGB significantly outperformed the naive prediction, which uses the mean fertilizer application as its prediction. HGB consistently outperformed (or matched) XGB for all three fertilizers and performance metrics. For this reason, we will use the HGB model in the remainder of this technical validation, as well as any subsequent analyses.

Table 5 Performances of the eXtreme Gradient Boosting (XGB) and HistGradientBoosting (HGB) models on the test sets in a 2 × 5-fold nested cross validation grid search.

SHAP value analysis

To examine the impact of the features on the prediction of the N, P2O5 and K2O application rates, the SHAP values of the ten most important features for the three corresponding HGB models are illustrated in Fig. 5. Agrological drivers dominated the predictions, comprising six, seven, and eight of the ten highest ranked features, respectively. The impact of the features remained consistent across all fertilizers, albeit with varying magnitudes (Fig. 5d,e,f). In particular, the predicted fertilizer application rates were consistently positively impacted by the country fertilizer per ha and the crop nutrient removal per ha (as red dots, i.e. high values of country fertilizer per ha and high nutrient removal per ha, corresponded with positive SHAP values), while it was negatively impacted by the crop nutrient content (red dots corresponding with negative SHAP values; Fig. 5d,e,f). These relationships align with the expected influence of these features on fertilization at the crop-level184. Across the different fertilizers, the most important socioeconomic features varied. For instance, the GDP per capita was the most important socioeconomic feature in the prediction of the P2O5 and K2O application rates, while in the N prediction, the global crop price was more important. Fertilization at the country-level is usually associated with the economic development of the country, measured by GDP166,185. However, at the crop-level, this relationship only held true for the most expensive fertilizers, P2O5 and K2O. For N, the most affordable nutrient156, factors such as global crop price and N cost from production appeared to be more significant (Fig. 5). Few environmental features seemed to be relevant for the predictions (Fig. 5); only the soil pH, soil OCS, and aridity index appeared in the top ten for some nutrients. Although the influence of these variables appeared to be low, the direction of these relationships confirmed the findings of other authors at the farm- or regional-level for soil organic carbon content and soil pH.143,144,145.

Fig. 5
figure 5

SHapley Additive eXplanation (SHAP) values of the top 10 most important features in the prediction of, respectively, the crop N (a,d), P2O5 (b,e) and K2O (c,f) application rates using Histogram-based Gradient Boosted regression. (ac) The top plots present the average feature importance, determined by the mean absolute SHAP value of each feature. (df) The bottom plots depict a SHAP value for each prediction and show the local feature importance and the feature effect. The color of a dot represents the value of the feature in that instance - red indicating relatively high, blue indicating relatively low values. A dot with a high SHAP value for a feature suggests a positive contribution to the prediction, whereas a negative SHAP value leads to a lower prediction. The features are ranked in order of descending average importance and the blue, green and orange squares indicate whether the feature is an environmental, agrological or socioeconomic characteristic.

Validation

To evaluate the validity of our results, we compare the compiled dataset based on the predictions against several national databases38,39,40,41,42,43,44,45,50,51,52,54,55,56,57,58 based on the MAE and MAPE errors between both, averaged over the available years as illustrated in Table 6. For most country/crop combinations, the differences are within reasonable ranges, with MAE values between 5–40 kg ha−1 and MAPE values between 10%-50%. However, for some countries, the deviations are larger, suggesting that our models may not capture all the underlying intricacies in the data for each country or crop. This can be seen for Sweden where most results deviate between 20%-100%, or New Zealand where similar results can be found. However, it should be noted that these larger differences between our compiled dataset and the national databases cover only limited years as data was not always available for certain countries, as was the case for Sweden and New Zealand. Still, these discrepancies are slightly better than in earlier research9. Additionally, more detailed plots to evaluate the results per year for the USA and UK, based on the USDA and DEFRA respectively, are included in Figs. 6 and 7. For the USDA and DEFRA crop nutrient data, the MAPE values are less than 50% and usually less than 25%, except for USDA soybean N (Fig. 6). Figures 6 and 7 show that our predictions follow the real observed trend for the samples and thus form a reliable end source with only minimal differences. These discrepancies between the national databases and our compiled dataset can be attributed to occasional disparities between the application rates in the training data (data provided by the global dataset compilation) and the data in the national databases, e.g., the USA data for soybean N in 1998 differed by 400% between the two samples. These differences should be taken into account when comparing our results to the national databases, as our predictions are based on the global compiled dataset. As can be seen in Table 7, where the global databases data and the national databases are compared based on MAE and MAPE, most country/crop combination indicate an MAPE values between 10%-50%, which is similar to our resulting error in Table 6. Also, the lack of training samples for some country/crop combinations resulted in larger errors for these occurrences.

Table 6 Validation of our model predictions of the average application rate per ha against national database information for certain countries and crops per fertilizer.
Fig. 6
figure 6

Comparison of the application rates per ha per year for various crops between our predicted model output and the data reported by the United States Department of Agriculture (USDA) for the USA.

Fig. 7
figure 7

Comparison of the application rates per ha per year for various crops between our predicted model output and the data reported by the Department for Environment, Food & Rural Affairs (DEFRA) for the UK.

Table 7 Comparison of the data reported by global datasets16,17,18,19,20,21,22,32,33,34,35,36,37 of the average application rate per fertilizer per ha against national database information for certain countries and crops per fertilizer.

To conclude, the model performances and logical feature importances, derived from the SHAP values, in conjunction with the relatively minor differences between this study and regional statistics, as well as earlier literature9, indicate that our crop-specific fertilizer application rate dataset is comparatively reasonable across regions and years.

Usage Notes

In this study, we provide detailed estimates on global N, P2O5, and K2O fertilizer application rate based on the HGB model output and compile a comprehensive dataset of these estimates by major crop groups between 1961–2019. Tabular data of the country- and crop-level predictions are made available as well as the 5-arcmin resolution gridded maps from our application, rendering an easy to use complete dataset. Subsequent analysis can be done both on the tabular data and the outputted maps, such as a trend analysis of fertilizer application rate or causal discovery to identify drivers of fertilizer application rate. Furthermore, our dataset can be leveraged as a source in other models where for example yield, ecological impact or fertilizer pricing can be seen as the output rather than use.

Our results represent an improvement and advance in efforts to evaluate historical fertilizer consumption for different crop groups and fertilizers. However, as demonstrated during the validation process, this approach still has limitations that data source users should be made aware of. The limited amount of available data for some crops, nutrients, and regions can lead to biases, particularly in regions such as Africa, during certain years, especially in the 60 s, and for certain nutrients, mainly K2O. Hence, the ML approach can be sensitive to outlying data points or noise and the limited data can make it prone to overfitting, which was mitigated as much as possible in the CV setup. In addition, our model is trained on data provided by global datasets16,17,18,19,20,21,22,31,32,33,34,35,36,37, which means that while our predictions may align closely with them, it is essential to acknowledge that they might diverge from national data mainly due to the difference between the two data sources as highlighted by the validation. This discrepancy between global and national databases such as the USDA38 or DEFRA50 databases highlights the complexity of accurately capturing historical fertilizer consumption trends across different regions and crop types. Moreover, the gridded cropland data provided by the HYDE 3.3 project30, is inconsistent with the one from satellite-derived land use (e.g., China and India186,187) or data derived from a national census at regional scale (e.g., USA188), as stated by Adalibieke et al.9. Furthermore, utilizing neighboring cells to allocate harvested areas across different crops, as well as leveraging the EARTHSTAT map29, implies some assumptions (see Eqs. (6) to (9)). The main assumption is the suggestion that the distribution pattern of a specific cell mirrors that of its neighboring cells, which constrains potential changes in cropland over cells. The consistent use of the EARTHSTAT map29 of the year 2000 also assumed that the crop group distribution of harvested area remains constant over time between 1961–2019. Finally, it is important to recognize that there are additional uncertainties stemming from the utilization of various data sources and methodological decisions within each data source, but these lie beyond the scope of our study.

Nevertheless, our study extends the current literature by providing a more detailed historical geospatial distribution of fertilizer application rate by crop and using ML to obtain detailed predictions with high precision. The detailed description and open-source code, in combination with the limited data sources used and ability to forecast, also make the method reproducible and easy to extend to forecast fertilizer application rate. In addition, our approach does not entail any assumptions, making it more flexible and robust than precious studies. Future research can build upon our study by expanding on more detailed specific fertilizer application rate. Considering the frequency of fertilizer application as well as the timing can be valuable for future research on the evaluation of fertilizer effectiveness and use. In addition, our study focuses on broad fertilizer applications, however, more detailed maps can be made for different types of specific fertilizers considered in our study (e.g., N fertilizer types). Furthermore, the time granularity of our maps can be improved. In addition, satellite data can be used to obtain even more fine-grained predictions, both in regions and more detailed time periods. Finally, a deeper investigation into the drivers of fertilizer application rate could enrich our understanding. While our focus has primarily been on the explainability of our model, exploring methodologies such as causal discovery or causal ML within a temporal setting could unveil the drivers of fertilizer application rate over time, potentially providing valuable insights and facilitating more detailed predictions.

Table 8 Overview of the used open source packages and respective programming language in the code for model training, SHAP value computation, validation and map building.