Background & Summary

Synthesizing the wealth of scientific evidence on the sustainability of agricultural practices is critical for informed decisions by policy makers, public administrators, and private investors. Producing food entails multiple and interlinked challenges around food-security, regenerating ecosystems, and the need to preserve resources and mitigate greenhouse gas emissions1. Decision-makers need robust knowledge on farming practices (hereafter FPs, including agronomic practices, cropping and livestock systems, land management options and impact-mitigation techniques) to drive agricultural systems towards higher climate, environment, and productivity performance.

A general search for sustainable agriculture science-related terms, such as (agric* OR farm*) AND (sustainab* OR environm*) in databases like Web of Science and Scopus retrieves over 260,000 peer-reviewed primary studies. While some of these individual studies may produce convergent or similar results, some others may provide contradictory findings. These disparities can arise from different locations, varied environmental conditions, and other factors, often making it challenging to achieve the statistical robustness necessary for decisive conclusions.

Systematic reviews and meta-analyses (MAs) can help to detect general trends and deal with disparities, as observed by the exponential growth in the number of published MAs and systematic reviews in agri-environmental science, since the year 2000. For example, a search string tailored to agricultural and environmental science as above such as (agric* OR farm*) AND (sustainab* OR environm*) AND (meta-analy* OR “systematic review*”) run in the databases Web of Science and Scopus yields 4,591 and 6,260 peer-reviewed MAs, respectively (search performed in June 2023).

A systematic review entails the systematic compilation, evaluation, and description of all pertinent studies retrieved using a specific query. Employing a comprehensive protocol safeguards against bias and ensures a transparent and reproducible process2. MAs, in turn, combine systematic review with statistical analysis, enabling the quantification of specific outcomes across primary studies3. This approach has gained traction within agri-environmental sciences, offering distinct advantages over primary studies4. These advantages include refining outcomes of the effects, upscaling results of local experiments, scrutinizing the diversity in FP performance under varying bio-geographical, environmental, and farm management contexts, while reconciling seemingly opposite research findings. MAs cover a large range of FPs, such as cover crops, intercropping, agroforestry, organic farming, and conservation agriculture and their impacts on e.g. crop production, water and soil quality, biodiversity, pest- and disease-control, greenhouse gas emissions5,6,7,8.

Recently, at an even higher synthesis level, systematic review methods are being used to synthesize the evidence from multiple published MAs to generate robust and reliable conclusions regarding the effects of different FPs4,9,10,11,12,13,14,15,16,17,18,19. However, these syntheses of MAs are still scarce and focused on relatively limited groups of FPs and/or of impacts and fail to provide a broad picture of the impacts of a large range of FPs in the context of sustainable agriculture. The iMAP-FP dataset20 presented here can contribute to fill such gaps.

Overview of the protocol used to build the iMAP-FP dataset

This paper describes one of the most comprehensive datasets to date on the impacts of 34 FPs categories on climate, environment, and agricultural productivity. The dataset was developed within the iMAP project (Integrated Modelling platform for Agro-economic and resource Policy analysis), which aimed to provide robust scientific evidence to support the implementation, monitoring and evaluation of the Common Agricultural Policy, in particular for the environmental and climate change objectives. The project was commissioned by the European Commission's Directorate General of Agriculture and Rural Development to the Joint Research Centre, with a long-term perspective going beyond the specific duration of the project.

Figure 1 provides a schematic overview of the protocol, used to screen and select published MAs. Following a rigorous methodological framework21, described in the next section, we systematically selected 13,935 peer-reviewed MAs published since the year 2000, of which, based on a series of criteria (see Section 3 of the dataset20), we further selected 570 and categorized them according to the type of FPs considered, the assessed impact/s, the covered geographical regions, and their quality level. We extracted the statistical significance of the effects associated with sustainable FPs (intervention) compared to a control (typically conventional agricultural practices) across 223 independent intervention-control pairs. We classified outcome metrics into 34 impact categories, including, but not limited to, climate mitigation, soil health, environmental pollution, water use, nutrients cycling, biodiversity, and agricultural productivity. Finally, we also mined the main knowledge gaps reported in the MAs. In addition to the iMAP-FP dataset20, we developed an open-access online library, which can be used to generate reports summarizing the available scientific evidence on specific FPs, accessible in the European Commission ___domain (iMAP-FP-EvidenceLibrary).

Fig. 1
figure 1

Schematic overview of the workflow underpinning the iMAP-FP dataset20.

This dataset and evidence library are easily accessible by policymakers, scientists, stakeholders and the general public. It is currently used by European Commission officers to support decisions and evaluations of key policies such as the European Union Common Agricultural Policy (CAP), the Farm to Fork strategy, the Biodiversity strategy, the Climate law and the recent Nature Restauration Law. Likewise, we believe it could be a useful tool for other worldwide policy-making contexts. We also propose it to the scientific community as a “living” library, helpful in coordinating efforts and progressively fill current knowledge gaps in the field of agriculture sustainability.

Methods

The dataset was developed using a systematic review approach, implemented to retrieve and synthetize published MAs, following the steps described below.

Identification and definition of farming practices categories

We identified 34 categories of FPs based on their relevance to the European Union Common Agricultural Policy (CAP). The European Commission’s Joint Research Centre team, together with external experts and officers from the Directorate-General for Agriculture and Rural Development, selected these FPs categories, based on their relevance for the CAP Strategic Plan (programming period 2023-27). They cover instruments under direct payments (GAEC and Eco-Schemes) as well as Rural Development interventions. The list of FPs includes cropping systems (such as agroforestry, organic farming, conservation agriculture), wide-ranging groups of land management options (such as landscape features or grassland management), specific agronomic or livestock-management practices (such as organic fertilisation, enhanced-efficiency fertilisers, livestock feeding techniques, etc.) and environmental mitigation techniques (such as manure storage techniques). These categories of FPs not only include group-level information (e.g. landscape features), but also specific sub-practices (e.g. field margins). Definitions of FPs derive from various official sources, such as Eurostat, the European Food Safety Agency (EFSA), the Organisation for Economic Co-operation and Development (OECD) or the Food and Agriculture Organization of the United Nations (FAO), as well as ad hoc descriptions found in the scientific literature reviewed.

Systematic identification of potentially relevant meta-analyses

For each FP category, a specific search equation was defined. The first part of the search equations included a combination of keywords targeted to the FP derived from its definition, the scientific literature and the relevant EU policy frameworks (Table 1). This FP-specific combination of keywords was reviewed by a “core” team (seven scientists) and iteratively improved after discussions with a wider team of scientists and policy officers. The second and the third parts of the equation were common to all FPs. The second part was optionally used to restrict the search to articles that address topics specifically related to agricultural practices. This second part was not always necessary, but it was useful when the keywords listed in the first part of the search equation selected a large number of irrelevant articles (not related to agricultural activities). The third part of the search equation included specific keywords to focus the search on meta-analyses or systematic reviews only, excluding primary literature and non-systematic narrative reviews. We did not include keywords related to one or several specific environmental/climate/productivity outcomes, because we wanted this review to be as comprehensive as possible, without limiting the potential outcomes.

Table 1 Structure of the search equations (and examples) used for the systematic review of meta-analyses, for each farming practice category. Search equations for all farming practice categories are available in section 2 of the dataset20.

Searches in the two major electronic databases (Web of Science and Scopus) began for the first set of FPs in July 2020 and ended with the latest FPs in June 2023.

Selection of meta-analyses

The systematic review methodology used in this study was aligned with best practices in the field and was guided by the Preferred Reporting Items for Systematic Reviews and MAs (PRISMA) statement and the Cochrane Handbook for Systematic Reviews of Interventions22. After merging the results obtained from the two databases and deleting the duplicates, the selection process included multiple steps to identify relevant MAs. The screening was performed by two researchers for each FP category and then reviewed by a group of seven other reviewers based on explicit exclusion/inclusion criteria (see below). Although our framework was primarily intended to synthetize MAs, we also considered systematic reviews if they included some quantitative results even if formal statistical analysis was lacking. Purely qualitative systematic reviews and non-systematic reviews were excluded. In short, when we use the term “MA”, we also include systematic reviews.

For all FPs, eligible studies were those that: (a) were peer-reviewed MAs of primary literature retrieved through systematic protocols; (b) reported results derived from pairwise comparisons of one or several sustainable FPs with control practices, on either agronomical (production yield, efficiency, etc.), environmental or climatic impacts, and (c) were published in English. Then, specific additional inclusion and exclusion criteria were established for each FP (see Section 3 of the dataset20). Using these criteria, the first screening was based on the title and abstract. Articles retained were then assessed for eligibility, identifying the Population, Intervention, Comparator, Outcome (PICO) after reading the full text. Any discrepancy of opinion among the reviewers was resolved by consensus.

The “PICO” structure is a typical semantic structure used in meta-analysis and evidence-based science4. The Intervention-Comparator pair is the basis of experiments used to assess the effect of an intervention (in our case a farming practice), with respect to a benchmark (in our case a comparator practice), by measuring the outcome (in our case an environmental, climatic or productivity impact) using one or more metrics. The “population” describes the object of the intervention and the experimental conditions, using many different variables. In our case, the population is the type of agro-ecosystem, which the intervention (farming practice) is applied to, and it can be described by several variables, such as pedo-climatic conditions, type of soil, ecosystem and landscape structure, type of crops, etc.

Data extraction

Data were extracted from selected MAs, for each FP category, using a standardized data extraction spreadsheet (see Supplementary Table S1). For each FP category, at least two reviewers (who were in charge of the identification and selection of MAs) conducted the data extraction. The data extracted were reviewed by the other researchers of the team.

Description of the selected MAs

For each selected MA, we extracted information regarding the scientific background and main objectives, the methods used for the systematic review (i.e., literature search strategy and selection criteria), statistical analysis, the number of original primary studies and experimental trials, the target population (that is, the specific pedo-climatic conditions, geographical areas, the types of crops or land use covered by the primary studies included in the MA), the comparison pairs (the FP used as intervention and the FP used as comparator), the main results, the influential factors, the conclusions drawn by the authors and the main knowledge gaps identified. The information was recorded in spreadsheets (Supplementary Table S1) using text strings faithful to the original wording used by the authors of each MA.

Report of effect size types and results of statistical tests

In primary studies, outcomes are measured separately for intervention and comparator using specific metrics (e.g. grain yield, soil organic carbon content, farmland bird’s species abundance, nitrous oxide emission per hectare, etc.). MAs combine results to estimate the mean effect size of the intervention versus the comparator for the outcome considered. Here, we reported the types of effect sizes used in each MA, such as the ratio of the metric in the intervention and the comparator (I/C), or their relative difference ((I-C)/C)), the logarithm of the ratio, the standardized difference, etc.

Each PICO combination was extracted as a new entry row in the spreadsheet (Supplementary Table S1). We classified each mean effect size reported into four possible levels: (a) significantly positive effect, (b) significantly negative effect, (c) non-significant effect, (d) results lacking formal statistical analysis. This classification was based on confidence intervals or p-values (or other statistical test outcomes) reported by the authors of the MAs.

Quality assessment of the meta-analyses

The quality of the selected MAs was assessed based on 16 independent quality criteria (Table 2), following a standardized quality assessment method11. The criteria cover the quality of the systematic review process, the quality of the statistical analysis and the risk of bias, in all the different steps followed when conducting a MA. Several of these criteria can also be found in previous publications10,11,23,24.

Table 2 Quality criteria used to assess the quality of the selected meta-analyses.

Classification and harmonization of intervention-comparator pairs and outcome metrics

After the PICO extraction, we classified intervention-comparator pairs (such as agroforestry vs conventional farming) and outcome metrics (such as soil carbon stock, soil sediment loss, birds’ abundance, etc.) into homogenized categories. We classified the intervention-comparator pairs by associating the original FPs definitions of each MA with generic definitions. We classified the metrics into two different types, depending on the experiments: empirical experiments (e.g. pot-, laboratory-, greenhouse-, field- or farm-scale) or numerical experiments based on model simulations (including life-cycle-assessment models). Additionally, we further classified these metrics into different impact categories depending on whether they addressed environmental, climate or productivity targets. Metrics were attributed to impact categories when a direct link could be established, while avoiding proxy-based or indirect cause-consequence associations. For instance, the metrics “nitrogen leaching” or “sediment run-off” were respectively associated to the impact categories “Nutrient leaching and run-off” and “Soil erosion”, while not to “Water quality” or “Soil nutrients”. To each metric, we assigned a direction of change (i.e. either decrease or increase) to be associated with the positive effect towards the impact. For instance, “increase soil carbon content” was associated with “increase carbon sequestration” and “increase soil methane uptake” was associated to “decrease greenhouse gas emission”.

Restrictions

We deliberately limited MA selection and data extraction to outcomes related to environmental, climate and productivity impacts. Therefore, in the current version of the dataset (2023 version)20, all outcomes related to economics, food quality, safety, nutritional values, labour and social dimensions were not considered.

To speed up the analysis, when the number of MAs exceeded 10 for a specific combination of FP and impact, we limited the selection process to the 10 most recent MAs, either excluding older MAs or prioritizing MAs carried out on a global scale. The details of such criteria are reported among selection criteria for each FP category.

In the present version of the iMAP-FP dataset (2023 version)20, the quantitative numerical estimates of the effect sizes reported by the selected MAs were not extracted.

Data Records

The full iMAP-FP dataset (2023 version) is available at the JRC data catalogue platform20. Below we describe the different sections of the dataset20. Table 3 supports the reporting of the different sections, specifying the data resulting from each analytical step and the names of the corresponding files, each one being stored in a separate spreadsheet (CSV format). Here we give a brief description of the content.

Table 3 Structure of the iMAP-FP dataset, structured into sections.

Section 1 - Farming practices glossary

Section 1 corresponds to the first step of the process, defining the FPs. It includes a glossary of the definitions of the FPs, accompanied with literature references, as well as more detailed explanations regarding the boundaries (what is included or excluded) of these definitions.

Sections 2, 3 and 4 - Systematic screening of MAs

Section 2 reports all search equations run on Web of Science® and Scopus® databases, together with their dates. The number of literature records resulting from all the search equations run in the Web of Science and Scopus databases (including the dates of search) is reported in Fig. 2, sorted by FP category and publication year and after duplicate removal.

Fig. 2
figure 2

Records retrieved from Web of Science and Scopus databases. Yearly distribution of unique records retrieved by using specific search equations for 34 farming practice categories. Farming practice categories are ordered chronologically according to the search dates, provided after the name. Purple squares are used to visualize the year of search and screening for each farming practice category.

Section 3 reports the detailed list of exclusion/inclusion criteria used for the screening and selection of the relevant MAs for each of the FP categories. Details of the screening process (including references of each record, exclusion/inclusion during abstract or full text reading) are included in Section 4 of the dataset20. Figure 3 provides an overview of the systematic selection process, which lasted around 36 months, from July 2020 to June 2023. After elimination of duplicates found in both databases, we identified a total of 13,935 records, potentially valid to meet the selection criteria of each FP category (reported in detail in Section 3 of the dataset20). After screening the titles and abstracts, 4,580 full-text articles were assessed for eligibility, and finally 759 MAs were retained for data extraction. Some of these MAs were selected simultaneously in more than one FP category selection process; therefore, the final number of unique MAs selected across all FPs is 570.

Fig. 3
figure 3

Graphical representation of the selection process leading to the systematic mapping applied to meta-analyses (MAs) published in the field of agriculture sustainability. The number of MAs retrieved from Web of Science and Scopus (after duplicate removal) is reported in the first bar-plot, for each of the 34 farming practice (FP) categories. The number of MAs retained and discarded after title-abstract reading and after full text reading is reported in the following two bar-plots. As some MAs were reporting results on more than one FP category, the final number of unique MAs selected for further data extraction is also reported.

Further details regarding selection process of each FP category (i.e. derived from each single search equation) are reported in the Supplementary Information, using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement diagrams25.

Section 5 – Selected MAs

Section 5 includes the reference list of 570 unique MAs, selected along the screening process. Of the selected 570 unique MAs, nearly 100 reported results on more than one FP category. On average, 10 MAs were relevant per each FP (Fig. 3), with some FPs covered by up to 73 MAs (Leguminous crops), 51 (No tillage and reduced tillage), 41 (Cover and catch crops, mulching), while other FP categories covered by few MAs (e.g. fallowing, with only 4 MAs).

The distribution of the selected MAs per publication year is reported in Fig. 4a. The median number of primary studies in each MA was 53, and ranged from 4 to 678 (Fig. 4b). The selected MAs were often conducted (397 MAs out of 570) at a global scale (Fig. 4c), without specific geographical limitations. Fewer MAs were focusing on specific continents (e.g. 65 on China, 27 on Europe, 18 on North America, 21 on Africa/Sub-Saharan-Africa/west-Africa and 13 on Asia/south-Asia), or on countries (e.g. 72 on China, 18 on the USA, 6 on Brazil, 3 on India, 2 on New Zealand and Canada) or on biogeographical zones (e.g. 6 on Mediterranean, 5 on temperate, 3 on tropical, 3 on continental, 2 on artic and 2 on arid).

Fig. 4
figure 4

Distribution of the selected meta-analyses per publication year (a), per number of primary literature articles synthetized (b) and per geographical coverage (c). Dotted lines indicate the starting date of the literature screening process (a) and the median of primary studies synthetized per meta-analysis (b).

Section 6 – Quality assessment of MAs

Section 6 of the dataset20 includes the assessment of the quality, transparency and robustness of the selected MAs (16 criteria), as shown in Table 2. Note that slightly different quality assessment may have been recorded on different FP categories or impact categories reported by the same MA. For instance, in some cases, the statistical analysis was of different quality (e.g. weighted mean, confidence intervals, etc.) for metrics regarding carbon sequestration and others regarding biodiversity, reported by the same MA.

Section 7 – Qualitative synthesis and reporting

Section 7 of the dataset20 summarizes the selected MAs, using wording as reported originally by the authors of MAs. For each combination of FP and impact categories, the following items are reported: the main characteristics of each MA, including their scientific background, main objectives, main findings, the relevant factors influencing the results, the authors’ conclusions and main knowledge gaps. This structure is a standard used in reporting systematic reviews and meta-analyses, by the Cochrane Handbook for Systematic Reviews of Interventions26.

Sections 8 and 9 – Classification of intervention-comparator pairs, outcome metrics and impact categories

Section 8 of the dataset20 reports the intervention and comparator pairs (e.g. “no tillage” and “conventional tillage”, “cover crop” and “bare soil”, etc.), outcome metrics (e.g. soil organic carbon stock, N2O emission, etc.) and type of effect sizes (e.g. logarithm of the ratio, Hedge’s g, etc.). All items are reported using the definitions used by authors in the original text of the MAs.

Section 9 reports a categorized list of 3,811 results, their significance based on statistical tests (i.e. significantly positive or negative, non-significant, lacking statistical tests) for each of the main PICO combinations reported by each MA. In total, we classified 223 different combinations of intervention-comparator pairs (specific FP vs a control FP). The average number of unique intervention-comparator pairs per FP was 6, but ranged from 1 to 21 (Fig. 5). The results reported by the selected MAs covered 553 unique outcome metrics. We grouped these metrics into 34 impacts categories (Fig. 6). The large majority of such metrics (i.e. 542 metrics, grouped into 25 impact categories) were estimated from experimental (or observational) studies, while few of them (i.e. 17 metrics and 9 impact categories) were based on modelling studies (typically life-cycle cradle-to-farm-gate models) (Fig. 6).

Fig. 5
figure 5

Data by farming practice category. Number of results of statistical tests on the effects (i.e. results of statistical models used to test the significance of the effect of a farming practice on the outcome, relative to a comparator practice), unique intervention-comparators pairs and outcome metrics (i.e. metrics used to quantify the outcome of farming practices), identified for each farming practice category.

Fig. 6
figure 6

Data by impact categories. Number of results of statistical tests on the effects (i.e. results of statistical models used to test the significance of the effect of a farming practice on the outcome, relative to a comparator practice), unique intervention-comparators pairs and outcome metrics (i.e. metrics used to quantify the outcome of farming practices) identified for each category of environmental/climate/productivity impact. Impact categories are also divided according to the type (either empirical or modelling) of primary studies.

Technical Validation

Dataset revision and data cleaning

At the end of the entire process (June 2023), all single spreadsheets (Supplementary Table S1), used separately for data extraction for each FP category, were merged together using dedicated scripts (run in R programming language). The whole dataset went through a revision process, aimed at checking the homogenization of the classifications, correcting errors and checking missing data in all sections. Particularly, a group of four reviewers revised all attributions of metrics to impact categories for all FP categories. Discrepancies were resolved through consent within the team.

The majority of corrections are related to Sections 8 and 9 and were especially necessary to homogenize the attribution of metrics to impact categories, in all FP categories. Some other corrections were made to other sections of the dataset, to correct inconsistencies and missing data. Figure 7 summarizes the number of corrections performed for each section of the data set.

Fig. 7
figure 7

Technical validation of the dataset, performed between June and October 2023 by 5 reviewers. Number of corrections (e.g. metric attributions to impact categories, wrong direction of the result, missing entries, etc.) in each section of the dataset and per farming practice category.

The main limitation of the present dataset is the difference in the MA search date between the FPs (from July 2020 to June 2023). For example, the search equations for “agroforestry systems” and “efficiency-enhanced fertilizers” were run in late 2020 and early 2021. Therefore, new MAs published after those dates, during the years 2021-2023, were not included in the current version of the iMAP-FP dataset20. An update is foreseen for most relevant FPs as identified by policy makers and/or for fast evolving ___domain (e.g. feed additives, animal welfare).

Usage Notes

The iMAP-FP dataset20, described in this article, can be loaded into any software that supports.csv files. To process the dataset, one can use programming languages such as Python, Java, MATLAB, or R. This dataset holds significant value for scientists and analysts in the agriculture-environmental science-to-policy ___domain, as it can be used to analyse the effects of farming practices on agriculture sustainability and productivity with robust knowledge, systematically extracted from peered-reviewed scientific literature.

As our dataset provides a detailed description and quality assessment of hundreds of MAs covering a wide diversity of FPs, it can be used to produce scientific syntheses on a wide range of topics related to agricultural sustainability in different regions of the world. More specifically, our dataset can be used to create evidence maps, as recently illustrated by Chen et al.14, where a small part of the iMAP-FP dataset20 was used to assess the benefits of enhanced-efficiency fertilisers for the environment, nutrient use efficiency, soil fertility, and crop production.

The iMAP-FP dataset20 provides all the elements needed to quickly identify relevant MAs for assessing the impact of a given FP on a particular environmental impact. It thus greatly facilitates the synthesis work of scientists or evaluators wishing to provide robust evidence concerning the sustainability of agricultural practices. Likewise, scientists can quickly identify which MAs can provide relevant primary datasets, in view of data re-analysis and towards incremental evidence integration, in accordance to the principles of the so-called “virtuous cycle of data ecosystems”27.

Our dataset can also be used as a basis for selecting relevant studies for vote counting of MAs or second-order meta-analysis, as illustrated in recent publications9,13,14. Instead of analysing the primary data as in standard MA (1st order), these fast-track methods either count the number of significant mean effect sizes provided by 1st order MAs (vote counting of MAs) or computes a weighted average of the mean effect sizes produced by the first-order MAs (second-order MAs). These methods are much quicker to apply than standard meta-analyses, and thus enable swift response to requests from policy-makers on the environmental impact of FPs, as shown in recent years by the increasing use of second order MA28.