Introduction

Restoration of tropical forests is pivotal for mitigating biodiversity loss, safeguarding ecosystem services, and mitigating climate change on both regional and global scales1,2,3. Amidst a spectrum of restoration strategies, fostering natural forest regeneration is the most cost-effective approach with the greatest potential for upscaling4,5. Natural forest regeneration is the spontaneous re-establishment of woodland cover on abandoned fields or degraded lands, which were previously covered by forest, resulting in the formation of secondary forests. The level of local and landscape degradation, however, may lead to divergent successional pathways, implying substantial uncertainties regarding restoration outcomes, ecological benefits, and recovery timelines1,2. In the Brazilian Amazon, secondary forests cover 17 million hectares, of which 41% is regenerating in areas degraded by repeated burning and therefore might have reduced capacity to restore carbon stocks3,4 and biodiversity levels5. The large variation in environmental conditions, land-use histories and landscape conservation status challenges the definition of ecological indicators and reference values that allow assessing regeneration success. Aiming to circumvent such uncertainties and offer a means for tracking the progress of restoration interventions and public policies, we compile the largest dataset on Amazonian secondary forests to develop a simple and comprehensive approach for identifying and monitoring forest regeneration success.

We define regeneration success as the development of a successional forest with high ecological integrity. Ecological integrity is a concept historically applied to old-growth forests that has been recently rephrased to incorporate successional dynamics6. Ecological integrity is the ability of an ecological system to support and maintain a community of organisms that has species composition, diversity, and ecosystem functioning comparable to those of natural habitats within a region and at a given age class6,7,8. Operationalizing ecological integrity, therefore, requires a reference of what would be a natural habitat6. Reference systems can be old-growth forests or a successional trajectory that has suffered no or minimum degradation8. As secondary succession is a dynamic process in which the full recovery of occurs at different rates and might take several decades9, it makes little sense to use old-growth forests as a reference system8,10. A young secondary forest can function perfectly well despite being (still) very different from old-growth forests. A more adequate reference system for regenerating forests, therefore, is successional trajectories that develop under contexts of minimum degradation in the region, i.e. subjected to least limitations to succession10,11,12.

Limitations to the successional process emerge from anthropogenic impacts that cause degradation1,13,14. Deforestation frequency3,12, landscape fragmentation15 and intensive and extensive land-use previous to and during forest regeneration12 reduce recovery rates of multiple ecological attributes and modifies the floristic and functional composition of regenerating forests12,16,17. After high intensity of land use, i.e. in areas degraded by fire, agriculture or pasture, propagules storage in the soil is impoverished and soil chemical and physical quality is reduced18, species colonization is hindered and tree growth is slowed down13. In fragmented landscapes with low forest cover, seed dispersal limitation is enhanced, air and soil temperatures increase and air humidity decreases13. Altogether, these factors hinder the colonization and growth of a large set of species, reducing the ecological integrity of successional forests. In such situations, natural regeneration by itself will fail to effectively restore ecosystem functioning8. In contrast, in contexts of low degradation, where limitations to succession are minimal, forest regeneration may follow an optimal successional trajectory1,9,13,14, which attains the highest possible values of vegetation attributes under the local environmental conditions. Such optimal successional trajectory represents the recovery potential in a given region and therefore can be used as a reference to derive values from indicators and allow assessing regeneration success at different moments in time6.

Evaluating the ecological integrity of secondary forests requires the integration of multiple ecological indicators with known behaviour in response to time, environmental conditions and anthropogenic impacts6. Ecological indicators are ecosystem attributes used to depict ecological conditions19, and therefore must be sensitive to degradation and have a predictable response to disturbances and to successional changes20.To be useful in practice, ideally, good indicators must be easy to measure and applicable across a range of environmental conditions20. In forest ecosystems, assessing ecological integrity requires indicators representing the key components of forest structure, diversity, and function6,7. Indicators can be derived from forest attributes such as basal area, stem density, biomass and species richness6,20. Such attributes show different recovery trajectories over time, with faster recovery of basal area, for example, compared to biomass and with stem density showing a hump shape at intermediate successional stages9. Some indicators can serve as a proxy for ecosystem services provision. For instance, the diversity of native species is associated to the conservation value of secondary forests and biomass associated with carbon sequestration and stocks. Understanding, therefore, how individual forest attributes change over time and how they are affected by environmental conditions and degradation is crucial to identify good indicators of ecological integrity to be able to measure regeneration success.

In this study we introduce a framework for assessing and monitoring the ecological integrity of naturally regenerating forests. We present ecological indicators and reference values that allow assessing regeneration success and estimating the potential ecological benefits of forest regeneration in the Amazon. We compiled the largest dataset on Amazonian secondary forests, comprising 448 vegetation plots distributed across the Brazilian Amazon (Fig. 1) to (i) investigate the effects of environmental and anthropogenic factors on the regeneration of multiple forest attributes, (ii) model and map out optimal successional trajectories across the region and (iii) derive reference values for key ecological indicators. These reference values, estimated for multiple forest age classes, serve as essential benchmarks for quantifying regeneration success and safeguarding the effectiveness of restoration efforts in the biome. Our research introduces a robust approach for assessing the ecological condition of regenerating forests and provides valuable decision-making tools for forest restoration and conservation initiatives.

Fig. 1: Location of sampled secondary-forest plots.
figure 1

White circles indicate the ___location of the 448 secondary forest plots, distributed across 24 sites, used in our analyses. Background map shows land use and land cover classes within the Brazilian Amazon limits mapped by MapBiomas in 2019, as indicated by the coloured legend at the upper right. The histogram at the lower right shows the distribution of landscape forest cover within 3 km buffers around each plot.

Results

Effects of environmental and anthropogenic factors on forest regeneration

We used a model selection approach based on generalized linear mixed models (GLMM) to evaluate how successional age, anthropogenic impacts (represented by previous land-use history and landscape forest cover), and environmental conditions affect forest attributes related to structure (stem density, maximum diameter, basal area, and structural heterogeneity), diversity (species richness for 100 individuals and species diversity of native species) and functioning (aboveground biomass - AGB). Sites were included as random factors. The best models selected explained between 60 and 72% (conditional R2) of the variation in forest attributes, with fixed effects explaining between 20 and 42% (marginal R2) (Fig. 2, Supplementary Table 1). The model for species richness had the largest model explanation, with similar proportions of variation explained by fixed and random factors (Supplementary Table 2).

Fig. 2: Standardized effect size of environmental and anthropogenic factors on forest structure, diversity and function.
figure 2

Standardized effect sizes retrieved from the best models for forest structure (stem density, maximum DBH, basal area and structural heterogeneity - SH), diversity (species richness for 100 individuals, Hill1 diversity index) and functioning (aboveground biomass- AGB). The predictors represent forest age, previous land-use history and soil physical conditions. Standardized coefficients are only shown for significant relations. Blue circles represent values higher than 0 indicating positive effects, and orange circles represent values lower than 0 indicating negative effects. Asterisks represent significance levels as *p < 0.05, **p < 0.01, ***p < 0.0001. Marginal R²m represent the variance explained solely by the fixed factors and Rc² describes the proportion of the variance explained by the fixed factors and random factors of the GLMM. See Supplementary Table 1 and Supplementary Table 2 for details.

Forest age, previous land-use history and soil physical conditions were the most important drivers of forest attributes. The relative importance of each environmental and anthropogenic driver varied depending on forest attribute (Fig. 2, and Supplementary Table 1 and Supplementary 2), but forest age had always the strongest effect. As expected, all vegetation attributes strongly increased with forest age (average standardized effect size ± standard deviation; 0.57 ± 0.23; Fig. 2).

Previous land-use history negatively affected all forest attributes. The higher the deforestation frequency previous to forest regeneration, the lower the values of all diversity and structure attributes (−0.12 ± 0.01), showing the strongest effects on species richness followed by basal area and structural heterogeneity (Fig. 2). The longer the duration of previous land-use prior to forest regeneration, the lower the values of forest structure and function, showing a slower recovery of stem density, basal area and AGB (−0.09 ± 0.01) (Fig. 2). Landscape forest cover yielded no significant effect, probably because of its high correlation with land-use duration (r = −0.74, p < 0.001, n = 434, Supplementary Fig. 1E).

Soil conditions negatively affected all forest attributes, except for stem density, through soil physical conditions of soil texture and density. Clay content strongly affected forest diversity and slightly affected AGB, with higher clay content leading to lower species richness and diversity of native species (−0.10 ± 0.03) and lower AGB (−0.07 ± 0.01). Soil bulk density had stronger effects than clay content and negatively affected AGB (−0.13 ± 0.04) and all forest structure attributes (−0.19 ± 0.04), apart from stem density. It had no effect on diversity indicators. Surprisingly, the climatic factors evaluated did not significantly affect any forest attributes.

Based on models, effect sizes, we ranked attributes, suitability as indicators. The forest attributes most sensitive to anthropogenic impacts, and therefore better suited as indicators, were in descending order: basal area, species richness, structural heterogeneity, stem density, AGB and maximum DBH (Fig. 2). Forest attributes most sensitive to environmental conditions, and therefore less generalizable across regions and less suited as indicators, were, in descending order: maximum DBH, basal area, structural heterogeneity, AGB, species diversity and species richness.

The optimal successional trajectory and reference values for regeneration success

Based on the selected models presented above for each forest attribute, we predicted optimal successional trajectories across the non-flooded Brazilian Amazon forest over 40 years of regeneration (Fig. 3). To predict optimal successional trajectories, we applied the equations fitted by GLMM (Equation 1) for each forest attribute using the actual environmental conditions and fixing at low values the variables representing anthropogenic impacts (see details in Methods). These low values of anthropogenic impacts represent contexts of minimal constraints for forest succession in the study region (Details in Supplementary Methods). Fixed low values were: one previous deforestation event and 8 years of land-use duration previous to forest regeneration. Predicted values for the entire region at each year of stand development (1–40 years) were averaged out (Fig. 3).

Fig. 3: Modelled optimal successional trajectories in the Brazilian Amazon over 40 years of forest regeneration.
figure 3

A Maximum diameter, B basal area; C structural heterogeneity (SH), D species richness per 100 individuals (E), species diversity (Hill1 index), (F) aboveground biomass. The green colour represents, for each forest attribute, the range of values of the optimal trajectory. The optimal successional trajectories represent scenarios of low anthropogenic impact, and hence, minimum successional constraints. The optimal successional trajectories were constructed by applying equation 1 to all pixels across the Brazilian Amazon (only non-flooded and forest ecosystem areas) using the actual values of environmental factors at a 1 km resolution and fixed values of anthropogenic impacts: one single deforestation cycle and 8 years of land-use duration. The dashed lines represent the mean values across the Brazilian Amazon and the green ribbon its associated standard deviation.

We calculated the mean rates of recovery over the 40 years of forest succession. The first 40 years of optimal successional trajectories in the Amazon showed the following average rates of recovery (Fig. 3): structural attributes increased at a mean annual rate of 129.3 stems.ha.yr−1 for stem density (Supplementary Fig. 2A), 0.66 cm.yr−1 for maximum DBH (Fig. 3A), 0.72 m².ha.yr−1 for basal area (Fig. 3B), and 0.005.yr−1 for structural heterogeneity (Fig. 3C) (which has no unit and varies between 0 and 1). Species richness increased 1.01 species per year (Fig. 3D), and species diversity (Hill1) increased 1.09 species per year (Fig. 3E). AGB increased at an average rate of 4.53 Mg.ha−1 per year (Fig. 3F). Our results show that successional forests in the Brazilian Amazon could attain at 20 years of succession an average of 26.2 cm maximum DBH, 20.8 m²ha−1 of basal area, 0.27 of structural heterogeneity, 36 native species per 100 stems, species diversity (hill 1) of 27 native species, and 134.30 Mg.ha−1 of AGB (Fig. 4).

Fig. 4: Predicted values of forest attributes attainable by optimal successional trajectories at 20 years of succession across the Brazilian Amazon.
figure 4

A Maximum diameter, B basal area; C structural heterogeneity (SH), D species richness per 100 individuals, E species diversity (Hill1 index), F aboveground biomass. Values were estimated based on GLMM fitted (Fig. 2) from data of secondary-forest plots in the Brazilian Amazon (Fig. 1). Uncertainty maps with estimated error values are available in Supplementary Fig. 10.

To derive reference values representative of the Brazilian Amazon region, we estimated the lowest values of the variation around the mean (i.e., the mean value minus the standard deviation value) of optimal successional trajectories across the region (Fig. 3), for successional ages of 5, 10, 15, and 20 years. We estimated that after 20 years of regrowth, a regenerating forest with high ecological integrity should have at least 19.7 cm of maximum DBH, a basal area of 14 m².ha¹, a structural heterogeneity index of 0.27, 34 native tree species per 100 individuals, a species diversity (hill1) of 25 native species, and 123 Mg.ha⁻¹ of aboveground biomass (Table 1). These values serve as references for evaluating regeneration success. Secondary forests with attribute values below the reference values (Table 1) are developing below the ecological potential of the region. These reference values may also indicate the potential ecosystem services provision by forest regeneration, in terms of species conservation (species richness) and carbon sequestration (aboveground biomass).

Table 1 Reference values for multiple ecological indicators

Given the strong effects of soil bulk density on forest structure indicators and of clay content on biodiversity indicators, we also estimated average reference values for regions with similar soil physical conditions21,22,23,24(Supplementary Table 3).

Ecological indicators of regeneration success

We identified four key forest attributes that can serve as good indicators of regeneration success because they are highly sensitive to anthropogenic impacts (i.e. are significantly and strongly affected by anthropogenic factors) and have high potential for generalization across the region (i.e., are minimally affected by environmental factors) (Fig. 2, Supplementary Table 4). Ecological indicators for each ecosystem component were: basal area and structural heterogeneity (representing forest structure), species richness (for forest diversity), and aboveground biomass (for forest functioning). We recommend using a combination of at least one indicator from each ecosystem component to assess the ecological integrity of successional forests and evaluate regeneration success.

Discussion

Based on the largest data set on forest regeneration inventories in the Amazon (Supplementary Table. 5), we modelled optimal successional trajectories and provided reference values and ecological indicators to assess and monitor forest regeneration success over time. These are powerful tools for governments and practitioners to assess and monitor forest regeneration success and to estimate potential ecosystem services provided by regenerating forests in the Amazon.

Anthropogenic factors of previous land-use frequency and duration negatively affected all forest attributes, reducing the ecological integrity of regenerating forests. Corroborating other studies, we demonstrated that the recovery of tree biomass, basal area, and species richness slows down in areas repeatedly deforested (usually by cutting and burning) and in areas with long history of agricultural or pasture use25,26,27. Such negative effects are mediated by reduced propagules dispersal and local barriers to species colonization and establishment13. Continuous land use and repeated events of cutting and burning gradually depletes soil quality and soil seed and resprouting banks12,27, selecting a narrow set of species that can survive and grow13. Deforestation frequency also reduce soil water content and atmospheric humidity thereby intensifying water stress conditions and exacerbating tree mortality28,29,30. Additionally, these conditions increase the likelihood of biological invasion, potentially constraining the establishment and growth of old-growth-forest species that ensure species turnover over time31,32. The low forest cover in landscapes of old agricultural frontiers33 restrict seed dispersal, limiting forest regeneration capacity34,35. Considering the strong negative correlation between land-use duration and landscape forest cover (Supplementary Fig. 2E), it is likely that the effects of land-use duration on forest attributes are a joint result of direct local impacts and reduced forest cover in the landscape. Our findings corroborate previous studies showing that intensive land-use history result in strong reductions in the recovery rates and ecological integrity of regenerating forests, ultimately translating into reductions in regeneration success36,37,38,39.

The only environmental variables that significantly affected forest regeneration were soil physical properties, with vegetation regrowth being negatively affected by soil bulk density and diversity recovery being reduced with increasing soil clay content. Soils with high bulk density tend to have low water infiltration capacity40,41 and low water content42, which together may reduce root growth and respiration rates, negatively affecting the recovery of forest structure attributes and aboveground biomass. High density soils may also be more susceptible to structural degradation during land use43,44,45, as a small increase in bulk density may lead to loss of soil stability and organic matter, especially in fine-textured soils such as those in the Amazon46. The negative effects of soil clay content on native species richness and diversity are probably mediated by soil texture-water relations. Clayey soils under wet climates get easily water logged47 reducing oxygen availability for plant roots and potentially limiting tree colonization to species bearing mechanisms to avoid soil anoxia48,49. Our results apparently contradicts a previous study on old-growth forests in the region, that found higher species diversity of herbs and trees in clayey soils (called platô in Portuguese) compared to the low-fertility sandy soils of riverbanks (called baixios in portuguese)42,50. However, in our study we did not include samples from sandy riverbanks so we have a smaller gradient of soil texture. Additionally, it could also be that soil texture affects differently forest regrowth due to the susceptibility of clayey soils to degradation51 or differential adaptations of successional species compared to old-growth forest species. Together, these results suggest that high density and clayey soils might be more susceptible to degradation and therefore have a lower forest natural regeneration capacity.

Based on the relations between forest attributes and their drivers, we could model the optimal successional trajectories and estimate the potential values of diversity, structure and functioning that secondary forests could attain across the Brazilian Amazon (Fig. 4). This is the first estimate of potential recovery of biodiversity and vegetation structure for the entire region and through succession. Our estimated values corroborate previous estimates, for instance of potential biomass recovery (2), and are aligned with findings from 26 published studies on secondary forests (Supplementary Fig. 3, SupplementaryTable 6). The reference values serve as a metric for assessing whether successional forests are deviating from potential optimal values or not. These recommended indicators and reference values can be used to establish a standardized protocol for monitoring forest regeneration in any class of forest age across non-flooded areas of the Brazilian Amazon. Consequently, these reference values allow for assessing the comply with Brazilian environmental laws and for determining regeneration success. In addition, forest biomass indicators values provide a way to assess the potential that regenerating forests have for assisting Brazil to achieve its intended Nationally Determined Contribution (NDC). Finally, indicators such as species richness and structural heterogeneity can help the country monitor biodiversity recovery and define targets for its restoration initiatives under the UN Decade on Ecosystem Restoration.

We showed that the best indicators of ecological conditions of regenerating forests are basal area and structural heterogeneity for assessing forest structure, native tree species richness for biodiversity, and biomass for forest functioning. All these are good indicators because they are strongly negatively affected by anthropogenic factors, and little affected by the major environmental gradients, show clear trends of increase with forest age and are correlated with successional processes (Supplementary Fig. 4). Structural heterogeneity (SH index) summarizes the information on DBH distribution in the community, which is a result of the presence of trees at different ontogenetic stages9 and of species with different growth rates (which may explain the correlation with species richness – Supplementary Fig. 5). Basal area summarizes tree growth and is strongly associated with forest age52 and forest disturbance25,26. Species richness reflects the capacity of new species to arrive and establish in the community, being dependent on the soil seed banks left after land use13,17 and on the dispersal from surrounding forests1. Biomass is related to carbon sequestration, tree growth, tree mortality, transpiration rates and water-use efficiency53,54,55. The reduction in biomass recovery rates and stocks with anthropogenic impacts indicate reductions in the rates of ecosystem processes and functioning. To apply the reference values of biomass provided here (Table 1) it is important to use the same allometric equation56,57. The joint assessment of these four indicators allows characterizing the ecological condition of regenerating forests to determine restoration success of successional forests and management needs for boosting succession.

We recommend using at least one indicator of each ecosystem component (structure, diversity and function) to adequately assess regeneration success. By providing reference values for each indicator instead of a combined index, we allow for identifying which ecosystem component is recovering below the site´s potential and identifying specific management needs. For example, a regenerating forest with high values of vegetation structure (basal area and structural heterogeneity) and ecosystem functioning (biomass) but low values of species diversity (native species richness)5,11 may have good growth conditions but limiting arrival of new species, potentially due to the lack of surrounding seed sources11. To boost ecosystem recovery, therefore, management practices could focus on enrichment planting and other landscape restoration58. The adoption of standardized protocols using our reference values provides the dual benefits of facilitating monitoring while also enabling targeted interventions for ecosystem restoration. However, it is important to note that our findings are most accurate for regenerating forests in the central and eastern Amazon regions. Increased sampling in other historically unsampled areas is important to avoid biases and enhance the applicability potential59.

The approach we propose here for assessing forest regeneration success in the Amazon is timely given the crucial role of forest regeneration in the Amazon region for Brazil and the globe to remove CO2 from the atmosphere and to achieve the ecosystem restoration targets set in the Paris Agreement60,61 and the UN Decade of Ecosystem Restoration62. Considering the low costs of natural forest regeneration compared to tree planting, it is likely that most Amazonian landholders will choose the former to comply with Brazil´s legal restoration requirements63. The use of a standardized protocol with the indicators and reference values proposed here will facilitate monitoring restoration and law enforcement, allowing for efficient and effective large-scale restoration initiatives in the Amazon64. Furthermore, assessing the potential of regenerating forests is crucial for informing conservation policies in human-modified landscapes. Our approach helps to avoid ambiguity in the interpretation of restoration outcomes and uncertainties in the application of public policies on forest restoration and conservation in the Amazon.

Methods

First, we evaluated the effects of environmental and anthropogenic variables on the recovery of forest attributes of function, structure, and biodiversity of secondary forests that have naturally regrown after pasture or agricultural use. Second, we used the selected models to simulate and map out optimal successional trajectories across the Brazilian Amazon. Then we extracted average reference values for the entire region, for each ecological indicator and successional stage. Third, we evaluated the effectiveness of forest attributes to serve as indicators of ecological integrity and regeneration success.

Data collection

We used a dataset of secondary forests from 24 sites across the Brazilian Amazon that cover a wide range of latitude (−10.1013889 to −0.5869589) and longitude (−67.63084 −45.54657) within the Amazon region (Supplementary Table 5). All plots were established in non-flooded forests below 500 m of altitude. Annual rainfall varied twofold across sites (from 1500 to 3000 mm yr−1) whereas mean annual temperatures varied less than 2 °C (from 24.5 to 26.7 °C yr−1) (Supplementary Fig. 6). Soil cation exchange capacity (CEC) varied 33-fold across sites (from 6.06 to 19.38 cmol (kg−1). The old-growth forest cover in the landscape surrounding the plots (within a 3 km radius) ranged from 5 to 100% (Fig. 1). Prior to the land abandonment, the original forest in those areas had undergone clear-cutting and burning, followed by cultivation for agriculture or use as pasture. Secondary forest patches had experienced 1–10 clear-cut cycles and 1–29 years of previous land use duration prior to regrowth. Secondary forests varied in age since abandonment from 0.5 to 70 years (median of 15.5 years), with 88% of the plots having less than 30 years since abandonment. Secondary forest age was determined as the age of forest regrowth provided by landowners’ interviews in each plot and site compiled.

Across these 24 sites, we compiled a dataset of 448 secondary forest plots with different successional ages (Supplementary Table 6), containing a total of 150,751 tree stems, with on average 18 plots per site (range 5–38). Forest inventories were undertaken between the years of 2005 and 2017. Plot size varied from 0.025 to 1.5 ha, with an average of 0.25 ha and median of 0.1 ha, with 84% of the plots having ≤0.5 ha. In each plot, all trees, shrubs, and palms with stem diameter at 1.3 m from the soil (breast height) (DBH) ≥ 5 cm were measured for their diameter and identified to species level, except for one site (with 21 plots) for which only data for trees ≥10 cm DBH were available.

Forest attributes

We computed vegetation metrics to assess forest structure, diversity, and function. For forest structure, we considered for each plot: total basal area (m² ha−1), total stem density (indv. ha−1), maximum tree size (maxDBH), and structural heterogeneity (SH). SH was defined as the Gini coefficient of stem diameter. The Gini coefficient ranges from zero, when all stem diameters are the same, to 1, when stem diameters vary maximally in their size. The Gini coefficient was calculated as the sum of all absolute differences in stem diameter of all pairwise combinations of the N trees in the plot, divided by 2* N2*mean stem diameter of all trees Regarding diversity, we calculated native species richness rarefied to 100 stems and native species diversity as the effective number of species using Hill number 1 (q = 1). We included only native species, excluding all non-native species based on the REFLORA (2024) checklist, totalling 10 species and accounting for less than 0.05% of the individuals. For the function component, we calculated aboveground biomass (AGB) using the equation proposed by Chave et al.65, which offers an allometric equation based on a comprehensive dataset that includes secondary forest species. Further details on the vegetation metrics can be found in the Supplementary Methods.

Environmental conditions and anthropogenic impacts

To assess the drivers of forest regeneration we selected six climate variables, nine soil variables and six anthropogenic variables and successional age of the secondary forest patch. To avoid collinearity in the statistical models, we applied a Spearman correlation analysis to all pairs of variables66, selecting only variables that had correlation coefficient lower than 0.5, and for those pairs with correlation higher than 0.5 we kept the variable with higher ecological relevance (Supplementary Fig. 6). These explanatory variables were selected because they influence forest attributes in the Amazon61,67 and are available for the entire region. Below we describe the selected predictors used in the analyses. For more details access the Supplementary Methods.

All climate variables were obtained from Climatologies at 1000 m resolution from the CHELSA database: accumulated annual precipitation (in mm year−1), average annual temperature (in °C mm year−1), temperature seasonality (variation coefficient), seasonality of precipitation (variation coefficient), mean temperature of the driest quarter [°c]; and precipitation of warmest quarter [mm/quarter]. Only seasonality in water availability (CWD- climatological water deficit) was extracted from another source65. CWM indicates the months in which evapotranspiration is larger than rainfall as a proxy for the amount of water lost by the environment during the dry months.

For soil conditions, we used Soil Cation Concentration (SCC; log10 cmol(+) kg−1, 450 m resolution) from Zuquim et al.9 as indicator of soil nutrient conditions for plants in natural vegetation68. The other soil variables were extracted from the SoilGrids database69 at 250 m resolution: averaged over the first 30 cm of the soil: soil pH, bulk density (g cm³ - soil dry mass over soil volume), volumetric carbon concentration in the soil (C), proportion of sand (>0.05 mm), silt (=0.002 mm and =0.05 mm) and clay (<0.002 mm). To represent access to the water table and topography, we used respectively, the Height Above the Nearest Drainage (HAND) derived from Nobre et al.70 with 90 m resolution66 and the altitude derived from the Shuttle Radar Topography Mission data67 with 30 m resolution (SRTM (m)67. More details Supplementary Methods.

The anthropogenic impacts of each plot were described by: landscape forest cover, fire frequency, deforestation frequency, previous land-use duration, number of changes in land-use type. For some information we had field data and for other we derived from the MapBiomas products of annual land use and land cover maps71, as described below.

We used the MapBiomas land use and land cover maps for the years 1985–2019 (MapBiomass collection 7), which are based on the classification of Landsat images at 30 m spatial resolution. For each plot, we extracted descriptors of previous land-use history at the plot level (as the maximum value of each land use descriptor within 300 m around the plot centroid) and at the landscape level (based on the summary of information within a 3 km radius around the plot centroid), to represent processes that affect regrowth at local and at landscape scales. For both scales (plot and landscape) we evaluated the land use history of each secondary forest plot since 1985 until the first year (+−4 years) that the regeneration started. We chose the buffer size of 3000 m for the landscape level, because they had the least skewed data distribution and the strongest effect on the response variables (More details in Supplementary Methods and Supplementary Table S7).

Landscape forest cover was characterized as the proportion of forest cover (old-growth forests and secondary forests) in the landscape surrounding each plot within a 3000 m radius. Landscape forest cover was described as the average value of forest cover in the first four years of forest regeneration.

Frequency of fire prior to forest regeneration was extracted at the plot level as the sum of the number of years that the centroid pixel appeared as burned (maximum of 34 years), and at the landscape level as the number of times an area had been burned weighted by the extent of the burned area in each year.

Frequency of deforestation before regeneration, was accessed at the plot level through interviews with landowners (available for 70% of the plots) or remote sensing when field information was unavailable. For plots with no field interview information, deforestation frequency was calculated from the MapBiomas time series at the plot level as the total number of changes from forest to non-forest class before the first year of forest regeneration (300 m radius) and at the landscape level as the sum of the number of deforestation events weighted by the total deforested area in km².

Previous land-use duration was derived from MapBiomas products and was characterized at the plot level as the number of years that the pixel was classified as non-forest class, and at the landscape level it was calculated as the sum of years that the pixel was classified as non-forest class weighted by the area in km² covered by non-forest class.

Number of land use changes was extracted from the MapBiomas product at the plot level as the frequency that a land cover class changed to another along the time series before the first year when regeneration started, and at the landscape-level it was calculated as the sum of the frequency of change of the pixel weighted by total area in km² covered by altered category.

Statistical analysis

Drivers of forest regeneration

We fitted generalized linear mixed models (GLMM) to our data on the 480 secondary forests and compared multiple candidate models that predict the vegetation metrics described above. Fixed factors were the succession age, environmental variables and anthropogenic impacts described previously. Site and plot size were included as random effects. We assessed the significance of the random effects by comparing models with and without the random term. We then evaluated whether the inclusion of random factors significantly affected the model’s performance, using log-likelihood tests72. We sequentially examined the effects of the random term on the slope, the intercept and on both. To ensure control over spatial design73, we kept the site as a random effect on the intercept. In all models, except for stem density, plot size was deemed a significant effect and kept as a random effect in the models. The response variables were measures of forest structure: basal area (m² ha−1), total stem density (indv. ha−1), maximum tree size (maxDBH), and structural heterogeneity (SH); diversity: rarefied species richness and Hill number 1, and function: aboveground biomass (AGB).

Also, we fitted the models using the Amazon sub-regions proposed by Viola et al.3 and Steege et al.74 as fixed effect. However, the vegetation metrics were not significantly affected by these sub-regions proposed (Details in Supplementary Notes and Supplementary Table 8, Supplementary Figs. 7 and  8). Consequently, we decided to present average reference values for the entire Brazilian Amazon instead of grouping them by sub-regions.

We performed model selection using the dredge function from the MuMIn package in R to explore all possible combinations of predictor variables from the global model. We ranked each GLMM model returned by the dredge function according to their Akaike Information Criterion (AICc) and the Akaike weight. The models with lowest AICc values (AICc <2), i.e. top-ranked models, are the most plausible to explain a substantial proportion of the variance in the data. We selected the model that was top ranked most often after 10,000 bootstraps (πi)73 and using “step” function from lmerTest package75, we retained in the model only the significant (p < 0.05) effects. We assessed, for each selected model, the adjusted R² (conditional and marginal) as a coefficient that represents the proportion of the variance explained by the model76, using the function r.squaredGLMM from the MuMIn package77. All analyses were carried out using R 4.278, all GLMM were run using package lme479.

Modelling the optimal successional trajectory

Based on the best GLMM models selected in the previous step (Supplementary Table 9, Supplementary Fig. 9), we modelled the optimal successional trajectory in the Amazon for each selected ecological indicator across 40 years of succession (Fig. 3) and extracted reference values (Table 1). The optimal successional trajectory develops under conditions of low anthropogenic impact, representing the maximum ecological integrity that can be attained at each successional stage in a given region.

We modelled the optimal successional trajectory by using Equation 1 and setting the variables of anthropogenic impacts retained in each model at their lowest values while keeping all other fixed variables at their actual values and setting forest age to specific values over the 40 years of succession (Equation 1 Supplementary Table 10). Low values of anthropogenic impact were defined as the lower quartile of local deforestation frequency, which refers to a single deforestation cycle, and of local land-use duration, which refers to 8 years of land-use duration. Land-use duration had a negative correlation with landscape forest cover (R² = 0.72, p < 0.001, Supplementary Fig. 1), which means that areas that have had a shorter history of land use also have higher forest cover in the surrounding landscape, meaning that in such situations there is a higher regrowth potential because there has been lower local land use impacts and there is higher availability of seed sources and seed dispersal agents in the landscape. For robustness of predictive models and values validation consult Supplementary Methods (Supplementary Table 10, Supplementary Figs. 10 and 11).

$${EI} \sim {\beta }_{o}+\log \left({Age}\right)+{\beta }_{1}\times {{{{\rm{Climate}}}}}_{{{{\rm{n}}}}}+{\beta }_{2}\times {{{{\rm{Soil}}}}}_{{{{\rm{n}}}}}+{\beta }_{3}\, \\ \times \left({{{\rm{Anthropogenic\; Impacts}}}}\right)+{\varepsilon }_{n}$$
(1)

where Ecological Indicators (EI) are a function of age (from 1 to 40), climate and soil in the ___location (pixel “n”), and β₁, β₂, and β₃ are the standardized parameters from the top-ranked GLMM models. Here, age and drivers are used as fixed effects, and the random components of the model are the intercept for each site, plot size, and the overall residuals (ε).

First, we modelled the optimal successional trajectory for the environmental conditions of our study sites, fitting one curve per study site and an average curve for all study sites (Supplementary Fig. 9). In this first step, we aimed to assess and validate the values predicted for the optimal successional trajectory (Supplementary Fig. 2, Supplementary Table 11) by comparing them with other studies (Supplementary Fig. 3; Supplementary Table 6; details on Supplementary Methods). We then applied the same equation to estimate the forest attribute values for the optimal successional trajectory, in every pixel across the non-flooded Brazilian Amazon over 40 years of regeneration, to map it out and extract average reference values.

Second, to map out and extract average values representative of an optimal successional trajectory in the Amazon region, we applied the equation 1 to all pixels within the boundaries of the Brazilian Amazon, excluding water bodies, urban areas and non-forest ecosystems (wetlands, savannas and bare soil for example). We built one predictive map for each ecological indicator (except for stem density because it was not influenced by environmental variables) at each successional age from one to 40 years. We used this age range because it is where most of the data used to build the model is contained and therefore predicted values are more accurate. This resulted in the creation of 240 output maps, corresponding to one map per ecological indicator and forest age, at 250 m resolution. Here we present only the six maps of age 20 years (Fig. 4). Then we calculated the mean and standard deviation of all pixels within the study area and built the average optimal successional trajectory for each forest attribute shown in Fig. 3. To calculate the annual recovery rate of forest attributes, we used the average differences between consecutive values of forest attributes and the corresponding successional ages. The recovery rate was calculated as the average of these values over 40 years.

To estimate the error associated with mapping the optimal successional trajectory, we employed a linear regression analysis between the observed and predicted values generated by the model. We quantified the relative errors by subtracting the estimated values from the observed values and then dividing by the observed values (More details in the Supplementary Methods and Supplementary Fig. 12).

Finally, we derived reference values representing the minimum a secondary forest must attain at a certain age to be considered within an optimal successional trajectory. We estimated the reference values by extracting the lowest values, from each ecological attribute, from the range of optimal successional trajectories modelled for the Amazon (Fig. 3), i.e. we subtracted the standard deviation value from the mean value. We did that for ages of 5, 10, 15 and 20 years (Table 1) because most secondary forests in the Brazilian Amazon falls within that range80 reference values for other ages can be derived from Fig. 3.

Importance of ecological indicators

To assess the ability of forest attributes to be used as ecological indicators we access information based on our fitted GLMM. We evaluated our ecological indicators based on: (i) sensitive to anthropogenic impact (A): We consider that good indicators are more sensitive in reflecting changes in community and/or ecosystem attributes resulting from anthropogenic impacts. For example, these indicators encompass forest attributes that exhibit a stronger effect size in anthropogenic variables. (ii) Robustness against other confounding factors (E): Good indicators should be concise and generalizable across regions. Indicators that heavily rely on specific edaphic and climatic drivers cannot be widely applied. Therefore, indicators with the lowest climate effect size were considered effective, as they can be generalized across the Amazon biome., (iii) Independence of other site conditions (S): We used the variation explained exclusively by the random effects as a proxy for the site-dependency of the response variable (indicator). Ecological indicators with lower dependency on random factors (site) were deemed as good indicators. The criteria were based on the model’s results and are available in Supplementary Methods (Supplementary Table 5).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.