A global urban tree leaf area index dataset for urban climate modeling

Dong, Wenzong; Yuan, Hua; Lin, Wanyi; Liu, Zhuo; Xiang, Jiayi; Wei, Zhongwang; Li, Lu; Li, Qingliang; Dai, Yongjiu

doi:10.1038/s41597-025-04729-y

Download PDF

Data Descriptor
Open access
Published: 12 March 2025

A global urban tree leaf area index dataset for urban climate modeling

Wenzong Dong¹,
Hua Yuan¹,
Wanyi Lin¹,
Zhuo Liu¹,
Jiayi Xiang¹,
Zhongwang Wei¹,
Lu Li¹,
Qingliang Li² &
…
Yongjiu Dai¹

Scientific Data volume 12, Article number: 426 (2025) Cite this article

3289 Accesses
1 Citations
Metrics details

Subjects

Abstract

Urban trees are recognized for mitigating urban thermal stress, therefore incorporating their effects is crucial for urban climate research. However, due to the limitation of remote sensing, the LAI in urban areas is generally masked (e.g., MODIS), which in turn limits its application in Urban Canopy Models (UCMs). To address this gap, we developed a high-resolution (500 m) and long-time-series (2000–2022) urban tree LAI dataset derived through the Random Forest model trained with MODIS LAI data, with the help of meteorological variables and tree height datasets. The results show that our dataset has high accuracy when validated against site reference maps, with R of 0.85 and RMSE of 1.03 m²/m². Compared to reprocessed MODIS LAI, our modeled LAI exhibits an RMSE ranging from 0.36 to 0.64 m²/m² and an R ranging from 0.89 to 0.97 globally. This dataset provides a reasonable representation of urban tree LAI in terms of magnitude and seasonal changes, thereby potentially enhancing its applications in UCMs and urban climate studies.

Study on the influence of urban tree canopy on thermal environment in Luoping County

Article Open access 16 August 2023

Climate change increases global risk to urban forests

Article 19 September 2022

Street trees provide an opportunity to mitigate urban heat and reduce risk of high heat exposure

Article Open access 13 February 2024

Background & Summary

Urban trees have been recognized for their ability to mitigate the urban thermal stress. Numerous field experiments and studies employing remote sensing techniques have evaluated and demonstrated that^1,2,3,4. Their roles encompass effectively cooling urban surface by shading and evapotranspiration^5,6 and influencing canyon wind and turbulent transport^7,8. Additionally, it impacts anthropogenic energy use^9,10 and can delay stormwater peak flow by intercepting precipitation^11,12,13. Given the importance of urban vegetation, Urban Canopy Models (UCMs), as essential tools for urban climate research, have made substantial progress in modeling urban vegetation, particularly urban trees which have a more pronounced impact on local climate than shorter vegetation. Several models have successfully integrated trees into UCMs to explore their influence on mesoscale land–atmosphere exchange processes^14,15,16, and in UCMs, the leaf area index (LAI) of the tree canopy is a critical structural attribute.

In recent years, advancements in remote sensing technology have significantly enhanced the monitoring of LAI in natural vegetation¹⁷. Since the 1980s, numerous satellite-based LAI datasets have been produced, including Global Inventory Modeling and Mapping Studies (GIMMS) LAI^18,19, reprocessed Moderate Resolution Imaging Spectroradiometer (MODIS) LAI^20,21, Global LAnd Surface Satellite (GLASS)^22,23, GlOBMAP²⁴, GEO^25,26. These datasets, derived from AVHRR (Advanced Very High Resolution Radiometer) or MODIS, provide a wealth of valuable information and have been extensively used in General Circulation Models (GCMs) or Land Surface Models (LSMs).

However, the limitations of passive optical detectors currently pose challenges in accurately estimating LAI in urban pixels, including building occlusion and the mixed pixels of artificial surfaces and vegetation, which affecting the reception and inversion of relevant band remote sensing signals. As a result, LAI values in urban areas are often masked or inadequately captured, such as MODIS LAI. Furthermore, the coarse resolution of AVHRR data (about 10 km) makes it unsuitable for capturing the intricate characteristics of urban vegetation dynamics and greening trend, due to urban constitute only a small fraction of the land area. GLASS v6 employs a machine learning method to generate global LAI data, including urban areas²². However, it’s important to note that this LAI represents “grid LAI”, i.e. the green leaf area per unit ground area (Fig. 1a), which can’t be directly applied in UCMs. According to its definition and the characteristics of remote sensing observations, this LAI represents the average distribution of leaf area evenly distributed across all leaves within the whole grid. Alternatively, UCMs generally require the “tree LAI”²⁷, i.e. the green leaf area per unit tree-covered area (Fig. 1b), as UCMs need this to calculate the shading effects of trees on buildings or the ground. Consequently, due to issues related to the definition of LAI, the existing satellite-based datasets may not fully meet the requirements of UCMs for modeling urban trees.

While lidar technology is effective in obtaining accurate tree LAI data in urban areas^28,29, it has obvious limitations in acquiring long-term and large-scale datasets. Currently, the ability to model the effects of urban trees on a global or regional scale is still constrained by the insufficient availability of urban tree LAI data, and urban tree LAI is commonly derived using preset values, knowledge-based assumptions, or empirical models^{30,31,32,33,34} for UCMs. There is a growing demand for more precise and detailed data that can effectively represent the phenological characteristics of urban vegetation to realistically resolve the influence of urban trees on urban radiation exchange and transpiration in UCMs. Therefore, to bridge the existing gap in urban tree LAI data and enhance the support for urban modelling, it is valuable to develop a specific LAI dataset tailored for urban tree modelling.

In this context, we aim to generate high-resolution urban tree LAI data to facilitate tree modeling in UCMs at various scales. We developed a machine learning models that utilize monthly minimum air temperature, vapor pressure, monthly incident shortwave radiation, monthly maximum air temperature, precipitation³⁵, tree height³⁶ and vapor pressure deficit as input features, with reprocessed MODIS LAI data²⁰ as target for model training and validation. First, we employ the Random Forest (RF) model alongside the aforementioned datasets to model LAI in natural trees. Subsequently, we validate the modelled LAI against 14 ground LAI reference maps and reprocessed MODIS LAI data. The modeled LAI, at both site and global scales, demonstrated high accuracy. Compared to the reference maps, R and RMSE of modelled LAI are 0.85 and 1.03 m²/m², respectively. When using reprocessed MODIS LAI as the benchmark, RMSE ranged from 0.36–0.64 m²/m², and R ranged from 0.89–0.97 at the global scale. Based on the above results, we conclude that the model can accurately simulate LAI. Finally, the trained RF model is used to predict urban tree LAI, showing a reasonable representation of urban tree LAI in terms of both the magnitude and seasonal variations. This urban tree LAI dataset would enable more accurate and detailed representation of tree characteristics within urban areas, contributing to improved precision and reliability in urban tree modelling.

Methods

In this study, we use an Automated Machine Learning library called FLAML³⁷ as our machine learning model. FLAML is a lightweight Python library designed for efficient automation of machine learning and AI operations. It automates workflow based on machine learning models and optimizes their performance. We selected two machine learning model, RF³⁸ and Light Gradient Boosting Machine (LightGBM)³⁹ and chose the best one as the final model. Both RF and LightGBM are machine learning methods grounded in the ensemble learning concept and integrate multiple independent weak learners to enhance overall fitting capabilities. RF employs the bagging strategy, randomly selecting training samples and features during the construction of decision trees. Iteratively, it identifies the optimal split feature and split point from the selected dataset for each regression decision tree. The final prediction is based on the average of predictions from all regression decision trees. Zhang et al.⁴⁰, successfully employed the RF model to predict LAI for four main plant functional types in Northern America. LightGBM is based on boosting algorithm and emphasizes efficiency and speed through innovative algorithms and optimizations. In model building, the parameters optimized for RF include n_estimators, max_features, and max_leaves, while the parameters optimized for LightGBM include n_estimators, num_leaves, min_child_samples, learning_rate, log_max_bin, colsample_bytree, reg_alpha, and reg_lambda. We did not specify parameter ranges, but instead set a maximum training time of 1800 seconds for the model to find the best hyperparameters.

Table 1 shows the explanatory variables used for the FLAML. Most variables were sourced from WorldClim v2.1, a dataset that provides spatially interpolated monthly climate data for global land areas at a high spatial resolution of 1 km. This includes monthly temperature (minimum, maximum, and average), precipitation, solar radiation, vapor pressure, and wind speed³⁵. Furthermore, under identical environmental conditions, urban trees may exhibit different growth patterns compared to those in natural environments (e.g., urban trees are often shorter). To capture these differences and considering the documented relationship between LAI and tree height in previous studies^17,29,41, tree height was included as an explanatory variable. For 1 km spatial resolutions, WorldClim v2.1 only provides historical climatology data, the model was trained for each month across different years using 1 km resolution climatology data, the primary difference between each model stems from varying target LAI which was used to represent annual variation. Although CRU-TS-4.06 provides downscaled data from 1969 to 2021, its resolution is limited to 2.5-minute, which is comparatively coarse. Consequently, the model constructed by CRU-TS-4.06 is inferior to that constructed with WorldClim v2.1 climatology data, additionally, since CRU-TS-4.06 only provides data on precipitation, maximum and minimum temperatures, and lacks data on radiation or vapor pressure, we still use WorldClim as the training data to maintain consistency. Moreover, the results indicate that WorldClim can reflect the temporal variation of LAI. Since WorldClim v2.1 does not provide vapor pressure deficit (VPD), we calculated it following Zhang’s method⁴⁰, the equation is as follow:

$${VPD}=\frac{1}{2}\cdot 0.6108\cdot ({e}^{\frac{17.269\cdot {T}_{\max }}{237.3+{T}_{\max }}}+{e}^{\frac{17.269\cdot {T}_{\min }}{237.3+{T}_{\min }}})-{VP}$$

(1)

where T_max and T_min are monthly maximum and minimum air temperature (°C), respectively, VP is vapor pressure (Kpa). The tree height data were obtained from the global canopy height map developed by Lang et al.^36,42. These data were derived from the Global Ecosystem Dynamics Investigation space-borne LiDAR mission, complemented by dense optical satellite images from Sentinel-2, achieving a resolution of 10 m. Additionally, MODIS Land Cover Type (MCD12Q1 V6.1)⁴³ was employed to account for land cover changes. It is worth noting that the coverage of MCD12Q1 V6.1 extends from 2001 to 2022. Therefore, for the year 2000, we utilized the data from 2001, all details are shown in Table 1.

Table 1 Details of the explanatory variables for training the machine learning model.

Full size table

Given the scarcity of urban in-situ observational data for LAI, we chose satellite-based LAI as the target variable. In this study, the reprocessed MODIS LAI V6.1²⁰ was utilized as the target variable for the RF and LightGBM model. The reprocessed MODIS LAI V6.1 covers the period from 2000 to 2022, with a temporal resolution of 8-day and a spatial resolution of 500 m. It was derived from the MCD15A2H V6.1⁴⁴ using the modified temporal spatial filter method to fill the gap and processed the low-quality data. Compared to the original MODIS LAI, the reprocessed data exhibits better continuity in both the temporal and spatial domains. It also has been validated with in-situ LAI observations and demonstrates a good performance with various LAI reference maps.

Aiming to derive urban tree LAI, we carefully selected input feature and target data. First, we partitioned the global data into 5° × 5° region, excluding regions without land area, the global data was divided into 1447 regions. Subsequently, to better represent LAI in urban areas, we restricted our training data to natural trees surrounding urban regions, thereby excluding 5° × 5° regions without urban areas. This refinement ensured that only regions containing urban areas were retained, ultimately resulting in approximately 711 regions being selected as training areas. Finally, we simulated the LAI of natural tree in the 711 regions. Meteorological variables and tree height were employed as explanatory variables, with reprocessed MODIS LAI as the target. All samples were randomly divided into training (10%) and test (90%) groups. It is important to note that the samples were selected based on IGBP classifications 1–5, which represent evergreen needleleaf tree (NET), evergreen broadleaf tree (BET), deciduous needleleaf tree (NDT), deciduous broadleaf tree (BDT) and mixed forest (MF). Since our objective is to obtain the tree LAI in urban areas, the model was exclusively trained using tree LAI data. In addition, due to the training error of the model, and it was trained at a monthly scale, a simple moving average (SMA) method was used to buffer the fluctuations and prevent discontinuity of LAI values between adjacent months. The SMA equation is as follow:

$${LA}{I}_{n}^{{\prime} }={\rm{SMA}}\left({LA}{I}_{n},N\right)$$

(2)

Where ${LA}{I}_{n}$ is LAI of the month $n$ predicted by RF model, $N$ is the moving windows and we set it as 3, therefore, the final LAI of month $n$ (${LA}{I}_{n}^{{\prime} }$) is the average of ${LA}{I}_{n-1}$, ${LA}{I}_{n}$ and ${LA}{I}_{n+1}$. The model’s performance was evaluated at site (Table 2) and global scale using five error metrics: R², R, RMSE, MBE and MAE. These evaluations aimed to determine whether the model could accurately extrapolate urban tree LAI. If the model demonstrated adequate performance, it was then utilized to predict urban tree LAI. LAI reference maps were collected from GBOV⁴⁵, VALERI⁴⁶, and Boston University^47,48, all with IGBP land cover types 1–5, because we only train and predict LAI on the above types of grids. It should be noted that all sites provide 8-day LAI and may include missing values within a month. Since the predicted LAI is presented on a monthly basis, we utilized all available data to calculate the monthly LAI of the reference map for comparison with the predicted LAI. After the urban tree LAI prediction is completed, it is used to calculate urban tree Stem Area Index (SAI) using Zeng’s method⁴⁹, the iterative equation for each month of one year is as follows:

$${SA}{I}^{n+1}=\max \left({SA}{I}^{n}\cdot {r}_{{tn}}+0.5{LA}{I}_{{diff}},{SA}{I}^{n}\right)$$

(3)

Where ${r}_{{\rm{tn}}}$ is the residual SAI retention (0.5), ${{LAI}}_{{\rm{diff}}}$ represents the LAI difference of two adjacent months, the superscript $n$ denotes iteration step of SAI, and the initial value (${SA}{I}^{0}$) is:

$${SA}{I}^{0}={SA}{I}^{{\prime} }\frac{{LA}{I}_{\max }}{{LA}{I}_{\max }^{{\prime} }}$$

(4)

Where ${LA}{I}_{\max }$ is the maximum LAI value in a year, ${SA}{I}^{{\prime} }$ and ${LA}{I}_{\max }^{{\prime} }$ are both constants, with values 1 and 5.5 m²/m² respectively. The stopping criterion for the iterative calculation is met when the sum of the differences in SAI values between two iterations across 12 months is less than 1.0E-6. The methodology, including its key steps, is shown in Fig. 2.

Table 2 Characteristics of the 14 validation sites (Land Cover Type is derived from MCD12Q1).

Full size table

Data Records

The dataset is available at Zenodo (https://doi.org/10.5281/zenodo.14709655)⁵⁰. The global LAI data are stored in NetCDF files. Each file contains the monthly LAI (also SAI) for all months in a year, with the data dimension defined as “mon × lat × lon”. Different resolutions of data are provided in separate directories, named global_XX, in which all the filenames follow the convention of “Global_UrbanTree_LAI_XX_YYYY.nc”, where XX represent the resolution of the data and YYYY indicates the year.

Technical Validation

First, we evaluated the performance of the two machine learning models to determine the most accurate algorithm for predicting LAI. Figure 3a demonstrates the performance of LightGBM and RF on training and test dataset of each month. The result shows that LightGBM exhibits a higher ${R}^{2}$ in training data than RF, however there are significant discrepancies between the test and training sets in certain months, and the ${R}^{2}$ in training data is close to RF. In contrast, for RF, the values for the test and training sets remain relatively consistent throughout. For RMSE, the results are similar to R. We also compare the annual predicted grid tree LAI in global of 2020. Generally, both models have a similar R (0.97) and MBE (around 0.1 ${m}^{2}/{m}^{2}$), but LightGBM has lower RMSE (0.45 ${m}^{2}/{m}^{2}$) than RF (0.51 ${m}^{2}/{m}^{2}$). However, LightGBM may has an overfitting problem, except that there are obvious differences between the training set and the test set as shown in Fig. 3b, LightGBM occasionally predicts negative LAI values or values outside the training set range (0–7 ${m}^{2}/{m}^{2}$) for some 500 m grids. Therefore, the RF model was deemed more stable and was consequently applied for LAI prediction.

After that, we evaluate the RF predicted LAI with MODIS V6.1 and all site observation LAI (Table 2). Figure 4a shows the validation result for the RF predicted and MODIS V6.1 LAI, indicating that RF predicted LAI exhibits a high accuracy, with an R of 0.95, an RMSE of 0.64 ${m}^{2}/{m}^{2}$, and a MAE of 0.49 ${m}^{2}/{m}^{2}$. Moreover, when assessed against field LAI measurements, the LAI produced by RF models also demonstrates high accuracy. The validation of the RF LAI revealed that the R reached 0.85, and the RMSE was approximately 1.03 ${m}^{2}/{m}^{2}$. This result closely mirrors the validation results between MODIS V6.1 LAI and the reference map LAI (Fig. 4c), suggesting that the errors between the model and the site observations mainly come from the training data.

Figure 5 shows the validation for the temporal variations of the RF predicted LAI, MODIS V6.1 LAI and site observation LAI during 2000–2022. The time-series result shows that the RF predicted LAI is consistent with MODIS V6.1 LAI and accurately captures both the seasonal cycles and annual changes. For the BDT cover, RF predicted LAI are quite similar with site observation LAI and the RF predicted LAI are smoother, for example, at SCBI (Fig. 5b) and HARV (Fig. 5e), RF predicted LAI usually fluctuates less or not at all. For NET, BET and MF cover, although the reprocessed MODIS V6.1 LAI mitigated LAI mutations through the reprocessing algorithm, LAI may still fluctuate greatly between adjacent months. While the RF LAI demonstrates a smoother pattern, especially for BET. The LAI reference map coincides well with both the RF predicted and MODIS V6.1 LAI values at the BDT cover site, but at MF cover, the RF predicted LAI and MODIS V6.1 LAI are generally larger than observed values at summer. Moreover, Fig. 5l illustrates that it is indeed feasible to employ climate explanatory variables and different target LAI to train the model, thereby enabling it to capture interannual changes in LAI. Between 2019 and 2020, MODIS LAI at TUMB experienced a sudden decrease, and RF LAI also exhibited a similar change.

The scatter plot of 2020 monthly grid tree LAI of RF predicted and MODIS V6.1 is shown in Fig. 6a, RF predicted LAI also shows a great performance in global scale. The result indicates that there is a good agreement between RF predicted LAI and MODIS LAI, with R of 0.97, RMSE of 0.51 ${m}^{2}/{m}^{2}$, and MAE of 0.11 ${m}^{2}/{m}^{2}$. We also conducted an evaluation of the predicted LAI for each land cover type, as shown in Fig. 6b. The LAI of each land cover type was slightly overestimated at low values. However, the distribution of predicted LAI is consistent with that of MODIS. It should be noted that we trained a single model to predict LAI for all land cover types, which could potentially result in biased predictions for specific types, such as NDT. However, our focus is on the urban tree LAI, from the results, the difference between the grid tree LAI predicted by RF and that from MODIS is within an acceptable range.

The heatmap of R and RMSE between monthly RF predicted and MODIS grid LAI from 2000 to 2022 are shown in Fig. 7. The result shows that the R of these two LAI range from 0.89–0.97 and RMSE range from 0.36–0.64 ${m}^{2}/{m}^{2}$. Although the model was trained using all the selected regions, the significant regional variations in the Northern Hemisphere during summer may have hindered the model from fully capturing all relevant features. Consequently, this season shows the highest RMSE and the lowest R values. However, the result demonstrates that the RF model can capture spatial and temporal variations in LAI well, it also proved that it is feasible to use target LAI values to represent annual variation even with limited training data. Based on the above results, we concluded that the RF model can reproduce the spatiotemporal variation in LAI data accurately in a masked urban grid, thereby fulfilling the requirements for model simulation.

Due to the lack of urban tree LAI observations, we assess the rationality of RF LAI data at urban sites sourced from Urban-Plumber2⁵¹. As shown in Table 3, our dataset demonstrates a reasonable magnitude and seasonal variation at these urban sites. In mid-to-high latitude urban sites dominated by deciduous broad-leaved and evergreen needle-leaved trees, the LAI generally peaks in summer, up to 4.9 ${m}^{2}/{m}^{2}$ at US-Minneapolis. For evergreen broad-leaved trees, e.g., AU-Preston, no significant seasonal changes were found. However, GLASS LAI generally has a lower value and no significant seasonal changes at these sites, mainly due to the high proportion of impervious and the low vegetation cover affecting the reception and inversion of relevant band remote sensing signals. For example, at US-Minneapolis, with 0.38 tree and 0.21 impervious cover exhibit a maximum LAI of approximately 3 ${m}^{2}/{m}^{2}$. In contrast, at other locations, the maximum LAI does not exceed 1 ${m}^{2}/{m}^{2}$, which is inconsistent with actual photography. In addition, while urban tree LAI exhibits similar growth patterns to natural tree LAI, differences in magnitude exist, with the LAI of natural trees typically being slightly higher than that of urban trees.

Table 3 Temporal comparison of the monthly RF, GLASS LAI of 9 urban sites during 2020, accompanied by corresponding images of each site, RF and GLASS LAI are the LAI of the pixel (500 m grid) where the flux tower is located and natural LAI is the weighted average of all-natural tree LAI within the 0.5° grid where the flux tower is located.

Full size table

Figure 8 shows the LAI of predicted tree LAI and GLASS grid LAI and their LAI value distribution, respectively. As shown in Fig. 8a,b, RF predicted LAI generally exhibits a higher LAI value in urban areas due to it is specifically calculated for tree-covered areas. In contrast, satellite remote sensing in mixed pixels like urban areas may be influenced by building occlusion and get a lower value. As demonstrates by Fig. 8c, GLASS LAI typically ranges from 1–2 m²/m² on a global scale, but RF predicted LAI shows a distinct seasonal variation, with LAI values typically ranging from 3 to 5 m²/m² in summer and 1 to 3 m²/m² in winter. And calculating LAI in vegetation-covered areas using vegetation coverage (tree and grass coverage) proves challenging (Fig. 8d), i.e. using LAI divided by the vegetation cover percentage. There was no significant change when the percentage of grass and trees was used, and an unrealistic LAI (>10 m²/m²) was obtained when only the trees cover is used.

Usage Notes

A correct lower boundary condition is crucial for weather climate model. As LSMs have enhanced their spatial resolution to 9 km globally (e.g., ERA5-Land) or 1-2 km for specific regions (e.g., WRF), employing an appropriate urban parameterization scheme to differentiate between urban and rural areas and better representation of the urban environment is necessary. However, these require accurate global ancillary data, including building morphology and vegetation data in urban areas. Considerable effort has applied to produce global-scale building morphology data^52,53,54. As demonstrate by Zhao et al.⁵⁵, substantial urban afforestation has occurred across numerous cities, playing an important role in mitigating urban heat. In addition, previous studies have proved that integrating urban vegetation in UCMs is important for improving model performances^56,57,58,59. Therefore, it is necessary to represent the influence of trees in the UCMs.

In this study, we used machine learning and remote sensing LAI data to generate an urban tree LAI dataset at 500 m from 2000 to 2022 for UCMs. Over the last decade, model community has made a great effort to improve and develop urban tree radiative model^{7,14,16,34,60}. However, there remains a scarcity of LAI data suitable for meso and large-scale simulations in UCMs, some studies can only use preset values^27,32. The 500 m urban tree LAI dataset generated in this study mainly focus on providing an essential dataset for urban land surface model, enabling them to simulate the influence of trees in urban environments across various temporal and spatial scales, particularly the shading effect. Additionally, this dataset can also be used to evaluate the urban greening, and its impact on urban climate⁵⁵.

This dataset is highly recommended for users who wish to model the tree in UCMs, especially at a region-global with long time scale. When using this dataset, users need to make it clear that the definition of LAI for the data is the LAI only in tree-covered areas within urban. In other words, the urban tree LAI value represents green leaf area per unit tree-covered surface area within a 500 m grid, excluding buildings and other impervious surfaces. However, users can also calculate the 500 m grid LAI using tree cover percentage data, depending on their specific requirements.

In addition, this dataset, generated using machine learning methods, should be aware of the limitations of machine learning, it can’t always accurately model the variation in LAI, particularly for extreme values or so-called “tipping points“⁶¹. Secondly, we use natural tree LAI surrounding urban to extrapolate the urban LAI, while we incorporated tree height to account for differences between urban and natural trees and restricted the training area to resemble urban environments as closely as possible, data limitations, such as the availability of tree height data for only a single year, may introduce uncertainty into our model. Finally, as a reprocessing dataset, MODIS LAI, which serves as a target variable, also introduces additional uncertainty into the machine learning process. However, this dataset is produced for urban climate modeling at a medium-large scale, based on the accuracy assessment, it can be considered as input data for urban climate modeling.

All monthly 500 m, 0.05° and 0.5° resolution LAI data are stored in NetCDF files and can be accessed at https://doi.org/10.5281/zenodo.14709655⁵⁰. Additionally, users can request coarse-resolution data at their desired spatial resolution and temporal coverage directly from the corresponding author. It is worth noting that the dataset will be updated as new or improved input data become available.

Code availability

The codes used in this study are available at https://github.com/tungwz/RF_LAI.

References

Gillner, S., Vogt, J., Tharang, A., Dettmann, S. & Roloff, A. Role of street trees in mitigating effects of heat and drought at highly sealed urban sites. Landsc. Urban Plan. 143, 33–42, https://doi.org/10.1016/j.landurbplan.2015.06.005 (2015).
Article Google Scholar
Hamada, S. & Ohta, T. Seasonal variations in the cooling effect of urban green areas on surrounding urban areas. Urban For. Urban Green. 9, 15–24, https://doi.org/10.1016/j.ufug.2009.10.002 (2010).
Article MATH Google Scholar
Wang, S., Hu, D., Chen, S. & Yu, C. A Partition Modeling for Anthropogenic Heat Flux Mapping in China. Remote Sens. 11, 1132, https://doi.org/10.3390/rs11091132 (2019).
Article ADS MATH Google Scholar
Zhou, W., Wang, J. & Cadenasso, M. L. Effects of the spatial configuration of trees on urban heat mitigation: A comparative study. Remote Sens. Environ. 195, 1–12, https://doi.org/10.1016/j.rse.2017.03.043 (2017).
Article ADS Google Scholar
Bowler, D. E., Buyung-Ali, L., Knight, T. M. & Pullin, A. S. Urban greening to cool towns and cities: A systematic review of the empirical evidence. Landsc. Urban Plan. 97, 147–155, https://doi.org/10.1016/j.landurbplan.2010.05.006 (2010).
Article Google Scholar
Wong, N. H., Tan, C. L., Kolokotsa, D. D. & Takebayashi, H. Greenery as a mitigation and adaptation strategy to urban heat. Nat. Rev. Earth Environ. 2, 166–181, https://doi.org/10.1038/s43017-020-00129-5 (2021).
Article ADS Google Scholar
Redon, E., Lemonsu, A. & Masson, V. An urban trees parameterization for modeling microclimatic variables and thermal comfort conditions at street level with the Town Energy Balance model (TEB-SURFEX v8.0). Geosci. Model Dev. 13, 385–399, https://doi.org/10.5194/gmd-13-385-2020 (2020).
Article ADS Google Scholar
Giometto, M. G. et al. Effects of trees on mean wind, turbulence and momentum exchange within and above a real urban environment. Adv. Water Resour. 106, 154–168, https://doi.org/10.1016/j.advwatres.2017.06.018 (2017).
Article ADS MATH Google Scholar
Parker, J. H. Landscaping to Reduce the Energy Used in Cooling Buildings. J. For. 81, 82–105, https://doi.org/10.1093/jof/81.2.82 (1983).
Article MATH Google Scholar
Tsoka, S., Leduc, T. & Rodler, A. Assessing the effects of urban street trees on building cooling energy needs: The role of foliage density and planting pattern. Sustain. Cities Soc. 65, 102633, https://doi.org/10.1016/j.scs.2020.102633 (2021).
Article Google Scholar
Rahman, M. A. et al. A comparative analysis of urban forests for storm-water management. Sci. Rep. 13, 1451, https://doi.org/10.1038/s41598-023-28629-6 (2023).
Article ADS PubMed PubMed Central MATH CAS Google Scholar
Xiao, Q. & McPherson, E. G. Rainfall interception by Santa Monica’s municipal urban forest. Urban Ecosyst. 6, 291–302, https://doi.org/10.1023/B:UECO.0000004828.05143.67 (2002).
Article Google Scholar
Xiao, Q. & McPherson, E. G. Rainfall interception of three trees in Oakland, California. Urban Ecosyst. 14, 755–769, https://doi.org/10.1007/s11252-011-0192-5 (2011).
Article MATH Google Scholar
Krayenhoff, E. S. et al. A multi-layer urban canopy meteorological model with trees (BEP-Tree): Street tree impacts on pedestrian-level climate. Urban Clim. 32, 100590, https://doi.org/10.1016/j.uclim.2020.100590 (2020).
Article Google Scholar
Lee, S.-H. & Park, S.-U. A Vegetated Urban Canopy Model for Meteorological and Environmental Modelling. Bound.-Layer Meteorol. 126, 73–102, https://doi.org/10.1007/s10546-007-9221-6 (2007).
Article ADS MATH CAS Google Scholar
Wang, C., Wang, Z.-H. & Ryu, Y.-H. A single-layer urban canopy model with transmissive radiation exchange between trees and street canyons. Build. Environ. 191, 107593, https://doi.org/10.1016/j.buildenv.2021.107593 (2021).
Article MATH Google Scholar
Fang, H., Baret, F., Plummer, S. & Schaepman-Strub, G. An Overview of Global Leaf Area Index (LAI): Methods, Products, Validation, and Applications. Rev. Geophys. 57, 739–799, https://doi.org/10.1029/2018RG000608 (2019).
Article ADS Google Scholar
Zhu, Z. et al. Global Data Sets of Vegetation Leaf Area Index (LAI)3g and Fraction of Photosynthetically Active Radiation (FPAR)3g Derived from Global Inventory Modeling and Mapping Studies (GIMMS) Normalized Difference Vegetation Index (NDVI3g) for the Period 1981 to 2011. Remote Sens. 5, 927–948, https://doi.org/10.3390/rs5020927 (2013).
Article ADS MATH Google Scholar
Cao, S. et al. Spatiotemporally consistent global dataset of the GIMMS leaf area index (GIMMS LAI4g) from 1982 to 2020. Earth Syst. Sci. Data 15, 4877–4899, https://doi.org/10.5194/essd-15-4877-2023 (2023).
Article ADS MATH Google Scholar
Lin, W. et al. Reprocessed MODIS Version 6.1 Leaf Area Index Dataset and Its Evaluation for Land Surface and Climate Modeling. Remote Sens. 15, 1780, https://doi.org/10.3390/rs15071780 (2023).
Article ADS MATH Google Scholar
Yuan, H., Dai, Y., Xiao, Z., Ji, D. & Shangguan, W. Reprocessing the MODIS Leaf Area Index products for land surface and climate modelling. Remote Sens. Environ. 115, 1171–1187, https://doi.org/10.1016/j.rse.2011.01.001 (2011).
Article ADS MATH Google Scholar
Ma, H. & Liang, S. Development of the GLASS 250-m leaf area index product (version 6) from MODIS data using the bidirectional LSTM deep learning model. Remote Sens. Environ. 273, 112985, https://doi.org/10.1016/j.rse.2022.112985 (2022).
Article MATH Google Scholar
Xiao, Z. et al. Use of General Regression Neural Networks for Generating the GLASS Leaf Area Index Product From Time-Series MODIS Surface Reflectance. IEEE Trans. Geosci. Remote Sens. 52, 209–223, https://doi.org/10.1109/TGRS.2013.2237780 (2014).
Article ADS MATH Google Scholar
Liu, Y., Liu, R. & Chen, J. M. Retrospective retrieval of long-term consistent global leaf area index (1981–2011) from combined AVHRR and MODIS data. J. Geophys. Res. Biogeosciences 117 https://doi.org/10.1029/2012JG002084 (2012).
Baret, F. et al. GEOV1: LAI and FAPAR essential climate variables and FCOVER global time series capitalizing over existing products. Part1: Principles of development and production. Remote Sens. Environ. 137, 299–309, https://doi.org/10.1016/j.rse.2012.12.027 (2013).
Article ADS MATH Google Scholar
Verger, A., Baret, F. & Weiss, M. Near Real-Time Vegetation Monitoring at Global Scale. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 7, 3473–3481, https://doi.org/10.1109/JSTARS.2014.2328632 (2014).
Article ADS MATH Google Scholar
Lee, S.-H. et al. Impacts of in-canyon vegetation and canyon aspect ratio on the thermal environment of street canyons: numerical investigation using a coupled WRF-VUCM model. Q. J. R. Meteorol. Soc. 142, 2562–2578, https://doi.org/10.1002/qj.2847 (2016).
Article ADS MATH Google Scholar
Alonzo, M., Bookhagen, B., McFadden, J. P., Sun, A. & Roberts, D. A. Mapping urban forest leaf area index with airborne lidar using penetration metrics and allometry. Remote Sens. Environ. 162, 141–153, https://doi.org/10.1016/j.rse.2015.02.025 (2015).
Article ADS MATH Google Scholar
Klingberg, J., Konarska, J., Lindberg, F., Johansson, L. & Thorsson, S. Mapping leaf area of urban greenery using aerial LiDAR and ground-based measurements in Gothenburg, Sweden. Urban For. Urban Green. 26, 31–40, https://doi.org/10.1016/j.ufug.2017.05.011 (2017).
Article Google Scholar
Järvi, L., Grimmond, C. S. B. & Christen, A. The Surface Urban Energy and Water Balance Scheme (SUEWS): Evaluation in Los Angeles and Vancouver. J. Hydrol. 411, 219–237, https://doi.org/10.1016/j.jhydrol.2011.10.001 (2011).
Article ADS Google Scholar
Järvi, L. et al. Development of the Surface Urban Energy and Water Balance Scheme (SUEWS) for cold climate cities. Geosci. Model Dev. 7, 1691–1711, https://doi.org/10.5194/gmd-7-1691-2014 (2014).
Article ADS MATH Google Scholar
Li, X.-X. & Liu, X. Effect of tree evapotranspiration and hydrological processes on urban microclimate in a tropical city: A WRF/SLUCM study. Urban Clim. 40, 101009, https://doi.org/10.1016/j.uclim.2021.101009 (2021).
Article Google Scholar
Loughner, C. P. et al. Roles of Urban Tree Canopy and Buildings in Urban Heat Island Effects: Parameterization and Preliminary Results. J. Appl. Meteorol. Climatol. 51, 1775–1793, https://doi.org/10.1175/JAMC-D-11-0228.1 (2012).
Article ADS MATH Google Scholar
Mussetti, G. et al. COSMO-BEP-Tree v1.0: a coupled urban climate model with explicit representation of street trees. Geosci. Model Dev. 13, 1685–1710, https://doi.org/10.5194/gmd-13-1685-2020 (2020).
Article ADS MATH Google Scholar
Fick, S. E. & Hijmans, R. J. WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas. Int. J. Climatol. 37, 4302–4315, https://doi.org/10.1002/joc.5086 (2017).
Article Google Scholar
Lang, N., Jetz, W., Schindler, K. & Wegner, J. D. A high-resolution canopy height model of the Earth. Nat. Ecol. Evol. 7, 1778–1789, https://doi.org/10.1038/s41559-023-02206-6 (2023).
Article PubMed PubMed Central MATH Google Scholar
Wang, C., Wu, Q., Weimer, M. & Zhu, E. FLAML: a fast and lightweight AutoML library. in Fourth conference on machine learning and systems (mlsys 2021), https://www.microsoft.com/en-us/research/publication/flaml-a-fast-and-lightweight-automl-library (2021).
Breiman, L. Random Forests. Mach. Learn. 45, 5–32, https://doi.org/10.1023/A:1010933404324 (2001).
Article MATH Google Scholar
Ke, G. et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. in Advances in Neural Information Processing Systems vol. 30 (Curran Associates, Inc., 2017).
Zhang, Z., Xin, Q. & Li, W. Machine Learning-Based Modeling of Vegetation Leaf Area Index and Gross Primary Productivity Across North America and Comparison With a Process-Based Model. J. Adv. Model. Earth Syst. 13, e2021MS002802, https://doi.org/10.1029/2021MS002802 (2021).
Article ADS MATH Google Scholar
Yuan, Y., Wang, X., Yin, F. & Zhan, J. Examination of the Quantitative Relationship between Vegetation Canopy Height and LAI. Adv. Meteorol. 2013, 1–6, https://doi.org/10.1155/2013/964323 (2013).
Article MATH Google Scholar
Lang, N., Schindler, K. & Wegner, J. D. ETH_GlobalCanopyHeight_10m_2020_version1: Global canopy height map for the year 2020 derived from Sentinel-2 and GEDI. ETH Zurich https://doi.org/10.3929/ethz-b-000609802 (2022).
Friedl, M. & Sulla-Menashe, D. MODIS/Terra+Aqua Land Cover Type Yearly L3 Global 500m SIN Grid V061. NASA EOSDIS Land Processes Distributed Active Archive Center https://doi.org/10.5067/MODIS/MCD12Q1.061 (2022).
Myneni, R., Knyazikhin, Y. & Park, T. MODIS/Terra+Aqua Leaf Area Index/FPAR 8-Day L4 Global 500m SIN Grid V061. NASA EOSDIS Land Processes Distributed Active Archive Center https://doi.org/10.5067/MODIS/MCD15A2H.061 (2021).
BAI, G. et al. GBOV (Ground-Based Observation for Validation): A Copernicus Service for Validation of Vegetation Land Products. in IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium 4592–4594, https://doi.org/10.1109/IGARSS.2019.8898634 (2019).
Baret, F. et al. VALERI: a network of sites and a methodology for the validation of medium spatial resolution land satellite products. Remote Sens Env. 76, 36–39 (2005).
MATH Google Scholar
Yang, W. et al. MODIS leaf area index products: From validation to algorithm improvement. IEEE Trans. Geosci. Remote Sens. 44, 1885–1898 (2006).
Article ADS MATH Google Scholar
Wang, Y. et al. Evaluation of the MODIS LAI algorithm at a coniferous forest site in Finland. Remote Sens. Environ. 91, 114–127, https://doi.org/10.1016/j.rse.2004.02.007 (2004).
Article ADS MATH Google Scholar
Zeng, X., Shaikh, M., Dai, Y., Dickinson, R. E. & Myneni, R. Coupling of the Common Land Model to the NCAR Community Climate Model. J. Clim. 15, 1832–1854 10.1175/1520-0442(2002)015<1832:COTCLM>2.0.CO;2 (2002).
Article ADS Google Scholar
Dong, W. et al. Global urban tree LAI/SAI dataset for urban climate modeling. Zenodo https://doi.org/10.5281/zenodo.14709655 (2024).
Lipson, M. et al. Harmonized gap-filled datasets from 20 urban flux tower sites. Earth Syst. Sci. Data 14, 5157–5178, https://doi.org/10.5194/essd-14-5157-2022 (2022).
Article ADS MATH Google Scholar
Demuzere, M. et al. A global map of local climate zones to support earth system modelling and urban-scale environmental science. Earth Syst. Sci. Data 14, 3835–3873, https://doi.org/10.5194/essd-14-3835-2022 (2022).
Article ADS MATH Google Scholar
European Commission. Joint Research Centre. GHSL Data Package 2023. (Publications Office, LU, 2023).
Li, M., Wang, Y., Rosier, J. F., Verburg, P. H. & van Vliet, J. Global maps of 3D built-up patterns for urban morphological analysis. Int. J. Appl. Earth Obs. Geoinformation 114, 103048, https://doi.org/10.1016/j.jag.2022.103048 (2022).
Article MATH Google Scholar
Zhao, J., Zhao, X., Wu, D., Meili, N. & Fatichi, S. Satellite-based evidence highlights a considerable increase of urban tree cooling benefits from 2000 to 2015. Glob. Change Biol. 29, 3085–3097, https://doi.org/10.1111/gcb.16667 (2023).
Article MATH CAS Google Scholar
Grimmond, C. S. B. et al. The International Urban Energy Balance Models Comparison Project: First Results from Phase 1. J. Appl. Meteorol. Climatol. 49, 1268–1292, https://doi.org/10.1175/2010JAMC2354.1 (2010).
Article ADS MATH Google Scholar
Grimmond, C. S. B. et al. Initial results from Phase 2 of the international urban energy balance model comparison. Int. J. Climatol. 31, 244–272, https://doi.org/10.1002/joc.2227 (2011).
Article MATH Google Scholar
Jongen, H. J. et al. The Water Balance Representation in Urban-PLUMBER Land Surface Models. J. Adv. Model. Earth Syst. 16, e2024MS004231, https://doi.org/10.1029/2024MS004231 (2024).
Article MATH Google Scholar
Lipson, M. J. et al. Evaluation of 30 urban land surface models in the Urban-PLUMBER project: Phase 1 results. Q. J. R. Meteorol. Soc. 150, 126–169, https://doi.org/10.1002/qj.4589 (2024).
Article ADS MATH Google Scholar
Krayenhoff, E. S., Christen, A., Martilli, A. & Oke, T. R. A Multi-layer Radiation Model for Urban Neighbourhoods with Trees. Bound.-Layer Meteorol. 151, 139–178, https://doi.org/10.1007/s10546-013-9883-1 (2014).
Article ADS Google Scholar
Li, Q. et al. A 1km daily soil moisture dataset over China using in situ measurement and machine learning. Earth Syst. Sci. Data 14, 5267–5286, https://doi.org/10.5194/essd-14-5267-2022 (2022).
Article ADS MATH Google Scholar

Download references

Acknowledgements

This research was funded by the Guangdong Major Project of Basic and Applied Basic Research (2021B0301030007), the Natural Science Foundation of China (under Grants U2342227, 42075160, and 42088101), the Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai) (No. SML2023SP216), and the specific research fund of the Innovation Platform for Academicians of Hainan Province (YSPTZX202143). We want to thank the VALERI project and Boston University dataset for providing LAI reference maps on their websites. We thank C. Meier, J. Dash, L.A. Brown and H. Morris and the NEON and GBOV networks for the measurements collected in the field and used to generate GBOV products.

Author information

Authors and Affiliations

Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Guangdong Province Key Laboratory for Climate Change and Natural Disaster Studies, School of Atmospheric Sciences, Sun Yat-sen University, Zhuhai, China
Wenzong Dong, Hua Yuan, Wanyi Lin, Zhuo Liu, Jiayi Xiang, Zhongwang Wei, Lu Li & Yongjiu Dai
College of Computer Science and Technology, Changchun Normal University, Changchun, China
Qingliang Li

Authors

Wenzong Dong
View author publications
Search author on:PubMed Google Scholar
Hua Yuan
View author publications
Search author on:PubMed Google Scholar
Wanyi Lin
View author publications
Search author on:PubMed Google Scholar
Zhuo Liu
View author publications
Search author on:PubMed Google Scholar
Jiayi Xiang
View author publications
Search author on:PubMed Google Scholar
Zhongwang Wei
View author publications
Search author on:PubMed Google Scholar
Lu Li
View author publications
Search author on:PubMed Google Scholar
Qingliang Li
View author publications
Search author on:PubMed Google Scholar
Yongjiu Dai
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization, H.Y.; methodology, W.D. and H.Y.; validation, W.D. and H.Y.; formal analysis, W.D. and H.Y.; writing—original draft preparation, W.D.; writing—review and editing, H.Y., W.D., W.L, Z.L, J.X, Y.D.; visualization, W.D.; funding acquisition, Y.D. and H.Y. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Hua Yuan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Dong, W., Yuan, H., Lin, W. et al. A global urban tree leaf area index dataset for urban climate modeling. Sci Data 12, 426 (2025). https://doi.org/10.1038/s41597-025-04729-y

Download citation

Received: 29 July 2024
Accepted: 28 February 2025
Published: 12 March 2025
DOI: https://doi.org/10.1038/s41597-025-04729-y

This article is cited by

TSWS: An observation-based streamflow dataset of Tianshan Mountains watersheds (1901–2019)
- Shuai Li
- Wei Wei
- Gonghuan Fang
Scientific Data (2025)