Abstract
Urban flooding threatens urban resilience and challenges SDGs 11 and 13. This study assesses urban building flood risk in Guangzhou by integrating flood susceptibility with building function vulnerability. Using a Random Forest (RF) model, it predicts flood susceptibility based on flood records, hydrological, topographical, and anthropogenic features. The Categorical Boosting (CatBoost) model identifies building functions using POI and AOI data. Results reveal significant spatial variations: central districts exhibit higher flood susceptibility, while peripheral areas remain less affected. Over half of the buildings are moderately vulnerable, with only a small fraction highly vulnerable. Based on flood susceptibility and functional vulnerability, Guangzhou is classified into three district types: central urban (Type I), intermediate urban (Type II), and suburban/rural (Type III). The study underscores the need for tailored flood risk management strategies to address these disparities and mitigate climate change-induced water hazards.
Similar content being viewed by others
Introduction
Urban flooding—which occurs when rainfall overwhelms drainage capacity—is an increasingly pressing challenge for global cities, especially with the compounding effects of climate change1,2. The primary causes of flooding include rapid urbanization, which increases impervious surfaces and reduces natural absorption, as well as inadequate drainage infrastructure that struggles to handle intense precipitation3. These conditions are further exacerbated by climate change, which drives more frequent rainfall events and alters storm patterns4.
The impacts of urban flooding are extensive, causing personal injuries, significant property damage, and disruptions to essential urban infrastructure such as transportation, healthcare, and water services5. These consequences pose substantial barriers to achieving the Sustainable Development Goals (SDGs), notably SDG 11.5 and 13.1, for resilient actions to combat climate change6,7,8. This study spares time on building risk caused by urban flooding and explores potential coping measures.
Previous studies focus on the direct impacts of flooding on building structures and the resulting economic losses9,10,11. However, research on the disruption of building functions (or functional disruptions for short) caused by flooding and its socio-economic implications remains relatively limited5,12. These oversights may underestimate the long-term socio-economic impacts of flooding6, misallocate resources, exacerbate societal burdens in the aftermath of floods11, and hinder the rapid recovery of critical facilities, such as hospitals and shelters, during floods13,14.
In the field of flood risk management and disaster reduction, physically-based models remain a predominant approach for simulating the physical processes contributing to flooding, such as rainfall, runoff, and infiltration. These models are grounded in the fundamental principles of hydrology and physics, offering detailed simulations that enhance our understanding of flood dynamics. Common examples include the MIKE Flood Model, PCSWMM 2D, and HEC-RAS. While these models are highly effective for understanding flood events, they require extensive calibration data and substantial computational resources, which can limit their applicability in regions with inadequate data infrastructure, particularly in developing countries15,16. Furthermore, these models are not well-suited for flood susceptibility modeling in large-scale regions17,18.
Machine learning (ML) methods, on the other hand, are emerging as a promising alternative in flood risk management and disaster reduction19,20. Unlike physically-based models, ML approaches leverage algorithms to automatically learn patterns from historical data, such as rainfall, land use, and flood records21. Popular Algorithms like Random Forest (RF)22,23, Support Vector Machines (SVM)24,25, and CatBoost26,27 are widely applied to predict flood susceptibility. Specifically, previous studies have shown that the RF performs better than methods like LightGBM, Artificial Neural Networks (ANN), and SVM in flood susceptibility mapping16,28.
Rivals of these ML models are the Deep Learning (DL) methods. For example, Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are commonly used in flood susceptibility analysis. The drawbacks of these DL methods include significant computational power, massive historical data, and the complexity of multiple hyperparameters16. These limitations make DL techniques less practical in scenarios where data and computational resources are constrained.
Considering the resource-intensive nature of DL methods and the practical constraints in the analyzed case, this study relies on ML techniques. They offer a balanced solution with limited data availability, combining efficiency and reliability without the need for extensive datasets or complex architectures.
In China, the use of machine learning in flood risk and disaster management research has grown significantly29,30,31. One research area is to testify the performance of single and hybrid ML models in the flood susceptibility analysis. For instance, Yao et al. (2022) compared six ML models (linear regression, K-nearest neighbors, support vector machine, random forest, stacking ensemble models, and blending ensemble models), and used them to evaluate the flash flood susceptibility in Jiangxi32. Another prominent direction involves using new data sources to improve flood susceptibility mapping, addressing the limitations of sparse and insufficient flood inventory data from historical records and field surveys. Li et al. (2023) addressed data scarcity issues in Chengdu by leveraging Sina Weibo data, which allowed for urban flood susceptibility assessment and monitoring33. Fang et al. (2022) incorporated the simulation results of the hydrodynamic model (HEC-HMS/RAS) and rapid flood model (Height Above the Nearest Drainage, HAND) into the RF model to enhance the training and validation procedures for the flood susceptibility assessment of Xinluo watershed18.
Despite these attempts in ML techniques, studies exploring the broader applications of flood susceptibility mapping remain limited in China. Some global studies have utilized susceptibility maps to assess flood risk for critical infrastructures. For instance, Yousefi et al. (2020) and Pourghasemi et al., (2021) evaluated the flood risk to schools and other infrastructures in Iran34,35. Nkeki et al. (2022) examined the urban infrastructural susceptibility of bridges and culverts, road networks, and buildings in Nigeria36. Similarly, Ruiz and Rajib (2022) developed a flood susceptibility index for critical infrastructure in US cities37. In this context, our research seeks to bridge this gap by exploring building flood risk based on susceptibility maps, offering a Chinese case study.
Guangzhou is a deltaic megacity, long been severely affected by flooding, which not only poses a threat to the safety of its residents but also creates ongoing challenges for infrastructure and economic development4,38,39. Although various policy innovations have been implemented since 2000, including the extensive promotion of green infrastructure and the enhancement of urban drainage systems40,41, the importance of building functionality has consistently been overlooked. This phenomenon is common across most cities in China. The findings of this research enrich the knowledge on how different urban elements contribute to city resilience, and how adaptation strategies are designed42,43,44.
Results
Multicollinearity analysis and sensitivity analysis
Figure 1 shows the pairwise correlations between flood conditioning features. The Pearson coefficients for all tested features are under the threshold of 0.8. It indicates that all flood conditioning features are independent, thereby further utilizing the RF model for the FSM.
For sensitivity analysis, we used feature importance from the RF model to evaluate the influence of various factors on flooding (Fig. 2). Land use and land cover, NDVI, and distance from roads emerge as the most influential factors. In contrast, features such as elevation, building density (NDBI), and distance from culverts and interchanges exhibit moderate influence. Other features, including distance to rivers, slope, plan curvature, and annual average daily rainfall, show comparatively low feature importance.
Figure 3 illustrates the cumulative variance explained by the first 15 principal components (PCs) derived from the PCA process. It demonstrates that the first 15 principal components cumulatively explain over 80% of the variance in the original dataset, ensuring that these components effectively capture the most significant information.
The sensitivity analysis of the CatBoost model was conducted using the feature importance derived from the principal components (PCs). Figure 4 displays the feature importance of the 15 principal components (PCs) as used by the CatBoost model for classifying building functions. The x-axis represents the feature importance score, while the y-axis lists the 15 PCs. PC1, PC2, and PC3 are the most important components, with feature importance scores exceeding 8. These PCs capture the dominant patterns in the data and are crucial for accurately identifying building functions. PC4, PC5, and PC6 show moderate importance, with scores between 5 and 7. These components contribute secondary but still significant information for the model’s predictions. PC10 through PC15 have relatively lower importance, with scores below 4. These PCs account for minor patterns in the data that have a limited impact on the model’s performance.
Urban Flood Susceptibility
Figure 5 presents the distinct spatial distribution of flood susceptibility in Guangzhou across the urban fabrics (resolution: 100 m × 100 m). The “Very High” susceptibility zones are predominantly situated in Guangzhou’s central urban districts, such as Liwan, Yuexiu, Tianhe, Haizhu, and Baiyun districts. The “High” and “Moderate” susceptibility areas are commonly aligned with major road networks. In contrast, the “Low” and “Very low” susceptibility regions are mainly found in Guangzhou’s peripheral northern and eastern districts, such as the Conghua and Zengcheng districts.
Table 1 shows the area and percentage of each urban flood susceptibility level. A significant 61.45% of the area falls within the “Very Low” susceptibility category, demonstrating areas that are least likely to experience urban flooding under normal circumstances. The “Low” and “Moderate” categories encompass 12.39% and 9.13% of the area, respectively, indicating regions where flood probabilities are present but less extensive. Notably, the “High” and “Very High” susceptibility levels, while comprising just over 8.51% and 8.52%, respectively.
“Very High” susceptibility districts are characterized by their dense urban development and substantial population clusters, which can exacerbate the risk and consequences of flooding events. This linear distribution of “Moderate” susceptibility areas along thoroughfares could be indicative of the areas where water tends to accumulate during heavy rainfall, due to the impervious surfaces and possible inadequacies in drainage systems. The “Low” and “Very low” areas are characterized by less intensive development and lower population densities and might benefit from natural drainage basins and more permeable land surfaces, which would reduce flood susceptibility.
Building function vulnerability
Figure 6 presents the map of building functions in Guangzhou. Table 2 provides statistics on the number and proportion of buildings for each function and each function vulnerability. The “Low Vulnerability” category includes 637,966 commercial buildings, accounting for 35.64% of the total, 55,491 industrial buildings, comprising 3.10%, and 1211 cultural buildings, which represent 0.07%. The combined sum for this category is 694,668 buildings, making up 38.81% of the total. Residential buildings fall under the “Moderate Vulnerability” category, with a total count of 1,077,961, representing the largest proportion at 60.22%. The “High Vulnerability” category includes 552 public service buildings (0.03%), 3057 healthcare buildings (0.17%), and 13,903 educational buildings (0.85%). The total count for “High Vulnerability” buildings is 17,512, which constitutes 0.98% of the total.
Figure 7 shows the result of building function vulnerability. Figure 8 shows the proportion of buildings of each level of function vulnerability in each district. The spatial distribution of building function vulnerability reveals significant variations across Guangzhou’s districts. Low-vulnerability buildings account for the majority in districts like Yuexiu (64.75%), Liwan (64.67%), and Baiyun (54.66%). Moderate vulnerability buildings (primarily residential buildings) are most prevalent in suburban and less urbanized areas, such as Conghua (83.66%), Nansha (70.62%), and Zengcheng (70.26%). High-vulnerability buildings, including hospitals, schools, and other essential public infrastructure, are rare across all districts, with slightly higher concentrations in Tianhe (4.32%) and Yuexiu (4.10%). These districts may host a greater number of critical public facilities, increasing their vulnerability to flood impacts.
Building flood risk
Figure 9 shows the result of building flood risk. The flood risk levels of buildings are determined using the natural breaks method. This method seeks to minimize the variance within classes and maximize the variance between classes. Table 3 describes the statistical results of building flood risk. Buildings falling under the “Very Low” and “Low” risk categories account for 13.66% and 24.16%, respectively. This suggests that most buildings are situated in low susceptibility zones or designed with less vulnerable functions. Conversely, the “Very High” risk categories encompass 15.99% of the total buildings. These buildings are likely in flood-prone locations or perform functions critical enough that even minor flooding could result in substantial consequences.
Figure 10 shows the percentage of buildings in each flood risk class at the district level. Districts such as Conghua, Nansha, and Zengcheng have a significant proportion of buildings in the very low and low-risk categories (Fig. 10a and b). For instance, Conghua has 34.38% of its buildings in the very low-risk category and 26.44% in the low-risk category, making a combined total of over 60%. Similarly, Zengcheng has 25.80% of buildings at very low risk and 27.22% at low risk, indicating a lower overall flood risk in these districts.
Radar charts show the proportion of buildings under five flood risk levels across 11 administrative districts in Guangzhou: very low risk (a), low risk (b), moderate risk (c), high risk (d), and very high risk (e). Values on each axis represent the percentage of buildings at that specific risk level within the total buildings in each district. Line colors correspond to the five risk levels, ranging from light yellow (very low) to dark red (very high).
The proportion of moderate-risk buildings is highest in Yuexiu and Liwan (Fig. 10c), accounting for 56.83% and 55.07%, respectively. Districts like Haizhu, Tianhe, and Panyu exhibit higher proportions of buildings in the high and very high-risk categories (Fig. 10e and f). Haizhu stands out with 36.28% of its buildings categorized as very high risk and an additional 10.45% at high risk. Tianhe also shows a considerable proportion of high-risk buildings, with 15.13% at high risk and 30.87% at very high risk. Panyu has 19.67% of its buildings at high risk and 30.61% at very high risk.
This study categorizes Guangzhou’s districts into three types based on the flood susceptibility and functional vulnerability of buildings at very high and high risk. These types are central urban districts (Type I), intermediate urban districts (Type II), and suburban/rural districts (Type III) (Fig. 11). The primary distinction among these types lies in the flood susceptibility range where moderate and high function vulnerability are concentrated. In central urban districts (Tianhe, Liwan, Yuexiu, Haizhu), buildings exhibit the highest flood susceptibility, with values concentrated in the 0.9–1.0 range. Intermediate urban districts (Baiyun, Huangpu, Panyu) show a broader flood susceptibility range, from 0.7 to 0.9, where buildings with moderate to high functional vulnerability face similar levels of risk. In suburban and rural districts (Nansha, Conghua, Huadu, Zengcheng), flood susceptibility is moderate, with values primarily concentrated in the 0.6–0.8 range.
Validation
The RF model for urban flood susceptibility mapping demonstrates high accuracy, with a score of 0.9132, further validation through the ROC curve, and an AUC value of 0.9494, as depicted in Fig. 12a. The training accuracy is 0.9627. The 5-fold cross-validation scores for the model are as follows: 0.9179, 0.9254, 0.9254, 0.8731, 0.9328, with a mean cross-validation accuracy of 0.9149. Given these metrics, we assess the model’s predictive performance as adequate for the intended application.
The CatBoost model for building function identification shows relatively high accuracy, with a score of 0.8561. The ROC curves and AUC values for each class are shown in Fig. 12b. The accuracy is as follows: commercial class 0.8333, residential class 0.8595, industrial class 0.8451, and educational class 0.8455. The training accuracy is 0.9320. The 5-fold cross-validation scores for the model are as follows: 0.9235 0.9239, 0.9242, 0.9248, 0.9230, with a mean cross-validation accuracy of 0.9239. This suggests that the model is well-generalized and does not exhibit significant overfitting. We randomly sampled the identified building functions and compared them with satellite imagery and Baidu Street View observations. The analysis revealed a strong alignment between the predicted functions and the actual building uses (Supplementary Table 3).
Discussion
We propose some measures for high-risk and very high-risk buildings, incorporating four strategies: avoidance, resistance, resilience, and reparability12,45,46 (Table 4). Avoidance focuses on reducing exposure by situating buildings in low-risk zones, elevating structures, or employing physical barriers. Resistance emphasizes measures to prevent floodwater from entering buildings, such as installing flood doors and barriers or using waterproof materials. Resilience aims to mitigate long-term damages by incorporating features that facilitate quick drying, cleaning, and re-occupancy while maintaining structural stability. Reparability, as a subset of resilience, highlights the importance of designing elements that are easy to repair or replace, such as modular fixtures or sacrificial finishes.
This study makes significant contributions to the existing literature on urban flood risk assessment and aligns directly with the goals of Sustainable Development Goals (SDGs) 11.5 and 13.1, which emphasize reducing the adverse impacts of natural disasters and enhancing adaptive capacity to climate-related hazards. By adopting Random Forest (RF) and CatBoost models for urban flood analysis, the study broadens the application of machine learning (ML) techniques in building flood risk modeling. It is considered as a field where applications remain relatively limited21,47,48.
A notable innovation in this study is the development of a novel approach to defining non-flooded points for the RF model. Specifically, non-flooded points are identified as those beyond the three-quarter mark of the shortest distances between all pairs of flooded points. This criterion significantly enhances model performance (Table 5). This methodological improvement addresses key challenges in flood susceptibility modeling and sets a new standard for similar analyses.
In terms of building function identification, previous studies have often relied on urban land-use data or street-view imagery to infer functions49,50. While Point of Interest (POI) and Area of Interest (AOI) data have been explored, their full potential remains untapped. Existing research has primarily integrated POI, AOI, and building geometry with morphology to infer building functions51,52. This study expands the utilization of POI and AOI data by employing the CatBoost model to infer building functions, particularly in areas where POI and AOI data are sparse.
Additionally, this study advances the understanding of building vulnerability grading in China by drawing inspiration from the United Kingdom. Traditionally, building vulnerability classifications in China have primarily focused on structural and material performance during earthquakes53. To address the gap in flood-related vulnerability assessments, this research adapts and integrates the UK’s National Planning Policy Framework54, using its highest standards as a reference to assign vulnerability values.
The proposed methodology is adaptable to other cities. For the flood susceptibility model, the method for defining non-flooded points can be directly applied in cities with flood records. For building function identification, the integration of POI and AOI data provides a practical framework, as many cities have access to similar datasets or can leverage publicly available resources.
Two points should be cautious in the broader applicability of this methodology. First, flood records may introduce systematic bias, as they are typically obtained from official monitoring stations that are often unevenly distributed. These stations are usually concentrated in urban centers, with limited coverage in suburban or rural areas. Future studies could address this limitation by incorporating additional data sources, such as news reports or social media. Second, feature selection requires local knowledge, including specific building usage patterns, which may vary significantly across different cities.
One limitation of this study lies in the methodology. It includes technical validation and field-based verification. Field validation involved randomly sampling building functions and comparing them with satellite imagery and Baidu Street View. However, the availability of street view data limited the scope of validation, which may leave some uncertainty.
Future research could explore several directions to further advance this study. One promising avenue is to investigate the performance of alternative models, such as deep learning techniques, in flood susceptibility mapping and building function identification, to evaluate their potential advantages and limitations compared to traditional machine learning methods. Additionally, incorporating socioeconomic indicators, such as income levels, housing quality, accessibility to services, and coping capacity metrics, could enhance the comprehensiveness of building function vulnerability models. Another critical area for exploration is the integration of future climate scenarios by including projections of land use and rainfall patterns, enabling more dynamic and time-sensitive flood risk evaluations. Finally, expanding validation efforts by conducting method validation across multiple cities with diverse urban and environmental contexts, supplemented with field-based validation, could improve the robustness and generalizability of the proposed approach.
To conclude, this paper examines building flood risk through the lens of machine learning models and explores how these assessments can inform effective risk management strategies. It provides a comprehensive evaluation of flood risk in Guangzhou by integrating urban floodsusceptibility with building function vulnerability. Leveraging machine learning modelsincluding Random Forest (RF) and CatBoost, the study combines flood records,hydrological and topographical features, human activities, building footprints, and AOl andPOl data to deliver a nuanced analysis of building flood risk.
Key findings from this research offer several important insights for urban flood risk management: (1) The analysis shows significant spatial differences in flood susceptibility across Guangzhou. Central districts, with their high population density and intensive development, exhibit “Very High” flood susceptibility, while peripheral districts generally show “Very Low” susceptibility. (2) The study also reveals that most buildings fall into the moderate function vulnerability category, with less than 2% classified as highly vulnerable. (3) Based on the flood susceptibility and function vulnerability of very high and high-risk buildings, Guangzhou districts are categorized into three types: Type I (central urban districts) with the highest flood susceptibility (0.9–1.0), Type II (intermediate urban districts) with a broader range (0.7–0.9), and Type III (suburban/rural districts) with moderate susceptibility (0.6–0.8). The distinction among these types lies in the susceptibility range where moderate to high function vulnerability is concentrated.
These findings underscore the importance of tailored flood risk management strategies that address the distinct susceptibility and function vulnerability of buildings. By providing urban planners and policymakers with a deeper understanding of the building flood risk, this research equips them with the tools needed to effectively mitigate water-related hazards, particularly in the face of challenges intensified by climate change.
Methods
Study area
Guangzhou is the capital of south China’s Guangdong Province, P. R. China (Fig. 13), with a total terrestrial area of about 7434.40 km2 and over 15 million population. Guangzhou consists of 11 districts: Tianhe, Yuexiu, Haizhu, Baiyun, Huangpu, Liwan, Panyu, Huadu, Nansha, Conghua, and Zengcheng districts. Situated in the subtropical monsoon climate zone, Guangzhou experiences an average annual precipitation ranging from 1673.0 mm to 2004.6 mm. The city’s rainfall is unevenly distributed throughout the year, with about 80% primarily occurring from April to September. Furthermore, Guangzhou averages around 150 precipitation days annually.
Rapid urbanization in Guangzhou has significantly increased flood risk due to changes in land use, strained drainage systems, and high population density55,56. The expansion of impervious surfaces, such as roads and buildings, has reduced natural water absorption, leading to greater surface runoff during heavy rainfall57. This, combined with the reduction of green spaces and wetlands, intensifies the city’s vulnerability to flooding58. The rapid growth has also outpaced the capacity of existing drainage infrastructure, which is often outdated and insufficient for managing increased runoff59. Additionally, the city’s dense population heightens exposure and complicates emergency response efforts, as more people are concentrated in flood-prone areas60. Vulnerable communities, particularly those in informal or substandard housing, face amplified risks due to inadequate infrastructure and limited resources61.
Guangzhou is confronting a significant risk of urban flooding62,63. On 8 September 2023, Guangzhou experienced an extraordinary rainstorm, remarkable for its intense downpour and broad impact. The intense rainfall in Guangzhou led to severe waterlogging across various areas, resulting in the temporary closure of Shiqiao Station on Metro Line 3 in Panyu District due to deep water accumulation. In response to the rainstorm warning, several districts in Guangzhou suspended school classes. According to the city’s flood, drought, and typhoon control headquarters, a total of 2516 people had been evacuated from hazardous zones in Guangzhou64.
Research design
Building flood risk is calculated by a function of urban flood susceptibility and building function vulnerability65,66,67,68. This study consists of three steps: (1) mapping urban flood susceptibility, (2) assessing building function vulnerability, and (3) measuring building flood risk (Fig. 14).
Urban flood susceptibility mapping
-
(1)
Ubran Flood Inventory Map Generation: Flood inventory plays a critical role in understanding flood susceptibility by providing a comprehensive database of past flood events, including their locations and causes. Recent studies emphasize that accurate flood inventories are essential for developing reliable flood susceptibility models, as they serve as training datasets for machine learning algorithms21,69. The urban flood inventory list of Guangzhou is provided by the Guangzhou Emergency Management Bureau. The original list contained 488 records, but some were duplicates or no longer relevant due to infrastructure development, thus leaving 479 records of waterlogging after the cleanup. Duplicates were removed by geocoding the flooded points to obtain their geographic coordinates. Points with identical locations were identified and duplicates were deleted. The original format of the list is an Excel spreadsheet, which includes fields such as the name of the waterlogged (inundation) sites, type, and risk level. The ___location of the flooded records is performed through geographic coding based on the names of these waterlogged sites.
Additionally, an equal number of non-flooded sites is selected. We establish buffer zones around flood points by calculating the nearest neighbor distances between them and selecting the 3/4 quartile for the buffer’s extent. Non-flooded points are required to not only fall outside these buffers but also be located in areas of higher elevation. Figure 15 shows the flood-recorded and non-flooded points in the study area. Of these sites, 70% are randomly chosen for model training, while the remaining 30% are allocated for testing the model70.
-
(2)
Feature selection: This study incorporates 15 features from hydrological, topographical, and anthropogenic perspectives71,72,73. For example, higher annual average daily precipitation increases water input, raising flood risk, while areas closer to rivers are more prone to flooding due to overflow. Steeper slopes reduce flood susceptibility by promoting runoff, whereas lower elevations are more vulnerable to water pooling. Urban land use increases susceptibility through impervious surfaces, while vegetative land use mitigates it by absorbing water. Proximity to roads further contributes to flooding by reducing infiltration and altering water flow. Pearson correlation coefficients (PCC) are calculated to avoid multicollinearity in the selected features74. The PCC between two sets of samples \({A}_{i}\,\)(i = 1, 2, 3,…, n) and \({B}_{i}\)(i = 1, 2, 3,…, n) can be represented as Eq. (1):
$$PCC=\frac{{\sum }_{i=1}^{n}({a}_{i}-\overline{a}){\sum }_{i=1}^{n}({b}_{i}-\overline{b})}{\sqrt{{\sum }_{i=1}^{n}{({a}_{i}-\overline{b})}^{2}{\sum }_{i=1}^{n}{({b}_{i}-\overline{b})}^{2}}}$$(1)where \({a}_{i}\) and \({b}_{i}\) are variable values for \({A}_{i}\) and \({B}_{i}\), \(\mathop{a}\limits^{\bar{} }\) and \(\mathop{b}\limits^{\bar{} }\) are the average of \({A}_{i}\) and \({B}_{i}\), respectively. A Pearson coefficient approaching 1 signifies a strong positive correlation between two features, while a coefficient near 0 suggests their independence. Previous studies indicated that the absolute value of a Pearson coefficient of more than 0.8 may contribute to collinearity75,76,77.
-
(3)
Mapping the Urban Flood Susceptibility based on the Random Forest (RF): The RF model is an ensemble machine learning method that combines multiple decision trees to improve prediction accuracy78. It uses a process called bagging (bootstrap aggregation), where random subsets of data are used to build each tree. This reduces overfitting and enhances model robustness. The RF model is also capable of handling complex, non-linear relationships and identifying key factors influencing flood susceptibility, such as topography, land use, and rainfall intensity.
The RF has been widely applied in flood susceptibility mapping due to its high accuracy and ability to process large-scale, heterogeneous data72,79,80,81. It is particularly effective for large-scale analyses, such as national or regional assessments23,29,82. Therefore, this study adopts the RF model to map urban flood susceptibility.
The training data, comprising flood conditioning features, is input into the Random Forest (RF) model, and its performance is evaluated using the test data. To optimize the model and mitigate overfitting, a GridSearchCV approach with 5-fold stratified cross-validation is employed. The hyperparameters of the Random Forest model were optimized using grid search to achieve the best performance. The final configuration included 75 trees (n_estimators), a maximum tree depth of 10 (max_depth), a minimum of 5 samples required to split a node (min_samples_split), and at least 3 samples per leaf node (min_samples_leaf). The max_features parameter was set to sqrt, ensuring a balance between computational efficiency and model accuracy. Once the model’s performance meets the required standards, the study area of Guangzhou is divided into a 100 m × 100 m grid, with center points extracted alongside their corresponding flood conditioning features. These data points are then input into the trained RF model to generate probability outputs, which are subsequently interpolated using the Inverse Distance Weighting (IDW) method to produce a flood susceptibility map for the entire study area.
-
(4)
Validation of the Urban Flood Susceptibility Model: The performance of the RF model is evaluated through the accuracy, the receiver operating characteristic (ROC) curve, and the area under the ROC curve (AUC). Accuracy is the ratio of correctly predicted samples to all samples. True positive (TP) and false positive (FP) denote the count of actual flood-recorded and non-flooded points, respectively, that are predicted as flooded points. True negative (TN) and false negative (FN) denote the count of actual flood-recorded and non-flooded points predicted as non-flooded points. Notably, the TP and the TN represent correctly predicted samples. We employed the accuracy (as defined in Eq. (2) to evaluate the predictive performance of the RF model.
The ROC curve and the AUC are regarded as some of the best indicators for model assessment83. The ROC curve plots the model’s ratio of correctly predicted pixels (True Positive Rate (TPR) known as sensitivity, Eq. (3)) against the ratio of incorrectly predicted pixels (False Positive Rate (FPR) known as 1-sensitivity, Eq. (4)).
The range of the AUC is between 0.5 (poor predictive ability) and 1 (the highest accuracy and reliability). Values greater than 0.8 typically indicate very good model performance84.
Building Function Vulnerability Assessment
Building function vulnerability assessment includes identifying building functions and classifying their vulnerability. This study proposes a CatBoost model for building function identification using building footprints, Points of Interest (POI) data, and Areas of Interest (AOI) data. Catboost uses ordered boosting and permutation-driven techniques to mitigate prediction shift and overfitting, making it particularly effective for handling categorical features85. Using CatBoost in building function identification is an exploratory attempt. The function vulnerability is classified according to the UK’s National Planning Policy Framework (Supplementary Table 1)54, which has been recognized as an effective standard by other researchers11. The detailed processes are listed as follows:
-
(1)
Data Preparation: Building footprints are categorized based on spatial relationships with AOIs and POIs into labeled and unlabeled datasets. AOI and POI data are partly validated against satellite imagery and Baidu Street View (Supplementary Table 2). We reclass the AOIs and POIs data into seven categories, including residential AOIs and POIs, industrial AOIs and POIs, educational AOIs and POIs, commercial POIs, cultural POIs, and public services POIs. It is observed that commercial POIs constituted the largest number and proportion, but most are small businesses, making it difficult to determine the primary function of the buildings. Therefore, POIs are classified into two categories: those with low importance but high density (commercial and industrial), and those with high importance but low density (residential, healthcare, public facilities, education, and cultural)52. If a building contains only the former type of POIs, its function is determined by these POIs. Otherwise, the function is determined by the latter type of POI. Labeled data are presented in Table 6. Due to weak labels inferred from AOI and POI, unsupervised noise reduction is applied49. Labeled data is split into training and test datasets in a 7:3 ratio.
-
(2)
Feature Processing: We select twenty-two features across three categories: urban function structure, building geometry, and building surroundings (Table 7). The urban function structure is described using POI kernel density estimation86,87. The building geometry is referenced by Lin et al. (2021). To describe building surroundings, we use the nearest distance to each category of POI. Missing values are handled using the K-nearest neighbors (KNN) method. After standardization, polynomial processing (degree = 2) is first used to enhance the feature set by capturing nonlinear interactions and higher-order relationships. Then PCA is applied to reduce the feature space to 15 components, preserving over 80% of the variance while ensuring computational efficiency and reducing overfitting88.
-
(3)
Resampling: To address the imbalanced sample distribution (Table 6), Adaptive Synthetic Sampling Method (ADASYN) resampling is used to balance the training set. ADASYN, proposed by He et al. (2008), generates synthetic data points for minority classes, improving the classifier’s performance on imbalanced datasets by focusing more on difficult-to-classify examples89,90.
-
(4)
Model Training and Evaluation: Due to the low number of buildings with public, healthcare, and cultural functions, and based on the assumption that AOIs and POIs can initially identify these, only residential, commercial, educational, and industrial functions are considered during model training. The Categorical Boosting (CatBoost) model, a gradient boosting algorithm, employs ordered boosting to minimize overfitting and reduce prediction bias, ensuring accurate and reliable results85. Additionally, CatBoost is computationally efficient and well-suited for datasets with mixed numerical and categorical variables. Using CatBoost model in building function identification is an exploratory attempt.
A balanced training dataset is utilized to train the CatBoost model. To improve robustness and prevent overfitting, key hyperparameters such as iterations (1500), learning rate (0.01), L2 regularization (18), and depth (12) were adjusted. A 5-fold stratified cross-validation is also used to validate model performance. Model performance is validated using the test set with the accuracy of each function.
-
(5)
Function Classification and vulnerability assessment: The trained Catboost model is used to predict the functions of unlabeled building footprint data. We referenced the UK’s National Planning Policy Framework (NPPF) (Supplementary Table 1), which classifies flood vulnerability based on use and facility. This classification is based partly on the need for some functions to be maintained during flooding such as hospitals and police stations. Then we assigned vulnerability values of 3 to “highly vulnerable” functions, 2 to “more vulnerable” functions, and 1 to “less vulnerable” functions (Table 8). The building function vulnerability results are aggregated by districts for urban management, urban planning, and policy-making.
Building Flood Risk Measuring
In this study, we draw upon the risk matrix to reinterpret risk as the likelihood of a disaster occurring and the potential impact it may have91. The risk matrix is a tool extensively used in natural disaster risk assessment and project risk management92. The Eq. (5) shows the function of the risk matrix.
This research attempts to incorporate the urban flood susceptibility model and building function vulnerability into the risk matrix, thereby extending its practical applications. During the assessment process, “urban flood susceptibility” denotes the likelihood of the flood hazard29,72,93,94, while “function vulnerability” of buildings signifies the impact of flood11,54.
Building flood risk is calculated using Eq. (6). We extract the urban flood susceptibility of each building and multiply it by the function vulnerability. This methodological approach enables a nuanced assessment of risk by integrating both the ___location-specific susceptibility to flooding and the inherent vulnerability determined by the building functions. The building flood risk results are summarized by districts to support urban management, urban planning, and policy-making.
Urban flood conditioning features
This study selected 15 features from hydrological, topographical, and anthropogenic perspectives to input into the RF model71,72,73. The hydrological features include the stream power index (SPI), annual average daily precipitation (AP), distance from rivers (DRI), and hydrological soil groups (HSGs). Topographical factors include elevation (EV), slope (SL), topographic wetness index (TWI), convergence index (CI), plan curvature (PLC), and profile curvature (PRC). The anthropogenic factors included in the analysis are distance from roads (DRO), distance from culverts and interchanges (DCI), land use and land cover (LULC), the normalized difference vegetation index (NDVI), and the normalized difference built-up index (NDBI). A detailed description of urban flood conditioning features is shown in Table 9 below and Supplementary Figs. 1, 2, and 3.
Building function identification features
The building footprint dataset is adopted from Shi et al. (2024)., which is identified based on Google Earth imagery, global urban boundaries, and local climate zones, using a Boundary-enhanced network95. The average recall and precision rates were 84.52% and 81.06%, respectively. Samples of buildings in Guangzhou are shown in Fig. 16. The total number of buildings in Guangzhou is 1,790,141 after data cleaning.
The building function identification features are listed in Table 10. The urban function structure features contain the kernel density estimation (KDE) of residential POI, cultural POI, educational POI, industrial POI, healthcare POI, commercial POI, and public service POI. The building geometry contains area, perimeter, regularity, height, compactness, number of nodes, aspect ratio orientation, and radius shape index of buildings. The building surroundings include the distance from buildings to the nearest residential POI, cultural POI, educational POI, industrial POI, healthcare POI, commercial POI, and public service POI. AOI and POI data are obtained from the Baidu Map and are validated by comparing satellite imagery and Baidu Street View (Supplementary Table 2). All procedures are operated on ArcGIS 10.8 or Python programming.
Data availability
This paper includes the digital elevation model (DEM) data, hydrological soil groups data, annual average daily precipitation data, land use and land cover, building footprint data, building height data, POI data, and AOI data. The DEM data are downloaded from the Geospatial Data Cloud. (http://www.gscloud.cn). The hydrological soil groups data are collected from the Global Hydrologic Soil Groups (https://daac.ornl.gov/SOILS/guides/Global_Hydrologic_Soil_Group.html). The annual average daily precipitation data are derived from the Chinese 30-meter resolution annual mean precipitation dataset (1991-2020) of the Fine Resolution Mountainous Environmental Mapping Program (FRMM). The land use and land cover data are collected from Sentinel-2 10 m Land Use/Land Cover Time Series. The building footprint data are downloaded at https://doi.org/10.5281/zenodo.8174931. The building height data are downloaded at https://zenodo.org/records/7827315. The POI and AOI data are collected from Baidu Maps (https://map.baidu.com/).
Code availability
The code that supports the findings of this research is available from the corresponding author upon request.
References
Hosseini, F. S. et al. Flash-flood hazard assessment using ensembles and Bayesian-based machine learning models: Application of the simulated annealing feature selection method. Sci. Total Environ. 711, 135161 (2020).
McDermott, T. K. J. Global exposure to flood risk and poverty. Nat. Commun. 13, 3529 (2022).
Wang, Y. et al. Spatial characteristics and driving factors of urban flooding in Chinese megacities. J. Hydrol. 613, 128464 (2022).
Chan, F. K. S. et al. Build in prevention and preparedness to improve climate resilience in coastal cities: Lessons from China’s GBA. One Earth 4, 1356–1360 (2021).
Qin, X.-L., Wang, S.-F. & Meng, M. Flood cascading on critical infrastructure with climate change: A spatial analysis of the extreme weather event in Xinxiang, China. Adv. Clim. Change Res. 14, 458–468 (2023).
Allaire, M. Socio-economic impacts of flooding: A review of the empirical literature. Water Secur 3, 18–26 (2018).
Güneralp, B., Güneralp, İ & Liu, Y. Changing global patterns of urban exposure to flood and drought hazards. Glob. Environ. Change 31, 217–225 (2015).
Xing, Z. et al. Flood vulnerability assessment of urban buildings based on integrating high-resolution remote sensing and street view images. Sustain. Cities Soc. 92, 104467 (2023).
Balasbaneh, A. T., Abidin, A. R. Z., Ramli, M. Z., Khaleghi, S. J. & Marsono, A. K. Vulnerability assessment of building material against river flood water: case study in Malaysia. IOP Conf. Ser. Earth Environ. Sci. 476, 012004 (2020).
Englhardt, J. et al. Enhancement of large-scale flood risk assessments using building-material-based vulnerability curves for an object-based approach in urban and rural areas. Nat. Hazards Earth Syst. Sci. 19, 1703–1722 (2019).
Park, K. & Won, J. Analysis on distribution characteristics of building use with risk zone classification based on urban flood risk assessment. Int. J. Disaster Risk Reduct. 38, 101192 (2019).
Tagg, A. et al. A new standard for flood resistance and resilience of buildings: new build and retrofit. E3S Web Conf. 7, 13004 (2016).
Arvin, M., Beiki, P., Hejazi, S. J., Sharifi, A. & Atashafrooz, N. Assessment of infrastructure resilience in multi-hazard regions: A case study of Khuzestan Province. Int. J. Disaster Risk Reduct. 88, 103601 (2023).
Escarameia, M., Walliman, N., Zevenbergen, C. & De Graaf, R. Methods of assessing flood resilience of critical buildings. Proc. Inst. Civ. Eng. - Water Manag. 169, 57–64 (2016).
Abdrabo, K. I. et al. Integrated methodology for urban flood risk mapping at the microscale in ungauged regions: A case study of hurghada, egypt. Remote Sens 12, 3548 (2020).
Saber, M. et al. Enhancing flood risk assessment through integration of ensemble learning approaches and physical-based hydrological modeling. Geomat. Nat. Hazards Risk 14, 2203798 (2023).
Afshari, S. et al. Comparison of new generation low-complexity flood inundation mapping tools with a hydrodynamic model. J. Hydrol. 556, 539–556 (2018).
Fang, L., Huang, J., Cai, J. & Nitivattananon, V. Hybrid approach for flood susceptibility assessment in a flood-prone mountainous catchment in China. J. Hydrol. 612, 128091 (2022).
Ayyad, M., Hajj, M. R. & Marsooli, R. Climate change impact on hurricane storm surge hazards in New York/New Jersey Coastlines using machine-learning. Npj Clim. Atmos. Sci. 6, 88 (2023).
Sharma, A. et al. Comprehensive evaluation of machine learning algorithms for flood susceptibility mapping in wardha river sub-basin, India. Acta Geophys. https://doi.org/10.1007/s11600-024-01471-8 (2024).
Seydi, S. T. et al. Comparison of machine learning algorithms for flood susceptibility mapping. Remote Sens. 15, 192 (2023).
Demissie, Z., Rimal, P., Seyoum, W. M., Dutta, A. & Rimmington, G. Flood susceptibility mapping: Integrating machine learning and GIS for enhanced risk assessment. Appl. Comput. Geosci. 23, 100183 (2024).
Kaiser, M., Günnemann, S. & Disse, M. Regional-scale prediction of pluvial and flash flood susceptible areas using tree-based classifiers. J. Hydrol. 612, 128088 (2022).
Liu, J. et al. Assessment of flood susceptibility mapping using support vector machine, logistic regression and their ensemble techniques in the Belt and road region. Geocarto Int. 37, 9817–9846 (2022).
Tehrany, M. S., Pradhan, B., Mansor, S. & Ahmad, N. Flood susceptibility assessment using GIS-based support vector machine model with different kernel types. CATENA 125, 91–101 (2015).
Ajin, R. S., Segoni, S. & Fanti, R. Optimization of SVR and CatBoost models using metaheuristic algorithms to assess landslide susceptibility. Sci. Rep. 14, 24851 (2024).
Chowdhury, M. E., Islam, A. K. M. S., Zaman, R. U. & Khadem, S. A machine learning-based approach for flash flood susceptibility mapping considering rainfall extremes in the northeast region of Bangladesh. Adv. Space Res. https://doi.org/10.1016/j.asr.2024.10.047 (2024).
Ren, H. et al. Flood susceptibility assessment with random sampling strategy in ensemble learning (RF and XGBoost). Remote Sens 16, 320 (2024).
Zhao, G., Pang, B., Xu, Z., Yue, J. & Tu, T. Mapping flood susceptibility in mountainous areas on a national scale in China. Sci. Total Environ. 615, 1133–1142 (2018).
Tang, X. et al. A novel machine learning-based framework to extract the urban flood susceptible regions. Int. J. Appl. Earth Obs. Geoinf. 132, 104050 (2024).
Wei, Q. et al. City scale urban flooding risk assessment using multi-source data and machine learning approach. J. Hydrol. 651, 132626 (2025).
Yao, J., Zhang, X., Luo, W., Liu, C. & Ren, L. Applications of stacking/blending ensemble learning approaches for evaluating flash flood susceptibility. Int. J. Appl. Earth Obs. Geoinf. 112, 102932 (2022).
Li, Y., Osei, F. B., Hu, T. & Stein, A. Urban flood susceptibility mapping based on social media data in Chengdu City, China. Sustain. Cities Soc. 88, 104307 (2023).
Yousefi, S. et al. Assessing the susceptibility of schools to flood events in Iran. Sci. Rep. 10, 18114 (2020).
Pourghasemi, H. R. et al. Assessment of urban infrastructures exposed to flood using susceptibility map and Google Earth Engine. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 14, 1923–1937 (2021).
Nkeki, F. N., Bello, E. I. & Agbaje, I. G. Flood risk mapping and urban infrastructural susceptibility assessment using a GIS and analytic hierarchical raster fusion approach in the Ona River Basin, Nigeria. Int. J. Disaster Risk Reduct. 77, 103097 (2022).
Ruiz, A. & Rajib, A. Developing a spatially explicit flood susceptibility index for critical infrastructure in US cities. World Environ. Water Resour. Congr. 2022 https://doi.org/10.1061/9780784484258.115 (2022).
Meng, M., Dąbrowski, M., Tai, Y., Stead, D. & Chan, F. Collaborative spatial planning in the face of flood risk in delta cities: A policy framing perspective. Environ. Sci. Policy 96, 95–104 (2019).
Chan, F. K. S. et al. Urban flood risks and emerging challenges in a Chinese delta: The case of the Pearl River Delta. Environ. Sci. Policy 122, 101–115 (2021).
Meng, M., Dąbrowski, M., Xiong, L. & Stead, D. Spatial planning in the face of flood risk: Between inertia and transition. Cities 126, 103702 (2022).
Meng, M., Dąbrowski, M. & Stead, D. Governing resilience planning: Organizational structures, institutional rules, and fiscal incentives in Guangzhou. Land 12, 417 (2023).
Sharifi, A. Urban form resilience: A meso-scale analysis. Cities 93, 238–252 (2019).
Sharifi, A. & Yamagata, Y. Resilient Urban Form: A Conceptual Framework. in Resilience-Oriented Urban Planning (eds. Yamagata, Y. & Sharifi, A.) vol. 65 167–179 (Springer International Publishing, Cham, 2018).
Tayyab, M. et al. GIS-based urban flood resilience assessment using urban flood resilience model: A case study of Peshawar City, Khyber Pakhtunkhwa, Pakistan. Remote Sens 13, 1864 (2021).
Bowker, P., Escarameia, M. & Tagg, A. Improving the Flood Performance of New Buildings: Flood Resilient Construction. (RIBA, London, 2007).
Meng, M., Dabrowski, M. & Stead, D. Enhancing Flood Resilience and Climate Adaptation: The State of the Art and New Directions for Spatial Planning. Sustainability 12, 7864 (2020).
Saber, M. et al. Examining LightGBM and CatBoost models for wadi flash flood susceptibility prediction. Geocarto Int. 37, 7462–7487 (2022).
Aydin, H. E. & Iban, M. C. Predicting and analyzing flood susceptibility using boosting-based ensemble machine learning algorithms with SHapley additive exPlanations. Nat. Hazards 116, 2957–2991 (2023).
Li, Z., He, W., Li, J. & Zhang, H. Identifying Every Building’s Function in Large-Scale Urban areas with Multi-Modality Remote-Sensing Data. in IGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Symposium 310–314 (2024). https://doi.org/10.1109/IGARSS53475.2024.10641437.
Zhang, J., Fukuda, T. & Yabuki, N. Development of a city-scale approach for façade color measurement with building functional classification using deep learning and street view images. ISPRS Int. J. Geo-Inf. 10, 551 (2021).
Deng, Y. et al. Identify urban building functions with multisource data: a case study in Guangzhou, China. Int. J. Geogr. Inf. Sci. 36, 2060–2085 (2022).
Lin, A. et al. Identifying Urban Building Function by Integrating Remote Sensing Imagery and POI Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 14, 8864–8875 (2021).
China Academy of Building Research (CABR). Seismic Design Code for Buildings (2016).
Ministry of Housing, Communities & Local Government. National Planning Policy Framework. https://www.gov.uk/government/publications/national-planning-policy-framework--2 (2023).
Xu, T., Liu, F., Wan, Z., Zhang, C. & Zhao, Y. Spatio-temporal evolution characteristics and driving mechanisms of waterlogging in urban agglomeration from multi-scale perspective: A case study of the Guangdong-Hong Kong-Macao Greater Bay Area, China. J. Environ. Manag. 368, 122109 (2024).
Zhang, Q., Wu, Z., Zhang, H., Dalla Fontana, G. & Tarolli, P. Identifying dominant factors of waterlogging events in metropolitan coastal cities: The case study of Guangzhou, China. J. Environ. Manag. 271, 110951 (2020).
Yu, H. et al. A high-efficiency global model of optimization design of impervious surfaces for alleviating urban waterlogging in urban renewal. Trans. GIS 25, 1716–1740 (2021).
Qin, X., Wang, S. & Meng, M. SEA for better climate adaptation in the face of the flood risk: Multi-scenario, strategic forecasting, nature-based solutions. Environ. Impact Assess. Rev. 106, 107495 (2024).
Wang, M. et al. Optimization of integrating life cycle cost and systematic resilience for grey-green stormwater infrastructure. Sustain. Cities Soc. 90, 104379 (2023).
Lyu, H.-M., Wang, G.-F., Shen, J., Lu, L.-H. & Wang, G.-Q. Analysis and GIS mapping of flooding hazards on 10 may 2016, guangzhou, China. Water 8, 447 (2016).
Quan, Z., Shanshan, X. & Chengdong, Z. A community resilience evaluation and optimization strategy based on stormwater management. J. Resour. Ecol. 13, 360–370 (2022).
Lu, S., Huang, J. & Wu, J. Multi-dimensional urban flooding impact assessment leveraging social media data: a case study of the 2020 Guangzhou Rainstorm. Water 15, 4296 (2023).
Yu, H., Zhao, Y., Fu, Y. & Li, L. Spatiotemporal Variance Assessment of Urban Rainstorm Waterlogging Affected by Impervious Surface Expansion: A Case Study of Guangzhou, China. Sustainability 10, 3761 (2018).
Xinhua. Guangdong raises emergency response for heavy rain-triggered flood to Level II--China Economic Net. http://en.ce.cn/main/latest/202309/09/t20230909_38708254.shtml (2023).
Alexander, D. Natural disasters: a framework for research and teaching. Disasters 15, 209–226 (1991).
Baky, M. A. A., Islam, M. & Paul, S. Flood hazard, vulnerability and risk assessment for different land use classes using a flow model. Earth Syst. Environ. 4, 225–244 (2020).
Nasiri, H., Mohd Yusof, M. J. & Mohammad Ali, T. A. An overview to flood vulnerability assessment methods. Sustain. Water Resour. Manag. 2, 331–336 (2016).
Skakun, S., Kussul, N., Shelestov, A. & Kussul, O. Flood hazard and flood risk assessment using a time series of satellite images: a case study in Namibia. Risk Anal. 34, 1521–1537 (2014).
Saravanan, S. et al. Flood susceptibility mapping using machine learning boosting algorithms techniques in Idukki district of Kerala India. Urban Clim. 49, 101503 (2023).
Hussain, M. et al. Development of a new integrated flood resilience model using machine learning with GIS-based multi-criteria decision analysis. Urban Clim. 50, 101589 (2023).
Li, W. et al. A Positive-unlabeled learning algorithm for urban flood susceptibility modeling. Land 11, 1971 (2022).
Lyu, H.-M. & Yin, Z.-Y. Flood susceptibility prediction using tree-based machine learning models in the GBA. Sustain. Cities Soc. 97, 104744 (2023).
Zhu, X., Guo, H. & Huang, J. J. Urban flood susceptibility mapping using remote sensing, social sensing and an ensemble machine learning model. Sustain. Cities Soc. 108, 105508 (2024).
Ullah, K., Wang, Y., Fang, Z., Wang, L. & Rahman, M. Multi-hazard susceptibility mapping based on convolutional neural networks. Geosci. Front. 13, 101425 (2022).
Barton, B. & Peat, J. Medical Statistics: A Guide to SPSS, Data Analysis and Critical Appraisal, 2nd Edition. (BMJ Books, Blackwell Publishing Ltd, 2014).
Chatterjee, S. & Simonoff, J. S. Analysis of Variance. in Handbook of Regression Analysis 111–137 (John Wiley & Sons, Ltd, 2012). https://doi.org/10.1002/9781118532843.ch6.
Shrestha, N. Detecting multicollinearity in regression analysis. Am. J. Appl. Math. Stat. 8, 39–42 (2020).
Breiman, L. Random Forests. Mach. Learn. 45, 5–32 (2001).
Casali, Y., Aydin, N. Y. & Comes, T. Machine learning for spatial analyses in urban areas: a scoping review. Sustain. Cities Soc. 85, 104050 (2022).
Gudiyangada Nachappa, T. et al. Flood susceptibility mapping with machine learning, multi-criteria decision analysis and ensemble using Dempster-Shafer Theory. J. Hydrol. 590, 125275 (2020).
Lee, S., Kim, J.-C., Jung, H.-S., Lee, M. J. & Lee, S. Spatial prediction of flood susceptibility using random-forest and boosted-tree models in Seoul metropolitan city, Korea. Geomat. Nat. Hazards Risk 8, 1185–1203 (2017).
Roy, P. et al. Threats of climate and land use change on future flood susceptibility. J. Clean. Prod. 272, 122757 (2020).
Band, S. S. et al. Flash flood susceptibility modeling using new approaches of hybrid and ensemble tree-based machine learning algorithms. Remote Sens. 12, 3568 (2020).
Chen, W. et al. Landslide susceptibility modelling using GIS-based machine learning techniques for Chongren County, Jiangxi Province. China Sci. Total Environ. 626, 1121–1135 (2018).
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, A. CatBoost: unbiased boosting with categorical features. in Advances in Neural Information Processing Systems vol. 31 (Curran Associates, Inc., 2018).
Liu, K., Yin, L., Lu, F. & Mou, N. Visualizing and exploring POI configurations of urban regions on POI-type semantic space. Cities 99, 102610 (2020).
Luo, G., Ye, J., Wang, J. & Wei, Y. Urban functional zone classification based on POI data and machine learning. Sustainability 15, 4631 (2023).
Abdi, H. & Williams, L. J. Principal component analysis. WIREs Comput. Stat. 2, 433–459 (2010).
He, H., Bai, Y., Garcia, E. A. & Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. in 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) 1322–1328 (IEEE, Hong Kong, China, 2008). https://doi.org/10.1109/IJCNN.2008.4633969.
Haddadi, S. J., Farshidvard, A., Silva, F. D. S., Dos Reis, J. C. & Da Silva Reis, M. Customer churn prediction in imbalanced datasets with resampling methods: A comparative study. Expert Syst. Appl. 246, 123086 (2024).
Sutherland, H., Recchia, G., Dryhurst, S. & Freeman, A. L. J. How people understand risk matrices, and how matrix design can improve their use: findings from randomized controlled studies. Risk Anal. 42, 1023–1041 (2022).
Proto, R., Recchia, G., Dryhurst, S. & Freeman, A. L. J. Do colored cells in risk matrices affect decision-making and risk perception? Insights from randomized controlled studies. Risk Anal. 43, 2114–2128 (2023).
Anees, M. T. et al. Flood Vulnerability, Risk, and Susceptibility Assessment: Flood Risk Management. In Advances in Environmental Engineering and Green Technologies (ed. Karmaoui, A.) 1–27 (IGI Global, 2020). https://doi.org/10.4018/978-1-5225-9771-1.ch001.
Collier, C. G. & Fox, N. I. Assessing the flooding susceptibility of river catchments to extreme rainfall in the United Kingdom. Int. J. River Basin Manag. 1, 225–235 (2003).
Shi, Q. et al. The Last Puzzle of Global Building Footprints—Mapping 280 Million Buildings in East Asia Based on VHR Images. J. Remote Sens. 4, 0138 (2024).
Sevgen, E., Kocaman, S., Nefeslioglu, H. A. & Gokceoglu, C. A novel performance assessment approach using photogrammetric techniques for landslide susceptibility mapping with logistic regression. ANN Random For. Sens. 19, 3940 (2019).
Das, S. & Gupta, A. Multi-criteria decision based geospatial mapping of flood susceptibility and temporal hydro-geomorphic changes in the Subarnarekha basin, India. Geosci. Front. 12, 101206 (2021).
Costache, R. & Tien Bui, D. Spatial prediction of flood potential using new ensembles of bivariate statistics and artificial intelligence: A case study at the Putna river catchment of Romania. Sci. Total Environ. 691, 1098–1118 (2019).
Ross, C. W. et al. HYSOGs250m, global gridded hydrologic soil groups for curve-number-based runoff modeling. Sci. Data 5, 180091 (2018).
Li, X. et al. Flood Risk assessment of global watersheds based on multiple machine learning models. Water 11, 1654 (2019).
Shafizadeh-Moghadam, H., Valavi, R., Shahabi, H., Chapi, K. & Shirzadi, A. Novel forecasting approaches using combination of machine learning and statistical models for flood susceptibility mapping. J. Environ. Manag. 217, 1–11 (2018).
Beven, K. J. & Kirkby, M. J. A physically based, variable contributing area model of basin hydrology / Un modèle à base physique de zone d’appel variable de l’hydrologie du bassin versant. Hydrol. Sci. Bull. 24, 43–69 (1979).
Costache, R. Flash-flood Potential Index mapping using weights of evidence, decision Trees models and their novel hybrid integration. Stoch. Environ. Res. Risk Assess. 33, 1375–1402 (2019).
Ohlmacher, G. C. Plan curvature and landslide probability in regions dominated by earth flows and earth slides. Eng. Geol. 91, 117–134 (2007).
Brath, A., Montanari, A. & Moretti, G. Assessing the effect on flood frequency of land use change via hydrological simulation (with uncertainty). J. Hydrol. 324, 141–153 (2006).
Hounkpè, J., Diekkrüger, B., Afouda, A. A. & Sintondji, L. O. C. Land use change increases flood hazard: a multi-modelling approach to assess change in flood characteristics driven by socio-economic land use change scenarios. Nat. Hazards 98, 1021–1050 (2019).
Shan, X., Yin, J. & Wang, J. Risk assessment of shanghai extreme flooding under the land use change scenario. Nat. Hazards 110, 1039–1060 (2022).
Rahmati, O., Pourghasemi, H. R. & Zeinivand, H. Flood susceptibility mapping using frequency ratio and weights-of-evidence models in the Golastan Province, Iran. Geocarto Int 31, 42–70 (2016).
Wang, Y., Fang, Z., Hong, H., Costache, R. & Tang, X. Flood susceptibility mapping by integrating frequency ratio and index of entropy with multilayer perceptron and classification and regression tree. J. Environ. Manag. 289, 112449 (2021).
Yang, J. et al. Divergent shifts in peak photosynthesis timing of temperate and alpine grasslands in China. Remote Sens. Environ. 233, 111395 (2019).
Kebede, T. A., Hailu, B. T. & Suryabhagavan, K. V. Evaluation of spectral built-up indices for impervious surface extraction using Sentinel-2A MSI imageries: A case of Addis Ababa city, Ethiopia. Environ. Chall. 8, 100568 (2022).
Acknowledgements
We sincerely thank the three anonymous reviewers for their thoughtful comments and helpful suggestions, which have contributed to improving this paper. We would also like to express our heartfelt thanks to our colleagues Bangdong Chen and Lianghao Zhang for their generous technical guidance and support throughout the research process. This research is co-funded by the National Youth Science Fund Project of the National Natural Science Foundation of China (52108050), the Guangdong Basic and Applied Basic Research Foundation (2023A1515011653), the Guangdong Basic and Applied Basic Research Foundation (2024A1515011547), the National Social Science Foundation (22VHQ009), and the China Scholarship Council (202306150061).
Author information
Authors and Affiliations
Contributions
X.Q.: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Visualization, Writing – Original Draft, Resources, Writing – Review & Editing. S.W.: Writing – Review & Editing, Resources, Supervision, Project administration, Funding acquisition. M.M.: Conceptualization, Investigation, Visualization, Writing – Original Draft, Methodology, Resources, Writing – Review & Editing, Supervision, Project administration, Funding acquisition. H.L.: Visualization, Investigation. H.Z.: Software, Visualization, Investigation. H.S.: Methodology, Writing – Review & Editing.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Qin, X., Wang, S., Meng, M. et al. Enhancing urban resilience through machine learning-supported flood risk assessment: integrating flood susceptibility with building function vulnerability. npj Urban Sustain 5, 19 (2025). https://doi.org/10.1038/s42949-025-00208-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s42949-025-00208-w