Abstract
This paper presents a ground motion prediction (GMP) model using an artificial neural network (ANN) for shallow earthquakes, aimed at improving earthquake hazard safety evaluation. The proposed model leverages essential input variables such as moment magnitude, fault type, epicentral distance, and soil type, with the output variable being peak ground acceleration (PGA) at 5% damping. To develop this model, 885 data pairs were obtained from the Pacific Engineering Research Center, providing a robust dataset for training and validation. The ANN architecture comprises 4 nodes in the input layer, two hidden layers each containing 25 nodes, and a single-node output layer, resulting in 750 unknown weight and bias values that the model must optimize. Following the model assessment, a genetic algorithm (GA) was integrated with the ANN model to enhance its predictive capabilities. This integration aimed to forecast 20 potential earthquake scenarios, a crucial step in validating the model’s effectiveness. The results were promising, as the ANN-GA successfully predicted earthquake occurrences in 15 out of 20 scenarios. These findings underscore the model’s potential in accurately forecasting seismic events, thereby contributing to the development of more resilient infrastructure and better-informed urban planning strategies.
Similar content being viewed by others
Introduction
Ground motion prediction equations (GMPEs) calculate the intensity measures of ground movements, essential for structural design and seismic risk evaluation. Several techniques and parameters, such as response spectrum, frequency content measures, earthquake magnitude, epicentral distance, fault type, soil type, and site classification parameters, have been adopted for developing GMPEs. The significant parameter usually predicted by GMPE is peak ground acceleration (PGA). In the study of PGA components, horizontal movements are deemed more critical for engineering than vertical ones, prompting most research to focus on horizontal aspects1,2,3,4. Typically, three approaches are used: first, the larger of the two horizontal components is selected; second, both horizontal components are considered; and third, the geometric mean of the estimates from both components is computed.
Mechanistic and empirical models, such as stochastic models and attenuation relations, have been developed, resulting in numerous GMPEs. From 1964 to 2021, 485 GMPEs for PGA and 316 for spectral ordinates were developed5. The accuracy of predictions relies on both the quality and quantity of data used to estimate the parameters. However, beyond the data itself, a key factor in inferring a GMPE is choosing the most suitable functional form. Initially, Esteva and Rosenblueth proposed a GMPE to predict PGA using a simple exponential function6. Several models based on synthetic data were proposed, but with advancements, models incorporating the effects of amplification, attenuation, fault mechanisms, etc., became more complex. Boore and Joyner presented strong ground motion prediction equations developed from earthquakes in California and Italy using peak acceleration as the most common measure, utilizing regression methods. The data was extracted from the updated NGA-West 2 database7. Bommer et al.. introduced an empirical approach to deriving GMPEs for estimating short-period response spectral ordinates across Europe and the Middle East, covering a magnitude range of 3.0 to 7.68. Boore and Atkinson formulated a GMPE with dependent variables include PGA, PGV, and 5% PSA. Notably, the horizontal components derived are not simple geometric means but are calculated using the 50th percentile values from all possible orientations3. Campbell and Bozorgnia developed an empirical ground motion model that estimates PGA, PGV, PGD, and 5% damped linear elastic response spectra for time periods between 0.01 and 10 s. It explicitly incorporates effects such as magnitude saturation, attenuation dependent on magnitude, style of faulting, rupture depth, hanging-wall geometry, and site response, both linear and nonlinear. The model’s results suggest that ground motion predictions were fairly consistent across different distances, with significantly increased readings during reverse-faulting scenarios where ruptures don’t reach the surface. Higher ground motions were also observed in some strike-slip ruptures with limited surface expression, though other strike-slip cases showed divergent results. The model’s reliability may decrease near the extremes of its magnitude range9. Ansari et al. developed advanced models for predicting tunnel damage in seismically active and landslide-prone regions, using Artificial Neural Networking (ANN) and deep learning techniques. The multi-hazard damage prediction model and the seismic tunnel damage prediction model both employ feed-forward neural networks trained with critical input parameters, such as PGA, source-to-site distance, overburden depth, tunnel depth, lining thickness, and geological strength index. Validated against significant seismic events like the Kobe (1995), Chi-Chi (1999), Mid-Niigata (2004), and Wenchuan (2008) earthquakes, these models demonstrated high predictive accuracy and reliability, providing crucial seismic design recommendations to enhance post-disaster infrastructure serviceability. However, limitations include the uneven distribution of seismic data and constraints on model generalizability across varied geological conditions, necessitating further validation for broader application. The research outputs also feature damage indexing, predictive graphs, and globally applicable design guidelines10,11. In the following years, the structure of GMPEs was adjusted several times to account for factors such as amplification, attenuation, faulting mechanisms, damping, and the uncertainties observed in ground motions. These changes resulted in highly intricate functional models. A significant limitation of this type of parametric model is the need for a predefined functional structure.
The essential need for reliable earthquake prediction has long been a subject of debate, challenging even the most esteemed scientists. Precisely predicting the exact time, ___location, and magnitude of an earthquake is exceedingly complex due to the multitude of seismic precursors and other indicators of potential quakes. However, advancements in artificial intelligence (AI) and the aggregation of data from these indicators offer a promising avenue for improving earthquake prediction accuracy12,13. The utilization of advanced technological models like machine learning algorithms, ANN, and fuzzy logic has become increasingly prevalent in the field of earthquake engineering to interpret trends and yield accurate outcomes. Furthermore, AI enhances computational efficiency and reduces error rates, offering significant advantages in tasks such as phase picking, early warning systems, ground-motion prediction, tomography, and geodesy. Herein, a ground motion prediction model utilizing an ANN specifically for shallow earthquakes, with the goal of enhancing earthquake hazard safety evaluation, is introduced. The model incorporates key input variables, including moment magnitude, fault type, epicentral distance, and soil type, and predicts PGA at 5% damping. After assessing the model, a genetic algorithm (GA) was integrated with the ANN model. This combination aimed to predict potential earthquake scenarios.
Modeling peak ground acceleration
Before explaining the modeling of PGA in detail, an overview of the methodologies adopted in this study is shown in Fig. 1. The entire study consists of two main stages: data processing and modeling for prediction. In the following sections, more details will be provided for each stage.
Data acquisition and preprocessing of PGA
The data was sourced from the Pacific Earthquake Engineering Research Center (PEER). This database encompasses a comprehensive collection of ground motions recorded during shallow earthquakes in active tectonic areas around the world. For this study, data from 100 unscaled original earthquake response spectra were gathered, specifically focusing on no-pulse records that do not consider the effects of velocity pulses due to rupture directivity. The input variables for the study include moment magnitude, fault type, epicentral distance, and soil type, with the output variable being PGA. Based on PEER, fault types are categorized by rake angle as summarized in Table 1. Meanwhile, soil type is grouped according to the National Earthquake Hazards Reduction Program (NEHRP) standards, utilizing (time-averaged shear-wave velocity to a depth of 30 m) Vs30 values as summarized in Table 2.
In this study, ground motion data were collected from recordings featuring earthquake magnitudes between 5.0 and 8.0 and epicentral distances not exceeding 200 km. Any records lacking complete information were omitted from the dataset. After these adjustments, the study comprised a total of 883 ground motions from 89 events, as detailed in Table 3.
The statistical properties of the experimental data are summarized in Table 4. Each input parameter was coded as X1, X2, X3, and X4, corresponding to moment magnitude, fault type, epicentral distance, and soil type, respectively. The fault mechanism was assigned a number from 0 to 4, and the soil type was given a value from 1 to 4, representing soil types B to E, as shown in Table 2. No data were recorded for soil type A.
In this study, we conducted a comprehensive analysis of the dataset, including variables X1, X2, X3, and X4, by examining their pairwise associations through a Pearson correlation matrix. The results, shown in Fig. 2a, reveal that there is no substantial linear correlation among these variables. All correlation coefficients are either close to zero or fall significantly below the threshold for moderate correlation (< 90%), indicating a lack of strong linear relationships. These findings highlight the relative independence of the variables, suggesting that linear modeling techniques may not be suitable for capturing their interdependencies. In Fig. 2b the plot is comparing the distribution of PGA values with a normal distribution curve. The data appears to be skewed, with a large concentration of values at lower PGA levels, deviating from the normal distribution’s symmetry suggesting that the PGA data is not normally distributed.
To mitigate quantitative effect of features, the datasets were subjected to a normalization process ranging between (\(\:a=\) 0.1) and (\(\:b=\) 0.9) prior to training the model, as given in Eq. (1)14.
Modeling and prediction of PGA
An ANN is a computational framework inspired by the neural architecture of the human brain. It consists of interconnected units known as neurons, which are organized into various layers such as an input layer, several hidden layers, and an output layer. Each neuron in these layers is linked through weights that influence the signals transmitted between neurons. ANNs are employed in machine learning for their ability to adaptively learn from data, enabling them to perform tasks such as pattern recognition, classification, and regression15. For an ANN with two hidden layers, the final output \(\:y\) can be expressed as:
Where \(\:x\) is Input features, \(\:w\) is weights matrix, \(\:b\) is biases values and \(\:f\) is activation function. \(\:{m}_{1}\), \(\:{m}_{2}\), and \(\:{m}_{3}\) are number of neurons in input, first hidden layer and second hidden layer, respectively16. In this study, an ANN model with two hidden layers utilizing the settings summarized in Table 5 under MATLAB software was developed to map four input features to a single output.
During the training and testing steps, the performance of ANN models with different nodes in the hidden layers was evaluated using the total goodness function (TGF), as utilized by several authors17,18,19, as given in Eq. (3). This evaluation is crucial because it allows for the selection of the best weight and bias values for the ANN model.
In which
Where \(\:{a}_{i}\), \(\:{O}_{i}\), \(\:\stackrel{-}{a}\), n and \(\:N\) are the target, output, mean values of the target, number of data during the testing or training step of the network and total number of data, respectively. Following the modeling, a GA, originally developed by John Holland20, was utilized in conjunction with an ANN to predict PGA. To simulate earthquake scenarios, a cost function based on Euclidean distance was determined as follows:
Where \(\:\overrightarrow{x}\) is a vectors that stores input parameters, and \(\:{X}_{1}\), \(\:{X}_{2}\), \(\:{X}_{3}\), and \(\:{X}_{4}\) are the input parameters. The hyperparameter settings of the GA under MATLAB software are summarized in Table 6. The population consists of 25 individuals with a double vector representation, and the algorithm runs for a maximum of 25 generations. A uniform creation function initializes the population, with rank scaling and roulette selection enhancing diversity and selection pressure. One elite individual is carried over to the next generation. The mutation function uses adaptive feasible strategies, applied to 50% of the population, while a two-point crossover method is employed with a crossover fraction of 50%. Migration occurs forward at 10% intervals, with 90% of the population migrating. A penalty-based approach addresses nonlinear constraints, and the cost limit is set to zero.
Results and discussion
Figure 3a-b displays the performance of the ANN-based models during both the training and testing phases. The results indicate that the developed networks perform exceptionally well in both stages, as evidenced by the impressive R² values. Specifically, the training phase achieved an R² value of 0.9976, while the testing phase reached 0.9982. These high R² values demonstrate the excellent predictive capability of the ANN-based models.
The ANN architecture described in Fig. 3c features two hidden layers, each with 25 neurons, resulting in a configuration of 4-25-25-1. This specific setup has been identified as optimal for achieving the highest total goodness value. Figure 3d illustrates the relative importance of each input variable, revealing that soil type and epicentral distance contribute the most (41.17%) and the least (2.67%), respectively, to the output of the developed model.
To solve the problem, at the start of the optimization process, the GA assesses each individual by their cost values, targeting the individual with the minimum cost relative to the set input variables. The efficacy of the GA during the optimization is presented in Fig. 3e. It is important to highlight that the best result from the optimization is the average of 15 simulation runs, reflecting the GA’s inherent randomness. This approach ensures the robustness and reliability of the optimization results, taking into account the stochastic nature of the GA process.
Overall, the combined use of ANN and GA in this study has proven to be highly effective in predicting ground motions, providing valuable insights for seismic hazard assessments and earthquake engineering. The integration of these advanced computational techniques offers a powerful tool for improving the safety and resilience of structures in earthquake-prone areas.
Figure 4 illustrate the relationships between Peak Ground Acceleration (PGA) and four key seismic factors: moment magnitude (X1), fault type (X2), epicentral distance (X3), and soil type (X4). Each plot reveals the unique and complex influence of these factors on PGA. All these plots are obtained for a scenario with a moment magnitude of 5, fault type of 0, epicentral distance of 0.44, and soil type of 1. Starting with moment magnitude, we observe a strong non-linear correlation where larger magnitudes correspond to increased PGA values, but the relationship also contains peaks and troughs, indicating that other factors may modulate the effect. Fault type, while showing variation in PGA, has a relatively lower and more limited impact, suggesting that, for this case, fault type does not significantly affect ground acceleration compared to other parameters. Epicentral distance shows a clear inverse relationship with PGA—closer distances to the epicenter result in higher PGA, a physically intuitive outcome since seismic waves lose energy as they travel farther. Soil type, however, exhibits a non-linear effect, where certain soil types significantly amplify the ground acceleration, reflecting the well-known influence of soil characteristics on seismic wave behavior.
For this specific case, among these factors, moment magnitude and epicentral distance stand out as the most influential in determining PGA. Moment magnitude drives the energy release of an earthquake, directly affecting ground acceleration, while epicentral distance dictates how much energy reaches a specific ___location. Soil type plays a critical, albeit secondary, role, especially in areas prone to amplification due to soft soils. For the case under consideration, moment magnitude emerges as the most important factor, as it fundamentally governs the energy produced during an earthquake, which in turn influences all subsequent seismic behavior, including PGA.
Table 7 presents the outcomes for a set of 20 distinct fault combinations archived by ANN-GA system. It is well-established that reverse faults possess the capacity to yield elevated PGAs, especially in cases characterized by substantial fault slip and proximity to the Earth’s surface. In contrast, PGAs associated with normal oblique faults typically exhibit lower values compared to those emanating from reverse and strike-slip faults which was consistent with the results of21,22,23. Existing literature suggests that given the same earthquake magnitude, distance to the site, and site condition, the ground motions from normal-faulting earthquakes tend to be smaller than those from strike-slip earthquakes by about 20%. Upon careful examination of the data, a notable trend becomes apparent: for reverse and reverse-oblique faults (with X2 values of 2 and 3), even when the earthquake magnitude is substantial, the optimized PGA values are observed at greater epicentral distances. This observation suggests that these fault types are capable of generating heightened PGAs even when located at more substantial distances from the epicenter. In the case of normal faults (with an X2 value of 1), our analysis indicates that the maximum PGA is most prominent at shorter epicentral distances, with their impact diminishing as distances increase. However, for normal-oblique faults (with an X2 value of 4), the obtained results appear inconsistent with the underlying physical principles of the problem. Given that, in the context of normal-oblique faults, the PGA values at larger epicentral distances are typically negligible according to existing knowledge, which suggest higher PGAs at greater distances, raise questions regarding the validity of the optimization procedure. Further investigation and validation may be warranted to reconcile this discrepancy between the model results and the known physics of normal-oblique fault behavior. It is to be noted fault parameter uncertainty including fault ___location, slip rate, rupture mechanism, and fault geometry is highly crucial in predictions.
The calibration of ANN-based models for region-specific seismic data is a critical step in ensuring accurate and reliable predictions of Peak Ground Acceleration (PGA). The data selected is of high-quality and it captures localized features such as Vs30, fault mechanisms, and attenuation characteristics for region-specific seismic prediction. Furthermore, model robustness was enhanced through hyperparameter tuning, cross-validation, and the application of regularization techniques to mitigate overfitting. These measures ensure that the ANN can generalize effectively to both training data and unseen events, contributing to the broader generalizability of PGA predictions. Incorporating diverse and representative datasets further strengthens the model’s adaptability to varying seismic conditions, making it suitable for practical seismic hazard evaluations.
Moreover, the analysis of the seismic data highlights a distinct pattern in PGA related to both epicentral distance and moment magnitude. The findings indicate a pronounced reduction in PGA as the epicentral distance increases, supporting the inverse square law. This law suggests that ground shaking intensity decreases as the distance from the seismic source increases, corroborating earlier research3,4,24. The study also uncovers a slight positive correlation between PGA and moment magnitude. Furthermore, it is observed that softer soils register higher PGA compared to harder soils. These insights highlight the importance of considering both epicentral distance and moment magnitude for a thorough assessment of seismic impacts on ground motion and structural integrity. This aids in the development of more effective seismic hazard assessments and infrastructure resilience strategies.
Conclusion
This study investigates the application of Artificial Neural Networks (ANN) for developing Ground Motion Prediction Equations (GMPE) using PEER NGA data. The findings demonstrate that ANNs provide a robust and effective alternative to traditional empirical methods for predicting ground motion, highlighting their potential in seismic hazard assessments and earthquake engineering. Key conclusions include:
-
The ANN model exhibits high predictive accuracy, proving to be a valuable tool for estimating ground motion, which is critical for designing safer structures in earthquake-prone regions.
-
Machine learning techniques are crucial for analyzing complex, nonlinear patterns in seismic data, thereby improving the reliability of seismic hazard assessments.
-
Optimization insights from the study point to areas with potential seismic risks, helping to prioritize safety measures.
-
There is substantial potential for further development and application of ANN models in earthquake prediction, which could enhance preparedness and mitigation strategies against earthquake impacts.
-
The study encourages ongoing research using machine learning in seismology, emphasizing the significance of advanced data-driven methods for a deeper understanding of ground motion behavior.
-
Although the ANN model includes key seismic parameters, future iterations could incorporate additional factors, such as directivity effects, dip angles, and hypocentral distances, to better capture complex seismic phenomena.
-
The model’s calibration for specific seismic regions may reduce its broader applicability. Expanding the dataset to include high-resolution, multi-regional seismic data would enhance the model’s robustness and generalizability, making it adaptable to various tectonic settings.
This research sets the stage for more sophisticated, accurate, and practical approaches to understanding and managing seismic risks.
Data availability
The data that support the findings of this study are available on request from the corresponding authors.
References
Zhang, B., Yu, Y., Li, X. & Wang, Y. Ground motion prediction equation for the average horizontal component of PGA, PGV, and 5% damped acceleration response spectra at periods ranging from 0.033 to 8.0 s in southwest China. Soil Dyn. Earthq. Eng. 159, 107297 (2022).
Chiou, B. & Youngs, R. NGA model for average horizontal component of peak ground motion and response spectra. PEER Report 2008/09, Pacific Engineering Research Center College of Engineering (Univ. Calif. Berkeley, 2008).
Boore, D. M. & Atkinson, G. M. Ground-motion prediction equations for the average horizontal component of PGA, PGV, and 5%-damped PSA at spectral periods between 0.01 s and 10.0 s. Earthquake spectra. 24(1), 99–138 (2008).
Campbell, K. W. & Bozorgnia, Y. NGA-West2 ground motion model for the average horizontal components of PGA, PGV, and 5% damped linear acceleration response spectra. Earthq. Spectra. 30(3), 1087–1115 (2014).
Douglas, J. Ground motion prediction Eqs. 1964–2021. (Department of Civil & Environmental Engineering Imperial College London, 2021).
Esteva, L. & Rosenblueth, E. Espectros De temblores a distancias moderadas y grandes. Bol. Sociedad Mexicana De Ingenieria Sesmica. 2(1), 1–18 (1964).
Boore, D. M. & Joyner, W. B. The empirical prediction of ground motion. Bull. Seismol. Soc. Am. 72(6B), S43-S60 (1982).
Bommer, J. J., Stafford, P. J., Alarcón, J. E. & Akkar, S. The influence of magnitude range on empirical ground-motion prediction. Bull. Seismol. Soc. Am. 97(6), 2152–2170 (2007).
Campbell, K. W. & Bozorgnia, Y. NGA ground motion model for the geometric mean horizontal component of PGA, PGV, PGD and 5% damped linear elastic response spectra for periods ranging from 0.01 to 10 s. Earthquake Spectra. 24(1), 139–171 (2008).
Ansari, A., Rao, K., Jain, A. & Ansari, A. Deep learning model for predicting tunnel damages and track serviceability under seismic environment. Model. Earth Syst. Environ. 9(1), 1349–1368 (2023).
Ansari, A., Rao, K., Jain, A. & Ansari, A. Formulation of multi-hazard damage prediction (MhDP) model for tunnelling projects in earthquake and landslide-prone regions: a novel approach with artificial neural networking (ANN). J. Earth Syst. Sci. 132(4), 164 (2023).
Kubo, H., Kunugi, T., Suzuki, W., Suzuki, S. & Aoi, S. Hybrid predictor for ground-motion intensity with machine learning and conventional ground motion prediction equation. Sci. Rep. 10(1), 11871 (2020).
Okazaki, T., Morikawa, N., Fujiwara, H. & Ueda, N. Monotonic neural network for ground-motion predictions to avoid overfitting to recorded sites. Seismological Soc. Am. 92(6), 3552–3564 (2021).
Shahmoradi Ghaheh, F. et al. Modeling and optimization of dyeing process of polyamide 6 and woolen fabrics with plum-tree leaves using artificial intelligence. Sci. Rep. 14(1), 15067 (2024).
Haghdoost, F., Razbin, M., Bahrami, H., Barzin, J. & Ghaee, A. Modeling and optimization of the core-shell nanofibrous composite mat as a scaffold via hybrid models. J. Ind. Text. 52, 15280837221112406 (2022).
Grassi, G. & Vecchio, P. Wind energy prediction using a two-hidden layer neural network. Commun. Nonlinear Sci. Numer. Simul. 15(9), 2262–2266 (2010).
Sohrabi, M., Razbin, M., Pourtavvaf, M., Bagherzadeh, R. & Mehdipour Mirmahale, M. Exercising hybrid model to design an optimized electrospun polyamide-6 nanofibrous mat for air filtration applications. J. Text. Inst. 114(11), 1667–1681 (2023).
Kouchehbaghi, N. H. et al. Soft computing procedure to optimize the electrospinning parameters of polyacrylonitrile nanofibrous air filter. J. Text. Inst. 1–13 (2023).
Hatami, O., Sayadi, D., Razbin, M. & Adibi, H. Optimization of grinding parameters of tool steel by the soft computing technique. Comput. Intell. Neurosci. 2022 (2022).
Holland, J. H. Genetic algorithms. Sci. Am. 267(1), 66–73 (1992).
Somerville, P. & Abrahamson, N. Ground motion prediction for thrust earthquakes. In Proc., SMIP95 Seminar on Seismological and Engineering Implications of Recent Strong-Motion Data. Vol. 16, no. 1995, 11–23 (1995).
Spudich, P. et al. Earthquake ground motions in extensional tectonic regimes, US Geological Survey, 2331–1258, (1996).
Bolt, B. Estimation of strong seismic ground motions. Int. Handb. Earthq. Eng. Seismology, 983–1001 (2002).
Bindi, D. et al. Ground motion prediction equations derived from the Italian strong motion database. Bull. Earthq. Eng. 9, 1899–1920 (2011).
Funding
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
Fatima Khalid conceived the presented idea. Milad Razbin developed the networks and performed the optimization. Fatima Khalid verified the results. Fatima Khalid and Milad Razbin wrote the manuscript. All authors provided critical feedback and helped shape the research, analysis, and manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Khalid, F., Razbin, M. Modeling peak ground acceleration for earthquake hazard safety evaluation. Sci Rep 14, 31032 (2024). https://doi.org/10.1038/s41598-024-82171-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-82171-7