Abstract
Currently, the alarm functions of existing levee seepage monitoring systems are limited to single-parameter monitoring and lack rate-of-change alarms and correlation alarms. This can lead to false alarms, missed alarms, equipment failures, or unnecessary downtime. To enhance the intelligence of levee safety monitoring and seepage alarms, a levee seepage intelligent alarm system based on a Bidirectional Long Short-Term Memory (BILSTM) network model was designed and implemented. Firstly, data cleaning and preprocessing are carried out on the engineering safety monitoring operation data to reduce the influence of dirty data such as outliers and repetitive values on the accuracy of alarms. Secondly, for the correlation between the piezometric tube water levels of the levee and the Yangtze River water levels, a correlation analysis based on Mutual Information (MI) theory was conducted to minimize the effect of piezometric tube water level change delays on correlation. Finally, the BILSTM model was used to predict trends in these potentially abnormal data intervals. Based on engineering application requirements, alarm thresholds were established, and a multi-level alarm module was developed. Field operation test results show that the proposed method can accurately predict the piezometric tube water levels of levees, achieving intelligent alarms within the engineering safety monitoring system.
Similar content being viewed by others
Introduction
Intelligent monitoring and perception of project operation safety is an important component of the “intelligent water conservancy” system platform1. Embankments are water retaining structures built along rivers, canals, lakes, coasts, or at the edge of flood zones, which play an irreplaceable role in maintaining the safe operation of water resources projects2. However, embankment projects often have a long construction history and are prone to hazards such as seepage, leakage, sub-surface erosion, and piping during flood seasons, posing significant risks to the safety of the nation and the lives and property of the people3,4. Therefore, effectively monitoring seepage in river embankments and issuing timely intelligent alarms are crucial for maintaining the safe operation of water conservancy projects and promptly identifying potential risks.
At present, the key technologies in the operation of water conservancy project safety monitoring and intelligent alarm system include state monitoring technology, data analysis technology, and intelligent alarm technology. Specifically regarding the safety monitoring of river embankments, the main contents of operational safety monitoring include deformation monitoring, seepage monitoring, and environmental monitoring5. Deformation monitoring primarily focuses on the deformation and settlement of the main body of river embankments, including vertical displacement, expansion joints, and riverbed deformation monitoring. The mainstream method for seepage monitoring involves using pressure cells installed on the bottom slab of monitoring weirs to reflect changes in water levels in pressure pipes. And environmental monitoring primarily involves parameters such as water level and flow rate. Operational state monitoring enables real-time warning or alarm when anomalies or faults occur during the operation of river embankment projects. In addition, the state monitoring technology, as one of the core technologies of the intelligent alarm system, evaluates the operating status of the project by detecting the parameters of key equipment in real-time and determines whether it is in normal working condition6,7, which can greatly avoid ineffective daily maintenance while reducing the false alarm rate.
There are three main methods for state monitoring: model-based methods, signal-based methods, and data-driven methods8,9. Model-based approaches establish accurate physical models based on the operating mechanisms of engineering systems, generally ensuring high precision10. However, due to the complexity of engineering systems, it is challenging to build precise physical models for the entire system, limiting their use cases11. For instance, Nast et al.12 proposed a model supporting Digital Twin (DT) engineering that can detect faults or conduct predictive maintenance by identifying deviations in component behavior. However, the implementation and adaptation of related technologies are still under exploration. Wang et al.13 proposed a complete set of physical circuit modeling solutions for the secondary system of intelligent substations and carried out applied research on secondary physical models, which provides great convenience for applications such as engineering commissioning, equipment maintenance, and substation expansion. Although this solution has been successfully piloted in a 220kV intelligent substation in Fujian Province, China, it is still limited in terms of its usage scenarios. Signal-based methods detect potential faults by processing and analyzing collected signal data14. These methods are sensitive to signal quality and significantly affected by noise and interference, often requiring specialized sensors and hardware, which increases costs and installation complexity15. For instance, Geng et al.16 proposed a capacitor state monitoring strategy based on switching signals to identify abnormal capacitors and improve the performance of modular multilevel converters. This method can maintain the balance of capacitor voltage with simple operations, and the monitoring error is only a few thousandths. Xu et al.17 proposed a transformer operation status monitoring system based on sound signals, using adaptive noise reduction technology to process sound signals and eliminate false alarms. In contrast, data-driven methods monitor operating status using the intrinsic characteristics of the data without relying on physical models. These methods have strong generalization capabilities, can adapt to various systems and equipment, and are relatively simple to transfer, reducing the costs of model development and maintenance18. As a result, data-driven methods have become the most popular approach in current engineering safety monitoring, with extensive research on their application methods. Currently, the most commonly used methods are machine learning-based methods19 and statistical analysis-based methods20. Machine learning-based methods require establishing complex network structures and determining numerous hyperparameters, making it difficult to interpret and understand the model’s internal decision-making process. These methods are also prone to overfitting when the data volume is insufficient, making generalization capabilities hard to ensure, and they have certain requirements for computational hardware. For example, Raia et al.21 proposed a method using Long Short-Term Memory (LSTM) models to monitor state conditions and identify potential faults in engineering processes. This method can predict faults with 96.75\(\%\) accuracy but still faces the aforementioned issues. Besides, statistical analysis-based methods rely on the distribution characteristics of the data to perform statistical analysis. By establishing relationships between variables, these methods monitor the operating status of equipment based on calculated statistical quantities. For example, Wang et al.22 proposed a novel statistical time-frequency analysis method to investigate newly observed data and make decisions regarding abnormal situations. This method has been applied in three different engineering contexts, demonstrating significant improvements and good potential in engineering applications. However, it involves large computational quantities and low computational efficiency, which still needs improvement.
In terms of data analysis technology, data gaps caused by sensor failures, network issues, or other reasons, as well as data instability and anomalies due to sensor noise, and inconsistencies in data standards across different sensors and devices, present significant challenges for data analysis and processing. Therefore, it is necessary to employ data cleaning models to process the raw monitoring data collected through various means, thereby improving data quality23. For instance, Li et al.24 addressed the distribution and characteristics of anomalous data, achieving effective cleaning of various types of anomalies using the fuzzy C-means clustering method. Wang et al.25 proposed an outlier detection algorithm strategy based on Density Peaks Clustering (DPC) to address the occurrence of abnormal data or noise in industrial production processes. Experimental results show that the DPC algorithm demonstrates excellent performance in identifying and handling outliers and noise in the data, thereby improving data quality. Anomaly detection is an important task in IoT big data analysis and can be applied in many fields. Lin et al.26 proposed an anomaly detection method based on autoencoders and random forests, AERFAD, which uses random forests to classify data as anomalous or normal. In addition, engineering safety operations involving multiple equipment and data types often require the use of dimensionality reduction techniques and multivariate statistical analysis methods. Commonly used data analysis methods include Principal Component Analysis (PCA), Multi-Task Learning (MLT), and neural networks. For example, Zhu et al.27 used PCA to study the clustering of data points; Achille et al.28 proposed a vector representation method for classification tasks based on multi-task learning, which can infer the nature and relationships of these tasks; Considering the importance of accurate detection of abnormal conditions for the safe and stable operation of engineering processes, Lu et al.29 proposed a novel Cascade Bagging Principal Component Analysis and Convolutional Neural Network classification network (CBPCA-CNN) to achieve supervised anomaly detection and validated its feasibility. In the field of hydraulic engineering, data cleaning, and analysis techniques still largely rely on traditional approaches and are in their early stages of development. With the trend towards digital twin technology in hydraulic engineering and the advancement of monitoring technologies, engineering safety monitoring data analysis techniques should become more intelligent and efficient. By integrating specific operational mechanisms of different application domains, adapting to various data types and complex environments, and introducing big data processing techniques and intelligent methods, the efficiency and accuracy of pump station data cleaning can be significantly improved.
In the design of intelligent alarm systems, the mechanisms of most current systems rely on threshold detection methods30, lacking the capability for fault prediction and flexibility in responding to real-time changes in complex working conditions. Therefore, an increasing number of researchers are introducing artificial intelligence techniques to model and analyze historical data, thereby predicting trends in the data and establishing intelligent alarm mechanisms31. For instance, Su et al.32 utilized the latest advancements in computer vision, artificial intelligence, machine learning, and convolutional neural network (CNN) technologies to design and implement an intelligent safety warning system. This system enables safe, controlled, precise, and convenient intelligent alarms and warnings.
In summary, although data cleaning, data analysis, and state monitoring technologies are advancing rapidly, their application in pump stations still has room for development. Currently, some pump station data cleaning techniques still rely on traditional methods, highlighting the need to adopt advanced algorithms or intelligent methods to improve the efficiency and accuracy of data cleaning in pump stations. With the development of artificial intelligence technologies, pump station data analysis will increasingly move towards intelligent decision-making systems, aiming to achieve automated operation and optimized performance of pump station systems. Additionally, current state monitoring technologies mostly rely on detecting faults before triggering alarms. However, during new operational periods, pump stations may face different operating conditions, environmental changes, or equipment statuses. Existing state monitoring technologies may result in maintenance personnel being unable to make timely decisions regarding these issues. Therefore, it is essential to forecast trends in the water levels of piezometric tubes along the levees to enhance the safety of the pump station.
Therefore, in this paper, in order to improve the operational efficiency of the system, reduce the maintenance cost, and enhance the safety of the system, a riverbank seepage intelligent alarm system based on a bidirectional long-short memory network model is designed and implemented with the operational alarm requirements of a pumping station hub of a pumping station of the South-to-North Water Diversion Project, which is located along the Yangtze River. Based on the safety monitoring data of the embankment on the Yangtze River side of the pumping station, for the case that there are outliers in the data set used for data analysis, a box plot is used to eliminate the outliers and correct the outliers with the average value of the two data values before and after the outliers. Compared to the direct elimination of outliers, it is possible to avoid misuse of the dataset and improve the quality of the dataset. For the similarity between the data, correlation analysis based on mutual information theory is used on the basis of data cleaning, and data intervals where abnormal operating conditions may occur are screened. Compared with the traditional correlation analysis, the correlation between the piezometric tube water level data and the Yangtze River level data of the pumping station diversion embankment is more accurately expressed, and the impact of the time-delayed changes in the piezometric tube water level data of the Yangtze River is also solved. In addition, this paper uses the BILSTM model to predict the trend of possible abnormal operation data intervals and then establishes intelligent alarm intervals and develops a multilevel alarm module to be integrated into the engineering safety monitoring system. The engineering field operation test shows that the proposed method can accurately predict the water level of the river embankment water level gauge tube and realize intelligent alarm in the engineering safety monitoring system.
Methods
The river embankment seepage intelligent alarm system serves as a crucial component of the engineering safety monitoring platform. It acquires key data related to seepage in the engineering safety monitoring system, analyzes this data, and establishes trend prediction and intelligent alarm models. This enables the system to predict trends in seepage and provide intelligent alarms, enhancing the monitoring and safety management of river embankments. The overall architecture is shown in Fig. 1, which is divided into three parts: safety monitoring, data analysis and processing, and intelligent alarm application interface. The function of the safety monitoring part is to obtain the state monitoring parameters related to river embankment seepage from the engineering safety monitoring system, such as the basic data of river embankment hydraulic construction, piezometric tube water level data, and the Yangtze River water level data. The data analysis processing section includes two main functions: data cleaning and trend prediction modeling. Data cleaning involves preprocessing monitoring data to address missing values, abnormal values, and duplicate values, thereby improving data quality and enhancing its analytical value. The prediction model, built on the foundation of correlation analysis, identifies potential anomaly situations at monitoring points and operational intervals, followed by forecasting trends to establish multi-level alarm thresholds for intelligent alerts. The developed intelligent alarm application interface module is integrated into the existing engineering safety monitoring system, enabling visual representation of the data analysis processing and implementation of intelligent alarm functions.
Data cleaning and preprocessing
Engineering safety monitoring systems typically include multiple subsystems corresponding to various buildings, equipment, and sensor systems. Data retrieved from these systems may encounter issues such as field name conflicts, attribute value discrepancies, and structural conflicts. Additionally, due to environmental influences, sensor errors, data transmission errors, and human factors, there are occurrences of data gaps, anomalies, and errors. Therefore, it is necessary to perform preprocessing and data cleaning in the early stages of data analysis, followed by evaluating the data quality.
Data cleaning
Data cleaning refers to the technical method of identifying and modifying errors and inconsistencies present in data. Data cleaning aims at converting data from multiple sources into a uniform format, completing missing data records, removing error items and duplicates, etc., so as to improve the quality of the data to suit the needs of analysis. In this paper, the data preprocessing and data cleaning broadly consists of the following three types:
Data validation: The data type and value range of each field of the acquired raw data are verified to see if they conform to the normal logic and value range. For abnormal data types and value ranges, type conversion or deletion is possible. The commonly used data validation scheme is to sort the data according to the time of collection and carry out descriptive statistics to determine whether the data meets the requirements by mean value, variance, extreme value, and so on.
Missing Value Processing: Missing value processing is mainly for data rows or columns where there is missing data to be processed. For rows or columns where empty values constitute more than 50% or where a column has only one non-missing value besides empty ones, deletion is typically applied. Columns with missing values making up less than 50% are often filled using the forward fill method to handle missing values.
Outlier handling: Data points that exceed the actual values of each field are referred to as outliers. Outlier handling typically involves the use of box plot analysis, where quartiles and the interquartile range (IQR) are used to calculate the maximum and minimum estimates, serving as criteria for identifying outliers. Sorting all data in ascending order and dividing it into four equal parts yields quartiles, as shown in Fig.2, the box plot has five parameters: the lower bound indicates the minimum value; the upper bound indicates the maximum value; the lower quartile (Q1), median (Q2), and upper quartile (Q3) refer to the 25th, 50th, and 75th numbers, respectively, when all values in the sample are arranged in ascending order.
The lower bound (LB) and upper bound (UB) are calculated using formula 1, where IQR is the interquartile range, defined as the difference between the third quartile (Q3) and the first quartile (Q1). Typically, values that fall below the lower bound or exceed the upper bound are considered outliers.
Although box plots can identify outliers in a dataset in an intuitive and clear way, they have limitations, such as: not being able to accurately measure the degree of skewness and the degree of tail-weighting of the data distribution. In addition, it reflects more ambiguous information for data with relatively large batch sizes. Moreover, there are limitations in using the median to represent the overall evaluation level. However, these limitations have no impact on the present study and hence the use of box plots to screen for outliers is very appropriate in this paper.
After identifying the outliers, the treatment of outliers can be done by direct deletion or by using the average of the data in the vicinity before and after the outliers to make alternative corrections.
Evaluation of data quality
After preprocessing and cleaning, data needs to undergo data quality assessment to ensure its suitability for analysis and modeling. Data quality refers to the degree to which data meets criteria such as completeness, correctness, consistency, accuracy, and uniqueness. Metrics such as precision, compression rate, and throughput can be used to measure data quality33:
Precision refers to the ratio of the intersection of the cleaned data \(D_c\) and the true data \(D_r\) to the true data \(D_r\) itself, as shown in formula 2. This metric provides an intuitive understanding of the effectiveness of the cleaning process and can track error rates during cleaning. However, in cases where manual intervention is required during data cleaning, precision can be affected.
Data compression rate refers to the proportion of the difference between the original data \(D_{raw}\) and the cleaned data \(D_c\) relative to the true data \(D_r\). This is calculated using formula 3. Effective data compression techniques can improve data management efficiency and speed up data processing. However, high compression rates may lead to data loss or distortion.
Throughput refers to the ratio of the processed data volume \(|D_{raw}|\) to the processing time T, as calculated by formula 4. This metric helps identify performance bottlenecks in the cleaning system and supports performance optimization to enhance data cleaning efficiency. However, at high throughput levels, the stability and reliability of the cleaning system may be compromised, necessitating additional performance tuning and resource allocation.
Correlation analysis based on MI
When seepage occurs in river embankments, the typical condition is that the water level in the piezometric tube within the embankment changes with the water level of the Yangtze River. To determine whether seepage is occurring, it’s necessary to analyze the correlation between the time series of the piezometric tube water level and the Yangtze River water level. When embankment seepage conditions occur, changes in the piezometric tube water level exhibit a lag. To minimize the impact of this lag on the correlation analysis, mutual information theory is used to determine the lag between changes in the piezometric tube water level and the Yangtze River water level. The piezometric tube water level series is then aligned with the Yangtze River water level series, allowing for correlation analysis.
Mutual Information (MI) is a statistical method used to measure the association between two random variables, and it can be used to analyze the correlation between two-time series. Time delay refers to the lag or lead of one signal relative to another signal in time. By using mutual information as a supplement, it is possible to capture nonlinear relationships and higher-order correlations.
Suppose the time series of the Yangtze River water level changes over time is \(X = \{ x(t)\} (t = 1,2,...,n)\) , which serves as the original data series, and the time series of the piezometric tube water level changes over time is \(Y = \{ x(t + \tau )\} (t = 1,2,...,n)\), which serves as the delayed series. The mutual information between these two series can be calculated using the following formula 5:
In formula 5, p(x, y) is the joint probability distribution function of X and Y, while p(x) and p(y) are the marginal probability distribution functions of X and Y, respectively. A larger mutual information value I(X, Y) indicates a stronger association between X and Y. The mutual information increases when a certain value of X can provide accurate information about a certain value of Y. Mutual information is zero if and only if X and Y are independent. In the calculation of mutual information, different time delay values can be chosen. The mutual information curve may show peaks at certain time delay values. The positions of these peaks can be used as estimates of the time delay.
After estimating the time delay \(\tau\) using mutual information and aligning the signals, the degree of association between the variables can be measured using the Pearson correlation coefficient, Kendall correlation coefficient, and Spearman correlation coefficient. These coefficients range from -1 to 1, where a positive value indicates a positive correlation, a negative value indicates a negative correlation, and 0 indicates no linear correlation. Based on the analysis results, a threshold can be manually set to determine whether the two curves are sufficiently similar, which can also serve as an alarm threshold for the system.
The calculation formulas for the Pearson, Kendall, and Spearman correlation coefficients are as follows:
(1) Pearson correlation coefficient:
where \(\rho\) is the Pearson correlation coefficient, cov is the covariance, and \({\sigma _X}\) and \({\sigma _Y}\) are the standard deviations of variables X and Y, respectively.
(2) Kendall correlation coefficient:
where \(\tau\) is the Kendall correlation coefficient, \(n_c\) is the number of concordant pairs, \(n_d\) is the number of discordant pairs, and n is the total number of pairs.
(3) Spearman correlation coefficient:
where \(\rho\) is the Spearman correlation coefficient, \(d_i\) is the difference in ranks between variables X and Y, and n is the sample size.
Trend forecasting based on BILSTM model
After identifying the data intervals that may show abnormal conditions through correlation analysis, it is necessary to perform trend forecasting on the data to achieve intelligent alarming, allowing problems to be detected and addressed early. In this paper, the trend forecasting is conducted using the BILSTM model. This model is developed based on the Recurrent Neural Network (RNN) and features interconnections between each layer of the network and the neurons within the same layer. It exhibits a bidirectional cyclic update of information, where the output information processed by the previous neuron and external information collectively influence the input of the next neuron, demonstrating a cyclic process of extracting and short-term memorizing relevant (LSTM) information.
BILSTM network is a variant of LSTM that utilizes two LSTM layers with opposite directions, allowing it to consider both historical and future information of the sequence simultaneously. It uses hidden layer neurons to store input information from both directions. The network structure is shown in Fig. 3.
The BILSTM is divided into two layers: the forward layer and the backward layer, both of which are connected to the output layer. At each time step t, the output from the forward LSTM layer \({\vec {H}_t}\) and the output from the backward LSTM layer. \({\mathop {H}\limits ^{\leftarrow }}_{t}\) are calculated separately. These outputs are then concatenated to obtain the BILSTM’s output \({Y_t}\). The final calculation method for the BILSTM is given by the formula 9:
where \({\vec {W}_{hy}}\) represents the connection weights from the forward LSTM layer to the output layer; \({\mathop {W}\limits ^{\leftarrow }}_{hy}\) represents the connection weights from the backward LSTM layer to the output layer; \({\mathrm{{b}}_y}\) is the bias of the output layer.
The model’s prediction accuracy can be evaluated using parameters such as Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Bias Error (MBE), and the coefficient of determination (\(R^2\)). RMSE indicates the degree of error dispersion, and MAE represents the mean of the absolute deviations between the actual and predicted values. MBE indicates the average bias between the actual and predicted values, and \(R^2\) represents the proportion of the variance explained by the model. Lower RMSE, MAE, and MBE values indicate higher prediction accuracy, while a higher \(R^2\) indicates higher prediction accuracy. The specific calculation formulas are shown in Eqs. (10)–(13):
where \(Y_t\) is the actual value of the dataset at time t, \(\hat{Y_t}\) is the predicted value at time t, N is the length of the data sequence.
Testing and results
This paper verifies the effectiveness of the proposed intelligent alarm model and system using the actual operational data from the safety monitoring of a river embankment seepage project at a pumping station along the Yangtze River in the South-to-North Water Diversion Project.
Data preparation
To monitor whether seepage occurs in the embankment of the pumping station along the Yangtze River, the engineering safety monitoring system has installed collectors at three sections of the embankment. Each collector gathers water level data from piezometric tubes at three positions, from the top to the bottom of the embankment, resulting in a total of nine measurement points, as shown in Fig. 4. The section names and numbers are 0-830, 1-130, and 1-430, with the measurement points numbered from top to bottom as Left 1, Left 2, and Left 3. The type of measurement is uplift pressure (YYL), so the field names for storing data from the nine measurement points are YYL_083_Z1, YYL_0830_Z2, YYL_0830_Z3, YYL_1130_Z1, YYL_1130_Z2, YYL_1130_Z3, YYL_1430_Z1, YYL_1430_Z2, and YYL_1430_Z3. The table of measurement points for the embankment is shown in Table 1.
Data cleaning
The data used for testing in this paper is from a total of 1,188 collection points from 10 August 2023 at 14:50:20 to 14 August 2023 at 17:40:00, with a collection frequency of 5 minutes. Prior to data processing, it is necessary to clean the data to remove outliers. Taking the analysis results of point YYL_0830_Z1 as an example, through box plot analysis, the sequence yields a mean of 3.07, the standard deviation of 0.049, the maximum value of 3.27, the minimum value of 2.97, first quartile Q1 of 3.03, second quartile Q2 of 3.06, third quartile Q3 of 3.11, upper bound estimate UB of 3.23, and lower bound estimate LB of 2.91. The YYL_0830_Z1 sequence contains three outliers at the 20th, 185th, and 757th samples, corresponding to values at 2023-08-10 16:34:16, 2023-08-11 06:19:16, and 2023-08-13 05:59:14, respectively. To ensure data quality, identified outlier values are replaced with the average of nearby values. For instance, if the 20th sample is an outlier, the average of the 19th and 21st samples is used as the new value for the 20th sample. This approach prevents outliers from negatively impacting subsequent model training and performance, thereby enhancing overall data quality. In addition, the data compression rate was calculated using formula 3 and the result was 0.25 %, which is a very low compression rate . It is evident that the data cleaning process did not cause any distortion of the data and the data quality is high.
Correlation analysis
After data preprocessing and data cleaning, correlation analyses between the piezometric tube water level and the Yangtze River water level at the 9 measurement points are required to further screen out the data intervals where abnormal operational conditions may exist. The correlation between the piezometric tube water level and the Yangtze River water level trend is low when no seepage occurs at the riverbank, otherwise, it indicates that an abnormal working condition may have occurred at the ___location of the measurement point. In order to exclude the influence of the lag of the piezometric tube water level change on the correlation analysis, the time delay of the piezometric tube water level change is firstly inferred based on the mutual information (MI) theory, and then based on the obtained time delay, the piezometric tube water level is corrected to the Yangtze River water level, and finally the correlation analysis is carried out.
Taking the data of YYL_1430_Z1, a measuring point of the riverbank, and the water level of the Yangtze River as an example, the time delay curve obtained based on the MI theory and the curve after the alignment correction are shown in Fig.5. YYL_1430_Z1 based on the MI algorithm to the time delay of 24 sampling points, the sampling interval is 5 minutes, so it can be obtained that the piezometric tube water level changes relative to the Yangtze River water level changes lag 120 minutes.
The lag times for the water level changes in the piezometric tube tubes at the 9 measurement points relative to the Yangtze River water level changes, as well as the correlations after correction, are shown in Table 2. From the correlation analysis results in the table, it is evident that most measurement points exhibit very low correlations between piezometric tube water level changes and Yangtze River water level changes (with Pearson, Kendall, and Spearman correlation coefficients all less than 0.3). However, the piezometric tube water levels at measurement points YYL_1430_Z1, YYL_1430_Z2, and YYL_1430_Z3 show higher correlations with the Yangtze River water level changes. This suggests that these three measurement points may be experiencing abnormal working conditions, and their changing trends should be analyzed further to determine if seepage has occurred and to issue alerts if necessary. Additionally, the average of the three types of correlation coefficients can be used as a criterion for identifying potential abnormal working conditions that require special attention. In this study, a threshold of greater than 0.4 is used for this purpose.
Trend forecasting
After identifying measurement points and their data intervals that may indicate abnormal conditions through correlation analysis, it is necessary to further predict the trend of the corresponding measurement point piezometric tube water level, in order to determine the threshold ranges for various levels of intelligent alarms. In this paper, model training and trend analysis of field operation data based on the BILSTM model is performed by dividing the dataset into a training set and a test set in chronological order, with the former 70 % as the training set and the latter 30 % as the testing set, to ensure that the model is evaluated on unseen data. Pre-processing operations such as standardisation and normalisation are performed on the training set to ensure the sensitivity and stability of the model to the data. The function of ReduceLROnPlateau is used to reduce the learning rate when the model performance no longer improves. With the maximum number of training epochs set to 100, Fig. 6 shows the BILSTM model’s prediction results for the water level change trend in the piezometric tube at measurement point YYL_1430_Z1.
The model’s prediction accuracy is evaluated using metrics such as root mean square error (RMSE), mean absolute error (MAE), mean bias error(MBE), and coefficient of determination (\(R^2\)), as shown in Table 3. From the evaluation parameters in the table, it can be seen that the values of RMSE, MAE, and MBE for the three measurement points are approximately 0, and \(R^2\) is approximately 1, which shows that the prediction accuracy of the model is good. In addition, the prediction accuracy of the model can be used as a basis for the determination of the intelligent alarm threshold.
Determination of alarm threshold
Using the trends in piezometric tube water level predictions, thresholds for various levels of alerts in the intelligent seepage alarm system for river dikes can be established. This paper establishes multiple alert levels based on different piezometric tube water level changes, including yellow alert, orange alert, and red alert levels. A yellow alert may correspond to water levels exceeding the normal range but still within the safe range, while an orange alert indicates water levels exceeding the safe range but not reaching the dangerous level. A red alert corresponds to water levels that have reached or exceeded the dangerous level. Based on prediction results and historical data, the water level ranges corresponding to each alert level and appropriate response measures are determined. The alarm water level thresholds are determined by the historical average water level \(\mu\) within the interval and the standard deviation \(\sigma\) of the predicted water level changes:
Yellow Alert: Triggered when the water level exceeds the historical average water level plus one standard deviation, \(\mu \pm \sigma\).
Orange Alert: Triggered when the water level exceeds the historical average water level plus two standard deviations, \(\mu \pm 2\sigma\).
Red Alert: Triggered when the water level exceeds the historical average water level plus three standard deviations, \(\mu \pm 3\sigma\).
Taking point YYL_0830_Z1 as an example, based on statistical analysis with an average of 3.06 meters and a standard deviation of 0.065 meters, the intelligent alarm water level ranges are as follows:
Yellow Alert Range: Lower limit of 2.995 meters, upper limit of 3.125 meters.
Orange Alert Range: Lower limit of 2.93 meters, upper limit of 3.19 meters.
Red Alert Range: Lower limit of 2.865 meters, upper limit of 3.255 meters.
For each alert level, corresponding response measures and emergency plans should be developed. For instance, during a yellow alert, monitoring frequency can be increased and emergency supplies prepared. During an orange alert, patrols and warning broadcasts can be initiated, and evacuation plans prepared. During a red alert, immediate emergency response should be initiated, including personnel evacuation and rescue operations. This effectively improves the ability to identify and respond to abnormal changes in the water level of the river embankment piezometric tube, thereby guaranteeing the safety and stability of the river embankment to the greatest extent possible.
Implementation of the intelligent alarm system
To achieve real-time monitoring and coordinated command of the operation status of river dike projects, and ensure timely receipt and processing of alarm information, this paper has designed and developed an intelligent alarm system based on a Industrial Internet operating system named supOS. The system includes interfaces for real-time operational information, equipment integrity, project safety status, correlation analysis, and alerts for the operational parameters of units. Through these features, operations and maintenance personnel can quickly grasp the overall condition of hydraulic structures and identify alarm information that requires immediate attention or prioritized handling.
The correlation coefficient statistical analysis interface is shown in Fig. 7. The correlation coefficient of the Yangtze River water level and the water level of the riverbank piezometric tube is displayed by selecting the type of correlation coefficient. Fig. 8 shows the MI-based correlation analysis interface for the YYL_1430_Z1 measurement point, which can obtain the correlation coefficients of individual measurement points after processing and give a conclusion on whether the measurement point is anomalous or not.
The operational status interface of the pumping station unit is shown in Fig. 9, where the monitoring status of the 2# unit is displayed. A green background under the label “2#” indicates that the unit is currently operating normally without alarms. If there are abnormalities in the unit, they will be indicated in colors ranging from yellow to orange to red, depending on the severity of the alarm.
Conclusion and discussion
In order to improve the intelligent degree of safety monitoring and seepage alarm of the river embankment project, a river embankment seepage intelligent alarm system based on bidirectional long and short-term memory network model (BILSTM) is designed and implemented. The data of safety monitoring of a pump station along the Yangtze River of the South-to-North Water Diversion Project are used as an example for verification and analysis. It consists of the following three main sections:
In the data cleaning phase, anomaly detection within the dataset was effectively achieved through box plot analysis. Appropriate methods were used to correct these anomalies, demonstrating the effectiveness of data cleaning. By improving data accuracy through cleansing, the system’s false alarm rate in the intelligent alarm system was reduced.
In the correlation analysis based on Mutual Information (MI) theory, the correlation coefficients and MI algorithm are used to analyze the correlation between the water level of the piezometric tube and the water level of the Yangtze River, and the possible abnormal measurement points are obtained, which solves the problem of the seepage of the riverbank that is difficult to be detected in time. The correlation analysis based on MI theory can reduce the impact of time delay on the correlation analysis, and also help to accurately judge the abnormal measuring points and determine the alarm threshold range, which lays the foundation for the future riverbank safety warning alarm.
Furthermore, in addition to the mutual information algorithm in the correlation analysis method, this paper also investigates the Dynamic Time Warping algorithm (DTW)34.The MI algorithm is suitable for dealing with two sequences of equal length, and the DTW algorithm is suitable for dealing with sequences with a certain bending or delay in the time axis. In practical engineering, when dealing with the time series data of the piezometric tube water level and the Yangtze River water level, if the piezometric tube water level and the Yangtze River water level are recorded at different time intervals then it is suitable to choose DTW for correlation analysis. However, the data selected for this study are from a total of 1188 collection points from 14:50:20 on 10 August 2023 to 17:40:00 on 14 August 2023, all with a collection frequency of 5 minutes. Obviously, the DTW algorithm is not applicable in this study whereas the mutual information algorithm is very suitable for the test study in this paper.
The proposed BILSTM model is capable of accurately predicting the water level of riverbank piezometric tubes in engineering safety monitoring on the basis of data cleaning and correlation analysis. It can determine multi-level alarm thresholds based on historical data and trend prediction results, thereby establishing an intelligent alarm system with strong practical value.
In addition, according to the research, it was found that the Gate Recurrent Unit (GRU) is often used for trend prediction, for example, Cui et al.35 designed a GRU network prediction model to study the trend of the state change of the pump source system of an aircraft. Therefore, in order to find out the optimal trend prediction algorithm, this study also conducted trend prediction of piezometric tube water level based on the GRU model, which was compared after using actual data tests.
The Gate Recurrent Unit (GRU) is a short-term memory model that strongly solves the gradient vanishing problem in traditional recurrent neural networks (RNNs) by introducing update gates and reset gates. The update gate controls the retention and forgetting of information, while the reset gate allows the model to selectively ignore previous information based on the current input. This allows GRU to better capture long-term dependencies in time series and is the mechanism of GRU’s work.
Consistent with the trend prediction experiments based on the BILSTM model, the maximum number of training sessions was set to 100, and the dataset was divided into a training set and a test set in chronological order, with the former 70 % used as the training set and the latter 30 % used as the test set. The prediction result of the GRU model on the trend of the water level change of the manometer pipe of the YYL_1430_Z1 measurement point is shown in Fig.10. RMSE, MAE, MBE, and \(R^2\) are still used as the evaluation indexes of prediction accuracy, and the comparison results of prediction accuracy indexes between GRU model and BILSTM model are shown in Table 4.
From Table 4, it can be seen that for the three abnormal measurement points YYL_1430_Z1, YYL_1430_Z2, and YYL_1430_Z3, comparing the results of the four evaluation metrics computed, the results of RMSE, MAE, and MBE of the BILSTM are all closer to 0, and the \(R^2\) are all closer to 1. Combined with the curves of the each model’s test set, it shows that under this dataset, the model accuracy of BILSTM is higher than that of GRU model. Therefore, the BILSTM model is preferred for the trend prediction of the water level in the piezometric pipe.
In conclusion, although the system under study is capable of correlation analysis and trend prediction, some problems were still encountered during field tests and there are still functions that need to be further improved. For example: The correlation between the change in piezometric tube water level and the change in the Yangtze River water level is very low in the case of the river embankment where seepage does not occur. However, during operation of this system, it was found that the water level at one measuring point on the Yangtze River side changed significantly with the water level of the Yangtze River. After investigation, it was found that it was not the sensor that was inaccurate in measuring the uplift pressure, but it was because the piezometric tube piping had been used for too long which caused the piping to deteriorate and leak, and thus false alarms occurred. Therefore, it is necessary to calibrate the sensors at the monitoring points and carry out regular maintenance on the pipeline to ensure that the measurement data can accurately reflect the seepage situation in subsequent studies and field tests. Moreover, the measurement data are susceptible to disturbances and fluctuations during the process of sensor measurement, on-site PLC acquisition, and transmission to the server in the data centre. Therefore, in the subsequent research, it is necessary to introduce filtering or anti-interference methods to reduce false alarms and false alarms caused by unstable data at the data source. Additionally, the process of data cleaning can be standardised and aggregated: a collection of these raw data sets and a series of cleaning algorithms and operations on them can be formed, and a series of operations on the raw big data sets can be recorded to form a complete big data research and decision-making platform. Finally, it is possible to combine a wider range of monitoring data to gain a more comprehensive understanding of the operating status of more complex equipment, and to explore the possibility of multimodal data fusion to further enhance the performance of the monitoring system, thereby expanding its application scenarios.
Data availibility
The data supporting the findings of this study are available within the paper and its supplementary information files.
References
Li, S. et al. Research on the overall framework of intelligent water conservancy construction. In 2021 IEEE 11th International Conference on Electronics Information and Emergency Communication (ICEIEC). 150–155. https://doi.org/10.1109/ICEIEC51955.2021.9463829 (2021).
Li, D. et al. Spatial and temporal characteristics analysis for land subsidence in shanghai coastal reclamation area using PS-INSAR method. Front. Mar. Sci. 9, 1000523 (2022).
Xiao, R. et al. Deformation monitoring of reservoir dams using GNSS: An application to south-to-north water diversion project, China. IEEE Access 7, 54981–54992. https://doi.org/10.1109/ACCESS.2019.2912143 (2019).
Qian, J. & Diao, J. Numerical analysis of river regime evolution in the lower reaches of yangtze river under the basin flood. In 2021 7th International Conference on Hydraulic and Civil Engineering & Smart Water Conservancy and Intelligent Disaster Reduction Forum (ICHCE & SWIDR). 1591–1594. https://doi.org/10.1109/ICHCESWIDR54323.2021.9656313 (2021).
Markovic, M., Radivojevic, N., Andrejevic Stosovic, M., Markovic Brankovic, J. & Zivkovic, S. High embankment dam stability analysis using artificial neural networks. Tehnički vjesnik 29, 1733–1740 (2022).
Kresta, J. V., Macgregor, J. F. & Marlin, T. E. Multivariate statistical monitoring of process operating performance. Can. J. Chem. Eng. 69, 35–47 (1991).
Zhong, K., Han, M. & Han, B. Data-driven based fault prognosis for industrial systems: A concise overview. IEEE/CAA J. Autom. Sin. 7, 330–345 (2019).
Zhao, Y., Li, T., Zhang, X. & Zhang, C. Artificial intelligence-based fault detection and diagnosis methods for building energy systems: Advantages, challenges and the future. Renew. Sustain. Energy Rev. 109, 85–101 (2019).
Demidova, G., Chen, H., Li, Y., Lukichev, D. & Anuchin, A. Reviewing fault diagnosis methods in electric drives: Power subsystem and electrical machine. In 2023 IEEE 24th International Conference of Young Professionals in Electron Devices and Materials (EDM). 1680–1688 (IEEE, 2023).
Yu, K., Lin, T. R., Ma, H., Li, X. & Li, X. A multi-stage semi-supervised learning approach for intelligent fault diagnosis of rolling bearing using data augmentation and metric learning. Mech. Syst. Signal Process. 146, 107043 (2021).
Cabrera, D. et al. Adversarial fault detector guided by one-class learning for a multistage centrifugal pump. IEEE/ASME Trans. Mech. 28, 1395–1403 (2022).
Nast, B., Reiz, A., Ivanovic, N. & Sandkuhl, K. A modeling approach supporting digital twin engineering: Optimizing the energy consumption of air conditioning facilities. In 2023 ACM/IEEE International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C). 479–483 (IEEE, 2023).
Wang, L., Huang, J. & Zhang, C. Research on modeling method and application of physical circuit of secondary system for the smart substation. In IEEE 6th Advanced Information Technology Electronic and Automation Control Conference (IAEAC ). 591–596 https://doi.org/10.1109/IAEAC54830.2022.9929948 (2022).
Badihi, H., Zhang, Y., Jiang, B., Pillay, P. & Rakheja, S. A comprehensive review on signal-based and model-based condition monitoring of wind turbines: Fault diagnosis and lifetime prognosis. Proc. IEEE 110, 754–806 (2022).
Gao, Z., Cecati, C. & Ding, S. X. A survey of fault diagnosis and fault-tolerant techniques†part I: Fault diagnosis with model-based and signal-based approaches. IEEE Trans. Indus. Electron. 62, 3757–3767 (2015).
Geng, Z., Han, M. & Zhou, G. Switching signals based condition monitoring for submodule capacitors in modular multilevel converters. IEEE Trans. Circuits Syst. II Exp. Briefs 68, 2017–2021. https://doi.org/10.1109/TCSII.2020.3040428 (2021).
Xu, H., Cai, J., Xu, L. & Xu, F. Sound signal-based transformer operation status monitoring system. In 2023 4th International Conference on Computer Engineering and Application (ICCEA). 375–380 (IEEE, 2023).
Jiang, W., Li, Z., Zhang, S., Wang, T. & Zhang, S. Hydraulic pump fault diagnosis method based on EWT decomposition denoising and deep learning on cloud platform. Shock Vib. 2021, 6674351 (2021).
Alsaihati, A., Elkatatny, S., Mahmoud, A. A. & Abdulraheem, A. Use of machine learning and data analytics to detect downhole abnormalities while drilling horizontal wells, with real case study. J. Energy Resour. Technol. 143, 043201 (2021).
Duan, J. et al. Numerical prediction of wax deposition in oil-gas stratified pipe flow. Int. J. Heat Mass Transf. 105, 279–289 (2017).
Raia, M. R., Ailincai, A., Baicoianu, A., Husar, C. & Irimia, C. Condition monitoring of industrial elevators based on machine learning models. In 2023 IEEE 28th International Conference on Emerging Technologies and Factory Automation (ETFA). 1–5 (IEEE, 2023).
Wang, T., Lu, G. & Yan, P. A novel statistical time-frequency analysis for rotating machine condition monitoring. IEEE Trans. Indus. Electron. 67, 531–541 (2019).
Zou, F. Research on data cleaning in big data environment. In 2022 International Conference on Cloud Computing, Big Data and Internet of Things (3CBIT). 145–148. https://doi.org/10.1109/3CBIT57391.2022.00037 (2022).
Li, Y. & Li, D. Photovoltaic abnormal data cleaning based on fuzzy clustering-quartile algorithm. In 2023 IEEE 6th International Conference on Industrial Cyber-Physical Systems (ICPS). 1–5 (IEEE, 2023).
Wang, Z., Zhang, Q. & Du, T. Industrial data outlier detection based on density peaks clustering (DPC). In 2023 2nd International Conference on Machine Learning, Control, and Robotics (MLCR). 104–108. https://doi.org/10.1109/MLCR61158.2023.00028 (2023).
Lin, T.-H. & Jiang, J.-R. Anomaly detection with autoencoder and random forest. In 2020 International Computer Symposium (ICS). 96–99. https://doi.org/10.1109/ICS51289.2020.00028 (2020).
Zhu, X. et al. Spatio-temporal distribution and source identification of heavy metals in particle size fractions of road dust from a typical industrial district. Sci. Total Environ. 780, 146357 (2021).
Achille, A. et al. Task2vec: Task embedding for meta-learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6430–6439 (2019).
Lu, S., Dong, H. & Yu, H. Abnormal condition detection method of industrial processes based on the cascaded bagging-PCA and CNN classification network. IEEE Trans. Indus. Inform. 19, 10956–10966. https://doi.org/10.1109/TII.2023.3242811 (2023).
Zhang, H. et al. Research of general threshold model for pumped storage power station equipment based on normal distribution. In 2023 Panda Forum on Power and Energy (PandaFPE). 1105–1109 (IEEE, 2023).
Wu, H. et al. Research on intelligent water valve fault classification method based on machine learning. In 2023 4th International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE). 250–255. https://doi.org/10.1109/ICBASE59196.2023.10303227 (2023).
Su, H. On the design and development of an intelligent safety warning system for slow-moving vehicles. In 2022 International Conference on Computational Science and Computational Intelligence (CSCI). 1535–1538. https://doi.org/10.1109/CSCI58124.2022.00272 (2022).
Du, S., Shi, W., Li, S. & Zhao, G. Research on data cleaning technology based on RD-CFD method. In 7th International Symposium on Advances in Electrical, Electronics, and Computer Engineering. Vol. 12294. 971–980 (SPIE, 2022).
Liu, Y. et al. Research on correlation analysis method of time series features based on dynamic time warping algorithm. IEEE Geosci. Remote Sens. Lett. 20, 1–5 (2023).
Cui, J. et al. State change trend prediction of aircraft pump source system based on GRU network. In 2020 39th Chinese Control Conference (CCC). 2457–2460 (IEEE, 2020).
Acknowledgements
This research was funded by the National Natural Science Foundation of China under grant 62103358, Intelligent Measurement and Control Laboratory of Yangzhou University. Supported by the Open Project Program of Engineering Research Center of High-efficiency and Energy-saving Large Axial Flow Pumping Station, Jiangsu Province, Yangzhou University (grant number ECHEAP017), Engineering Research Center of High-efficiency and Energy-saving Large Axial Flow Pumping Station, Jiangsu Province, Yangzhou University, Yangzhou, Jiangsu 225009, China.
Author information
Authors and Affiliations
Contributions
Conceptualization, Zhiyu Shao. and Hongru Tang.; methodology, Zhiyu Shao; software, Xin Mei. and Jingwei Li; validation, Zhiyu Shao and Xin Mei; formal analysis, Zhiyu Shao and Jingwei Li; investigation, Tianyuan Liu. and Hongru Tang; resources, Zhiyu Shao and Xin Mei; data curation, Jingwei Li; writing-original draft preparation, Zhiyu Shao and Meiling Xue; writing-review and editing, Hongru Tang and Jingwei Li; visualization, Meiling Xue; supervision, Zhiyu Shao; project administration, Zhiyu Shao; funding acquisition, Zhiyu Shao. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Shao, Z., Mei, X., Xue, M. et al. Intelligent alarm system for river embankment seepage based on BILSTM. Sci Rep 14, 23822 (2024). https://doi.org/10.1038/s41598-024-75125-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-75125-6