Introduction

As a prevalent chronic neurological disorder, epilepsy impacts over 50 million people globally1,2. Defined as a group of neurological brain disorders caused by excessive and abnormal brain activity, epilepsy is the second most common neurological disorder after stroke3, according to the International League Against Epilepsy (ILAE)4,5. Individuals with epilepsy experience sudden neuronal discharges, resulting in loss of consciousness, awareness, emotional or cognitive dysfunction, and an increased risk of premature death6,7. Traditional management methods, including medication and surgery, often face certain limitations, highlighting the importance of accurate diagnosis and seizure prediction.

Currently, non-invasive electroencephalogram (EEG) recordings of brain voltage fluctuations are considered one of the most effective tools for diagnosing epilepsy, as significant changes in brain electrical activity occur during and around seizures8,9. This technique entails the placement of multiple electrodes across different brain locations to capture ongoing brain activity10,11. The pathogenesis of epilepsy is rooted in abnormal neuronal discharges in the cerebral cortex, which can propagate to other neurons and muscles through nerve fibers, resulting in clinical symptoms such as muscle twitching12,13. Additionally, EEGs have high temporal resolution, typically ranging from hundreds to thousands of Hz, enabling the capture of brain voltage fluctuations and ensuring timely detection of seizures14,15. However, the conventional manual examination and analysis of EEG signals by neurologists are deemed inefficient, subjective, and prone to errors. Consequently, there is a pressing need for the development of automated techniques to recognize seizures from EEG signals.

In recent years, many researchers have explored EEG-based methods for seizure detection and prediction. Although these two techniques have slightly different applications, they are closely related. Seizure detection focuses on distinguishing between ictal and interictal EEG signals, whereas prediction aims to differentiate between pre-ictal and inter-ictal phases16,17. EEG signals are highly complex, characterized by low signal-to-noise ratio, high dimensionality, non-smoothness, nonlinearity, variability, and artifacts, making seizure detection and prediction tasks challenging18,19. Consequently, various techniques for analyzing non-smooth signals have been proposed and applied into seizure detection methods, such as Short-Time Fourier Transform (STFT), Wavelet Transform (WT), and Welch’s Power Spectral Density Estimation (Welch’s Method). These techniques represent and extract time–frequency ___domain information from EEG signals, each offering unique advantages and applications. For instance, STFT has been applied to generate EEG spectrograms and used in conjunction with a dual attention residual network model for seizure prediction20. WT has been utilized to extract effective features such as relative energy, relative amplitude, and fluctuation coefficients for seizure detection21. Welch’s Method has been employed to calculate the power spectral density of EEG signals22,23. Among these time–frequency analysis methods, the Discrete Wavelet Transform (DWT) is particularly notable for its superior performance in multi-resolution analysis. DWT allows signals to be decomposed into different frequency components with varying resolutions. Such multi-resolution analysis is particularly beneficial in EEG signal processing tasks as it provides detailed views of the signal at different scales, thereby extracting features useful for seizure detection and prediction24. Therefore, extensive research utilizes DWT for decomposing EEG signals25,26,27,28.

Extracting features from the decomposed EEG signals is a key process in many studies, as it captures the necessary information for accurately identifying epilepsy29,30. Initially, researchers focused on extracting features from the time, frequency, and time–frequency domains. These features include standard deviation, root mean square value, power spectral density, and signal energy derived from wavelet transform. For example, Liu et al.23 used Welch’s Method to compute the power spectral density of periodic and aperiodic components in EEG signals. Deivasigamani et al.31 employed the Dual-Tree Complex Wavelet Transform to decompose EEG signals and extracted mean and standard deviation from the generated coefficients. In addition, by further study of EEG features, it was found that the nonlinear features have a significant advantage in distinguishing epileptic seizures from the normal state32. These nonlinear features include phase space reconstruction, fractal dimension, and entropy. In Luckett et al.’s work33, the basic dynamics of EEG signals were extracted using time delay embedding and phase space reconstruction. These features were utilized to train a Convolutional Neural Network (CNN) with regression outputs. Naser et al.34 employed STFT to convert EEG segments into images, subsequently using the box-counting method to estimate and determine the fractal dimension of the EEG data. Specifically, among numerous nonlinear features, entropy has received widespread attention as a crucial tool for measuring signal complexity and uncertainty35,36. Aung et al.37 proposed an enhanced distribution entropy method that combines fuzzy entropy with distribution entropy to achieve improved performance. Raghu et al.38 performed a five-level wavelet packet decomposition of the raw EEG and then utilized logarithmic entropy as a feature for epileptic EEG. However, a single EEG feature cannot comprehensively represent the EEG signals, leading to insufficient accuracy and reliability in the detection of epileptic seizures. Some studies have begun to integrate multiple features to further enhance detection performance. For example, Liu et al.39 integrated four features including fuzzy entropy, approximate entropy, sample entropy, and Higuchi fractal dimension, and used the Max-Relevance and Min-Redundancy (mRMR) method to select the optimal feature set for classification. Kavya et al.40 integrated and optimized time-___domain features (including mean, skewness, variance, kurtosis, standard deviation, approximate entropy, and zero-crossing), frequency-___domain features (including power spectrum, signal energy, and total signal area), average DWT coefficients, signal relationship features, and brain imaging features to further improve classification accuracy. However, simply combining various features can significantly increase runtime and introduce irrelevant features that may negatively impact the final classification results.

Additionally, traditional machine learning methods have been applied to seizure detection and prediction tasks41,42. Tiwari et al.43 used the filter bank common spatial pattern algorithm to extract frequency features and employed extreme gradient boosting (XGBoost) for signal classification. Although this algorithm performs well with low-dimensional data, it has limitations when dealing with high-dimensional data. Zhong et al.44 proposed a new method for predicting epileptic seizures by constructing an optimal spatiotemporal feature set from multiple dimensions, including time-___domain, nonlinear, and brain network features, and then using SVM for classification. Hossain et al.45 used STFT to extract time-___domain and frequency-___domain information and employed CNN for feature extraction and classification. Zhang et al.46 extracted the relative energy of EEG signals in several specific frequency bands and input this into a Bidirectional Gated Recurrent Unit (Bi-GRU) network, achieving an average sensitivity of 93.89% on the CHB-MIT dataset. Indeed, CNN-based methods exhibit strong capabilities in extracting local features; however, they encounter challenges in capturing long-term temporal dependencies in time-series data47. Gated Recurrent Unit (GRU) is a variant of Recurrent Neural Network (RNN) designed to address the issues of vanishing and exploding gradients inherent in traditional RNN when scaling network depth and iterations. While GRU models have established long-term temporal dependencies, they typically overlook local spatial information48.

To address the aforementioned issues, a method that combines feature fusion with the CNN-GRU-AM model is proposed for the detection and prediction of epileptic seizures. Initially, the raw EEG signals are preprocessed by applying noise reduction techniques. Subsequently, the signals are segmented using DWT to generate six subbands. Next, time–frequency ___domain and nonlinear features are extracted from each subband and integrated. By combining time–frequency ___domain and nonlinear features, multidimensional information of the signals is comprehensively captured. The approach facilitates the understanding of both local variations and global structures in EEG signals, thereby revealing their complex dynamic characteristics. Finally, the fused features are fed into the CNN-GRU-AM model for further feature extraction and classification. The CNN component effectively extracts spatial features from the input data, while the GRU specializes in capturing the inherent temporal features. The addition of the attention mechanism allows the model to focus on critical features, enhancing its recognition capability. The proposed method was validated on the CHB-MIT dataset, and the promising results demonstrate its effectiveness. The main contributions and innovations of this paper can be summarized as follows.

Firstly, time–frequency ___domain features and nonlinear features were extracted for EEG signals, and then integrated to more comprehensively represent the subtle changes and implicit information within the EEG signals.

Secondly, an attention mechanism was incorporated into the hybrid deep learning model, which selectively weights and emphasizes important information while ignoring or reducing the processing of less important information, in turn to addresses the issue of decreased classification accuracy caused by information redundancy.

Thirdly, the application of feature fusion combined with a hybrid deep learning model for the detection and prediction of epileptic seizures is explored in this paper, which provides researchers and clinicians with a novel method for classifying EEG signals.

The rest of this study is as follows. “Materials and methods” section describes the EEG signal dataset used to evaluate the proposed method, presents the proposed method for detecting and predicting epileptic seizures as well as the evaluation metrics used to evaluate the proposed method. “Results and discussions” section presents the results of seizure detection and prediction for all cases in the dataset, comparison of experimental results of different models, and comparison with other existing methods. “Conclusion” section summarizes our work.

Materials and methods

Dataset

In this study, the CHB-MIT dataset is utilized for detecting and predicting seizures. Developed by Massachusetts Institute of Technology (MIT) and Children’s Hospital in Boston (CHB), this publicly accessible dataset comprises scalp EEG recordings from various patients and is extensively employed in epilepsy studies. It adopts the International 10–20 System’s bipolar montage method, capturing EEG signals from 22 electrodes at a 256 Hz sampling rate with 16-bit precision. Table 1 details the CHB-MIT dataset, which typically includes 23 EEG signal channels, with a few cases having 18 channels. Data on CHB01 and CHB21 were collected from the same patient, 1.5 years apart. There are approximately 9 to 42 consecutive EEG files for each case, with most files containing one hour of EEG recordings. The EEG files in this dataset include 198 episodes, and the beginning and end of each episode are labeled49.

Table 1 Summary of CHB-MIT epilepsy EEG dataset.

Experimental methods

The schematic representation of our proposed methodology for the detection and prediction of epileptic seizures is depicted in Fig. 1. After filtering, the EEG signal is decomposed through DWT to generate six subbands (D1–D5, A5). Features are extracted from these subbands, including standard deviation (STD), power spectral density (PSD), band energy, and fuzzy entropy (FuzzyEn). Finally, the CNN-GRU-AM model further extracts features and performs classification. Each component of the methodology is elucidated in detail in the subsequent sections.

Figure 1
figure 1

Flowchart of seizure detection and prediction.

Pre-processing

The majority of studies focusing on epilepsy diagnosis operate under the assumption that epileptic EEG signals exhibit four distinct continuous states of brain activity. These states include the pre-ictal phase (preceding the seizure), the ictal phase (during the seizure occurrence), the post-ictal phase (following the seizure), and the inter-ictal phase (representing the non-seizure intervals).

In the seizure detection task, the primary goal is to identify the timing of seizure events, distinguishing between ictal and inter-ictal states. Conversely, in the seizure prediction task, the goal is to issue a warning before the onset of a seizure. Here, the focus lies in distinguishing between inter-ictal and pre-ictal states. The CHB-MIT dataset has a sampling frequency of 256 Hz. A random sample of 76,800 data points was taken during the ictal, pre-ictal, and inter-ictal phases, corresponding to a time length of 300 s. Notably, a significant proportion of EEG recordings in the CHB-MIT dataset exhibit contamination from 60 Hz power line noise. This interference could be efficiently mitigated by eliminating components within the 57–63 Hz and 117–123 Hz frequency ranges. Following this noise reduction step, the filtered EEG signals undergo DWT. The calculation formula for DWT is shown in Eq. (1).

$$DWT(j,k) = \int\limits_{ - \infty }^{\infty } {f(t)\frac{1}{{\sqrt {2^{j} } }}\psi^ * \left( {\frac{{t - k2^{j} }}{{2^{j} }}} \right)\;} dt,$$
(1)

where \(\psi^ *\) is the complex conjugate of the mother wavelet function \(\psi\) with fluctuating characteristics. \(2^{j}\) and \(k2^{j}\) denote the scale factor and translation factor. \(j\) represents the number of layers of decomposition and \(k\) is an integer, respectively.

As a time–frequency analysis method, the DWT dissects the original signal into sub-signals of varying frequencies through a sequence of filters and down-sampling operations during a multi-level decomposition. These sub-signals encompass both approximation coefficients, indicative of the low-frequency part, and detail coefficients, capturing the high-frequency component. For the j-th level of decomposition, the sampling frequency is \({f}_{s}/{2}^{j}\), where \({f}_{s}\) is the original signal’s sampling frequency. The frequency band of the detail coefficients can be represented as \(\left[{f}_{s}/{2}^{j+1},{f}_{s}/{2}^{j}\right]\). In this study, we employed the Daubechies-4 wavelet function as the basis function for wavelet decomposition, configuring it as a five-level decomposition. For an EEG signal sampled at 256 Hz, the detail coefficients across the five scales respectively represent signal components within the 64–128 Hz, 32–64 Hz, 16–32 Hz, 8–16 Hz, and 4–8 Hz frequency bands50.

Feature extraction

Feature extraction holds paramount importance in the analysis of epileptic EEG signals. Given the intricacy of the brain’s electrical activity, EEG signals frequently encompass multiple frequency components and time-___domain features. Consequently, the analysis and processing of EEG signals necessitate the comprehensive utilization of various signal processing techniques and methods. This section elucidates two categories of features: time–frequency ___domain features and nonlinear features, both extracted within each subband of the EEG signal. Time–frequency ___domain analysis is capable of capturing the changes in signals in both time and frequency simultaneously, which is particularly important for processing non-stationary signals such as EEG signals. Epileptic seizures are often manifested as nonlinear and complex dynamic changes, and these changes can be better reflected through nonlinear characteristics.

The standard deviation (STD) effectively describes the amplitude variation of the EEG signal and it is a simple and easy to calculate statistic that can be quickly extracted from the EEG signal. The calculation of STD is shown in Eq. (2).

$$STD = \sqrt {\frac{{\sum\nolimits_{i = 1}^{N} {(x_{i} - \mu )^{2} } }}{N}} ,$$
(2)

where \(x_{i}\) represents the \(i^{\text{th}}\) EEG data sample in a signal segment. \(\mu\) represents the mean of the segment. \(N\) represents the length of segment.

The power spectral density (PSD) serves as a visualization tool for depicting the energy distribution of EEG signals across various frequencies, thereby revealing alterations in frequency components during seizures. Concurrently, band energy offers insight into the energy variations within specific frequency bands, extracting the energy distribution across distinct frequency ranges and consequently deriving frequency features. The amalgamation of PSD and frequency band energy features is a common practice, providing a more holistic comprehension of EEG signal characteristics in the frequency ___domain. The comprehensive analysis yields vital information crucial for the diagnosis and prediction of epilepsy.

EEG signals exhibit a high degree of complexity and nonlinearity, and the incorporation of nonlinear features proves beneficial in capturing intricate nonlinear relationships within the data. These features aid in uncovering the deep nonlinear structures inherent in EEG signals. Notably, among the nonlinear features, entropy serves as a metric for quantifying uncertainty and the information content within the data. Fuzzy entropy, a variant of traditional entropy, stands out for its enhanced capacity to handle data uncertainty. It is particularly advantageous in the context of complex epilepsy datasets, where fuzzy entropy excels in precisely capturing relationships between data features and the associated information content. The capability contributes to a more nuanced understanding of the intricate characteristics embedded within the data51.

Classification

By leveraging diverse neural network structures, the features inherent in epileptic signals can be thoroughly explored and harnessed to enhance the accuracy and robustness of classification. In this study, we devised a CNN-GRU-AM model tailored for epilepsy detection and prediction, amalgamating the distinctive dominance of CNN, GRU, and AM. Specifically, the CNN component adeptly extracts spatial features from the input data through two convolutional layers (each followed by ReLU activation) and pooling layer. The first convolutional layer uses 32 filters, while the second convolutional layer uses 64 filters. The output of the pooling layer is weighted through an attention mechanism comprising two fully connected layers and a sigmoid activation layer. The attention-weighted result is then fed into a GRU layer with 10 hidden units, designed to capture the inherent temporal features in the input data and output the result of the last time step of the sequence. Subsequently, the output from the GRU layer is passed to a fully connected layer to generate a probability distribution of the categories, which is ultimately used for classification prediction via a softmax layer. The proposed model effectively utilizes time–frequency ___domain and nonlinear features comprehensively, enabling the capture of intricate patterns within the signals. The structured integration of CNN, GRU, and AM facilitates the accurate recognition of epileptic seizures. The architecture of the CNN-GRU-AM model is visually represented in Fig. 2, and the details of the network architecture and the hyper-parameter configuration are given in Tables 2 and 3, respectively.

Figure 2
figure 2

CNN-GRU-AM architectural model.

Table 2 Model structure of CNN-GRU-AM.
Table 3 Hyper-parameter configuration.
Convolutional neural network (CNN)

The CNN is a deep learning model that gained widespread application in the early stages of image processing and computer vision. Renowned for its efficacy in extracting and classifying image features, CNN has also demonstrated utility in classifying the states of EEG signals. Unlike traditional machine learning algorithms, CNN eliminates the need for manually designing features. The fundamental components of a CNN include a convolutional layer, a pooling layer, and a fully connected layer. The convolutional layer primarily serves to extract features from input data, generating a feature map. The feature map is subsequently downsampled by the pooling layer to reduce feature dimensionality and computational complexity. Stacking multiple convolutional and pooling layers in a specific order facilitates the extraction of increasingly sophisticated features. The fully connected layer then integrates the learned features from the pooling layer into the sample labeling space through weighted fusion.

The architecture of the CNN network is visually represented in Fig. 3. In the initial convolutional layer, a 3 × 1 convolutional kernel is employed to generate 32 convolutional feature maps. The subsequent pooling layer utilizes a pooling window of size 3 × 1 with a step size of 1. In the subsequent convolutional layer, a similar 3 × 1 convolutional kernel is utilized to produce 64 convolutional feature maps. The pooling configuration remains consistent with a window size of 3 × 1 and a step size of 1. Each convolutional layer in our model is followed by a batch normalization layer and a Rectified Linear Unit (ReLU) activation function. The dual technique enhances the model’s generalization capacity, mitigates overfitting, and accelerates the training speed.

Figure 3
figure 3

CNN model structure diagram.

Gated recurrent unit (GRU) network

The GRU structure is an improved variant of the recurrent neural network (RNN), specifically designed to address the challenges of gradient vanishing and gradient explosion inherent in traditional RNNs when expanding the number of network layers and iterations. Serving as a modification of the Long Short-Term Memory (LSTM) architecture, the GRU streamlines the structure by reducing the number of gates. GRU employs update gate and reset gate to decide whether to retain or discard hidden state information from the previous time step. The utilization of a Sigmoid function, outputting values between 0 and 1, facilitates the determination of the extent of information retention. The selective updating and forgetting of information enable GRU to efficiently capture long-term dependencies within the data. The update gate controls previously sent messages. In Eq. (3), the update gate can selectively retain previous messages \({h}_{t-1}\), and \({W}^{\left(z\right)}\) and \({U}^{\left(z\right)}\) are the weight matrices of the update gate.

$$z_{t} = \sigma (W^{(z)} x_{t} + U^{(z)} h_{t - 1} ).$$
(3)

The reset gate selectively forgets the previous information \({h}_{t-1}\). Equation (4) is calculated in the same way as Eq. (3), with \({W}^{\left(r\right)}\) and \({U}^{\left(r\right)}\) being the reset gate’s weight matrix.

$$r_{t} = \sigma (W^{(r)} x_{t} + U^{(r)} h_{t - 1} ).$$
(4)

The reset gate output \({r}_{t}\) is subject to \(U{h}_{t-1}\) matrix multiplication as shown in Eqs. (5) and (6).

$$h_{t}{\prime} = \tanh (Wx_{t} + r_{t} \cdot Uh_{t - 1} ),$$
(5)
$$h_{t} = z_{t} \cdot h_{t - 1} + (1 - z_{t} ) \cdot h_{t}{\prime} .$$
(6)

The internal structure of the GRU is shown in Fig. 4, where \({x}_{t}\) and \({h}_{t}\) are the input vector and the hidden state at time t, respectively, and \({h}_{t}{\prime}\) is a candidate for the hidden state. The update gate \({z}_{t}\) determines how to update the hidden state using the current EEG information, and the reset gate \({r}_{t}\) determines how much historical information needs to be forgotten. \(\sigma \left(\cdot \right)\) and \(\text{tanh}(\cdot )\) are the sigmoid function and hyperbolic tangent function, respectively.

Figure 4
figure 4

Internal structure of GRU cell.

Attention mechanism (AM)

The AM emulates the human attention allocation process and serves as a valuable tool in assisting deep learning models when handling extensive datasets. Designed to enhance efficiency in focusing on crucial information, the AM allows models to prioritize significant details during data processing. This prioritization is achieved through assigning varying weights to different segments of the input data. By incorporating the AM, the model gains flexibility in processing input data of diverse lengths and structures. Moreover, it enhances the model’s ability to discern correlations within the input data. The introduction of the AM into the epilepsy prediction model aims to consider the diverse impacts of different input features on prediction outcomes, ultimately contributing to improved prediction accuracy.

The schematic diagram of the AM is illustrated in Fig. 5. In the AM process, the model initially preserves the outputs from the preceding network layer, subsequently correlating them with the values of the output sequence. The unique approach enables the model to learn the selection of input features requiring focused attention. Consequently, higher weights are assigned to input features exhibiting strong correlations. The weights are calculated as shown in Eqs. (7) and (8).

$$u_{t} = \tanh (w_{i} h_{t} + b),$$
(7)
$$a_{t} = soft\max (u_{t}^{T} ,u_{w} ),$$
(8)

where, \({w}_{i}\) is the weight matrix, \({h}_{t}\) is the output vector of the hidden layer of the GRU, \({u}_{t}\) is the activation vector, and \({a}_{t}\) is the weight value. The final result of \({A}_{n}\) vector can be obtained from Eq. (9).

$$A_{n} = \sum\limits_{t = 1}^{n} {a_{t} u_{t} } .$$
(9)
Figure 5
figure 5

Internal schematic of AM.

Tenfold cross validation (CV)

The tenfold CV stands as a robust performance evaluation technique, ensuring the inclusion of all data in both training and testing phases. The visualization principle of this method is depicted in Fig. 6. The tenfold CV method is compared with the Holdout Method, with the former offering a more comprehensive utilization of data and mitigating errors arising from uneven data distribution.

Figure 6
figure 6

Schematic diagram for tenfold CV.

In the tenfold CV process, the data is randomly partitioned into 10 equal-sized subsets. The candidate model undergoes training using nine of these subsets and is then tested on the remaining subset. Predictions from the test subset are recorded in vectors, and this procedure is repeated ten times, with a different subset serving as the test data in each iteration. Following these repetitions, the model’s predictions for the entire dataset are consolidated in vectors. These vectors are combined, and specific metrics, chosen based on the problem’s nature, are employed to evaluate the performance of the candidate model. This segmentation method contributes to more reliable results, particularly in the context of classification networks.

Evaluation indicators

To assess the performance of the proposed seizure detection and prediction method, commonly used evaluation metrics in classification are utilized to measure the effectiveness of the approach from various perspectives. These criteria include sensitivity, specificity, accuracy, Receiver Operating Characteristic (ROC) curve, and Area Under the Curve (AUC). The expressions for sensitivity, specificity, and accuracy are provided in Eqs. (10)–(12). The AUC represents the area under the ROC curve.

$$Sensitivity = \frac{TP}{{TP + FN}},$$
(10)
$$Specificity = \frac{TN}{{TN + FP}},$$
(11)
$$Accuracy = \frac{TP + TN}{{TP + FN + FP + TN}},$$
(12)

where TP and TN represent the number of samples correctly predicted as positive and negative by the model. FP and FN represent the number of samples incorrectly predicted as positive when they are negative, and negative when they are positive. The AUC value ranges between 0 and 1. An AUC of 1 signifies perfect separation between positive and negative samples by the classifier. Conversely, an AUC of 0.5 suggests that the classifier’s performance is equivalent to random guessing.

Results and discussions

Experimental results

This study conducts a comprehensive evaluation of the proposed CNN-GRU-AM model, with the tenfold CV results for all cases in the CHB-MIT dataset shown in Table 4. Focusing on the seizure detection task, the sensitivity metric results are notable, with 22 out of 24 cases exhibiting sensitivity values exceeding 98%, and 9 cases achieving a perfect sensitivity score of 100%. Specificity and accuracy also showcase commendable performance in the majority of cases, with only a few instances demonstrating slightly lower metric values. Across the 24 cases, the CNN-GRU-AM model attains an average sensitivity of 99.24%, an average specificity of 99.51%, an average accuracy of 99.35%, and a mean AUC of 99.34%. Six specific cases (Chb02, Chb06, Chb07, Chb16, Chb22, and Chb23) achieve 100% sensitivity, 100% specificity, 100% accuracy, and 100% AUC.

Table 4 Results of indicators for seizure detection and prediction.

Turning to the seizure prediction task, the results, as displayed in Table 4, reveal that only 2 out of 24 cases exhibit sensitivity metrics below 90%, with more than half of the cases surpassing 95% sensitivity. Across all 24 cases, the proposed method achieves an average sensitivity of 95.47%, a specificity of 94.93%, an accuracy of 95.16%, and an AUC of 95.15% in the seizure prediction task.

Compared with other models

Analysis of sensitivity results

In addition to evaluating the performance of the proposed CNN-GRU-AM model, sensitivity tests were conducted on all cases in the CHB-MIT dataset using three distinct models: CNN-GRU, CNN, and GRU. The sensitivity results for both seizure detection and prediction are illustrated in Figs. 7 and 8.

Figure 7
figure 7

Sensitivity results of 24 cases under four models for seizure detection.

Figure 8
figure 8

Sensitivity results of 24 cases under four models for seizure prediction.

Examining the sensitivity results for seizure detection in Fig. 7, it is observed that the CNN-GRU-AM model generally outperforms the CNN-GRU model, with exceptions noted for Chb04 and Chb17. Moving to the sensitivity results for seizure prediction in Fig. 8, the CNN-GRU-AM model also demonstrates superior sensitivity compared to the CNN-GRU model, except for Chb17, Chb21, and Chb22. Overall, the CNN-GRU-AM model exhibits a significant performance advantage over the CNN-GRU model in both seizure detection and prediction tasks, underscoring its superior efficacy in these domains. This enhanced performance is attributed to the AM model’s capacity to more effectively capture crucial features in the input data, thereby elevating the overall model performance. Furthermore, it is noteworthy that the CNN-GRU model displays higher sensitivity overall in the seizure detection and prediction tasks relative to both the CNN model and the GRU model. This heightened performance is ascribed to CNN’s efficiency in extracting spatial features from the input data and GRU’s aptitude in capturing long-term dependencies in time series. The synergy of these capabilities enhances the overall performance of the CNN-GRU model in seizure detection and prediction.

Analysis of AUC results

The AUC stands as a comprehensive parameter for evaluating classification tasks, offering insights into how well a model distinguishes between positive and negative cases. Moreover, AUC serves as a valuable metric for comparing the performance of different classification models. In this study, we leverage AUC to compare the ROC curves of four models in the seizure detection and prediction tasks, employing three cases (Chb01, Chb12, and Chb24) as examples. The ROC curves for seizure detection and prediction for the three cases are illustrated in Figs. 9 and 10, respectively. Figure 9 reveals that the CNN-GRU-AM models for all three cases consistently demonstrate higher AUC values. Notably, for each case, the CNN-GRU-AM model outperforms the other three models, showcasing its superior discriminatory power. In Fig. 10, the ROC curves for the prediction task exhibit slightly lower AUCs compared to those for the detection task. This discrepancy is attributed to the greater complexity and challenge of the seizure prediction task. It is noteworthy that the CNN-GRU-AM model consistently achieves higher AUC values for both seizure detection and prediction tasks, signifying its robust performance in classification tasks compared to the other three models.

Figure 9
figure 9

ROC curves for three cases under four models for seizure detection.

Figure 10
figure 10

ROC curves for three cases under four models for seizure prediction.

Comparison of average results of each index

To comprehensively evaluate the effectiveness of our models, the average metrics for seizure detection and prediction are presented in Figs. 11 and 12, encompassing four models: CNN-GRU-AM, CNN-GRU, CNN, and GRU. Results in Figs. 11 and 12 reveal a substantial improvement in metrics for the CNN-GRU model compared to the individual CNN and GRU models. This amalgamation of CNN and GRU features effectively addresses both spatial and time series features, enhancing overall model performance. Upon comparing the CNN-GRU-AM model with the CNN-GRU model, the introduction of the attention module is found to elevate sensitivity in the seizure detection task by 0.43%, specificity by 0.37%, accuracy by 0.41%, and AUC value by 0.39%. Similarly, improvements are observed in the seizure prediction task, with sensitivity increasing by 1.27%, specificity by 1.04%, accuracy by 1.13%, and AUC value by 0.60%. In summary, the CNN-GRU model demonstrates superior overall performance compared to the individual CNN and GRU models, while the CNN-GRU-AM model surpasses the CNN-GRU model in terms of overall performance.

Figure 11
figure 11

Mean metric results under the four models for seizure detection.

Figure 12
figure 12

Mean metric results under the four models for seizure prediction.

Discussion

This study introduces a novel approach, the CNN-GRU-AM model, grounded in multi-feature fusion, designed for seizure detection and prediction tasks. The model undergoes testing on the CHB-MIT EEG dataset. The proposed method utilizes DWT to decompose multichannel EEG signals. Notably, DWT ensures a reversible decomposition and reconstruction process, preserving information integrity. Following signal decomposition, the resulting subbands undergo feature extraction, encompassing STD, PSD, band energy, and fuzzy entropy. These extraction methods are chosen for their demonstrated discriminative ability, robustness, information richness, and interpretability. Experimental results highlight the superior overall performance of the CNN-GRU-AM model when compared to the CNN-GRU model, the CNN model, and the GRU model. Tables 5 and 6 present a comprehensive performance comparison for seizure detection and prediction between our proposed method and other approaches. All evaluations are conducted on the CHB-MIT scalp EEG dataset, reaffirming the effectiveness of the CNN-GRU-AM model in advancing seizure detection and prediction methodologies.

Table 5 CHB-MIT detection comparison.
Table 6 CHB-MIT prediction comparison.

In Table 5, Harpale et al.52 employed temporal and frequency features of EEG signals for epileptic seizure state detection, utilizing a fuzzy classifier across seven selected cases. However, the small sample size raises concerns about potential sample bias affecting their results. Li et al.53 achieved high sensitivity in 21 cases by employing a channel-embedded spectral-temporal squeezed excitation network and SVM. Jiang et al.54 utilized Synchroextracting chirplet transform and a SVM, achieving 98.71% sensitivity across 22 cases. Additionally, Alharthi et al.55 employed wavelet transform, compatibility framework, and a Convolutional Neural Network-Bidirectional Long Short-Term Memory-Attention Mechanism (CNN-BiLSTM-AM) model, demonstrating significant sensitivity in 23 cases. To make the findings more convincing, Zhang et al.46 utilized a combination of frequency-specific energy and a bidirectional gated recurrent unit network. Prasanna et al.56 adopted a combination framework involving customized convolutional neural networks, exhaustive RF, and RNN-BiLSTM models. Tang et al.57 used path signatures combined with a BiLSTM network and an AM. They all achieved good results in all 24 cases of the CHB-MIT dataset. It is clear that the methods proposed in this study show higher sensitivity, specificity and accuracy in seizure detection tasks.

In Table 6, our proposed method, integrating time–frequency ___domain and nonlinear fusion features with the CNN-GRU-AM model, achieved remarkable sensitivity (95.47%), specificity (94.93%), and accuracy (95.16%) across 24 cases. In comparison with prior studies, our method excelled in seizure prediction. Notably, it outperformed the 89.25% sensitivity achieved by Yang et al.20 using STFT spectral images and RDANet in 13 patients. Our method exhibited substantial progress with a larger sample size compared to Hellar et al.58 accuracy in 16 patients using EmDMD and spectral features. Furthermore, our method demonstrated superior results with the same sample size compared to the 93.80% sensitivity, 91.20% specificity, and 94.00% accuracy achieved by Aslam et al.59 in 22 patients. Our method showcased heightened sensitivity and accuracy relative to Ma et al.60 results in 24 patients using the multichannel feature fusion model CNN-BiLSTM. Lee et al.61 utilized STFT, pre-training, and ResNet-LSTM achieving a sensitivity of 89.60%, while in this work, the integration of time–frequency ___domain with nonlinear fusion features and CNN-GRU-AM yielded a higher sensitivity of 95.47%.

In conclusion, through a comparative analysis of recent related studies, our proposed method consistently exhibited higher sensitivity, specificity, and accuracy. These findings substantiate the efficacy of our method in the challenging tasks of seizure detection and prediction.

Conclusion

Detecting and predicting seizures is crucial for seizures that cannot be controlled by medication or surgery. A method that integrates feature fusion with the CNN-GRU-AM model for seizure detection and prediction is proposed in this study. Initially, the method extracts multiple features from EEG signals and fuses them. The fused features are then input into the CNN-GRU-AM network model. The CHB-MIT dataset was used to validate the proposed method. For seizure detection, the method achieves an average cross-validation sensitivity of 99.24%. For seizure prediction, it attains an average cross-validation sensitivity of 95.47%. Compared to other models, the CNN-GRU-AM model demonstrates superior performance. These results validate the effectiveness of the method in identifying the state of EEG signals. The high sensitivity highlights the system’s capability to identify most true cases of epilepsy, thereby minimizing false negatives. This is crucial for timely diagnosis and treatment of epilepsy patients.

Compared to existing methods, this method’s outstanding performance establishes a solid foundation for advancements in epileptic seizure detection and prediction research. However, since the CHB-MIT dataset primarily consists of pediatric patients, it is essential to evaluate the proposed method in other patient groups, as they may exhibit more diverse characteristics. Additionally, future work will explore the fusion of more significant features, and investigate combinations of other deep learning models and feature fusion methods to further enhance seizure detection and prediction capabilities.