Artificial intelligence-driven cybersecurity system for internet of things using self-attention deep learning and metaheuristic algorithms

Alblehai, Fahad

doi:10.1038/s41598-025-98056-2

Download PDF

Article
Open access
Published: 16 April 2025

Artificial intelligence-driven cybersecurity system for internet of things using self-attention deep learning and metaheuristic algorithms

Fahad Alblehai¹

Scientific Reports volume 15, Article number: 13215 (2025) Cite this article

3468 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Recently, Internet of Things (IoT) usage has increased rapidly, and cybersecurity concerns have also improved. Cybersecurity attacks are exclusive to the IoT, which has unique limitations and characteristics. Considering that many attacks and threats are being presented daily against IoT. So, it is significant to recognize these kinds of attacks and discover solutions to alleviate their risks. The modern approach to cybersecurity comprises the application of artificial intelligence (AI) to develop complex models for protecting systems and networks, specifically in IoT environments. Cyber attackers have also adapted by leveraging AI technologies, using adversarial AI to execute advanced cybersecurity threats. This constant evolution of AI-driven threats and defences necessitates developing more robust, adaptive, and real-time cybersecurity models to stay ahead of increasingly advanced attacks. This paper presents an Intelligent Cybersecurity System Using Self-Attention-based Deep Learning and Metaheuristic Optimization Algorithm (ICSSADL-MHOA). The proposed ICSSADL-MHOA model aims to enhance a robust cybersecurity system in IoT networks. At first, the data normalization stage employs min–max normalization to ensure consistency, accuracy, and efficiency by organizing data into a standardized format. Furthermore, the improved tuna swarm optimization (ITSO) model is implemented for the feature selection process to detect the most relevant features in the data. Besides, the proposed ICSSADL-MHOA model utilizes the bidirectional long short-term memory with self-attention (BiLSTM-SA) model for the detection and classification method of cybersecurity. Finally, the parameter selection of the BiLSTM-SA technique is performed by employing the hunger games search (HGS) technique. Comprehensive studies under the ToN-IoT and Edge-IIoT datasets validate the efficiency of the ICSSADL-MHOA method. The experimental validation of the ICSSADL-MHOA method illustrated a superior accuracy value of 99.37% over existing techniques.

Attack resilient IoT security framework using multi head attention based representation learning with improved white shark optimization algorithm

Article Open access 24 April 2025

Harnessing feature pruning with optimal deep learning based DDoS cyberattack detection on IoT environment

Article Open access 20 May 2025

Augmenting cybersecurity through attention based stacked autoencoder with optimization algorithm for detection and mitigation of attacks on IoT assisted networks

Article Open access 28 December 2024

Introduction

Due to rising demand and the growth of innovative network systems of IoTs. However, its concepts have become more complex day by day. IoT is demanding to describe because it has improved and evolved since it was primarily developed¹. Even the best definition describes it as a connected digital network where devices with unique UIDs can swap data autonomously without human intervention². It is often deliberated as a user interface for a centralized or system ___location application, usually a smartphone application that sends instructions or data to more than single-edge IoT gadgets. IoT gadgets are susceptible to Internet threats due to several attack vectors³. Hackers might exploit cybersecurity vulnerabilities in IoT devices, which depend upon the specific part of their target network, leading to different threats. IoT-related cyber security studies are very active right now. Cybersecurity might be significantly assisted by AI⁴. Cyber security is implicated in safeguarding software, data, and electronics, together with the processes by which methods are acquired⁵. Generally, security intentions include privacy regarding information adequately disclosed to unauthorized gadgets or people to be destroyed or modified. Consequently, owing to the limitless IoT-based connected gadgets, society is also becoming gradually susceptible to cyberthreats like denial-of-service (DoS) threats by insiders and hackers⁶. Technology is progressively more important in everyday existence, which means cybersecurity and cybercrime devices progress simultaneously through the whole manufacturing area, which requires investing in cybersecurity countermeasures. In contrast, innovative technologies have been developed for IoT cybersecurity management. Additionally, cyber-threats on smart grids, as primary structural elements, are mainly susceptible and bear more costs, and they rigorously affect the safety of governments and citizens⁷. There is an increasing interest in cyber security and the absence of effectual countermeasures, for example, cyber security experts. Figure 1 signifies the common architecture of cybersecurity in IoT devices.

Because of their better performance in a range of prediction-based domains, in recent times, investigators have aimed at machine learning (ML) and deep learning (DL) models. Using AI models like DL and ML methodology might provide effective approaches to data usage to identify and predict possible cybersecurity attacks. DL approaches recognize cyber threats that are increasingly popular more quickly than preceding models that allow more effective mitigation⁸. DL is a subdivision of AI that focuses on handling and calculating machine applications, which can be complicated, non-linear designs, and then employing those designs to make predictions. In the cybersecurity world, DL techniques have become gradually popular devices, rapidly vital to effective defence approaches against harmful attacks. Since IoT gadgets have become more connected, the possibility of hacks has improved. The rapid expansion of IoT devices has significantly enhanced the complexity of cybersecurity challenges, creating new vulnerabilities that cybercriminals exploit⁹. As IoT systems become more integrated into everyday life, ensuring their security is significant to prevent unauthorized access and malicious attacks. The interconnected behaviour of these devices makes them a prime target for cyber threats, necessitating advanced security measures. Conventional methods are often insufficient, highlighting the need for more innovative solutions that address these growing risks. Leveraging artificial intelligence and advanced algorithms is becoming crucial to improve the protection and resilience of IoT networks¹⁰.

This paper presents an Intelligent Cybersecurity System Using Self-Attention-based Deep Learning and Metaheuristic Optimization Algorithm (ICSSADL-MHOA). The proposed ICSSADL-MHOA model aims to enhance a robust cybersecurity system in IoT networks. At first, the data normalization stage employs min–max normalization to ensure consistency, accuracy, and efficiency by organizing data into a standardized format. Furthermore, the improved tuna swarm optimization (ITSO) model is implemented for the feature selection process to detect the most relevant features in the data. Besides, the proposed ICSSADL-MHOA model utilizes the bidirectional long short-term memory with self-attention (BiLSTM-SA) model for the detection and classification method of cybersecurity. Finally, the parameter selection of the BiLSTM-SA technique is performed by employing the hunger games search (HGS) technique. Comprehensive studies under the ToN-IoT and Edge-IIoT datasets validate the efficiency of the ICSSADL-MHOA method. The key contribution of the ICSSADL-MHOA method is listed below.

The ICSSADL-MHOA model utilizes min–max normalization to scale features within a consistent range, improving input uniformity. This approach enhances the stability of the model and ensures improved performance by preventing the dominance of larger values. By normalizing the data, the model effectively handles varying feature scales, resulting in more accurate results.
The ICSSADL-MHOA method employs the ITSO approach for feature selection, detecting the most important features for the task. This methodology improves classification accuracy by mitigating irrelevant features, allowing the model to concentrate on the most impactful data. ITSO assists in achieving optimal feature subsets, improving the overall efficiency of the model.
The ICSSADL-MHOA approach utilizes the BiLSTM model incorporated with SA to capture past and future context in data sequences. Concentrating on relevant patterns in the data significantly enhances the model’s capability to detect and classify cybersecurity threats. The model improves its predictive accuracy and threat detection robustness by integrating temporal and contextual data.
The ICSSADL-MHOA methodology employs the HGS approach to optimize the selection of model parameters, improving its capability to converge efficiently. By fine-tuning parameters, the technique enhances the overall performance of the model. This approach confirms that the model reaches optimal solutions, improving accuracy and computational efficiency.
The ICSSADL-MHOA method integrates ITSO for feature selection, BiLSTM-SA for detection and classification, and HGS for parameter optimization, giving a comprehensive and efficient solution. This multi-algorithmic approach improves the accuracy and robustness of the model in cybersecurity tasks. The novelty is in the seamless integration of optimization, DL, and feature selection techniques, creating a highly effective framework tailored for IoT environments. This integration confirms superior threat detection and resource optimization in complex cybersecurity scenarios.

The article is structured as follows: Sect. “Literature Survey” presents the literature review, Sect. “Materials and Methods” outlines the proposed method, Sect. “Experimental Validation and Discussion” details the results evaluation, and Sect. “Conclusion” concludes the study.

Literature survey

Imtiaz et al.¹¹developed XIoT, an innovative XIoT threat recognition method to address these challenges. Exploiting sophisticated DL approaches, particularly Convolutional Neural Network (CNN), XIoT examines spectrogram images changed to IoT system traffic information to identify subtle and complex threat patterns. Unlike conventional methods, XIoT highlights interpretability by incorporating CNNs, Explainable AI (XAI) methods, allowing cyber security analysts to trust and understand its forecasts. Additionally, these technique structures utilize the lower-latency, higher-speed optical network features. In¹², a DL-based structure is developed with multiple optimizations for automatically classifying and detecting cyber threats. These optimizations contain hyper-parameter tuning, feature engineering, and reduction of dimensions. Sattarpour et al.¹³ developed an innovative anomaly-based IDS exploiting DL models, mainly aimed at the Bidirectional Encoder Representations from Transformers (BERT) model. BERT’s structure allows it to implement lesser cost evaluations and recognitions than other advanced models, making it appropriate for resource-constrained IoT settings. The developed structure, EBIDS, connects the ability of BERT to improve intrusion detection (ID) at the network or IoT systems. Morshedi et al.¹⁴ developed an innovative IoT network ID (NID) method, exploiting DL models and pristine data. The aim is to give a more efficient model than the preceding models. The developed DL technique integrates LSTM structure and densely transition layers, intending to take spatial or temporal dependency in the data. Ragab et al.¹⁵ intend a Next-Generation Cybersecurity Attack Detection employing an ensemble DL model (NGCAD-EDLM) methodology in IIoT settings. Moreover, an ensemble DL of dual models, such as deep belief network (DBN) and CNN approaches, are applied for classification. Furthermore, the DL model’s hyper-parameter choice is achieved using the lotus effect optimizer algorithm (LEOA) approach. Alsoufi et al.¹⁶ enhance and design a new anomaly-based ID system (AIDS) for IoT systems. Primarily, an SAE is utilized to minimize the higher dimension and acquire substantial data representation by determining the rebuilt error. Afterwards, the CNN model was used to generate a dual classification method. Al-Neami et al.¹⁷developed an innovative method to enlarge the Field-Programmable Gate Array (FPGA) to enlarge a higher-performance IDS. The presented method incorporates advanced models containing Extreme Gradient Boosting (XGBoost), Hybrid DL (HDL) model, and Meta Ensemble Learning (MEL) that relate LSTM methods for temporal investigation and CNN for extracting features. This synergistic method substantially decreases detection latency and increases the threat recognition precision. Wang, Dai, and Yang¹⁸ developed a NID model based on DL. Also, a Conditional Tabular Generative Adversarial Network (CTGAN) generates synthetic data for the minority class.

Saravana Ram and Gopi Saminathan¹⁹present an ID System (IDS) for wireless sensor networks (WSN) using an Optimized Self-Attention-Based Progressive Generative Adversarial Network (SAPGAN) with Namib Beetle Optimization Algorithm (NBOA). It integrates data pre-processing with APPDRC, feature selection via WSOA, and intrusion classification for diverse attacks. NBOA optimizes SAPGAN’s parameters for improved attack classification accuracy. Tewari and Gupta²⁰ analyze and address the security challenges in IoT across its three layers, namely perception, transportation, and application, exploring cross-layer integration issues and comparing them with conventional network security problems. Aboalela et al.²¹introduce the Harnessing Feature Pruning with Optimal DL DDoS Cyberattack Detection (HFPODL-DDoSCD) approach for effectual DDoS attack detection in IoT environments. It uses Z-score normalization, Siberian Tiger Optimization (STO) for feature selection, and an SA-BiTCN-BiGRU model for attack detection. Parameter tuning is performed using the Artificial Protozoa Optimizer (APO) to optimize performance. Adat and Gupta²² analyze the security threats in IoT, provide a taxonomy of security issues and defence mechanisms, and discuss future research directions to address existing gaps and improve IoT security. Santhanamari et al.²³propose a robust security framework using the Cosine CNN (CCNN) technique for attack detection, improving feature extraction with cosine similarity. The Exponential Distribution Optimizer (EDO) optimizes CCNN, balancing exploration and exploitation for optimal performance. Zhao, Li, and Li²⁴ propose a secure authentication scheme incorporating semantic LSTM and blockchain (BC) to improve authentication, access control, and security in IoT applications while reducing computational overhead. Wang et al.²⁵ propose a deep residual SConv1D-Attention model. The method utilizes binary Particle Swarm Optimization (bPSO) for feature selection, a novel SConv1D-Attention module for effectual information integration, and a robust loss function for addressing data imbalance by accentuating minority classes. Reka et al.²⁶ present a Centrality Coati Optimization Algorithm (COA)-based Cluster Gradient for multi-attack intrusion identification in MANETs. It utilizes Dual Network Centrality for cluster head selection and the COA for compact clustering. The Multi-head Self-Attention based Gated Graph Convolutional Network (MSA-GCNN) detects various attacks. Mohamed et al.²⁷ introduce a probabilistic composite model for zero-day exploit detection. It features Adaptive WavePCA-Autoencoder (AWPA) for denoising and dimensionality reduction (DR), Meta-Attention Transformer Autoencoder (MATA) for improved feature extraction, Genetic Mongoose-Chameleon Optimization (GMCO) for efficient feature selection, and Adaptive Hybrid Exploit Detection Network (AHEDNet) for dynamic ensemble adaptation, achieving high accuracy and low false positives.

Ashwini and Nagasundara²⁸ propose the Enhanced Dual Vision Transformer (EDVT) integrated with the Mantis Search Split Attention Network (MSSAN) models for ransomware detection and classification. It utilizes the log-sinh with Adaptive Box-Cox Transformation (log-sinhABT) for data pre-processing and the Hybrid Termite Alate City Council’s Evolution Optimization (HTCEO) for efficient feature selection. Zareh Farkhady et al.²⁹ present a three-dimensional DL (3DLBS) approach for attack detection, transforming 1D data into 3D using shape, fill, and permute techniques. The model also utilizes CNN and LSTM branches for detection and uses binary chimp optimization (BCHO) for feature selection, improving accuracy and speed. Perumal et al.³⁰propose the Enhanced Metaheuristics with DL Model for BC Assisted Cybersecurity Solution (EMDLM-BCCS) technique. It uses data pre-processing, extreme learning machine (ELM) for attack detection, and elite-oppositional grasshopper optimization (EGOA) to enhance ELM performance. Orman³¹ proposes an IDS framework integrating Multi-layer Perceptron (MLP), ML, DL, Random Forest (RF), and hybrid models. Kocherla et al.³²introduce the DLAD model, a bio-inspired metaheuristic for anomaly detection in IIoT. The technique also utilizes the Improved Crow Search Algorithm (ICSA) method for feature selection, Stacked Recurrent Neural Networks (SRNN) and Harris Hawks Optimizer (HHO) techniques for classification and parameter tuning. Alqahtany, Shaikh, and Alqazzaz³³introduce an IDS using Enhanced Grey Wolf Optimization (EGWO) methodology for feature selection to improve reliability and computational efficiency in IoT networks. Babitha³⁴develops a quantum-inspired BC-assisted cybersecurity model for IoT, utilizing the Fitness-based Jellyfish Chameleon Swarm Algorithm (FJCSA) technique for key optimization and Adaptive Attention-based LSTM with Adaboost (AALSTM-Ab) model for ID. Anu Velavan and Sureshkumar³⁵ propose a Double Fuzzy Clustering-Driven Context Neural Network for ID in Cloud Computing (DFCCNN-BWOA-IDC) model for ID in cloud computing. The method also employs Sequential pre-processing for data cleaning, Recursive Feature Elimination (RFE) for feature selection, and the Beluga Whale Optimization (BWO) approach to optimize DFCCNN parameters for accurate attack detection. Lakicevic et al.³⁶propose a phishing email detection methodology by employing an artificial neural networks (ANN) model with soft attention and BERT encoders optimized by a modified crayfish optimization algorithm (COA) method to improve classification accuracy. Sayeed, Ahmed, and Swamy³⁷present a multimodal biometric system utilizing palm and knuckle vein recognition. The technique also employs contrast enhancement for pre-processing, GLCM and DWT for feature extraction, Chimp Optimization Algorithm (ChOA) technique for feature selection, and a Deep Neural Network (DNN) model for classification. Althobaiti and Escorcia-Gutierrez³⁸ introduce the weighted salp swarm algorithm with DL-based cyber-threat detection and classification (WSSADL-CTDC) technique for cyber-threat detection, incorporating a weighted salp swarm algorithm, DL, and min–max normalization. The method utilizes the shuffled frog leap algorithm (SFLA) for feature selection and a hybrid convolutional autoencoder (CAE) model with WSSA-based hyperparameter tuning for improved performance. Table 1 summarizes the existing studies on AI-based cybersecurity systems.

Table 1 Summary of AI-driven cybersecurity systems for IoT using DL and metaheuristic algorithms.

Full size table

Despite the significant improvements in IoT security solutions, various limitations remain. Many existing methods depend heavily on specific optimization algorithms or models that may not generalize well across diverse IoT environments, such as XIoT, EBIDS, SAPGAN, etc. The reliance on high computational resources in specific approaches limits their scalability for resource-constrained IoT devices. Additionally, various methodologies suffer from challenges in feature selection and the handling of imbalanced data, such as HFPODL-DDoSCD and SConv1D-Attention, which affect their accuracy in detecting minority class threats. Furthermore, most current models do not sufficiently address cross-layer security issues in IoT, leaving gaps in comprehensive protection strategies. Lastly, there is a requirement for more efficient and low-latency techniques to address real-time ID in IoT systems, as highlighted by the proposed methods in several studies like IDS-SAPGAN and EDVT. A significant research gap is the requirement for more dynamic and context-aware security mechanisms that adapt to the evolving nature of IoT environments and growing threats.

Materials and methods

This paper presents a novel ICSSADL-MHOA technique. The proposed ICSSADL-MHOA model aims to enhance a robust cybersecurity system in IoT networks. It involves various processes, such as data normalization, DR, classification, and parameter tuning. Figure 2 signifies the complete work procedure of the ICSSADL-MHOA model.

Data normalization: min–max normalization

At first, the data normalization stage employs min–max normalization to ensure consistency, accuracy, and efficiency by organizing data into a standardized format³⁹. This model is chosen for this model because it effectually scales the data to a consistent range, usually between 0 and 1, which improves the convergence speed and performance of ML models. This method is specifically advantageous when dealing with datasets with varying magnitudes across features, as it ensures that no single feature dominates due to its scale. Unlike other techniques, such as Z-score normalization, which assumes data is usually distributed, min–max normalization works well even for non-linear data distributions. It’s also simple to implement and computationally efficient. Furthermore, this technique preserves the relationships between data points, making it appropriate for optimization-based models like the one in this framework. Maintaining consistency in feature scaling makes the model less prone to bias from outliers, resulting in more accurate results in classification and prediction tasks.

Normalization is significant in carrying out input data onto magnitude alterations of ML and DL techniques, which are complex to magnitude alterations. To attain that, Min‐Max normalization is used to regularize the features in the interval of $[\text{0,1}]$. It is mathematically expressed below:

$$z=\frac{B-\text{min}(X)}{\text{max}\left(X\right)-\text{min}(X)}$$

(1)

Here, $B$ denotes a value of the original data; $min(X)$ and $Max(X)$ represent the minimum and maximum values, respectively; $z$ means a significant normalization to prevent certain features from leading others owing to their measure.

DR: ITSO model

Next, the ITSO method is implemented for the feature selection process to detect the most relevant features in the data⁴⁰. This model is chosen because it can effectively explore and exploit the search space, detecting the most pertinent features while avoiding redundant or irrelevant ones. Unlike conventional feature selection methods, ITSO replicates tuna’s foraging behaviour, allowing it to navigate complex, high-dimensional data spaces effectively. This swarm-based algorithm balances global exploration and local exploitation, making it specifically effective for massive datasets. The adaptive nature of the ITSO model ensures that it converges to optimal or near-optimal solutions without getting trapped in local minima, a common issue with other methods like greedy algorithms or filter-based techniques. Moreover, ITSO doesn’t require prior knowledge of feature correlations, making it more versatile across diverse datasets. Its integration with ML techniques significantly improves accuracy by mitigating dimensionality and focusing on the most impactful features. Figure 3 illustrates the steps involved in the ITSO model.

The TSO model is a bio-inspired meta-heuristic model that originated its stimulation from the tuna fish’s foraging behaviour. The foraging model consisted of dual phases in-depth, as demonstrated. The first model is spiral foraging, while tuna utilizes a spiral that forms throughout the search. This model permits them to flock their prey into less deep waters, making it easy to achieve. By accepting this spiral approach, tuna successfully enclose their prey and improve their probabilities of an effective search. The next model, parabolic foraging, includes all tunas following along, making a parabolic design to surround its prey successfully. By imitating these strategies, the TSO model improves its optimizer procedures. The mathematical model of these behaviours is described below:

Initialization

Like other bio-inspired meta-heuristic models, TSO initiates the optimization procedure by arbitrarily generating primary populations uniformly distributed through the searching region utilizing Eq. (2).

$${X}_{i}^{int}=rand. \left(ub-lb\right)+lb, i={1}_{t}{2}_{t} NP$$

(2)

whereas ${x}_{i}^{int}$ denotes ${i}^{th}$ individual; $ub$ and $lb$ represent the upper and lower limits, respectively; $NP$ characterizes the tuna population counts, and the $rand$ refers to uniformly distributed arbitrary vectors with values ranging between $(0$−1).

Spiral foraging

When encountered with predators, smaller breeding fish like herring and sardines exhibit dynamic behaviour, constantly adjusting their swimming direction to evade threats. In contrast, tuna schools use a tightly looped spiral formation to pursue their prey. While most fish may lack robust orientation skills, they collaborate with more adept swimmers, forming a cohesive, unified hunting force. Furthermore, tuna schools share information with all members following the lead fish, enabling effective communication and coordinated movements.

The foraging behaviour also comprises a concept where if a group member fails to find food, the rest do not blindly follow. Instead, a random reference point within the search space is introduced to guide the spiral search. This encourages broader exploration and improves the group’s global search capability.

The updated model for the group members’ movement is as follows:

$${X}^{t+1}=\left\{\begin{array}{c}{a}_{1}.\left({X}_{best}^{t}+\beta .\left|{X}_{best}^{t}-{X}_{i}^{t}\right|\right)+{a}_{2}.{X}_{i}^{t}, i=1\\ {a}_{1}.\left({X}_{best}^{t}+\beta .\left|{X}_{best}^{t}-{X}_{i}^{t}\right|\right)+{a}_{2}.{X}_{i-1}^{t},i=\text{2,3},..NP\end{array}\right.$$

(3)

This model facilitates global exploration, enhancing the overall search efficiency by diversifying the strategy of the group

$${\alpha }_{1}=a+\left(1-a\right).\frac{t}{{t}_{\text{max}}}$$

(4)

$${\alpha }_{2}=\left(1-a\right)-\left(1-a\right).\frac{t}{{t}_{\text{max}}}$$

(5)

$$\beta ={e}^{bl}.cos\left(2\pi b\right)$$

(6)

$$l={e}^{3}.cos\left(\left(\left({t}_{\text{max}}+\frac{1}{\tau }\right)-1\right)\pi \right)$$

(7)

The equations describe an optimization process where the position of each individual, ${X}_{i}^{t+1}$, is updated based on the best-known solution, ${X}_{best}^{t}$, and the previous position of an individual. The constants ${\alpha }_{1}$ and ${\alpha }_{2}$ control the extent to which individuals depend on the optimal solution and their prior positions during the search. The parameter $\beta$, computed utilizing a random factor $b$, introduces the agent’s movement variability. $l$, defined as a dynamic spiral factor, improves global exploration. The variables $t$ and ${t}_{\text{max}}$ represent the current and maximum iteration counts. The random number $b$ is uniformly distributed between 0 and 1, guiding the exploration–exploitation balance in the optimization process. These dynamics ensure that the group maintains an efficient and diverse search strategy.

Parabolic foraging

In addition to the spiral feeding form, tunas participate in cooperative feeding by accepting a parabolic pattern. In this design, they utilize reference points that are usually the position of their food. In addition, tunas dynamically look for food in their direct environments. These double-feeding models are implemented together, with an equivalent presumed probability of 50%. The mathematic representation which designates this phenomenon is defined as shown:

$${X}_{i}^{t+1}=\left\{\begin{array}{l}{X}_{best}^{t}+rand.\left({X}_{best}^{t}-{X}_{i}^{t}\right)+{TF}_{p}^{2}. \left({X}_{best}^{t}-{X}_{i}^{t}\right), if r<0.5\\ {TF}_{p}^{2}. {X}_{i}^{t}, if r\ge 0.5\end{array}\right.$$

(8)

$$p=(1-\frac{t}{{t}_{\text{max}}}{)}^{\frac{t}{{t}_{\text{max}}}}$$

(9)

Meanwhile, $TP$ refers to a randomly generated number valued at 1 or ‐l.

The change presented in this study is to enhance the TSO model performance to gain better solutions with quicker convergence. This mainly includes the introduction of other weighted calculations in the parabolic foraging stage. In this stage, the weight $p$ is computed utilizing Eq. (9). This method of controlling the $p$-value mainly relies on the maximal iteration counts and the iteration counter, which provides reduced weighted values. Reliance on the maximum iteration counts is the main reason for the slower convergence of the TSO in some instances (particularly in resolving higher‐dimensional difficulties). To improve this problem, an adaptive fitness weighted model is presented to change the conventional one provided in Eq. (9) to improve the model’s convergence speed to the best result. Adaptive fitness weighting speeds up convergence by dynamically fine-tuning the significance of solutions according to their fitness, permitting the TSO to effectively concentrate on potential areas, stop early convergence, and adjust to varying fitness lands, improving exploitation, exploration, and allocation of resources, which will finally increase the model’s speed of convergence in different optimizer issues to give better global solutions. The presented weighting calculation, $p$, is proposed in Eq. (10) below.

$$p=\frac{fit\left({X}_{i}^{\tau }\right)}{{\Sigma }_{i=1}^{t}fit\left({X}_{i}^{\tau }\right)}$$

(10)

whereas $fit({X}_{i}^{t})$ refers to the fitness of the tuna individual in $tth$ iteration. This adaptive weight is different from another model as it mainly relies on the individual fitness of the population for guiding convergence. It is significant to recognize that the weight is computed at all iterations utilizing Eq. (10). The presented ITSO model implementation process is provided as shown:

In the ITSO approach, the fitness function (FF) employed was intended to have a balance between the number of preferred features in each solution (least) and the classifier accuracy (greatest) attained by utilizing these preferred features, Eq. (11) epitomizes the FF for evaluating a solution.

$$Fitness=\alpha {\gamma }_{R}\left(D\right)+\beta \frac{\left|R\right|}{\left|C\right|}$$

(11)

Here, ${\gamma }_{R}(D)$ signifies the classification rate of error. $\left|R\right|$ refers to the cardinality of the chosen subset, and $|C|$ signifies the total quantity of features within the data. $\beta$ and $\alpha$ are dual parameters corresponding to the significance of classification excellence and subset length. ∈ [1, 0] and $\beta =1-\alpha .$

Classification process: BiLSTM-SA

Besides, the proposed ICSSADL-MHOA model utilizes the BiLSTM-SA technique for the classification method of cybersecurity⁴¹. This model is chosen because it can capture both past and future dependencies in sequential data. The bidirectional nature of BiLSTM allows the model to access context from both directions, enhancing its capability to comprehend temporal relationships and patterns that may exist in the data. When integrated with SA, the model can concentrate on the most relevant parts of the input sequence, allowing it to emphasize crucial features while disregarding irrelevant ones. This makes it highly effective for complex, dynamic datasets like those encountered in cybersecurity. Unlike conventional models that may face difficulty with long-range dependencies, BiLSTM-SA outperforms learning from sequences of varying lengths. Furthermore, integrating LSTM and attention mechanisms improves its robustness, enabling it to perform better in detection and classification tasks than simpler models like conventional feedforward neural networks or shallow LSTMs. Figure 4 depicts the infrastructure of BiLSTM-SA.

The networks of the LSTM technique control the information flow over gating mechanisms, allowing them to read, retain, and remove information in particular. These networks are effectual in taking long‐term dependences and, partially, easing the tasks of gradient explosion and vanishing that recurrent neural networks (RNNs) might face when handling long successive data. It has numerous memory cells, and each one has ${an f}_{t}$ forget gate, ${i}_{t}$ s input gate, and ${0}_{t}$ output gate:

$${f}_{t}=\sigma \left({W}_{f}\cdot \left[{h}_{t-1},{x}_{t}\right]+{b}_{f}\right)$$

(12)

$${i}_{t}=\sigma \left({W}_{i}\cdot \left[{h}_{t-1},{x}_{t}\right]+{b}_{i}\right)$$

(13)

$${\text{o}}_{t}=\sigma \left({W}_{\text{o}}\cdot \left[{h}_{t-1},{x}_{t}\right]+{b}_{\text{o}}\right)$$

(14)

In Eq. (12), $\sigma$ is an sigmoid activation function; $W$ and $b$ signify the weight matrix and bias, respectively; ${h}_{t-1}$ represents the preceding moment’s hidden layer (HL); ${x}_{t}$ is a present input.

The input gate defines what present input data must be kept in the memory cell. The forget gate mainly defines how many preceding memories must be left out. The memory cell concludes which data wants to be removed and must be remembered for the following time step per the verdicts created by both input and forget gates. The gate of output manages a quantity of data, which is distributed from the memory cell to HL. Then, it is employed as an output and distributed to the following layer. The final output value has been computed by enlarging the output gate and outcomes by the memory cell. On the other hand, the LSTM can only deal with data in a one-way method, generally from the start to the end of the series. To overwhelm this restraint, the BiLSTM technique employs dual dissimilar layers of LSTM at every time step: one handles the sequence in the direction forward (from start to finish), and another handles it backwards (from finish to start).

The main aim of BiLSTM is to take bidirectional dependencies by uniting outputs from both directions. This bidirectional model permits the method to incorporate context data by seizing intricate dependencies in sequences.

SA is commonly employed in CV and NLP methodologies to capture links within sequences. The main goal is to permit the method for handling inputs at every step by reflecting local district data and focusing on other fragments of similar input series. This flexibility allows the process to seize global dependencies among basics by spreading. Furthermore, the SA provides the benefit of sequential handling instead of handling simultaneously. This method can alter the attention weights across numerous time-steps by providing more concentration to significant steps.

Initially, inputs are changed into 3 vectors such as key $(K)$, query $(Q)$, and value $(V)$. Then, the resemblance was computed utilizing the dot product among $Q$ and $K$, followed by standardization of the similarity scores for getting an attention weight. Then, they are employed in the value vectors, and the resultant output is weighted completely:

$$Q=X{W}_{q},K=X{W}_{k}, V=X{W}_{v}$$

(15)

Here, ${W}_{q},{W}_{k}$, and ${W}_{v}$ denote the weight matrices. The dot product of $K$ and $Q$ was employed for calculating their resemblance:

$$Attention\left(Q,K, V\right)=softmax\left(\frac{Q{K}^{T}}{\sqrt{{d}_{k}}}\right)V$$

(16)

In the equation mentioned above, ${d}_{k}$ denotes the key dimension employed for measuring the dot product to evade the problem of gradient explosions or vanishing. S $oftmax$ regularizes the resemblance scores for getting attention weights and next attains a weighted synopsis on vector $V$ to attain the last output:

$$Output =Attention(Q,K, V)$$

(17)

Parameter optimizer: HGS model

Finally, the parameter selection of the BiLSTM-SA is executed using the HGS method⁴². This method is chosen because it can optimize complex, high-dimensional search spaces. Unlike conventional optimization techniques, such as grid or random search, HGS is inspired by natural selection processes, which enables it to balance exploration and exploitation more effectively. This method adapts to dynamic environments, making it ideal for optimizing the parameters of DL methods such as BiLSTM-SA. HGS is capable of averting local minima, a common challenge in parameter tuning, by integrating diverse strategies that improve global exploration. Additionally, HGS can handle large search spaces with a reduced computational cost compared to exhaustive search methods. Its robustness and capability to fine-tune hyperparameters make it specifically appropriate for enhancing the performance of complex models like BiLSTM-SA, ultimately improving the classification accuracy and efficiency of the cybersecurity system. Figure 5 specifies the steps involved in the HGS technique.

The HGS is the recent and novel population‐based meta-heuristic method, which imitators the natural insight of animals to look for food. Hunger has been the primary inspiration for designing a competitive and computationally efficient model. The model mimics selection, competition, and adaptive procedures existing naturally, along with imaginary games (Hunger Games). In such games, individuals (agents) challenge for survival or resources in challenging surroundings. During the optimization context, these individuals are considered promising solutions to problems, and the surroundings are the region, while the difficulties are to be investigated for solutions. The model passes across the selection, adaptation, and competition methods, which assist in making the optimal solution(s) to the problem. This segment describes the mathematical representation of the HGS method. The model depends on dual key elements, the $Hunger Rule$ and the $Approach Rule$, to mimic adaptive decision‐making strategies and natural hunger‐driven behaviours.

Approaching food

Naturally, animals often cooperate in looking for food, whereas in other cases, they select to hunt autonomously. Animal searching models stimulate the mathematic equation in Eq. (18). They characterize three dissimilar models in which animals transfer, imitating their behaviour once they approach a food source. These designs are essential to the HGS approach, which imitates individual foraging or cooperative behaviour between animals.

$$\overrightarrow{X}\left(t+1\right)=\left\{\begin{array}{ll}\overrightarrow{X}\left(t\right)\cdot \left(1+randn\left(1\right)\right),& if {r}_{1}<l\\ \overrightarrow{W}\cdot {\overrightarrow{X}}_{b}+\overrightarrow{R}\cdot {\overrightarrow{W}}_{2}\cdot \left|{\overrightarrow{X}}_{b}-\overrightarrow{X}\left(t\right)\right|,& if {r}_{1}>l, {r}_{2}>E\\ \overrightarrow{W}\cdot {\overrightarrow{X}}_{b}-\overrightarrow{R}\cdot {\overrightarrow{W}}_{2}\cdot \left|{\overrightarrow{X}}_{b}-\overrightarrow{X}\left(t\right)\right|,& if {r}_{1}>l, {r}_{2}\le E\end{array}\right.$$

(18)

whereas $\overrightarrow{R}$ differs inside $[-a, a],$ ${r}_{1}$ and ${r}_{2}$ means randomly generated values inside the range $[\text{0,1}]$. The term $randn$(l) represents a random number distributed normally. The $t th$ variable epitomizes the present iteration, through $\overrightarrow{X}\left(t\right)$ and $\overrightarrow{X}\left(t+1\right)$ representing all individuals’ present and following locations. ${\overrightarrow{W}}_{1}$ and ${\overrightarrow{W}}_{2}$ denote weighting related to hunger, which is calculated according to hunger‐driven signals. ${\overrightarrow{X}}_{b}$ signifies the ___location of an optimal individual in the present iteration. At last, $l$ stands for important controller parameters of the HGS that may affect its complete performance.

During the presented upgrading rules, $\overrightarrow{X}(t)\cdot (1+randn(1)$ demonstrates in what manner the agent hunts for food either randomly or hungrily at its present position. $\left|{\overrightarrow{X}}_{b}-\overrightarrow{X}\left(t\right)\right|$ represents the range of activity the individual is currently doing, which ${\overrightarrow{W}}_{2}$ further fine-tunes to reflect hunger’s influence on these activities. The hunting ends after the individual is most starving, using $R$ helping as a controller to decrease the range of activity to 0 slowly. The subtraction or addition of the range of activity, subjective by ${\overrightarrow{W}}_{1}\cdot {\overrightarrow{X}}_{b}$, imitates how an individual, directed among their peers to food resources, restarts hunting at the present position after food is located. Now, ${\overrightarrow{W}}_{1}$ characterizes the difference in precisely locating the real position.

Based on the mathematical expression in Eq. (19), $E$ assists as variation controls for each position.

$$E=\text{sech}(\left|F\left(i\right)-BF\right|)$$

(19)

For all individuals $i$, whereas $i$ range between (1-$n)$ using $n$ to become the quantity of population size or searching agents, $F(i)$ signifies the fitness value. $BF$ indicates the optimal fitness attained in the present iteration (thus far). The function of hyperbolic $sech$ was described as follows:

$$sech\left(x\right)=\frac{2}{{e}^{x}+{e}^{-x}}$$

(20)

The expression for $\overrightarrow{R}$ is delineated as:

$$\overrightarrow{R}=2\times shrink \times rand-shrink$$

(21)

$$shrink=2\times \left(1-\frac{t}{T}\right)$$

(22)

Now, $rand$ characterizes a randomly generated value inside the range $[\text{0,1}]$, and $T$ signifies the total iteration counts. The shrinking parameter, computed according to the present iteration $t$ according to the total iterations $T$, ranges between $(0$−2). These ranges consider how the impact on the $shrink$ factor reduces in time, from its maximal at the beginning of the searching procedure to $0$ as $t$ approaches $T$. Therefore, the $R$ range that fine-tunes the activities range the searching agents derived from $rand$ and $shrink$, further differs from $2$ to 2. This dynamical range permits for a measured exploration of searching region, restricting as the model grows, to concentrate on exploiting optimal solutions discovered.

Hunger rule

This part presents the mathematical method miming individuals’ starvation qualities, establishing the HGS method’s core concept.

$${\overrightarrow{W}}_{1}\left(i\right)=\left\{\begin{array}{ll}hungry\left(i\right)\cdot \frac{N}{{SH}_{hungry}}\cdot {r}_{4},& if {r}_{3}<l\\ 1,& if {r}_{3}>l\end{array}\right.$$

(23)

$${\overrightarrow{W}}_{2}\left(i\right)=\left(1-\text{exp}\left(-\left|hungry\left(i\right)-{SH}_{hungry}\right|\right)\right)\cdot {r}_{5}\cdot 2$$

(24)

Given that hunger calculates every individual’s hunger level, $N$ represents the entire individual count, and $SHungry$ characterizes the aggregated hunger through every individual, calculating their hunger levels $\sum (hungry)$. ${r}_{3},$ ${r}_{4}$, and ${r}_{5}$ random numbers fall under the range $[\text{0,1}]$. The equation for computing a hungry individual, $hungry(i)$, is delineated in Eq. (25).

$$hungry\left(i\right)=\left\{\begin{array}{c}0, if AllFitness\left(i\right)=BF\\ hungry\left(i\right)+H, if AllFitness\left(i\right)\ne BF\end{array}\right.$$

(25)

Here, $AllFitness(i)$ collects the fitness values of all individuals for the present iteration. At all iterations, the best‐performing individual’s hunger level returns to $0$. The upgraded hunger levels are denoted by $H$. The $H$ value is computed utilizing Eq. (26).

$$H=\left\{\begin{array}{c}LH\cdot \left(1+r\right), if TH<LH\\ TH, if TH\ge LH\end{array}\right.$$

(26)

$$TH=\frac{F(i)-BF}{WF-BF}\cdot {r}_{6}\cdot 2\cdot \left(UB-LB\right)$$

(27)

whereas ${r}_{6}$ represents a randomly generated amount within the interval of [0,1]; $F(i)$ signifies the fitness value of all individuals; $BF$ characterizes the maximum fitness attained in the present iteration; $WF$ indicates the lower fitness gained in the present iteration; $LB$ and $UB$ stands for the lower and upper limits of the feature area, respectively. As defined, the hunger sensation signifies minimal values, $and LH limits H$. To improve the model’s efficacy, hunger’s lower and upper thresholds are processed by using the value of $LHs$ to be explored in parameter tuning. Hunger can impact the range of activity either negatively or positively. ${W}_{1}$ and ${W}_{2}$ are demonstrated to reflect that. During Eq. (28), the disparities amongst $LB$ and $UB$ demonstrate the maximal hunger level in changing states; $F(i)-BF$ determines the remaining food needed for the individual to fulfil starvation; Every iteration changes the hunger level of an individual. $WF-DF$ computes an individual’s total hunting possible in the present situation; $\frac{F\left(i\right)-DF}{WF-DF}$ controls the hunger ratio; ${r}_{6}$ x $2$ evaluates the environmental influence, both positive and negative, on hunger.

The HGS approach originates an FF for getting an enhanced classification of performance. It explains a positive number to imply the better result of the candidate solution. At this point, the classification error rate reduction is measured as FF. Its formulation is expressed below:

$$\begin{gathered} fitness(x_{i} ) = classifierErrorRate(x_{i} ) \hfill \\ = \frac{{no.of\,misclassified\,{\kern 1pt} samples}}{{Total\,no.{\kern 1pt} \,of\,samples}} \times 100 \hfill \\ \end{gathered}$$

(28)

Experimental validation and discussion

The simulation analysis of the ICSSADL-MHOA technique is examined under dual datasets such as ToN-IoT⁴³and Edge-IIoT⁴⁴. The ToN-IoT database contains 119,957 no. of samples below nine class labels. The total number of features is 42, but only 27 have been selected. The complete details of this dataset are depicted in Table 2. The suggested technique is simulated using the Python 3.6.5 tool on PC i5-8600 k, 250 GB SSD, GeForce 1050 Ti 4 GB, 16 GB RAM, and 1 TB HDD. The parameter settings are provided: learning rate: 0.01, activation: ReLU, epoch count: 50, dropout: 0.5, and batch size: 5.

Table 2 Details of the dataset.

Full size table

Figure 6 displays the classifier results of the ICSSADL-MHOA model on the ToN-IoT dataset. Figures 6a-6b displays the confusion matrices by accurately identifying and classifying all classes below 70%TRPH and 30%TSPH. Figure 6c presents the PR study, which notified higher performance through all classes. At last, Fig. 6d demonstrates the ROC study, which illustrates skilful solutions with significant values of ROC for different class labels.

The results from Table 3 and Fig. 7 show the performance of the ICSSADL-MHOA approach for detecting cyberattacks on the ToN-IoT dataset under two diverse attack proportions: 70% TRPH and 30% TSPH. For 70% TRPH, the method illustrates high accuracy, with an average $acc{u}_{y}$ of 99.42%, indicating superior performance in identifying standard and attack class labels. The $pre{c}_{n}$ of 92.51% suggests an effectual reduction of false positives, while the $sen{s}_{y}$ of 87.79% shows the capability of the model to detect a high percentage of true positive attack instances. However, this sensitivity represents that some minority class attacks, like MiTM, may also not be detected, suggesting room for improvement. The $spe{c}_{y}$ of 99.53% demonstrates the capability of the technique to accurately detect normal instances without false positives, and the ${F1}_{score}$ of 89.13% reflects a balanced $pre{c}_{n}$ and recall. For 30% TSPH, the performance slightly decreases, with an $acc{u}_{y}$ of 99.44%, $pre{c}_{n}$ of 92.17%, $sen{s}_{y}$ of 87.76%, $spe{c}_{y}$ of 99.56%, and an ${F1}_{score}$ e of 88.99%. The slight reduction in $sen{s}_{y}$ for both configurations points to challenges in detecting rare attacks, specifically under imbalanced data conditions. This reduction could be addressed by techniques like resampling or weighted loss functions to enhance the detection of low-frequency attacks.

Table 3 Cyberattack detection of ICSSADL-MHOA method on ToN-IoT dataset.

Full size table

In Fig. 8, the training (TRA) and validation (VAL) $acc{u}_{y}$ performances of the ICSSADL-MHOA technique on the ToN-IoT dataset are exemplified. The values of $acc{u}_{y}$ are computed across a period of 0–25 epochs. The figure underscored that the values of TRA and VAL $acc{u}_{y}$ present a cumulative tendency indicating the proficiency of the ICSSADL-MHOA method through maximum performance through multiple repetitions. In addition, the TRA and VAL $acc{u}_{y}$ values remain close through the epochs, notifying lesser overfitting and revealing the maximum outcome of the ICSSADL-MHOA method, which guarantees steady prediction on unseen samples.

Figure 9 shows the TRA loss (TRALOS) and VAL loss (VALLOS) of the ICSSADL-MHOA approach on the ToN-IoT database. The loss values are computed throughout 0–25 epochs. The values of TRALOS and VALLOS depict a diminishing trend, which indicates the proficiency of the ICSSADL-MHOA approach in harmonizing a tradeoff between generalization and data fitting. The consecutive decrease in loss and securities values improved the performance of the ICSSADL-MHOA technique and tuned the prediction results after a while.

Table 4 and Figs. 10–11 shows the comparative study of the ICSSADL-MHOA approach on the ToN-IoT dataset with existing methodologies below different metrics. The performances imply that the proposed ICSSADL-MHOA approach has improved outcome $acc{u}_{y}$ of 99.44%, $pre{c}_{n}$ of 92.17%, $sen{s}_{y}$ of 87.76%, $spe{c}_{y}$ of 99.56%. While the existing models DT, RF, KNN, SVM, XGBoost, MLP, and NB techniques have gained the poorest performance.

Table 4 Comparative analysis of ICSSADL-MHOA method on the ToN-IoT dataset.

Full size table

Also, the proposed ICSSADL-MHOA method is examined under the Edge-IIoT dataset. This dataset has 56,000 records under 12 classes, as represented in Table 5. The total number of features is 62, but only 45 have been selected.

Table 5 Details of Edge-IIoT dataset.

Full size table

Figure 12 displays the classifier results of the ICSSADL-MHOA model on the Edge-IIoT dataset. Figures 12a-12b depicts the confusion matrices through precise identification and classification of all classes below 70%TRPH and 30%TSPH. Figure 12c shows the PR examination, which notified superior performance over all class labels. Finally, Fig. 12d exemplifies the ROC examination, which demonstrates capable solutions with significant values of ROC for dissimilar class labels.

Table 6 and Fig. 13 demonstrate cyberattack detection of the ICSSADL-MHOA approach on the Edge-IIoT dataset below 70%TRPH and 30%TSPH is showcased. The performances show that the ICSSADL-MHOA model efficiently detected all class labels. Based on 70%TRPH, the ICSSADL-MHOA approach attains an average $acc{u}_{y}$ of 99.33%, $pre{c}_{n}$ of 95.99%, $sen{s}_{y}$ of 95.79%, $spe{c}_{y}$ of 99.63%, and ${F1}_{score}$ of 95.88%. Moreover, according to 30%TSPH, the ICSSADL-MHOA approach attains an average $acc{u}_{y}$ of 99.37%, $pre{c}_{n}$ of 96.12%, $sen{s}_{y}$ of 96.02%, $spe{c}_{y}$ of 99.65%, and ${F1}_{score}$ of 96.07%.

Table 6 Cyberattack detection of ICSSADL-MHOA method on Edge-IIoT dataset.

Full size table

Figure 14 depicts the TRA and VAL $acc{u}_{y}$ performances of the ICSSADL-MHOA technique on the Edge-IIoT dataset. The values of $acc{u}_{y}$ are computed through a period of 0–25 epochs. The figure underscored that the values of TRA and VAL $acc{u}_{y}$ show an increasing trend, indicating the capacity of the ICSSADL-MHOA approach with maximum performance across numerous repetitions. Followed by the TRA and VAL $acc{u}_{y}$ values remaining close across the epochs, notifying diminished overfitting and showing the maximal performance of the ICSSADL-MHOA approach, which assurances reliable prediction on unseen samples.

Figure 15 shows the TRALOS and VALLOS graph of the ICSSADL-MHOA methodology on the Edge-IIoT dataset. The loss values are computed throughout 0–25 epochs. The values of TRALOS and VALLOS represent a declining trend, which indicates the proficiency of the ICSSADL-MHOA approach in harmonizing a tradeoff between data fitting and generalization. The successive dilution in values of loss and securities enhances and securities enhances the outcome of the ICSSADL-MHOA approach and tunes the calculation results gradually.

Table 7 and Figs. 16–17shows the comparative study of the ICSSADL-MHOA approach on the Edge-IIoT dataset with existing methodologies below different metrics^45,46,47,48. The performances denote that the proposed ICSSADL-MHOA technique has gained superior outcome $acc{u}_{y}$ of 99.37%, $pre{c}_{n}$ of 96.12%, $sen{s}_{y}$ of 96.02%, $spe{c}_{y}$ of 99.65%. The existing methods, Shallow ANN, Isolated LSTM, CNN, RF, SVM, DNN, and Inception Time techniques, have reached the poorest performance.

Table 7 Comparative analysis of ICSSADL-MHOA method on Edge-IIoT dataset^45,46,47,48.

Full size table

Conclusion

In this article, a new ICSSADL-MHOA technique is presented. The main aim of the ICSSADL-MHOA technique is to enhance a robust cybersecurity system in IoT networks. At first, the data normalization stage employs min–max normalization to ensure consistency, accuracy, and efficiency by organizing data into a standardized format. Next, the ITSO model was implemented for the FS process to detect the most relevant features in the data. Besides, the proposed ICSSADL-MHOA model designs the BiLSTM-SA technique for the classification method of cybersecurity. Finally, the parameter selection of the BiLSTM-SA is implemented using the HGS method. Comprehensive studies under the ToN-IoT and Edge-IIoT datasets validate the efficiency of the ICSSADL-MHOA method. The experimental validation of the ICSSADL-MHOA method illustrated a superior accuracy value of 99.37% over existing techniques. The ICSSADL-MHOA method’s limitations include reliance on a limited set of data sources, which may not fully represent the diverse behaviour of IoT environments and threats. Additionally, the computational complexity of specific approaches could affect their applicability in resource-constrained devices, affecting scalability and real-time performance. The proposed models may also struggle to handle data imbalances and noisy environments, impacting the detection accuracy for minority class threats. Furthermore, many existing solutions fail to address cross-layer security challenges comprehensively, which is significant for robust IoT defence. Future work should improve the model’s adaptability across diverse IoT systems, optimize computational efficiency, and develop hybrid techniques incorporating diverse security layers. Enhancing real-time detection capabilities and exploring lightweight solutions for edge devices would also be vital for the practical deployment of these methods. Additionally, further research could focus on integrating adversarial ML techniques to improve the system’s robustness against sophisticated cyberattacks.

Data availability

The data supporting this study’s findings are openly available in the Kaggle repository at https://www.kaggle.com/datasets/dhoogla/cictoniot and https://www.kaggle.com/datasets/mohamedamineferrag/edgeiiotset-cyber-security-dataset-of-iot-iiot, reference number^43,44.

References

Al-Haija, Q. A. & Zein-Sabatto, S. An efficient deep-learning-based detection and classification system for cyber-attacks in iot communication networks. Electronics 9, 2152 (2020).
Article Google Scholar
Sarı, T. & GülesYigitol, H. K. B. Awareness and readiness of industry 4.0: The case of turkish manufacturing industry. Adv. Prod. Eng. Manag. 15, 57–68 (2020).
Google Scholar
Gupta, S., Sabitha, A. S. & Punhani, R. Cyber security threat intelligence using data mining techniques and artificial intelligence. Int. J. Recent Technol. Eng. 8, 6133–6140 (2019).
Google Scholar
Laskurain-Iturbe, I., Arana-Landín, G., Landeta-Manzano, B. & Uriarte-Gallastegi, N. Exploring the influence of industry 4.0 technologies on the circular economy. J. Clean. Prod. 321, 128944 (2021).
Article Google Scholar
Ameen, A. H., Mohammed, M. A. & Rashid, A. N. Enhancing security in IoMT: A blockchain-based cybersecurity framework for machine learning-driven ECG signal classification. Fusion Pract. Appl. https://doi.org/10.54216/FPA.140117 (2024).
Article Google Scholar
Smith, K. J., Dhillon, G. & Carter, L. User values and the development of a cybersecurity public policy for the IoT. Int. J. Inf. Manag. 56, 102123 (2021).
Article Google Scholar
Zhang, T., Zhao, Y., Jia, W. & Chen, M. Y. Collaborative algorithms that combine AI with IoT towards monitoring and control system. Futur. Gener. Comput. Syst. 125, 677–686 (2021).
Article Google Scholar
Al-Omari, M., Rawashdeh, M., Qutaishat, F. & AlshiraAbabneh, H. M. N. An intelligent tree-based intrusion detection model for cyber security. J. Netw. Syst. Manag. 29, 20 (2021).
Article Google Scholar
Islam, U. et al. Detection of distributed denial of service (DDoS) attacks in IOT based monitoring system of banking sector using machine learning models. Sustainability 14, 8374 (2022).
Article Google Scholar
HanEL-HasnonyCai, Y. I. M. W. Dragonfly algorithm with gated recurrent unit for cybersecurity in social networking. J. Cybersec. Inform. Manag. 2, 75–85 (2019).
Google Scholar
Imtiaz, N. et al. A deep learning-based approach for the detection of various internet of things intrusion attacks through optical networks. Photonics. 12(1), 35 (2025).
Article Google Scholar
Deshmukh, A. & Ravulakollu, K. An efficient CNN-based intrusion detection system for IoT: Use case towards cybersecurity. Technologies 12(10), 203 (2024).
Article Google Scholar
Sattarpour, S., Barati, A. & Barati, H. EBIDS: Efficient BERT-based intrusion detection system in the network and application layers of IoT. Clust. Comput. 28(2), 1–21 (2025).
Article Google Scholar
Morshedi, R., Matinkhah, S. M. & Sadeghi, M. T. Intrusion detection for IoT network security with deep learning. J. AI Data Mining 12(1), 37–55 (2024).
Google Scholar
Ragab, M. et al. Artificial intelligence driven cyberattack detection system using integration of deep belief network with convolution neural network on industrial IoT. Alex. Eng. J. 110, 438–450 (2025).
Article Google Scholar
Alsoufi, M. A. et al. Anomaly-based intrusion detection model using deep learning for IoT networks. Comput. Model. Eng. Sci. 141(1), 823–845 (2024).
Google Scholar
Al-Neami, I. A., Hameed, Z. S. & Al-zubaydi, Z. A. Adaptive FPGA-based intrusion detection system for real-time internet of things security. J. Intel. Syst. Net. Things https://doi.org/10.54216/JISIoT.140122 (2025).
Article Google Scholar
Wang, X., Dai, L. & Yang, G. A network intrusion detection system based on deep learning in the IoT. J. Supercomput. 80(16), 24520–24558 (2024).
Article Google Scholar
Saravana Ram, R. & Gopi Saminathan, A. An intrusion detection system in wsn using an optimized self-attention-based progressive generative adversarial network. IETE J. Res. https://doi.org/10.1080/03772063.2025.2452339 (2025).
Article Google Scholar
Tewari, A. & Gupta, B. B. Security, privacy and trust of different layers in Internet-of-Things (IoTs) framework. Futur. Gener. Comput. Syst. 108, 909–920 (2020).
Article Google Scholar
Aboalela, R. et al. Harnessing feature pruning with optimal deep learning-based distributed denial of service cyberattack detection on IoT environment. Alex. Eng. J. 120, 584–597 (2025).
Article Google Scholar
Adat, V. & Gupta, B. B. Security in internet of things: Issues, challenges, taxonomy, and architecture. Telecommun. Syst. 67, 423–441 (2018).
Article Google Scholar
Santhanamari, P., Kathirgamam, V., Subramanian, L., Panneerselvam, T. & Chirakkal Radhakrishnan, R. Security enhancement in 5G networks by identifying attacks using optimized cosine convolutional neural network. Net. Technol. Lett. 8(2), e70003 (2025).
Google Scholar
Zhao, G., Li, X. & Li, H. A trusted authentication scheme using semantic LSTM and blockchain in IoT access control system. Int. J. Semant. Web Inform. Syst. (IJSWIS) 20(1), 1–27 (2024).
MathSciNet Google Scholar
Mohamed, A. A., Al-Saleh, A., Sharma, S. K. & Tejani, G. G. Zero-day exploits detection with adaptive WavePCA-Autoencoder (AWPA) adaptive hybrid exploit detection network (AHEDNet). Sci. Rep. 15(1), 4036 (2025).
Article PubMed PubMed Central CAS Google Scholar
Reka, R., Karthick, R., Ram, R. S. & Singh, G. Multi head self-attention gated graph convolutional network based multi-attack intrusion detection in MANET. Comput. Secur. 136, 103526 (2024).
Article Google Scholar
Wang, Z. et al. A deep residual SConv1D-attention intrusion detection model for industrial internet of things. Clust. Comput. 28(2), 116 (2025).
Article Google Scholar
Ashwini, K. & Nagasundara, K. B. An intelligent ransomware attack detection and classification using dual vision transformer with mantis search split attention network. Comput. Electr. Eng. 119, 109509 (2024).
Article Google Scholar
Zareh Farkhady, R., Majidzadeh, K., Masdari, M. & Ghaffari, A. 3DLBS-BCHO: A three-dimensional deep learning approach based on branch splitter and binary chimp optimization for intrusion detection in IoT. Clust. Comput. 28(2), 83 (2025).
Article Google Scholar
Perumal, E., Arulanthu, P., Ramachandran, R. and Singh, R. February. Enhanced Metaheuristics with Deep Learning Model for Blockchain Assisted Cyber Security Solution in Internet of Things Environment. In 2024 Second International Conference on Emerging Trends in Information Technology and Engineering (ICETITE) 1–7 IEEE (2024).
Orman, A. Cyberattack detection systems in industrial internet of things (IIoT) networks in big data environments. Appl. Sci. 15(6), 3121 (2025).
Article CAS Google Scholar
Kocherla, R., Molugu, S., Nandalal, V., Dhal, P.K., Saranya, N., Basu Mallik, B. and Girimurugan, R. February. Deep Learning Based Stacked Recurrent Neural Networks for Intrusion Detection in Industrial Control Systems Using Bio Inspired Meta Heuristics. In International Conference on Emerging Trends in Mathematical Sciences & Computing 207–221 Cham: Springer Nature Switzerland (2024).
Alqahtany, S. S., Shaikh, A. & Alqazzaz, A. Enhanced grey wolf optimization (EGWO) and random forest based mechanism for intrusion detection in IoT networks. Sci. Rep. 15(1), 1916 (2025).
Article PubMed PubMed Central CAS Google Scholar
Babitha, S. Efficient quantum inspired blockchain-based cyber security framework in IoT using deep learning and huristic algorithms. Intell. Decis. Technol. 18(2), 1203–1232 (2024).
Google Scholar
Anu Velavan, S. & Sureshkumar, C. Double fuzzy clustering-driven context neural network for intrusion detection in cloud computing. Wireless Netw. https://doi.org/10.1007/s11276-024-03890-3 (2025).
Article Google Scholar
Lakicevic, B., Spalevic, Z., Volas, I., Jovanovic, L., Zivkovic, M., Zivkovic, T. and Bacanin, N. 2024 December. Artificial Neural Networks with Soft Attention: Natural Language Processing for Phishing Email Detection Optimized with Modified Metaheuristics. In International Conference on Advanced Network Technologies and Intelligent Computing 421–438 Cham: Springer Nature Switzerland (2024).
Sayeed, F., Ahmed, K. R. & Swamy, S. M. Development of a multimodal biometric recognition system with feature optimization and deep learning. Multimed. Tools Appl. https://doi.org/10.1007/s11042-025-20709-1 (2025).
Article Google Scholar
Althobaiti, M. M. & Escorcia-Gutierrez, J. Weighted salp swarm algorithm with deep learning-powered cyber-threat detection for robust network security. AIMS Math. 9(7), 17676–17695 (2024).
Article Google Scholar
Ullah, A., Khan, I. U., Younas, M. Z., Ahmad, M. & Kryvinska, N. Robust resampling and stacked learning models for electricity theft detection in smart grid. Energy Rep. 13, 770–779 (2025).
Article Google Scholar
Yussif, A. F. S., Seini, T. & Adu, C. Improved tuna swarm optimization (ITSO) algorithm based on adaptive fitness-weight for global optimization. Int. J. Electr. Eng. Appl. Sci. (IJEEAS) https://doi.org/10.54554/ijeeas.2024.7.02.011 (2024).
Article Google Scholar
Wu, L., Chen, C., Li, Z., Chen, Z. & Li, H. The joint estimation of SOC-SOH for Lithium-Ion batteries based on BiLSTM-SA. Electronics 14(1), 97 (2024).
Article Google Scholar
Hussain, M., Thaher, T., Almourad, M. B. & Mafarja, M. Optimizing VGG16 deep learning model with enhanced hunger games search for logo classification. Sci. Rep. 14(1), 31759 (2024).
Article PubMed PubMed Central CAS Google Scholar
https://www.kaggle.com/datasets/dhoogla/cictoniot
https://www.kaggle.com/datasets/mohamedamineferrag/edgeiiotset-cyber-security-dataset-of-iot-iiot
Gad, A. R., Nashat, A. A. & Barkat, T. M. Intrusion detection system using machine learning for vehicular ad hoc networks based on ToN-IoT dataset. IEEE Access 9, 142206–142217 (2021).
Article Google Scholar
Li, J., Chen, H., Shahizan, M. O. & Yusuf, L. M. Enhancing IoT Security: A comparative study of feature reduction techniques for intrusion detection system. Intell. Syst. Appls. https://doi.org/10.1016/j.iswa.2024.200407 (2024).
Article Google Scholar
Bukhari, S. M. S. et al. Enhancing cybersecurity in Edge IIoT networks: An asynchronous federated learning approach with a deep hybrid detection model. Net. Things https://doi.org/10.1016/j.iot.2024.101252 (2024).
Article Google Scholar
Tareq, I., Elbagoury, B. M., El-Regaily, S. & El-Horbaty, E. S. M. Analysis of ton-iot, unw-nb15, and edge-iiot datasets using dl in cybersecurity for iot. Appl. Sci. 12(19), 9572 (2022).
Article CAS Google Scholar

Download references

Acknowledgements

Researchers Supporting Project number (RSPD2025R564), King Saud University, Riyadh, Saudi Arabia.

Funding

King Saud University, Riyadh, Saudi Arabia, RSPD2025R564.

Author information

Authors and Affiliations

Computer Science Department, Community College, King Saud University, 11437, Riyadh, Saudi Arabia
Fahad Alblehai

Authors

Fahad Alblehai
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization, Data curation and Formal analysis, Investigation and Methodology, Funding Support, Project administration, Resources, Supervision, Validation, Visualization and Writing—original draft: Fahad Alblehai.

Corresponding author

Correspondence to Fahad Alblehai.

Ethics declarations

Competing interests

The author declares no competing interests.

Ethics approval

This article contains no studies with human participants performed by the author.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Alblehai, F. Artificial intelligence-driven cybersecurity system for internet of things using self-attention deep learning and metaheuristic algorithms. Sci Rep 15, 13215 (2025). https://doi.org/10.1038/s41598-025-98056-2

Download citation

Received: 31 January 2025
Accepted: 09 April 2025
Published: 16 April 2025
DOI: https://doi.org/10.1038/s41598-025-98056-2

Artificial intelligence-driven cybersecurity system for internet of things using self-attention deep learning and metaheuristic algorithms

Subjects

Abstract

Similar content being viewed by others

Attack resilient IoT security framework using multi head attention based representation learning with improved white shark optimization algorithm

Harnessing feature pruning with optimal deep learning based DDoS cyberattack detection on IoT environment

Augmenting cybersecurity through attention based stacked autoencoder with optimization algorithm for detection and mitigation of attacks on IoT assisted networks

Introduction

Literature survey

Materials and methods

Data normalization: min–max normalization

DR: ITSO model

Initialization

Spiral foraging

Parabolic foraging

Classification process: BiLSTM-SA

Parameter optimizer: HGS model

Approaching food

Hunger rule

Experimental validation and discussion

Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethics approval

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Search

Quick links

Subjects

Abstract

Similar content being viewed by others

Attack resilient IoT security framework using multi head attention based representation learning with improved white shark optimization algorithm

Harnessing feature pruning with optimal deep learning based DDoS cyberattack detection on IoT environment

Augmenting cybersecurity through attention based stacked autoencoder with optimization algorithm for detection and mitigation of attacks on IoT assisted networks

Introduction

Literature survey

Materials and methods

Data normalization: min–max normalization

DR: ITSO model

Initialization

Spiral foraging

Parabolic foraging

Classification process: BiLSTM-SA

Parameter optimizer: HGS model

Approaching food

Hunger rule

Experimental validation and discussion

Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethics approval

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links