A recurrent sigma pi sigma neural network

Deng, Fei; Liang, Shibin; Qian, Kaiguo; Yu, Jing; Li, Xuanxuan

doi:10.1038/s41598-024-84299-y

Download PDF

Article
Open access
Published: 02 January 2025

A recurrent sigma pi sigma neural network

Fei Deng^1,2,
Shibin Liang³,
Kaiguo Qian^1,2,
Jing Yu^1,2 &
…
Xuanxuan Li^1,2

Scientific Reports volume 15, Article number: 588 (2025) Cite this article

1366 Accesses
Metrics details

Subjects

Abstract

In this paper, a novel recurrent sigma‒sigma neural network (RSPSNN) that contains the same advantages as the higher-order and recurrent neural networks is proposed. The batch gradient algorithm is used to train the RSPSNN to search for the optimal weights based on the minimal mean squared error (MSE). To substantiate the unique equilibrium state of the RSPSNN, the characteristic of stability convergence is proven, which is one of the most significant indices for reflecting the effectiveness and overcoming the instability problem in the training of this network. Finally, to establish a more precise evaluation of its validity, five empirical experiments are used. The RSPSNN is successfully applied to the function approximation problem, prediction problem, parity problem, classification problem, and image simulation, which verifies its effectiveness and practicability.

Training all-mechanical neural networks for task learning through in situ backpropagation

Article Open access 09 December 2024

Universal structural patterns in sparse recurrent neural networks

Article Open access 08 September 2023

A solution to the learning dilemma for recurrent networks of spiking neurons

Article Open access 17 July 2020

Introduction

Neural networks have been widely used in many fields over the past few decades owing to their many benefits. Numerous applications and studies of neural networks have shown the numerous advantages of artificial intelligence, in contrast to the classical methods. Why are neural networks widely used and so attractive? The answer is that governments, companies, and people have found them extremely beneficial and convenient to use. However, once the number of elements in their input layers, and the number of training examples, are extremely large, the training speed decreases. To avoid this problem, higher-order neural networks (HONNs)¹, which have not only a traditional $\sum$ layer but also a novel $\prod$ layer, are used to reduce the network complexity. HONNs include the summing units and product units for multiplying the inputs. Compared with traditional neural networks with only summation units, the product units of HONNs have the capacity to process nonlinear information.

HONNs are widely used in this paper’s areas of interest^2,3,4,5. The sigma-pi neural network (SPNN)⁶, sigma-pi-sigma neural network (SPSNN)⁷, and pi-sigma neural network (PSNN)⁸ are HONNs. The SPNN was applied to research aircraft speed/altitude control by Sun Kim⁹. Fan Q.W. et al.¹⁰ proved the convergence of the SPSNN, and applied the SPSNN to function approximation and classification problems. Qian Kang et al.¹¹ analysed the convergence of the SPSNN with smoothing Lasso regularization and adaptive momentum, and applied it to the same field as that in¹⁰. Nayak S.C. et al.¹² used a PSNN to build a model of chaotic crude oil prices and predicted the time series. A new PSNN with sparse constraints was established by Zhang Y.Q. to overcome the phenomenon of oscillation in the learning process, and improve the learning efficiency of the network¹³. Pan Wei et al.¹⁴ used a PSNN in the field of magnetic shape memory alloy actuators. In a recent book¹⁵, the PSNN was combined with a swarm’s metaheuristic algorithm, and demonstrated the effectiveness of data classification. In the future, a new HONN network is expected to be built; it will likely be used in various fields to avoid the extensive memory requirement problem⁷.

For any neural network structure, the characteristics of stability and convergence are among the most significant indices of effectiveness. Therefore, many studies have focused on the stability and convergence of neural networks. Qinwei Fan et al.¹⁰ proved the convergence result of the sigma‒pi‒sigma neural network, and then concluded that the error function decreases monotonically and tends to zero during the network’s training process. In¹⁶, the authors studied the convergence of a gradient neural network and generalized the inverse of a gradient neural network-based dynamic system calculation. To avoid time-consuming numerical solutions, the trajectory of the state variables in the considered dynamic system can be generated, which reflects the value of the convergence of the neural network successfully. Xiao Lin et al.¹⁷ extended the zeroing neural network method to solve the inverse problem of a dynamic quaternion numerical matrix. They also accelerated the convergence speed of the network with a novel nonlinear activation function and achieved scheduled time convergence. Based on the coyote optimization algorithm, Liu Wei et al.¹⁸ proposed a new shallow neural network evolution method and demonstrated the effectiveness of optimizing and updating the weights and thresholds of the BP neural network, which theoretically proved that the network model could quickly converge to attain the optimal solution. In¹⁹, the authors designed an inversion method for rock mechanic parameters based on a BP neural network, established a nonlinear mapping relationship between measured stress values and rock mechanic parameters, and conducted network training, and the improved neural network algorithm easily converged. With the classic BP neural network, Chen Bin²⁰ theoretically analysed the convergence of the network model in 2004. Wang Jian²¹ analysed and studied the convergence of the BP neural network in 2017 under a sparse response regulation scenario. Based on different scenarios, the convergence of the networks (BP neural networks, as an example^18,20,21) can be continuously studied for almost 20 years.

Therefore, this paper proposes a novel neural network, named the RSPSNN, which can implement static mapping and a radial basis function network, and likely produce a multi-layer neural network. The RSPSNN includes several characteristics similar to those of the dynamic ridge polynomial neural network (DRPNN)²², which includes a recurrent unit of ridge polynomials. While DRPNN applies polynomial terms, the RSPSNN uses self-generated suitable function terms that can feed forwards the information between the previous time and the recent time. Owing to its flexibility, it is expected to have more powerful modelling capabilities. Therefore, in this work, the new network, the RSPSNN, is constructed. The main contributions of this paper can be summarized as follows:

(1)
A novel network model of the RSPSNN is devised, which contains recurrent and higher-order characteristics.
(2)
The stability convergence of the RSPSNN is shown, which is one of the most significant properties of recurrent networks.
(3)
The RSPSNN is used in various fields to examine the effectiveness and capacity of the new structure.

The remainder of this paper is organized as follows. In "New neural network structure methods", the novel structure of the RSPSNN is proposed. "Convergence of the stability of the new structure" presents the learning rules and procedure for training this network. "Evaluation of the new structure for different applications" presents the results of the stability convergence, in which detailed proofs are listed clearly. In "Function approximation problem", numerical experiments demonstrate the effectiveness of the RSPSNN. Finally, brief conclusions are drawn in "Prediction problem". The introduction does not include a heading, and expands on the background of the topic, usually including in-text citations¹.

New neural network structure methods

In this section, a novel recurrent and higher-order neural network is proposed. The network is named the recurrent sigma-pi-sigma neural network (RSPSNN), which is an extension of the feedforward function for the ordinary sigma-pi-sigma neural network (SPSNN)⁷, as shown Fig. 1. In Fig. 1, the black part is the SPSNN. With an extension of the red line, the entire structure of the RSPSNN can be obtained. The novel proposed RSPSNN contains recurrent and higher-order characteristics. For the recurrent characteristic, the output at the current time is used as a part of the input at the subsequent time. For the higher-order characteristic, the novel structure contains the neuron Σ and the neuron ∏, whereas the traditional neural networks include only the neuron Σ.

The whole network contains six parts: the first part is the input layer, the second part is the $\sum_{1}$ layer, the third part is the $\prod$ layer, the fourth part is the $\sum_{2}$ layer, the fifth part is the output layer, and the last part is the feedforward chain. The former five parts are the SPSNN, and, based on SPSNN, the RSPSNN is combined with the last part. The structure of the SPSNN⁷ can improve the extensive memory requirement and reduce the learning difficulty in the training progress of the generally available types of neural networks. The network is very attractive because it requires a small amount of memory. Therefore, as introduced in the former content, the RSPSNN also has small memory requirements, which not only inherits the existing advantages of the SPSNN but also introduces a new recurrent advantage. For complexity, under the condition of the same number of iterations, only one new element is added to the input layer, and the others are the same. Therefore, the effect on complexity can be negligible.

The RSPSNN is made of various high-order neurons. Figure 1 shows the topology of the RSPSNN, which is composed of an input layer, an $\sum_{1}$ layer, an $\prod$ layer, an $\sum_{2}$ layer, an output layer and a recurrent chain. We suppose that $t$ represents the time step; $P + 1$, $N$, and $Q$ are the numbers of input layers, $\sum_{1}$ layers, and $\prod$ layers, respectively; and $f_{qnp} ()$ is the activation function for training the network. $y(t - 1)$ represents the output value of this network at the previous time step. The detailed notations used for the variables are shown in Table 1.

Table 1 Notations.

Full size table

Therefore, the overall input $I(t)$ is referred to as $I(t) = [I_{1} (t),I_{2} (t), \cdots ,I_{p} (t), \cdots ,I_{P + 1} (t)]^{T}$, where

$$I_{p} (t) = \left\{ {\begin{array}{*{20}c} {x_{p} (t),1 \le p \le P} \\ {y(t - 1),p = P + 1} \\ \end{array} } \right.$$

(1)

Most notably, when $t = 1$, the initial value of $y(0) = 0$. That is, in the 1st step of the iteration, the input elements include only $x_{p} (t = 1)$, without a recurrent part $y(0)$. From the 2nd iteration, the recurrent element $y(t - 1)$ just joins the input $I_{p} (t)$.

We denote $W_{0} = [w_{01} ,w_{02} , \cdots ,w_{0Q} ]$ as the weight vector between the $\sum_{2}$ layer and the $\prod$ layer, where $Q$ is defined as the number of nodes of the $\prod$ layer, and where $W_{n} = [w_{n1} ,w_{n2} , \cdots ,w_{nP} ,w_{n(P + 1)} ]$ is the $n - th$ weight vector of the $\sum_{1}$ layer. The linking weights between the $\sum_{1}$ layer and the $\prod$ layer are fixed as 1.

For the $\sum_{1}$ layer, $\varepsilon (t) = (\varepsilon_{1} (t), \cdots ,\varepsilon_{N} (t))$ is the variable of the $\sum_{1}$ layer, and $N$ is the number of neurons in the $\sum_{1}$ layer. Therefore, the n-th output of the $\sum_{1}$ layer can be represented by

$$\varepsilon_{n} (t) = W_{n} I_{n} (t) = \sum\limits_{p = 1}^{P + 1} {w_{np} I_{p} (t)}$$

(2)

For the $\sum_{1} \sim \prod$ layer, the state of connectivity can dynamically change. The various neurons can be fully connected, whereas other neurons can be linked sparsely. It is clear that the number of product neurons with full connectivity is $2^{N}$, whereas the number with sparse connectivity is less than $2^{N}$.

Let $A_{q} (1 \le q \le Q)$ be the set of neurons about the $\sum_{1}$ layer linked with the $q - th$ neuron of the $\prod$ layer, and let $B_{n} (1 \le n \le N)$ be the set of neurons about the $\prod$ layer linked with the $n - th$ neuron of the $\sum_{1}$ layer. For an arbitrary value of $a$, we denote $\phi (a)$ as the number of elements in $a$ and obtain

$$\sum\limits_{q = 1}^{Q} {\phi (A_{q} )} = \sum\limits_{n = 1}^{N} {\phi (B_{n} )}$$

(3)

For the $\prod$ layer, $\delta (t) = [\delta_{1} (t), \cdots ,\delta_{Q} (t)]^{T}$ is defined as the output result, so

$$\delta_{q} (t) = \prod\limits_{{i \in A_{q} }} {\varepsilon_{i} } = \prod\limits_{{i \in A_{q} }} {W_{i} } I_{i} (t) = \prod\limits_{{i \in A_{q} }} {\sum\limits_{p = 1}^{P + 1} {w_{np} I_{p} (t)} }$$

(4)

For the output layer, the final and actual outputs of the RSPSNN are as follows:

$$y(t) = f(W_{0} \delta (t)) = f(\sum\limits_{q = 1}^{Q} {w_{0q} \delta_{q} (t)} )$$

(5)

where $f( \cdot )$ is the activation function.

Network learning.

We denote $\{ I^{j} (t),O^{j} (t)\}_{j = 1}^{J}$ as the training sample set, where $I^{j} (t)$ is the set of the corresponding input, and where $O^{j} (t)$ is the set of the corresponding output. The mean squared error (MSE) function can be obtained as follows:

$$E(w) = \frac{1}{2}\sum\limits_{j = 1}^{J} {(y^{j} (t) - O^{j} (t))^{2} } = \frac{1}{2}\sum\limits_{j = 1}^{J} {(f(W_{0} \delta^{j} (t)) - O^{j} (t))^{2} }$$

(6)

In addition, for ${\raise0.7ex\hbox{${\partial E(w)}$} \!\mathord{\left/ {\vphantom {{\partial E(w)} {\partial W_{0} }}}\right.\kern-0pt} \!\lower0.7ex\hbox{${\partial W_{0} }$}}$ and ${\raise0.7ex\hbox{${\partial E(w)}$} \!\mathord{\left/ {\vphantom {{\partial E(w)} {\partial W_{n} }}}\right.\kern-0pt} \!\lower0.7ex\hbox{${\partial W_{n} }$}}$, the two partial derivatives of $E(w)$ can be obtained as follows:

$$\frac{\partial E(w)}{{\partial W_{0} }} = \sum\limits_{j = 1}^{J} {(f(W_{0} \delta^{j} (t)) - O^{j} (t))} \cdot f_{j}{\prime} (W_{0} \delta^{j} (t)) \cdot \delta^{j} (t)$$

(7)

$$\frac{\partial E(w)}{{\partial W_{n} }} = \sum\limits_{j = 1}^{J} {(f(W_{0} \delta^{j} (t)) - O^{j} (t))} \cdot f_{j}{\prime} (W_{0} \delta^{j} (t)) \cdot \frac{{\partial \delta^{j} (t)}}{{\partial W_{n} }}$$

(8)

where

$$\frac{{\partial \delta^{j} (t)}}{{\partial W_{n} }}\left\{ {\begin{array}{*{20}c} {\prod\limits_{{i \in A_{q} \backslash n}} {\varepsilon_{i} \frac{{\partial \varepsilon_{i} }}{{\partial W_{n} }}} = \prod\limits_{{i \in A_{q} \backslash n}} {\varepsilon_{i} I_{i}^{j} (t)} ,q \ne 1\& n \in A_{q} } \\ {0,q \equiv 1orn \notin A_{q} } \\ \end{array} } \right.$$

(9)

For the sequence $\{ W^{k} \}$ of weights, the iteration rules of updating can be obtained as

$$W^{k + 1} = W^{k} + \Delta W^{k}$$

(10)

with

$$\Delta W_{0}^{k} = - \eta \frac{\partial E(w)}{{\partial W_{0} }}$$

(11)

$$\Delta W_{n}^{k} = - \eta \frac{\partial E(w)}{{\partial W_{n} }}$$

(12)

where $W^{0}$ is assigned an initial weight value, and where $\eta$ is the learning rate during the training process of the network. In this paper, $\eta$ is defined as a fixed value.

Convergence of the stability of the new structure

The recurrent neural network can model arbitrary dynamical systems^22,23, which is one of its most beneficial properties. Hence, the recurrent chain in the RSPSNN is expected to play an advantageous role. The properties of the RSPSNN, which includes recurrent connections, complexity, and difficulty in training the network, also exist. Compared with the ordinary SPSNN, training the RSPSNN is more difficult. The error between the output value and target value may not decrease monotonically, so the gradient algorithm and the state of stability convergence may be more complicated. Second, the two partial derivatives ${\raise0.7ex\hbox{${\partial E(w)}$} \!\mathord{\left/ {\vphantom {{\partial E(w)} {\partial W_{0} }}}\right.\kern-0pt} \!\lower0.7ex\hbox{${\partial W_{0} }$}}$ and ${\raise0.7ex\hbox{${\partial E(w)}$} \!\mathord{\left/ {\vphantom {{\partial E(w)} {\partial W_{n} }}}\right.\kern-0pt} \!\lower0.7ex\hbox{${\partial W_{n} }$}}$ of $E(w)$ can be related to the output and the gradient. Therefore, the computation of the gradient and weights is more difficult.

To overcome the convergence problems in the proposed new network, the stability of the RSPSNN is derived to illustrate that the network has stable convergence. The stability of the RSPSNN in terms of convergence is described in detail below.

$$\max (\sum\limits_{q = 1}^{Q} {\sum\limits_{{i \in A_{q} }} {\left| {w_{i(P + 1)} } \right|} } \cdot \prod\limits_{\begin{subarray}{l} s \in A_{q} \\ s \ne i \end{subarray} } {(\left| {\sum\limits_{p = 1}^{P} {w_{sp} } } \right| + \left| {w_{s(P + 1)} } \right|)) < \frac{1}{{\max \left| {f^{\prime}} \right|}}}$$

The detailed process of above proof is seen from Appendix in ‘Supplementary Material’ file. For the RSPSNN, the final purpose is to search the optimal weights via the iteration rules of weights. It is expected to reach the unique state of equilibrium for the network such that the actual output values are as close as possible to the ideal output values.

Evaluation of the new structure for different applications

To illustrate the validity of the RSPSNN, we use MATLAB 2018a software to execute the numerical experiments. First, we build a novel network structure for the RSPSNN, where the numbers of input neurons, $\sum_{1}$ layer neurons, $\prod$ layer neurons, and $\sum_{2}$ layer neurons are 50, 12, 3, and 1, respectively. For the $\sum_{1} \sim \prod$ layer, 12 nodes in the $\sum_{1}$ layer are divided into three groups (3,4,5), the three nodes in the first group are fully connected with the first node in the $\sum_{2}$ layer, the four nodes in the second group are fully connected with the second node in the $\sum_{2}$ layer, and the five nodes in the third group are fully connected with the third node in the $\sum_{2}$ layer. We choose the tanh function ($g(x) = \frac{{e^{x} - e^{ - x} }}{{e^{x} + e^{ - x} }}$) as the activation function, and the learning rate is $\eta = 0.01$. The initial weights are chosen from [-0.05,0.05] for $W_{0}$ and $W_{n}$, and 800 input points are randomly chosen from [-4.0,4.0]. The training termination is determined by whether the maximum number of iterations reaches 50,000, or if $MSE$ is less than 0.01.

Function approximation problem

To implement function approximation, we first choose the following sine function to verify the approximation ability of the proposed RSPSNN.

$$y = \sin (x)$$

Figure 2 shows the curve of the MSE in the process of function approximation. When the number of iterations reaches 33924, the MSE is equal to 0.0078, which meets the criteria for stopping the iteration.

Figure 3 shows that the approximation function of the RSPSNN is the line with ‘*’, and the actual function is the dotted line. For the approximation results, the two lines are very similar: the vast majority of lines overlap, and the accuracy rate is 86.7%. The RSPSNN demonstrates a good approximation.

Prediction problem

For the prediction problem, we choose the Mackey-Glass (MG) time series^7,24 to illustrate the effectiveness, and to guarantee the ability of the RSPSNN. Therefore, in this paper, we also use it to assess the validity of the RSPSNN. The MG equation is as follows.

$$y(k + 1) = (1 - b)y(k) + a\frac{y(k - \tau )}{{1 + y^{10} (k - \tau )}}$$

(13)

where $a = 0.2$, $b = 0.1$, and $\tau = 17$.

The MG equation is complicated because the time delay $\tau$ is a variable. For ease of presentation, the initial value of $y(0)$ is denoted as 1.8. Figure 4 shows the state space distribution of the MG time series. The objective is to model the time series and predict the value of a time series at some future time¹⁷.

In this experiment, one-step prediction is implemented first, in which the real value at time $k + 1$ can be predicted by the values before and at time $k$. We produce 500 training data points and 500 testing data points for the experiment. In Fig. 5, the target output (1000 data points) is shown as a red line, whereas the predicted output result from the RSPSNN is shown as a blue line. The prediction result is very accurate because the red line and the blue line are very close. The red line nearly overlaps. For the 500 test data points (from 501 to 1000), the prediction result is also very accurate. During training and testing, the error variation is shown in Fig. 6. When the number of iterations reaches 3033, the MSE is equal to 0.0096.

Another long-term prediction is also examined. For this prediction, the output value is fed back as the input value for computing the future values of the RSPSNN. Like the description in the previous content, 1000 points are divided into 800 training data points and 200 testing points (Fig. 7). Figure 7 shows the condition of error, in which when the number of iterations reaches 567, the MSE is equal to 0.0085. Figure 8 shows the validity of the long-term prediction. The red line represents the target output (points 1 to 800 represent the target output of the training data, points 801 to 1000 represent the target output of the testing data), whereas the blue line represents the predicted output (the former 800 points represent the training data, and the following 200 points represent the test data). For points 801 to 1000, when the prediction result is compared with the target result, there are 190 points for meeting the condition, and the error is less than or equal to 0.01. Therefore, the prediction accuracy rate reaches 95%. Notably, the RSPSNN has the ability to make good predictions.

Furthermore, compare RSPSNN against other cutting-edge architectures like Long Short-Term Memory (LSTM) networks for comprehensive evaluation. Like the description in the previous content, the whole data (1000 points) are also divided into 80 percentage and 20 percentage. Figure 9 shows the prediction results of the LSTM. For points 1 to 800, the red line represents the target output of the training data, while the blue line represent the predicted output of the training data. And points 801 to 1000 in red line represent the target output of the testing data, the corrending 200 points in blue line mean the predicted output of the testing data. And Figure 10 shows the error curve of MSE. It is illustrated that the MSE is less to 0.01 when the number of iterations reaches 115. Compared with the results of RSPSNN, the LSTM is provided with the better performance for the series problems. The same error precision, the less number of iterations.

Parity problem

For the parity problem, we use the classification benchmark as a sample, which is in n-dimensional space with $2^{n}$ elements. In this experiment, the 4-dimensional parity problem is used as an example for evaluating the model of the RSPSNN. The 4-dimensional parity function is composed of $2^{4} = 16$ disparate vectors, and Table 2 presents the sixteen sets of data for the inputs and target outputs.

Table 2 Example of a 4-bit parity problem.

Full size table

For this problem, the outputs are always 1 or 0, and the inputs are always 1 or -1. In the process of training the RSPSNN, we require more error precision small values to show the details of the MSE. Therefore, in this experiment, the error precision is increased to 0.0001 from 0.01.

The evolution properties of performance during the learning process are shown in Figure 11 and Figure 12. Figure 11 shows the curve of the MSE. When the number of iterations reaches 2062, the MSE is equal to 9.9998e-05. In Figure 12, the target outputs are the 16 red points, whereas the training outputs are the 16 blue points, which overlap closely. Notably, the RSPSNN is able to address the parity problem.

Classification problem

For the classification problem, the RSPSNN with multiple layers is suitable for solving this problem. In this experiment, we use four classifiers with two 5-variable functions. The two functions are

$$f_{1} (x_{1} ,x_{2} ,x_{3} ,x_{4} ,x_{5} ) = x_{2} x_{3} \exp [(x_{3} x_{4} - x_{5} )^{2} ] - 2(x_{1} x_{4} - 1)^{2} x_{5} + x_{3} - x_{4} - 0.4$$

(14)

$$f_{2} (x_{1} ,x_{2} ,x_{3} ,x_{4} ,x_{5} ) = x_{2} x_{3} (2x_{4} x_{5} - 1) - \sin (1.5\pi x_{1} ) - 1.0$$

(15)

where $0 \le x_{1} ,x_{2} ,x_{3} ,x_{4} ,x_{5} \le 2.0$.

With these two functions, we define the four classifiers, where $f_{1} (x_{1} ,x_{2} ,x_{3} ,x_{4} ,x_{5} )$ and $f_{2} (x_{1} ,x_{2} ,x_{3} ,x_{4} ,x_{5} )$ are used as the boundaries:

Classifier I: $f_{2} \ge 0$ and $f_{1} \ge 0$.

Classifier II: $f_{2} < 0$ and $f_{1} \ge 0$.

Classifier III: $f_{2} \ge 0$ and $f_{1} < 0$.

Classifier IV: $f_{2} < 0$ and $f_{1} < 0$.

For ease of display, we denote the three variables as 1.0. Figure 13 shows the distributions of the four classifiers on the $x_{1} \sim x_{2}$ variables when $x_{3}$, $x_{4}$, and $x_{5}$ are fixed at 1.0.

We randomly select 1000 values in [0,2.0] for $x_{1}$ and $x_{2}$, respectively, and compose them as 1000 points [$x_{1}$, $x_{2}$] in sequence. The RSPSNN is subsequently used for training the classification problem. Figure 14 shows the curve of the MSE, and Table 3 shows the classification accuracy rate.

Table 3 Classification accuracy rates.

Full size table

In Figure 14, when the number of iterations reaches 1964, the MSE is equal to 0.0100. In Table 2, the initial 1000 points are divided into four classifiers with Eq. (14) and Eq. (15), in which the data sizes of I, II, III, and IV are 78, 31, 111, and 780, respectively. Furthermore, those classification data are trained by the RSPSNN, and the corresponding training results are 21, 12, 109, and 770 correctly classified; the total number of correct classifications is 912. Therefore, the final accuracy rate is 91.2%, which demonstrates the classification capabilities of the RSPSNN.

Image simulation

For the image simulation process, we use the RSPSNN to simulate the following original picture in Figure 15. The effect of the RSPSNN on the simulation of the image is shown in Figure 16 below. Figure 16 shows the image simulation results after 50 iterations, for which the simulation accuracy rate is 62.5%. The cat is clearly shown under the condition of a few iterations. With more iterations, the effectiveness becomes clearer.

Finally, to illustrate the efficiency and accuracy ability of the proposed methodology, the comparative results between the RSPSNN and SPSNN are shown in Table 4.

Table 4 The accuracy rate results of the RSPSNN and SPSNN.

Full size table

The above table shows that the overall accuracy rates of the RSPSNN are superior to those of the SPSNN. For the parity problem and image simulation, the accuracy rates are similar. For the function approximation and classification problem, slight advantages can be seen between the RSPSNN and SPSNN. Notably, the result of the RSPSNN is 22.8% higher than the result of the SPSNN for the prediction problem. That is, the effectiveness and capacity of the new structure are demonstrated by its recursive nature. Its most beneficial property is the memory function, which excels in the prediction problem.

Conclusion

In this work, a novel neural network is proposed. First, we successfully constructed the RSPSNN using the SPSNN. We then proved the convergence stability for the new network structure, which is one of the most important profitable characteristics. In addition, the RSPSNN was applied to four successful experiments: the function approximation problem, prediction problem, parity problem, classification problem, and for image simulation. Finally, the comparison results between the RSPSNN and SPSNN are shown, which verifies the efficiency and accuracy ability of the proposed network. For the construction of RSPSNN, only one new element is added in input layer, and others are same, so the effect on complexity can be negligible But, because there is one more element in the input layer in RSPSNN than the SPSNN’s, the computation needs more scalability for the potential limitations. One iteration needs one more size of scalability, N inerations need more N*1 size of scalability.

It is anticipated that the RSPSNN will be more widely used in other fields for theoretical analysis and improvement. The superior application of the RSPSNN can be extended to many practical problems worldwide. In future work, we plan to research the novel characteristics of the monotonicity of the proposed network and perform a more comprehensive comparison with other existing neural network models to highlight the advantages of the RSPSNN more clearly.

Data availability

The data are available from the corresponding author on reasonable request.

References

Giles, C. L. & Maxwell, T. Learning, invariance, and generalization in a high-order neural network. Appl. Opt. 26(23), 4972–4978 (1987).
Article ADS CAS PubMed MATH Google Scholar
E. Artyomov and O. Y. Pecht, Modified high-order neural network for pattern recognition, Pattern Recognition Letters(2004).
L. Mirea and T. Marcu, System identification using Functional-Link Neural Networks with dynamic structure, In 15th triennial world congress, Barcelona, Spain(2002).
Z.Y. Dong, X. Yang and X..Zhang, New Criteria for Global Exponential Stability of Discrete-Time High-Order Neural Networks with Time-Varying Delays, The 40th China Control Conference(2021).
Patra, J. C. & Bos, A. V. D. Modeling of an intelligent pressure sensor using functional link artificial neural networks. ISA Trans. 39, 15–27 (2000).
Article CAS PubMed MATH Google Scholar
D.E. Rumelhart, J.L. McClell and the PDP Research Group, Parallel distributed processing-explorations in the microstructure of cognition, Cambridge: MIT Press(1986).
C.K. Li, A Sigma-Pi-Sigma neural network (SPSNN), Neural Processing Letters, 1–19(2003).
Shin, Y. & Ghosh, J. The Pi-Sigma network: an efficient higher-order neural network for pattern classification and function approximation. International Joint Conference on Neural Networks 2(1), 13–18 (1991).
MATH Google Scholar
Sun, K. & Horspool, K. R. Aircraft Speed/Altitude Control Using a Sigma-Pi Neural Network, AIAA Scitech 2020 Forum (2020).
Fan, Q. W. et al. Convergence analysis for sigma-pi-sigma neural network based on some relaxed conditions. Inf. Sci. 585, 70–88 (2022).
Article MATH Google Scholar
Kang, Q., Fan, Q. W. & Zurada, J. M. Deterministic convergence analysis via smoothing group Lasso regularization and adaptive momentum for Sigma-Pi-Sigma neural network. Inf. Sci. 553, 66–82 (2021).
Article MathSciNet MATH Google Scholar
S.K. Nayak, A fireworks algorithm based Pi-Sigma neural network (FWA-PSNN) for modelling and forecasting chaotic crude oil price time series, EAI Endorsed Transactions on Energy Web(2020).
Zhang, Y. Q., Fan, Q. W. & He, X. S. Research on Pi-Sigma Neural Network Gradient Learning Algorithm with Regular Terms. China Computer & Communication 32(1), 38–41 (2020).
ADS MATH Google Scholar
Pan, W. et al. Hysteresis Modeling for Magnetic Shape Memory Alloy Actuator via Pi-Sigma Neural Network with Backlash-Like Operator. Acta Physica Polonica Series a 137(5), 634–636 (2020).
Article ADS CAS MATH Google Scholar
N. Panda and S. Majhi, Effectiveness of Swarm-Based Metaheuristic Algorithm in Data Classification Using Pi-Sigma Higher Order Neural Network, Progress in Advanced Computing and Intelligent Engineering, 77–88(2021).
P.S. Stanimirović, M.D. Petković and D. Mosić, Exact solutions and convergence of gradient based dynamical systems for computing outer inverses, Applied Mathematics and Computation, 412(C)(2022).
Xiao, L. et al. Zeroing Neural Networks for Dynamic Quaternion-Valued Matrix Inversion. IEEE Trans. Industr. Inf. 18(3), 1562–1571 (2022).
Article MathSciNet MATH Google Scholar
Liu, W. et al. Research on Shallow Neural Network Evolution Method Based on Improved Coyote Optimization Algorithm. Chinese Journal of Computers 44(06), 1200–1213 (2021).
MATH Google Scholar
Shang, F. H., Wang, W. Q. & Cao, M. J. Shale in-situ stress prediction model based on improved BP neural network. Computer Technology and Development 31(07), 164–170 (2021).
MATH Google Scholar
B. Chen, The convergence of BP neural network and its application in physical property modeling, Computer Application and Software(2004).
Wang, J. et al. Convergence analysis of BP neural networks via sparse response regularization. Appl. Soft Comput. 61, 354–363 (2017).
Article MATH Google Scholar
Rozaida, G., Hussain, A. J. & Panos, L. Dynamic Ridge Polynomial Neural Network: Forecasting the univariate non-stationary and stationary trading signals. Expert Syst. Appl. 38, 3765–3776 (2011).
Article MATH Google Scholar
Hussain, A. J. & Liatsis, P. Recurrent pi-sigma networks for DPCM image coding. Neurocomputing 55, 363–382 (2002).
Article MATH Google Scholar
Mackey, M. & Glass, L. “Oscillation and chaos in physiological control systems. Science 197, 287 (1977).
Article ADS CAS PubMed MATH Google Scholar

Download references

Acknowledgements

This work was supported in part by the Special Basic Cooperative Research Programs of Yunnan Provincial Undergraduate Universities Association under Grant No. 202101BA070001-149/202101BA070001-150, in part by the Project Information: Specialized Project on Teacher Education of Yunnan Provincial Education Science Planning under Grant No. GJZ2412, and in part by the Yunnan Students’ Innovation and Entrepreneurship Training Program under Grant No. S202311393002. And we thank Yujiao Wang for improving language in the manuscript, who is the professor at the Kunming University.

Author information

Authors and Affiliations

College of Information Engineering, Kunming University, Kunming, 650214, China
Fei Deng, Kaiguo Qian, Jing Yu & Xuanxuan Li
Key Laboratory of Data Governance and Intelligent Decision in Universities of Yunnan, Kunming, 650214, China
Fei Deng, Kaiguo Qian, Jing Yu & Xuanxuan Li
Yunnan Electric Power Test and Research Institute Group Co., Ltd, Kunming, 650214, China
Shibin Liang

Authors

Fei Deng
View author publications
Search author on:PubMed Google Scholar
Shibin Liang
View author publications
Search author on:PubMed Google Scholar
Kaiguo Qian
View author publications
Search author on:PubMed Google Scholar
Jing Yu
View author publications
Search author on:PubMed Google Scholar
Xuanxuan Li
View author publications
Search author on:PubMed Google Scholar

Contributions

F.D. wrote the main manuscript text and conceived the experiment(s), S.L., K.Q., J.Y. and X.L. conducted the experiment(s), and F.D. performed statistical analysis and figure generation. All authors reviewed the manuscript.

Corresponding author

Correspondence to Fei Deng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics declarations

The authors confirm that all experiments were performed in accordance with relevant named guidelines and regulations.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Deng, F., Liang, S., Qian, K. et al. A recurrent sigma pi sigma neural network. Sci Rep 15, 588 (2025). https://doi.org/10.1038/s41598-024-84299-y

Download citation

Received: 26 July 2024
Accepted: 23 December 2024
Published: 02 January 2025
DOI: https://doi.org/10.1038/s41598-024-84299-y