Introduction

Autoregressive exogenous (ARX) models are used to describe the dynamic relationship between inputs and outputs in discrete systems1,2,3. These models can effectively capture the characteristics of time-series data and predict future time-series. Consequently, ARX models are widely used in various fields, such as precision machining4, heat conduction5, composite materials6, artificial intelligence7, the petrochemical industry8, and weather prediction9.

Parameter identification is a crucial technique for estimating the unknown parameters of a model by observing output data and input data. Using identification technology to construct models results in high accuracy, and it has thus garnered widespread attention10,11. Various researchers have explored parameter identification methods for ARX models. For example, Dong et al.12 developed a weighted hierarchical stochastic gradient (SG) descent algorithm to improve the parameter convergence accuracy. Li et al.13 studied a decoupled identification scheme based on neural fuzzy network and ARX model. Tu et al.14 proposed a conjugate gradient descent method that resulted in accelerated convergence. Jing et al.15 established an ARX model using a variable step-size SG descent algorithm. A parameter learning scheme using multi-signals was proposed16, which improve the model identification accuracy. Liang et al.17 extended the Nesterov accelerated gradient descent algorithm into a multi-innovation form and used the multi-innovation matrix to accurately identify ARX model parameters. Li et al.18 derived a multi-innovation extended SG descent method to improve the parameter estimation accuracy. Ding et al.19 applied the multi-innovation stochastic gradient (MISG) descent algorithm to the nonlinear ARX systems identification. Chen et al.20 used Kalman filtering to estimate the ARX model output and combined the expectation maximisation algorithm to estimate parameters. Additionally, a Shannon principle-based forgetting factor gradient descent algorithm was proposed21, which improve the parameters convergence speed. Li et al.22 used correlation analysis method and adaptive Kalman filter to estimate the system parameters with measurement noises. Chen et al.23 developed an improved multi-step gradient iteration algorithm based on the Kalman filter to identify ARX models with missing output data, resulting in enhanced parameter identification accuracy. A separation identification algorithm was proposed24, and combined with filtering technology to identify the system parameters with colored noise. Li et al.25 combined with correlation analysis theory and the data filtering technology to estimate the multi-input multi-output system parameters. Stojanovic et al.26 estimated model parameters using the Masreliez–Martin filter. Other researchers27 decomposed the ARX model into two subsystems and establish a two-stage iteration algorithm. Wang et al.28 developed a three-stage subsystem generalised least squares identification algorithm. The SG algorithm was improved by incorporating a convergence index29, resulting in enhanced parameter estimation accuracy. Additionally, Chen et al.30 researched a particle filter based on SG identification algorithm. Li et al.31 developed a long short-term memory networks identification algorithm, and combined with the advantages of SG descent and root mean square propagation algorithms, an adaptive momentum estimation technique is created to optimize the networks parameters.

Notably, among the abovementioned methods for ARX model parameter identification, the gradient descent algorithm has been widely used owing to its broad applicability and ease of implementation in engineering scenarios. However, traditional integer-order SG descent algorithms rely on the convergence speed and direction of the gradient, resulting in low speed and accuracy of parameter identification. Compared with the integer-order gradient, the order selection of fractional-order gradient is arbitrary, which breaks the limitation of integer-order differential. Therefore, the fractional-order gradient has higher flexibility, which can improve the convergence speed and convergence accuracy of the algorithm32,33. In recent years, many scholars have focused on the fractional order gradient. For example, Chen et al.34 ensured that the fractional gradient could converge to the minimum point by changing the initial integration point. Wei et al.35 derived three forms of fractional-order gradients and proved their convergence. Other researchers36 established a fractional order stochastic gradient (FOSG) descent algorithm and an adaptive FOSG descent algorithm. Additionally, the hierarchical principle was used to design a fractional hierarchical gradient algorithm37.

The FOSG directly extends the integer-order gradient into a fractional order. However, because of the significant dependence of a single fractional order gradient on the choice of fractional order, improper order selection might reduce the identification accuracy and speed of the algorithm. Moreover, the fractional gradient typically uses only the current moment information for identifying model parameters of the next moment, resulting in underutilisation of identification information. To address this limitation, this paper proposed the multi-innovation additional fractional gradient descent identification algorithm. The main contributions of this study can be summarized as follows:

  • The proposed algorithm uses an additional fractional-order gradient and the integer-order gradient synchronously to identify model parameters, thereby accelerating the convergence speed of the algorithm.

  • The multi-innovation principle is introduced to expand the integer-order gradient and additional fractional-order gradient into multi-innovation matrices.

  • The proposed method is compared with SG, FOSG and MISG to verify its superiority in convergence speed and convergence accuracy.

  • The proposed algorithm can avoid the inaccurate parameters estimation result caused by improper fractional order selection.

The remainder of this paper is organised as follows: The mathematical model of an ARX system is presented in Section “Autoregressive exogenous models”. Section “Additional fractional gradient descent identification algorithm” outlines the additional fractional gradient descent identification algorithm. Section “Multi-innovation additional fractional gradient descent identification algorithm and convergence analysis” describes the multi-innovation additional fractional gradient identification algorithm and analysis of its convergence. Section “Simulation and experiment” presents details of the simulation and experiment for evaluating the algorithm performance. Section “Conclusion” presents the concluding remarks.

Autoregressive exogenous models

The ARX model is considered

$$ A(z^{ - 1} )y(t) = B(z^{ - 1} )u(t) + v(t) $$
(1)

where \(u\left( t \right)\) and \(y\left( t \right)\) are the system input and output, respectively. \(v\left( t \right)\) denotes white noise with finite variance \(\sigma_{v}^{2}\). The polynomials \(A(z^{ - 1} )\) and \(B(z^{ - 1} )\) can be expanded as

$$ A(z^{ - 1} ) = 1 + a_{1} z^{ - 1} + a_{2} z^{ - 2} + \cdots + a_{n} z^{ - n} $$
(2)
$$ B(z^{ - 1} ) = 1 + b_{1} z^{ - 1} + b_{2} z^{ - 2} + \cdots + b_{n} z^{ - n} $$
(3)

where n is a known parameter, and \(z^{ - n}\) denotes the difference operator.

The information vector \(\varphi (t)\) is defined using Eqs. (2) and (3). Combined with \(z^{ - n} y(t) = y(t - n)\) and \(z^{ - n} u(t) = u(t - n)\), \(\varphi (t)\) can be expressed as

$$ \varphi (t) = [ - y\left( {t - 1} \right), - y\left( {t - 2} \right), \cdots , - y\left( {t - n} \right),u\left( t \right),u\left( {t - 1} \right),u\left( {t - 2} \right), \cdots ,u\left( {t - n} \right)]^{{\text{T}}} $$
(4)

Next, the parameter vector \(\theta\) is defined, and parameters \(a_{1} ,a_{2} , \cdots ,a_{n} ,b_{1} ,b_{2} , \cdots ,b_{n}\) in Eqs. (2) and (3) are presented in the vector form

$$ \theta = \left[ {a_{1} ,a_{2} , \cdots ,a_{n} ,b_{1} ,b_{2} , \cdots ,b_{n} } \right]^{{\text{T}}} $$
(5)

Using Eqs. (2), (3), (4), (5), the Eq. (1) can be rewritten as

$$ y(t) = \varphi^{{\text{T}}} (t)\theta + v(t) $$
(6)

Additional fractional gradient descent identification algorithm

The additional fractional gradient descent identification algorithm adds an additional fractional gradient based on the integer gradient, and utilize the flexibility of fractional order to improve the convergence speed of the algorithm. The gradient descent algorithm is an iterative method. The unknown model parameters can be identified by minimising the criterion function through step-by-step iteration along the direction of gradient descent. The directions of the integer order and fractional order gradient correspond to the partial and fractional differentials of the function, respectively.

Fractional differential is an extension of integer differential. The differential order can be any real number or complex number. Compared with integer differential, fractional differential has three different definitions. In this study, the Caputo definition is used, expressed as38

$$ {}_{{t_{0} }}^{C} D_{{t_{f} }}^{\alpha } f(x) = \frac{1}{\Gamma (m - \alpha )}\int_{{t_{0} }}^{{t_{f} }} {\frac{{f^{(m)} (\tau )}}{{(t_{f} - \tau )^{\alpha - m + 1} }}} d\tau $$
(7)

where \(_{{t_{0} }}^{C} D_{{t_{f} }}^{\alpha }\) is the fractional calculus operator; \(\alpha\) is the order of the operator; \(m - 1 < \alpha < m,m \in {\mathbb{Z}}_{ + }\); \(t_{0}\) and \(t_{f}\) represent the low and high integral terminals, respectively; and \(\Gamma (\alpha )\) represents the gamma function.

The Taylor series expansion form of the Caputo differential is

$$ {}_{{t_{0} }}^{C} D_{{t_{f} }}^{\alpha } f(x) = \sum\limits_{i = 1}^{ + \infty } {\left( \begin{gathered} \alpha - m \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} i - m \hfill \\ \end{gathered} \right)} \frac{{f^{(i)} (t_{0} )}}{\Gamma (i + 1 - \alpha )}(t_{f} - t_{0} )^{i - \alpha } $$
(8)

where \(\left( \begin{gathered} p \hfill \\ q \hfill \\ \end{gathered} \right) = \frac{\Gamma (p + 1)}{{\Gamma (q + 1)\Gamma (p - q + 1)}},p \in {\mathbb{R}},q \in {\mathbb{N}}\).

The fractional-order gradient \(\nabla^{\alpha } f(x)\) can be expressed as

$$ \nabla^{\alpha } f(x) = \mu \sum\limits_{i = 1}^{ + \infty } {\left( \begin{gathered} \alpha - 1 \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} i - 1 \hfill \\ \end{gathered} \right)} \frac{{f^{(i)} (x)}}{\Gamma (i + 1 - \alpha )}(t_{f} - t_{0} )^{i - \alpha } $$
(9)

where \(0 < \alpha < 1\), and \(\mu\) is the step-size factor. To avoid complex numbers or a zero denominator, Eq. (9) can be rewritten as

$$ \nabla^{\alpha } f(x) = \mu \frac{{f^{(1)} (x)}}{\Gamma (2 - \alpha )}(\left| {t_{f} - t_{0} } \right| + \varepsilon )^{1 - \alpha } $$
(10)

where \(\varepsilon\) is a small non-negative number. Equation (10) can be extended to the case of \(1 < \alpha < 2\)39.

Then, the criterion function can be expressed as

$$ J(\theta ) = \frac{1}{2}\left[ {y(t) - \varphi^{{\text{T}}} (t)\theta } \right]^{2} $$
(11)

The additional fractional gradient descent algorithm, which incorporates both the fractional order and integer order gradients to identify \(\theta\), can be expressed as

$$ \hat{\theta }(t) = \hat{\theta }(t - 1) - \gamma \nabla J(\hat{\theta }(t - 1)) - \nabla^{\alpha } J(\hat{\theta }(t - 1)) $$
(12)

where \(\hat{\theta }\) is the estimate of parameter vector \(\theta\), and \(\gamma\) denotes the step-size factor of the integer-order gradient. Substituting Eq. (11) into Eq. (12) yields

$$ \begin{gathered} \hat{\theta }(t) = \hat{\theta }(t - 1) - \gamma \nabla J(\hat{\theta }(t - 1)) - \nabla^{\alpha } J(\hat{\theta }(t - 1)) \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} = \hat{\theta }(t - 1) + \gamma \varphi (t)[y(t) - \varphi^{{\text{T}}} (t)\hat{\theta }(t - 1)] + \mu {}_{{\hat{\theta }(t - 2)}}^{C} D_{{\hat{\theta }(t - 1)}}^{\alpha } J(\hat{\theta }(t - 1)) \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} = \hat{\theta }(t - 1) + \gamma \varphi (t)[y(t) - \varphi^{{\text{T}}} (t)\hat{\theta }(t - 1)] + \mu \frac{{\nabla J(\hat{\theta }(t - 1))}}{\Gamma (2 - \alpha )}(\left| {\hat{\theta }(t - 1) - \hat{\theta }(t - 2)} \right| + \varepsilon )^{1 - \alpha } \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} = \hat{\theta }(t - 1) + \gamma \varphi (t)[y(t) - \varphi^{{\text{T}}} (t)\hat{\theta }(t - 1)] + \mu \frac{{\varphi (t)[y(t) - \varphi^{{\text{T}}} (t)\hat{\theta }(t - 1)]}}{\Gamma (2 - \alpha )}(\left| {\hat{\theta }(t - 1) - \hat{\theta }(t - 2)} \right| + \varepsilon )^{1 - \alpha } \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} = \hat{\theta }(t - 1) + \gamma \varphi (t)[y(t) - \varphi^{{\text{T}}} (t)\hat{\theta }(t - 1)] + \mu \frac{{\varphi (t)[y(t) - \varphi^{{\text{T}}} (t)\hat{\theta }(t - 1)]\Xi (\hat{\theta },\alpha ,t)}}{\Gamma (2 - \alpha )} \hfill \\ \end{gathered} $$
(13)

where \(\mu \frac{{\varphi (t)[y(t) - \varphi^{{\text{T}}} (t)\hat{\theta }(t - 1)]\Xi (\hat{\theta },\alpha ,t)}}{\Gamma (2 - \alpha )}\) is the additional fractional gradient,\(\Xi (\hat{\theta },\alpha ,t) = {\text{diag}}\left\{ {\left[ {\left| {\hat{\theta }_{1} (t - 1) - \hat{\theta }_{1} (t - 2)} \right| + \varepsilon } \right]^{1 - \alpha } ,\left[ {\left| {\hat{\theta }_{2} (t - 1) - \hat{\theta }_{2} (t - 2)} \right| + \varepsilon } \right]^{1 - \alpha } , \cdots ,\left[ {\left| {\hat{\theta }_{l} (t - 1) - \hat{\theta }_{l} (t - 2)} \right| + \varepsilon } \right]^{1 - \alpha } } \right\}\), and l is the number of identification parameters.\(\gamma\) and \(\mu\) can be expanded into the following form

$$ \gamma = 1/\overline{r}(t),{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \overline{r}(t) = \overline{r}(t - 1) + \left\| {\varphi (t)} \right\|^{2} $$
(14)
$$ \mu = 1/r(t),{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} r(t) = \overline{r}(t - 1) + \left\| {\Xi (\hat{\theta },\alpha ,t)\varphi (t)} \right\|^{2} ,{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \overline{r}(0) = 1 $$
(15)

The output error is defined as in Eq. (13)

$$ e(t) = y(t) - \varphi^{{\text{T}}} (t)\hat{\theta }(t - 1) $$
(16)

Combining Eqs. (13)-(16), we can obtain the iterative formula of the additional fractional gradient descent algorithm, as follows

$$ \hat{\theta }(t) = \hat{\theta }(t - 1) + \frac{\varphi (t)e(t)}{{\overline{r}(t)}} + \frac{{\varphi (t)e(t)\Xi (\hat{\theta },\alpha ,t)}}{r(t)\Gamma (2 - \alpha )} $$
(17)

Equation (17) shows that the two gradients jointly identify the model parameters, which can avoid the algorithm not converging due to improper order selection of a single fractional gradient. Compared with a single integer gradient, the additional fractional gradient can make the parameters converge to the true value faster, even if the fractional order is small.

Multi-innovation additional fractional gradient descent identification algorithm and convergence analysis

Multi-innovation additional fractional gradient descent identification algorithm

The additional fractional gradient descent identification algorithm leverages the flexibility of fractional calculus to simultaneously identify unknown parameters through integer and fractional gradients. By simultaneously treating these two gradients, the convergence accuracy and speed of the algorithm can be improved. However, both these gradients use only the data at the current moment, resulting in underutilisation of the available information, thereby limiting the ability to improve the accuracy of parameter identifies during each iteration. Therefore, to further enhance the speed and accuracy of parameter identification, the multi-innovation principle is introduced, and a multi-innovation additional fractional gradient descent algorithm is established.

The multi-innovation principle involves combining current and past data to form a multi-innovation matrix, which is then used to estimate the current parameters40,41. In this context,\(y(t),\varphi^{{\text{T}}} (t),e(t)\), and \(v(t)\) are termed single innovations, which are then extended to the multi-innovation matrices \(Y(p,t),\Phi (p,t),E(p,t)\), and \(V(p,t)\), respectively.

$$ Y(p,t) = \left[ {y(t),y(t - 1), \cdots ,y(t - p + 1)} \right]^{{\text{T}}} $$
(18)
$$ \Phi (p,t) = \left[ {\varphi (t),\varphi (t - 1), \cdots ,\varphi (t - p + 1)} \right]^{{\text{T}}} $$
(19)
$$ V(p,t) = \left[ {v(t),v(t - 1), \cdots ,v(t - p + 1)} \right]^{{\text{T}}} $$
(20)
$$ E(p,t) = \left[ {e(t),e(t - 1), \cdots ,e(t - p + 1)} \right]^{{\text{T}}} $$
(21)

where p denotes the multi-innovation length. Combining Eqs. (16), (18), and (19), Eq. (21) can be rewritten as

$$ \begin{gathered} E(p,t) = \left[ \begin{gathered} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} y(t) - \varphi^{{\text{T}}} (t)\hat{\theta }(t - 1) \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} y(t - 1) - \varphi^{{\text{T}}} (t - 1)\hat{\theta }(t - 1) \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \vdots \hfill \\ y(t - p + 1) - \varphi^{{\text{T}}} (t - p + 1)\hat{\theta }(t - 1) \hfill \\ \end{gathered} \right] \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} = Y(p,t) - \Phi^{{\text{T}}} (p,t)\hat{\theta }(t - 1) \hfill \\ \end{gathered} $$
(22)

The multi-innovation additional fractional gradient descent algorithm criterion function is defined as

$$ J_{1} (\theta ) = \frac{1}{2}\left\| {Y(p,t) - \Phi^{{\text{T}}} (p,t)\theta } \right\|^{2} $$
(23)

Substituting the single innovations with the multi-innovation matrix from Eqs. (18), (19), and (21), respectively, the multi-innovation additional fractional gradient descent algorithm can be formulated as

$$ \hat{\theta }(t) = \hat{\theta }(t - 1) + \frac{\Phi (p,t)E(p,t)}{{\overline{r}(t)}} + \frac{{\Phi (p,t)E(p,t)\Xi (\hat{\theta },\alpha ,t)}}{r(t)\Gamma (2 - \alpha )} $$
(24)
$$ \overline{r}(t) = \overline{r}(t - 1) + \left\| {\Phi (p,t)} \right\|^{2} $$
(25)
$$ r(t) = \overline{r}(t - 1) + \left\| {\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)} \right\|^{2} $$
(26)

Compared with Eq. (17), Eq. (24) uses more observation data in the parameter identification process and has a higher data utilization rate.

Convergence analysis

The analysis of convergence is crucial for the gradient descent algorithm, which determines the reliability and stability of the algorithm. It is a prerequisite for theoretical simulation and practical application. For clarity, the following lemmas are defined.

Lemma 1

Reference42 For the ARX model in Eq. (6) and multi-innovation additional fractional gradient descent algorithm presented in Eqs. (24), (25), (26), there exists a constant \(0 < \overline{\alpha } \le \beta < \infty\) and an integer \(N \ge n\) such that the following strong persistent excitation condition holds.

$$ \overline{\alpha }I_{n} \le \frac{1}{N}\sum\limits_{i = 0}^{N - 1} {\varphi (t + i)} \varphi^{{\text{T}}} (t + i) \le \beta I_{n} $$
(27)

Then \(\overline{r}(t)\) in Eq. (25) satisfies the inequality

$$ n\overline{\alpha }(t - N + 1) \le \overline{r}(t) \le n\beta (t + N - 1) + 1 $$
(28)

Lemma 2

Reference43 The following inequality holds

$$ 2xy \le ax^{2} + y^{2} /a $$
(29)

where a real numbers.

Lemma 3

Reference43 Suppose the non-negative sequences \(\left\{ {\kappa (t)} \right\},\left\{ {\psi_{t} } \right\}\) and \(\left\{ {\beta_{t} } \right\}\) satisfy \(\kappa (t + 1) \le (1 - \psi_{t} )\kappa (t) + \beta_{t}\),\(\psi_{t} \in [0,1),\sum\limits_{t = 1}^{\infty } {\psi_{t} } = \infty ,\kappa (0) < \infty\). Then

$$ \lim {\kern 1pt} {\kern 1pt} {\kern 1pt} \sup \kappa (t{)} \le {\text{lim}}{\kern 1pt} {\kern 1pt} {\kern 1pt} \frac{{\beta_{t} }}{{\psi_{t} }} $$
(30)

Proof

Let the parameter estimation error be defined as

$$ \tilde{\theta }(t) = \hat{\theta }(t) - \theta $$
(31)

According Eqs. (6) and (18), (19), (20), we obtain

$$ Y(p,t) = \Phi^{{\text{T}}} (p,t)\theta + V(p,t) $$
(32)

Subtracting \(\theta\) from both sides of Eq. (24) and combining Eqs. (22) and (32), we can obtain the iterative equation for the \(\tilde{\theta }\).

$$ \begin{gathered} \tilde{\theta }(t) = \tilde{\theta }(t - 1) + \frac{\Phi (p,t)E(p,t)}{{\overline{r}(t)}} + \frac{{\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)E(p,t)}}{r(t)\Gamma (2 - \alpha )} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} = \tilde{\theta }(t - 1) + \frac{\Phi (p,t)}{{\overline{r}(t)}}[Y(p,t) - \Phi^{{\text{T}}} (p,t)\hat{\theta }(t - 1)] + \frac{{\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)}}{r(t)\Gamma (2 - \alpha )}[Y(p,t) - \Phi^{{\text{T}}} (p,t)\hat{\theta }(t - 1)] \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} = \tilde{\theta }(t - 1) + \frac{\Phi (p,t)}{{\overline{r}(t)}}[\Phi^{{\text{T}}} (p,t)\theta + V(p,t) - \Phi^{{\text{T}}} (p,t)\hat{\theta }(t - 1)] \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} + \frac{{\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)}}{r(t)\Gamma (2 - \alpha )}[\Phi^{{\text{T}}} (p,t)\theta + V(p,t) - \Phi^{{\text{T}}} (p,t)\hat{\theta }(t - 1)] \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} = \tilde{\theta }(t - 1) + \frac{\Phi (p,t)}{{\overline{r}(t)}}[ - \Phi^{{\text{T}}} (p,t)\tilde{\theta }(t - 1) + V(p,t)] + \frac{{\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)}}{r(t)\Gamma (2 - \alpha )}[ - \Phi^{{\text{T}}} (p,t)\tilde{\theta }(t - 1) + V(p,t)] \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} = \left[ {I_{n} - \frac{{\Phi (p,t)\Phi^{{\text{T}}} (p,t)}}{{\overline{r}(t)}} - \frac{{\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)\Phi^{{\text{T}}} (p,t)}}{r(t)\Gamma (2 - \alpha )}} \right]\tilde{\theta }(t - 1) \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} + \frac{\Phi (p,t)V(p,t)}{{\overline{r}(t)}} + \frac{{\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)V(p,t)}}{r(t)\Gamma (2 - \alpha )} \hfill \\ \end{gathered} $$
(33)

By taking the norm on both sides of Eq. (33), the following expression is obtained.

$$ \begin{gathered} \left\| {\tilde{\theta }(t)} \right\|^{2} = \left\| {\left[ {I_{n} - \frac{{\Phi (p,t)\Phi ^{{\text{T}}} (p,t)}}{{\bar{r}(t)}} - \frac{{\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)\Phi ^{{\text{T}}} (p,t)}}{{r(t)\Gamma (2 - \alpha )}}} \right]\tilde{\theta }(t - 1)} \right\|^{2} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} + 2\tilde{\theta }(t - 1)\left[ {I_{n} - \frac{{\Phi (p,t)\Phi ^{{\text{T}}} (p,t)}}{{\bar{r}(t)}} - \frac{{\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)\Phi ^{{\text{T}}} (p,t)}}{{r(t)\Gamma (2 - \alpha )}}} \right]\frac{{\Phi (p,t)V(p,t)}}{{\bar{r}(t)}} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} + 2\tilde{\theta }(t - 1)\left[ {I_{n} - \frac{{\Phi (p,t)\Phi ^{{\text{T}}} (p,t)}}{{\bar{r}(t)}} - \frac{{\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)\Phi ^{{\text{T}}} (p,t)}}{{r(t)\Gamma (2 - \alpha )}}} \right]\frac{{\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)V(p,t)}}{{r(t)\Gamma (2 - \alpha )}} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} + \frac{{\Phi (p,t)V(p,t)}}{{\bar{r}(t)}}\frac{{\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)V(p,t)}}{{r(t)\Gamma (2 - \alpha )}} + \left\| {\frac{{\Phi (p,t)V(p,t)}}{{\bar{r}(t)}}} \right\|^{2} + \left\| {\frac{{\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)V(p,t)}}{{r(t)\Gamma (2 - \alpha )}}} \right\|^{2} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \le \lambda _{{\max }} \left[ {I_{n} - \frac{{\Phi (p,t)\Phi ^{{\text{T}}} (p,t)}}{{\bar{r}(t)}} - \frac{{\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)\Phi ^{{\text{T}}} (p,t)}}{{r(t)\Gamma (2 - \alpha )}}} \right]\left\| {\tilde{\theta }(t - 1)} \right\|^{2} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} + 2\tilde{\theta }(t - 1)\left[ {I_{n} - \frac{{\Phi (p,t)\Phi ^{{\text{T}}} (p,t)}}{{\bar{r}(t)}} - \frac{{\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)\Phi ^{{\text{T}}} (p,t)}}{{r(t)\Gamma (2 - \alpha )}}} \right]\frac{{\Phi (p,t)V(p,t)}}{{\bar{r}(t)}} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} + 2\tilde{\theta }(t - 1)\left[ {I_{n} - \frac{{\Phi (p,t)\Phi ^{{\text{T}}} (p,t)}}{{\bar{r}(t)}} - \frac{{\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)\Phi ^{{\text{T}}} (p,t)}}{{r(t)\Gamma (2 - \alpha )}}} \right]\frac{{\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)V(p,t)}}{{r(t)\Gamma (2 - \alpha )}} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} + \frac{{\Phi (p,t)V(p,t)}}{{\bar{r}(t)}}\frac{{\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)V(p,t)}}{{r(t)\Gamma (2 - \alpha )}} + \left\| {\frac{{\Phi (p,t)V(p,t)}}{{\bar{r}(t)}}} \right\|^{2} + \left\| {\frac{{\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)V(p,t)}}{{r(t)\Gamma (2 - \alpha )}}} \right\|^{2} \hfill \\ \end{gathered} $$
(34)

According to the discussion in Ref.44 and the definition of \(\Xi (\hat{\theta },\alpha ,t)\) in Eq. (8), \(0 < \hat{\theta }_{i} (t - 1) - \hat{\theta }_{i} (t - 2) < 1,{\kern 1pt}\) \(i = 1,2, \cdots ,l\) holds. Then

$$ \left\{ \begin{gathered} \varepsilon^{{\frac{1 - \alpha }{2}}} < \Xi^{\frac{1}{2}} (\hat{\theta },\alpha ,t) \le (1 + \varepsilon )^{{\frac{1 - \alpha }{2}}} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} 0 < \alpha \le 1 \hfill \\ (1 + \varepsilon )^{{\frac{1 - \alpha }{2}}} < \Xi^{\frac{1}{2}} (\hat{\theta },\alpha ,t) \le \varepsilon^{{\frac{1 - \alpha }{2}}} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} 1 < \alpha \le 2 \hfill \\ \end{gathered} \right. $$
(35)

Moreover, \(\Xi (\hat{\theta },\alpha ,t) = \max \left\{ {\varepsilon^{1 - \alpha } ,\left( {1 + \varepsilon } \right)^{1 - \alpha } } \right\}\) with \(0 < \alpha < 2\).According to Lemma 1

$$ \begin{gathered} \frac{{\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)\Phi^{{\text{T}}} (p,t)}}{r(t)\Gamma (2 - \alpha )} = \frac{{\Xi (\hat{\theta },\alpha ,t)}}{r(t)\Gamma (2 - \alpha )}\sum\limits_{i = 0}^{N - 1} {\varphi (t - i)} \varphi^{{\text{T}}} (t - i) \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \ge \frac{{\varepsilon^{1 - \alpha } N\overline{\alpha }}}{r(t)\Gamma (2 - \alpha )} \hfill \\ \end{gathered} $$
(36)

where \(\overline{\alpha } > 0\), N > 0, within the range of \(0 < \alpha < 2\),\(\Gamma (2 - \alpha ) > 0\). Given that \(r(t) = \overline{r}(t - 1) + \left\| {\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)} \right\|^{2}\), it can be found that \(0 \le \overline{r}(m) - \overline{r}(0) \le r(m + 1)\) and \(0 \le \left\| {\Xi (\theta ,\alpha ,t)\Phi (p,t)} \right\|^{2} \le r(m + 1)\) for \(m = 1,2, \cdots ,t - 1\). Then,\(r(t) > 0\). Thus, Eq. (36) is greater than 0, and the following expression is obtained from Eq. (34).

$$ \left[ {I_{n} - \frac{{\Phi (p,t)\Phi^{{\text{T}}} (p,t)}}{{\overline{r}(t)}} - \frac{{\Xi (\theta ,\alpha ,t)\Phi (p,t)\Phi^{{\text{T}}} (p,t)}}{r(t)\Gamma (2 - \alpha )}} \right] \le I_{n} - \frac{{\Phi (p,t)\Phi^{{\text{T}}} (p,t)}}{{\overline{r}(t)}} $$
(37)

According to Lemma 1, as p = N, we obtain

$$ \begin{gathered} I_{n} - \frac{{\Phi (p,t)\Phi^{{\text{T}}} (p,t)}}{{\overline{r}(t)}} = I_{n} - \frac{1}{{\overline{r}(t)}}\sum\limits_{i = 0}^{N - 1} {\varphi (t - i)} \varphi^{{\text{T}}} (t - i) \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \le \left[ {1 - \frac{{N\overline{\alpha }}}{{\overline{r}(t)}}} \right]I_{n} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \le \left[ {1 - \frac{{N\overline{\alpha }}}{n\beta (t - N + 1) + 1}} \right]I_{n} \hfill \\ \end{gathered} $$
(38)

Combining Eqs. (36)–(38) and Lemma 1, we obtain the expectation of Eq. (34)

$$ \begin{gathered} {\text{E}}\left[ {\left\| {\tilde{\theta }(t)} \right\|^{2} } \right] \le \left[ {1 - \frac{{N\overline{\alpha }}}{n\beta (t - N + 1) + 1}} \right]{\text{E}}\left[ {\left\| {\tilde{\theta }(t - 1)} \right\|^{2} } \right] \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} + 2{\text{E}}\left\{ {\tilde{\theta }(t - 1)\left[ {I_{n} - \frac{{\Phi (p,t)\Phi^{{\text{T}}} (p,t)}}{{\overline{r}(t)}}} \right]\frac{\Phi (p,t)V(p,t)}{{\overline{r}(t)}}} \right\} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} + 2{\text{E}}\left\{ {\tilde{\theta }(t - 1)\left[ {I_{n} - \frac{{\Phi (p,t)\Phi^{{\text{T}}} (p,t)}}{{\overline{r}(t)}}} \right]\frac{{\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)V(p,t)}}{r(t)\Gamma (2 - \alpha )}} \right\} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} + 2{\text{E}}\left\{ {\frac{\Phi (p,t)V(p,t)}{{\overline{r}(t)}}\frac{{\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)V(p,t)}}{r(t)\Gamma (2 - \alpha )}} \right\} + \frac{{E\left[ {\left\| {\Phi (p,t)V(p,t)} \right\|^{2} } \right]}}{{\overline{r}^{2} (t)}} + \frac{{E\left[ {\left\| {\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)V(p,t)} \right\|^{2} } \right]}}{{(r(t)\Gamma (2 - \alpha ))^{2} }} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} = Q_{1} + Q_{2} + Q_{3} + Q_{4} + Q_{5} \hfill \\ \end{gathered} $$
(39)

where

$$ Q_{1} = \left[ {1 - \frac{{N\overline{\alpha }}}{n\beta (t - N + 1) + 1}} \right]{\text{E}}\left[ {\left\| {\tilde{\theta }(t - 1)} \right\|^{2} } \right] + \frac{{E\left[ {\left\| {\Phi (p,t)V(p,t)} \right\|^{2} } \right]}}{{\overline{r}^{2} (t)}} $$
$$ Q_{2} = 2{\text{E}}\left\{ {\tilde{\theta }(t - 1)\left[ {I_{n} - \frac{{\Phi (p,t)\Phi^{{\text{T}}} (p,t)}}{{\overline{r}(t)}}} \right]\frac{\Phi (p,t)V(p,t)}{{\overline{r}(t)}}} \right\} $$
$$ Q_{3} = 2{\text{E}}\left\{ {\tilde{\theta }(t - 1)\left[ {I_{n} - \frac{{\Phi (p,t)\Phi^{{\text{T}}} (p,t)}}{{\overline{r}(t)}}} \right]\frac{{\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)V(p,t)}}{r(t)\Gamma (2 - \alpha )}} \right\} $$

\(Q_{4} = 2{\text{E}}\left\{ {\frac{\Phi (p,t)V(p,t)}{{\overline{r}(t)}}\frac{{\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)V(p,t)}}{r(t)\Gamma (2 - \alpha )}} \right\}\), \(Q_{5} = \frac{{E\left[ {\left\| {\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)V(p,t)} \right\|^{2} } \right]}}{{(r(t)\Gamma (2 - \alpha ))^{2} }}\).

According to Lemma 1 and Eq. (35), the following inequality is obtained

$$ \begin{gathered} E\left[ {\left\| {\Phi (p,t)V(p,t)} \right\|^{2} } \right] \le E\left\{ {\lambda_{\max } \left[ {\Phi (p,t)\Phi^{{\text{T}}} (p,t)} \right]\left\| {V(p,t)} \right\|^{2} } \right\} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \le p\beta E\left[ {\left\| {V(p,t)} \right\|^{2} } \right] = p^{2} \beta \sigma_{v}^{2} = N^{2} \beta \sigma_{v}^{2} \hfill \\ \end{gathered} $$
(40)
$$ \begin{gathered} E\left[ {\left\| {\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)V(p,t)} \right\|^{2} } \right] \le E\left\{ {\lambda_{\max } \left[ {\Phi (p,t)\Phi^{{\text{T}}} (p,t)} \right]\left\| {\Xi (\hat{\theta },\alpha ,t)} \right\|^{2} \left\| {V(p,t)} \right\|^{2} } \right\} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \le p\beta E\left[ {\left\| {\Xi (\hat{\theta },\alpha ,t)} \right\|^{2} \left\| {V(p,t)} \right\|^{2} } \right] = p^{2} \beta (1 + \varepsilon )^{2(1 - \alpha )} \sigma_{v}^{2} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} = N^{2} \beta (1 + \varepsilon )^{2(1 - \alpha )} \sigma_{v}^{2} \hfill \\ \end{gathered} $$
(41)

Substituting Eqs. (39) and (28) into \(Q_{1}\) yields

$$ Q_{1} = \left[ {1 - \frac{{N\overline{\alpha }}}{n\beta (t - N + 1) + 1}} \right]{\text{E}}\left[ {\left\| {\tilde{\theta }(t - 1)} \right\|^{2} } \right] + \frac{{N^{2} \beta \sigma_{v}^{2} }}{{[n\overline{\alpha }(t - N + 1) + 1]^{2} }} $$
(42)

We take the limit of \(Q_{1}\) according to Lemma 3

$$ \begin{gathered} \mathop {\lim }\limits_{t \to \infty } Q_{1} \le \mathop {\lim }\limits_{t \to \infty } \frac{{N^{2} \beta \sigma_{v}^{2} }}{{a^{2} [n\overline{\alpha }(t - N + 1) + 1]^{2} }}\frac{n\beta (t - N + 1) + 1}{{N\overline{\alpha }}} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \le \mathop {\lim }\limits_{t \to \infty } \frac{{N\beta \sigma_{v}^{2} [n\beta (t - N + 1) + 1]}}{{\alpha [n\alpha (t - N + 1) + 1]^{2} }} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \sim \frac{{N\beta^{2} \sigma_{v}^{2} }}{{n\overline{\alpha }^{3} }}\frac{1}{t} \hfill \\ \end{gathered} $$
(43)

When \(t \to \infty\), \(Q_{1} \to 0\). Now, we use Lemma 2 and Eqs. (38) and (40) to prove the convergence of \(Q_{2}\).

$$ \begin{gathered} Q_{2} = 2{\text{E}}\left\{ {\tilde{\theta }(t - 1)\left[ {I_{n} - \frac{{\Phi (p,t)\Phi^{{\text{T}}} (p,t)}}{{\overline{r}(t)}}} \right]\frac{\Phi (p,t)V(p,t)}{{\overline{r}(t)}}} \right\} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \le a\left[ {1 - \frac{N\alpha }{{n\beta (t - N + 1) + 1}}} \right]{\text{E}}\left[ {\left\| {\tilde{\theta }(t - 1)} \right\|^{2} } \right] + \frac{1}{a}{\text{E}}\left[ {\left\| {\frac{\Phi (p,t)V(p,t)}{{r(t)}}} \right\|^{2} } \right] \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \le \left[ {1 - \frac{N\alpha }{{n\beta (t - N + 1) + 1}}} \right]{\text{E}}\left[ {\left\| {\tilde{\theta }(t - 1)} \right\|^{2} } \right] + \frac{1}{{a^{2} }}{\text{E}}\left[ {\left\| {\frac{\Phi (p,t)V(p,t)}{{r(t)}}} \right\|^{2} } \right] \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \le \left[ {1 - \frac{N\alpha }{{n\beta (t - N + 1) + 1}}} \right]{\text{E}}\left[ {\left\| {\tilde{\theta }(t - 1)} \right\|^{2} } \right] + \frac{{N^{2} \beta \sigma_{v}^{2} }}{{a^{2} [n\overline{\alpha }(t - N + 1) + 1]^{2} }} \hfill \\ \end{gathered} $$
(44)

Similarly, according to Lemma 3, Eq. (44) can be rewritten as

$$ \begin{gathered} \mathop {\lim }\limits_{t \to \infty } Q_{2} \le \mathop {\lim }\limits_{t \to \infty } \frac{{N^{2} \beta \sigma_{v}^{2} }}{{a^{2} [n\alpha (t - N + 1) + 1]^{2} }}\frac{n\beta (t - N + 1) + 1}{{N\alpha }} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \le \mathop {\lim }\limits_{t \to \infty } \frac{{N\beta \sigma_{v}^{2} [n\beta (t - N + 1) + 1]}}{{\alpha a^{2} [n\alpha (t - N + 1) + 1]^{2} }} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \sim \frac{{N\beta^{2} \sigma_{v}^{2} }}{{na^{2} \alpha^{3} }}\frac{1}{t} \hfill \\ \end{gathered} $$
(45)

When \(t \to \infty\), \(Q_{2} \to 0\). Because \(r(t) = \overline{r}(t - 1) + \left\| {\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)} \right\|^{2}\), it follows that \(r(t) > \overline{r}(t - 1)\). We integrate Lemma 2 and Eqs. (40), (41) to prove the convergence of \(Q_{{_{3} }}\).

$$ \begin{gathered} Q_{3} = 2{\text{E}}\left\{ {\tilde{\theta }(t - 1)\left[ {I_{n} - \frac{{\Phi (p,t)\Phi^{{\text{T}}} (p,t)}}{{\overline{r}(t)}}} \right]\frac{{\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)V(p,t)}}{r(t)\Gamma (2 - \alpha )}} \right\} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \le \left[ {1 - \frac{{N\overline{\alpha }}}{n\beta (t - N + 1) + 1}} \right]{\text{E}}\left[ {\left\| {\tilde{\theta }(t - 1)} \right\|^{2} } \right] + \frac{{(1 + \varepsilon )^{2(1 - \alpha )} N^{2} \beta \sigma_{v}^{2} }}{{a^{2} [n\overline{\alpha }(t - N) + 1]^{2} \Gamma^{2} (2 - \alpha )}} \hfill \\ \end{gathered} $$
(46)

According to Lemma 3, Eq. (46) can be rewritten as

$$ \begin{gathered} \mathop {\lim }\limits_{t \to \infty } Q_{3} \le \mathop {\lim }\limits_{t \to \infty } \frac{{(1 + \varepsilon )^{2(1 - \alpha )} N^{2} \beta \sigma_{v}^{2} }}{{a^{2} [n\overline{\alpha }(t - N) + 1]^{2} \Gamma^{2} (2 - \alpha )}}\frac{n\beta (t - N + 1) + 1}{{N\overline{\alpha }}} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \le \mathop {\lim }\limits_{t \to \infty } \frac{{(1 + \varepsilon )^{2(1 - \alpha )} N\beta \sigma_{v}^{2} [n\beta (t - N) + n\beta + 1]}}{{\alpha a^{2} [n\overline{\alpha }(t - N) + 1]^{2} \Gamma^{2} (2 - \alpha )}} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \sim \frac{{(1 + \varepsilon )^{2(1 - \alpha )} N\beta \sigma_{v}^{2} [\beta + n\beta + 1]}}{{n\overline{\alpha }^{3} a^{2} \Gamma^{2} (2 - \alpha )}}\frac{1}{t} \hfill \\ \end{gathered} $$
(47)

When \(t \to \infty\), \(Q_{3} \to 0\). Similarly, we combine Lemmas 1 and 3 as well as Eqs. (40)-(41) to prove the convergence of Q4.

$$ \begin{gathered} \mathop {\lim }\limits_{t \to \infty } Q_{4} = 2{\text{E}}\left\{ {\frac{\Phi (p,t)V(p,t)}{{\overline{r}(t)}}\frac{{\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)V(p,t)}}{r(t)\Gamma (2 - \alpha )}} \right\} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \le \frac{{2E[\Xi (\theta ,\alpha ,t)\Phi (p,t)\Phi^{{\text{T}}} (p,t)V(p,t)V^{{\text{T}}} (p,t)]}}{{\overline{r}(t)\overline{r}(t - 1)\Gamma (2 - \alpha )}} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \le \frac{{2[(1 + \varepsilon )^{2(1 - \alpha )} N\beta \sigma_{v}^{2} ]}}{[n\beta (t - N + 1) + 1][n\beta (t - N) + 1]\Gamma (2 - \alpha )} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \sim \frac{{(1 + \varepsilon )^{2(1 - \alpha )} N\sigma_{v}^{2} }}{{n^{2} \beta \Gamma (2 - \alpha )}}\frac{1}{{t^{2} }} \hfill \\ \end{gathered} $$
(48)

When \(t \to \infty\), \(Q_{4} \to 0\). Lastly, we integrate Lemmas 1 and 3 and Eq. (40) to prove the convergence of Q5.

$$ \begin{gathered} \mathop {\lim }\limits_{t \to \infty } Q_{5} = \frac{{E\left[ {\left\| {\Xi (\hat{\theta },\alpha ,t)\Phi (p,t)V(p,t)} \right\|^{2} } \right]}}{{(r(t)\Gamma (2 - \alpha ))^{2} }} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \le \frac{{(1 + \varepsilon )^{2(1 - \alpha )} N^{2} \beta \sigma_{v}^{2} }}{{[n\beta (t - N) + 1]^{2} \Gamma (2 - \alpha )^{2} }} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \sim \frac{{(1 + \varepsilon )^{2(1 - \alpha )} \sigma_{v}^{2} }}{{n^{2} \beta \Gamma (2 - \alpha )^{2} }}\frac{1}{{t^{2} }} \hfill \\ \end{gathered} $$
(49)

Based on this analysis along with Eqs. (39), (43), (45), and (47), (48), (49), we conclude that the expectation of the parameter identification error \({\text{E}}\left[ {\left\| {\tilde{\theta }} \right\|^{2} } \right]\) converges to 0, thereby proving the algorithm convergence.

The multi-innovation additional fractional gradient descent identification algorithm steps are summarized below.

Algorithm

Multi-innovation additional fractional gradient descent identification algorithm

Step 1

Obtain model input data u(t) and output data y(t). Determine the ARX model unknown parameters \(\theta\)

Step 2

Determine the multi-innovation length p. According to Eq. (18)and Eq. (19) to Construct multi-innovation matrix \(Y(p,t)\) and \(\Phi (p,t)\)

Step 3

Determine the number of iterations and the fractional gradient order \(\alpha\)

Step 4

Use integer order and fractional order gradient in Eq. (24) to estimate model parameters \(\hat{\theta }\) synchronously

Step 5

The identification results are used to update the estimate parameter \(\hat{\theta }(t)\) in Eq. (24)

Step 6

Repeat steps 4–5. When the algorithm is iterated to the set number, obtain the final identification result

Simulation and experiment

To verify the effectiveness of the algorithm, we use a simulation example and conduct a three degree-of-freedom (3-DOF) gyroscope model identification experiment. The proposed algorithm is compared with the SG, FOSG36 and MISG19 algorithms. The contributions of the additional fractional orders and multi-innovation lengths to the performance of the algorithm are examined. The model identification accuracy is evaluated using the following index

$$ \delta { = }\frac{{\left\| {\hat{\theta }(t) - \theta } \right\|}}{\left\| \theta \right\|} $$
(50)

Numerical simulation

Consider the ARX model as follow

$$ y(t) = a_{1} y(t - 1){ + }a_{2} y(t - 2) + b_{1} u(t) + b_{2} u(t - 1) + v\left( t \right) $$
(51)

The parameters to be identified are \(\theta = \left[ {a_{1} ,a_{2} ,b_{1} ,b_{2} } \right]^{{\text{T}}} = \left[ {1.5, - 0.65,1.25,0.85} \right]^{{\text{T}}}\), the \(u\left( t \right)\) is a persistent excitation signal, and \(v(t)\) is a white noise with variance \(\sigma_{v}^{2} = 0.6^{2}\). The proposed algorithm is compared with the SG, FOSG, and MISG methods. In the proposed algorithm and FOSG method, the fractional order is set as \(\alpha = 1.2\). The multi-innovation length for the MISG method and proposed algorithm is set as p = 3. The identification results are shown in Figs. 1, 2 and summarised in Table 1.

Figure 1
figure 1

Comparison of different identification methods. (a) iterative convergence error, (b) absolute error of each parameter identification.

Figure 2
figure 2

Convergence speed and accuracy for each parameter.

Table 1 Comparison of identification results with different algorithms.

As indicated in Fig. 1 and Table 1, compared with the existing approaches, the proposed algorithm introduces an additional fractional gradient term. The integer order and fractional order gradient simultaneously identify the model parameters, resulting in higher convergence speed and accuracy, smaller the identified parameter error. Figure 2 shows that the multi-innovation additional fractional-order gradient descent algorithm can promptly and accurately identify four different unknown parameters, highlighting its effectiveness.

Subsequently, we examine the influence of multi-innovation length p on the algorithm performance. To this end, p is set as 1, 3, 5, and 7, with the fractional order being \(\alpha = 1.2\). The identification results are shown in Figs. 3, 4 and outlined in Table 2.

Figure 3
figure 3

Comparison of estimation performance under different multi-innovation lengths. (a) iterative convergence error, (b) absolute error of each parameter identification.

Figure 4
figure 4

Convergence speed and accuracy for each parameter.

Table 2 Comparison of identification results with different multi-innovation length.

Figures 3, 4 and Table 2 show that with increasing length, the proposed algorithm exhibits enhanced convergence speed and parameter identification accuracy. This improvement is attributable to the additional fractional gradient descent algorithm synchronously extending the single innovations of the fractional and integer gradients into a multi-innovation matrix, thereby enhancing the data utilisation. When a single innovation (p = 1) is transformed to a multi-innovation matrix (p = 7), the evaluation index \(\delta\) decreases from 0.1550 to 0.0196 after 2000 iterations.

Next, we examine the influence of different fractional orders on the identification speed and accuracy of the proposed algorithm. To this end, we set the p = 3 and fractional orders as \(\alpha = 0.5,0.8,1.2,1.5\). The identification results are shown in Figs. 5, 6 and summarised in Table 3.

Figure 5
figure 5

Comparison of estimation performance under different fractional orders. (a) iterative convergence error, (b) absolute error of each parameter identification.

Figure 6
figure 6

Convergence speed and accuracy for each parameter.

Table 3 Comparison of identification results with different fractional orders.

It can be seen from Figs. 5, 6 and Table 3 that as the fractional order increases, the parameter identification error gradually decreases. Moreover, owing to the presence of both the integer order and fractional order gradients, the proposed algorithm maintains superior convergence speed and accuracy compared with the integer-order gradient descent algorithm, especially because of the small fractional order. Even when \(\alpha = 0.5\), the identification error (\(\delta = 0.0286\)) is still smaller than the SG algorithm (\(\delta = 0.2241\) in Table 1). This outcome further verifies the effectiveness of the proposed algorithm.

Experiment

The experimental analysis is conducted using a 3-DOF gyroscope system, which shown in Fig. 7. It contains a gyroscope, motor, data acquisition card, and power amplifier. The computer controls the input motor torque and transmits the signal to the data acquisition card through the Quarc 2020 real-time control software. The power amplifier amplifies the signal and applies it to the motor to precisely control the attitude angle of the gyroscope.

Figure 7
figure 7

The 3-DOF gyroscope experimental device.

The 3-DOF gyroscope model can be expressed as follows

$$ y_{e} (t) = a_{1} y(t - 1) + a_{2} y(t - 2) + b_{1} u(t - 1) $$
(52)

The model error is evaluated using the following index

$$ {\text{Error}} = \sum\limits_{t = 0}^{{T_{f} }} {\left| {y_{{\text{e}}} (t) - y_{id} (t)} \right|} $$
(53)

where \(y_{id} (t)\) is identified system output, \(T_{f}\) is the termination time of the system operation.

To determine the system input and output, the Quarc database module in MATLAB/Simulink is used to build the system input and output measurement units. The data collection step length is 0.004 s, and the collection period is 50 s. The proposed algorithm is compared with the SG, FOSG36, and MISG19 methods, we set the p = 5 and fractional order as \(\alpha = 1.2\). The experiment data and identified result are shown in Figs. 8, 9, 10, summarised in Tables 4 and 5.

Figure 8
figure 8

The experiment data and identified system. (a) system input and output data, (b) identified system of proposed algorithm.

Figure 9
figure 9

Identification errors with different methods.

Figure 10
figure 10

Identification results with different fractional orders.

Table 4 Identification results with different methods.
Table 5 Identification results with different fractional orders.

Figure 8 shows that the proposed algorithm can accurately identify the parameters of the 3-DOF gyroscope system. It can be seen from Fig. 9 and Table 4 that compared with the SG, FOSG and MISG algorithms, the proposed algorithm has the smallest identification error and can more accurately describe the dynamic characteristics of the gyroscope system. Figure 10 and Table 5 show that when \(\alpha\) takes different orders, the identification results are basically consistent, which further verifies that the proposed algorithm has high flexibility in engineering applications.

Conclusion

This paper introduces an additional fractional gradient descent identification algorithm based on the multi-innovation principle for ARX systems. The proposed algorithm incorporates a fractional order gradient along with the integer-order gradient and leverages the flexibility of fractional order calculus to improve the convergence speed. Additionally, it extends single innovations to multi-innovation matrices, enabling further enhanced data utilisation and parameter identification accuracy. The convergence of the algorithm is analysed, and its effectiveness is verified through simulations and experiment. The results show that the proposed approach outperforms the SG, FOSG, and MISG algorithms in terms of the convergence speed and accuracy. With increasing multi-innovation length and fractional order, the parameter identification accuracy and convergence speed improve. In terms of limitations, the fractional order \(\alpha\) of the current algorithm is fixed. Our future work will consider adaptive fractional order selection, and extend the adaptive multi-innovation additional fractional gradient descent identification algorithm to nonlinear systems and time-delay systems.