Introduction

The design of fractional adaptive algorithms based on the fractional calculus has become a very important and emerging research field during the last decade for investigating signal processing, control system and chaotic system1,2,3,4,5,6. Fractional calculus is a significant branch of mathematics which proposed the possibility of computing fractional order derivatives and integrals, and has been applied in many different fields of science, technology and engineering7,8,9,10. The researchers have focused on the FLMS algorithms and exploited them to solve diversified parameter estimation problems by applying the theories of the fractional calculus. For instance, signal processing 11, image denoising12, circuit analysis13, neural networks14, chaotic systems15, fractional system identification16,17, power signal18,19, robotics20, epidemic model21, recommender systems22,23. Some new fractional adaptive algorithms have been proposed to enhance the accuracy, convergence rate, and complexity of the fractional system identification. Chaudhary et al. provided normalized fractional adaptive algorithms for the parameter estimation of control autoregressive autoregressive systems and the nonlinear control autoregressive systems24,25. Yin et al. proposed a novel orthogonalized fractional order filtered-x normalized least mean squares algorithm using orthogonal transform method and fractional order switch mechanism to obtain the precise parameters of the feedforward controller and enhance the convergence speed of the introduced algorithm26. Chaudhary et al. designed a class of novel sign fractional adaptive algorithms for nonlinear Hammerstein systems by applying the sign function to input data corresponding to first-order term, fractional-order term and both of them27. Chaudhary et al. presented an innovative fractional order least mean square algorithm for an effective parameter estimation for power signal by introducing high values of the fractional order28. Liu et al. proposed a quasi fractional order gradient descent method with adaptive stepsize to solve the high-dimensional function optimization problems29. Wang et al. proposed a fast speed fractional order gradient descent method with variable initial value to ensure convergence to the actual extremum30.

The research community showed great interest in the fields of BIP systems and the GBIP systems in the last few decades, and proposed various approaches for parameter estimation of these complicated nonlinear systems. The especial peculiarity of the BIP systems is that the system output is the linear function in any one of the parameter vectors when the other parameter vectors are fixed. The BIP systems can characterize lots of the block-oriented cascaded nonlinear systems. Hammerstein systems, Wiener systems and Hammerstein–Wiener systems can be transformed into the form of the generalized bilinear-in-parameter systems31,32. For example, Hammerstein impulse response systems can be expediently described as bilinear-in-parameter systems when the nonlinearity is expressed as the linear combination of the known basis functions33. The BIP systems can explain the nonlinearity with relatively low computational complexity, so they are widely applied in many areas of scientific research and engineering practice34,35, and several new identification methods have been reported in this field. Ding et al. derived a filtering based recursive least squares algorithm for BIP systems by using the hierarchical identification principle36. Wang and Ding presented a filtering based multi-innovation stochastic gradient algorithm for interactively identifying each linear regressive sub-model of the bilinear-in-parameter system with ARMA noise37. Wang et al. discussed a hierarchical least squares algorithm based on the model decomposition principle for estimating the parameters of BIP systems, and analyzed the convergence of the proposed algorithm using the stochastic martingale theory38. The output nonlinear system can be formulated as the GBIP system36,37,38. Hammerstein systems, Wiener systems, and Hammerstein -Wiener systems also can be transformed into the GBIP system39,40,41.

Recently, the concept of the hierarchical identification principle and the multi innovation theory was presented for the effective fractional parameter identification of nonlinear systems. Zhu et al. formulated the auxiliary model and hierarchical normalized fractional adaptive algorithms for parameter estimation of BIP systems by using the auxiliary model idea and the hierarchical identification principle42. Chaudhary NI et al. proposed fractional hierarchical gradient descent algorithm by generalizing the standard hierarchical gradient descent algorithm to fractional order for effectively solving parameters estimate of nonlinear control autoregressive systems under different fractional order and noise conditions43. Chaudhary NI et al. presented multi-innovation fractional LMS algorithm for parameter estimation of input nonlinear control autoregressive systems by increasing the length of innovation vector44. In terms of the GBIP system, the hierarchical fractional adaptive method could not completely separate the two parameter vectors \(a\) and \(b\) in the system, but by using the key term separation principle, the two parameter vectors in the system not only can be completely separated, but also the system output can be represented as the a linear combination of the unknown parameter vectors of the system. The fractional hierarchical gradient descent algorithm only used the fractional order derivative, but the K-HFLMS algorithm also used the integer order derivative, so it could give more accurate estimation results. The multi-innovation hierarchical fractional adaptive strategies could make full use of the data in the dynamical data window to refresh parameter estimation of the system, they are more effective than the traditional hierarchical fractional adaptive approaches in parameter identification of the nonlinear control systems.

Based on the work in [42,43,44], this study proposed the K-HFLMS algorithm and the K-MHFLMS algorithm for parameter estimation of GBIP system by using the hierarchical identification principle and multi-innovation theory. The performance of the K-MHFLMS strategy is analyzed in comparison with the K-HFLMS strategy for different noise variances, fractional orders and innovation lengths . The main contributions of this study are given as bellow.

  1. 1.

    A K-HFLMS algorithm is designed by generalizing the standard hierarchical stochastic gradient algorithm to fractional order method based on the key term separation principle and the hierarchical identification principle for more accurate parameter estimates of GBIP system.

  2. 2.

    A K-MHFLMS algorithm is exploited based on the multi-innovation theory in order to accelerate the convergence speed of the proposed hierarchical fractional adaptive algorithm.

  3. 3.

    The effectiveness of the proposed two fractional algorithms is validated through simulation experimentation with different values of noise variance, fractional order and length of innovation vector.

  4. 4.

    The superior performance of the proposed multi-innovation hierarchical fractional adaptive strategy is compared with the hierarchical fractional adaptive approach through accurate parameter estimation based on the Fitness metrics and the average predicted output error.

Rest of this study is outlined as follows: an identification model based on the key term separation principle is briefly described for the GBIP system in Section "Generalized bilinear-in-parameter system model". Section "Hierarchical stochastic gradient identification algorithm" presents a hierarchical stochastic gradient identification algorithm for the GBIP system using the hierarchical identification principle. Section "Hierarchical fractional least mean square identification algorithm" discusses a K-HFLMS algorithm for parameter estimation of the identification model. Section "Multi-innovation hierarchical fractional least mean square identification algorithm" provides a K-MHFLMS algorithm based on the multi-innovation theory for the GBIP system. Section "Simulation experimentation and results discussion" proposes the results for a simulation experimentation of the two algorithms along with a detailed comparative discussion based on different performance metrics. The concluding remarks with some potential research directions in the relevant fields of engineering, science and technology are given in Section "Conclusions".

Generalized bilinear-in-parameter system model

Consider the block diagram of the GBIP system with ARMA noise is described in Fig. 1, the mathematical expression for the GBIP system is given as below36,37,38,39:

$$y(t) = {\varvec{a}}^{T} {\varvec{F}}(t){\varvec{b}} + {{\varvec{\Phi}}}^{T} (t){{\varvec{\rho}}} + \frac{D(z)}{{C(z)}}v(t),$$
(1)

where \(y(t) \in R\) is the system output, \({\varvec{F}}(t) \in R^{{n_{a} \times n_{b} }}\) is the known information matrix which consisted of the input measurement data, \({{\varvec{\Phi}}}(t) \in R^{{n_{\rho } }}\) is the known information vector, \(v(t) \in R\) is the unknown white noise with zero mean and variance \(\sigma^{2}\), polynomials \(C(z)\) and \(D(z)\) are given as unit backward shift operator \(z^{ - 1} [z^{ - 1} y(t) = y(t - 1)], C(z) = 1 + \sum\limits_{i = 1}^{{n_{c} }} {c_{i} z^{ - i} }, D(z) = 1 + \sum\limits_{i = 1}^{{n_{d} }} {d_{i} z^{ - i} }\), \(\varvec{a}= [a_{1} ,a_{2} , \cdots ,a_{{n_{a} }} ]^{T} \in R^{{n_{a} }}\), \(\varvec{b}= [b_{1} ,b_{2} , \cdots ,b_{{n_{b} }} ]^{T} \in R^{{n_{b} }}\), \(\varvec{\rho}= [\rho_{1} ,\rho_{2} , \cdots ,\rho_{{n_{\rho } }} ]^{T} \in R^{{n_{\rho } }}\), \(\varvec{c}= [c_{1} ,c_{2} , \cdots ,c_{{n_{c} }} ]^{T} \in R^{{n_{c} }}\) and \(\varvec{d}= [d_{1} ,d_{2} , \cdots ,d_{{n_{d} }} ]^{T} \in R^{{n_{d} }}\) are the unknown parameter vectors to be identified. Assume that the orders \(n_{a} ,n_{b} ,n_{\rho } ,n_{c} ,n_{d}\) are known.

Fig. 1
figure 1

Block diagram of GBIP model.

Define the information matrix:

$${\varvec{F}}({\varvec{t}}) = \left[{\varvec{f}}(u(t - 1)),{\varvec{f}}(u(t - 2)), \cdots ,{\varvec{f}}(u(t - n_{b} )) \right] \in R^{{n_{a} \times n_{b} }} ,$$
(2)

while \({\varvec{f}}(u(t - i)) = \left[f_{1} (u(t - i)),f_{2} (u(t - i)), \cdots ,f_{{n_{a} }} (u(t - i)) \right]^{T} \in R^{{n_{a} }} ,\;i = 1,2, \cdots ,n_{b}\).

Using the auxiliary model identification idea45,46, the intermediate variable can be defined as:

$$w(t) = \frac{D(z)}{{C(z)}}v(t),$$
(3)

where the unknown colored disturbance \(w(t) \in R\) is the autoregressive moving average(ARMA) process. So Eq. (3) can be formulated as:

$$w(t) = [1 - C(z)]w(t) + D(z)v(t)$$
(4)
$$= - c_{1} w(t - 1) - c_{2} w(t - 2) - \cdots - c_{{n_{c} }} w(t - n_{c} ) + d_{1} v(t - 1) + d_{2} v(t - 2) + \cdots + d_{{n_{d} }} v(t - n_{d} ) + v(t)$$
$$= {{\mathbf{\Omega}}}^{T} (t){{\varvec{\theta}}} + v(t),$$
(5)

where \({{\mathbf{\Omega}}}(t) = [ - w(t - 1), - w(t - 2), \cdots , - w(t - n_{c} ),v(t - 1),v(t - 2), \cdots ,v(t - n_{d} )]^{T} \in R^{{n_{c} + n_{d} }}\) is the unknown noise information vector, \(\varvec{\theta}\) \(= [c_{1} ,c_{2} , \cdots ,c_{{n_{c} }} ,d_{1} ,d_{2} , \cdots ,d_{{n_{d} }} ]^{T} \in R^{{n_{c} + n_{d} }}\) is the unknown noise parameter vector, Eq. (5) is the noise identification model. Due to in Fig. 1, and polynomial \(G(z) = \sum\limits_{i = 1}^{{n_{b} }} {b_{i} z^{ - i} }\), the unknown inner variable \(\overline{u}(t) = \sum\limits_{i = 1}^{{n_{a} }} {a_{i} f_{i} (u(t))} = {\varvec{f}}^{T} (u(t))\,{\varvec{a}}\), then Eq. (1) can be rewritten as:

$$y(t) = {\varvec{a}}^{{\varvec{T}}} {\varvec{F}}(t){\varvec{b}} + {{\varvec{\Phi}}}^{T} (t){{\varvec{\rho}}} + w(t)$$
$$= {\varvec{a}}^{{\varvec{T}}} {\varvec{F}}(t){\varvec{b}} + {{\varvec{\Phi}}}^{T} (t){{\varvec{\rho}}} + {{\varvec{\Omega}}}^{T} (t){{\varvec{\theta}}} + v(t)$$
(6)
$$= G(z)\overline{u}(t) + {{\varvec{\Phi}}}^{T} (t){{\varvec{\rho}}} + {{\varvec{\Omega}}}^{T} (t){{\varvec{\theta}}} + v(t)$$
(7)
$$= \sum\limits_{i = 1}^{{n_{b} }} {b_{i} } \overline{u}(t - i) + {{\varvec{\Phi}}}^{T} (t){{\varvec{\rho}}} + {{\varvec{\Omega}}}^{T} (t){{\varvec{\theta}}} + v(t).$$
(8)

Using the key term separation principle47,48, \(\overline{u}(t - 1)\) can be chosen as a key term. Let \(b_{1} = 1\), so we have:

$$y(t) = \sum\limits_{i = 1}^{{n_{a} }} {a_{i} f_{i} (u(t - 1))} + \sum\limits_{i = 2}^{{n_{b} }} {b_{i} \overline{u}(t - i))} + {{\varvec{\Phi}}}^{T} (t){{\varvec{\rho}}} + {{\varvec{\Omega}}}^{T} (t){{\varvec{\theta}}} + v(t)$$
(9)
$$= {\varvec{f}}^{T} (u(t - 1)){\varvec{a}} + {{\varvec{\Psi}}}^{T} (t){{\varvec{\lambda}}} + {{\varvec{\Phi}}}^{T} (t){{\varvec{\rho}}} + {{\varvec{\Omega}}}^{T} (t){{\varvec{\theta}}} + v(t),$$
(10)

where the unknown information vector and the unknown parameter are respectively defined as \({{\varvec{\Psi}}}(t) = [\overline{u}(t - 2),\overline{u}(t - 3), \cdots ,\overline{u}(t - n_{b} )]^{T} \in R^{{n_{\lambda } }}\), \(\lambda = [b_{2} , \cdots ,b_{{n_{b} }} ]^{T} = [\lambda_{1} , \cdots ,\lambda_{{n_{\lambda } }} ]^{T} \in R^{{n_{\lambda } }}\), \(n_{\lambda } = n_{b} - 1\).

Equation (10) is the identification model of the GBIP system given in Fig. 1. The objective of this paper is to propose new fractional gradient identification algorithms based on the key term separation principle for estimating parameter vectors \({\varvec{a}},{{\varvec{\lambda}}},{{\varvec{\rho}}}\) and \({{\varvec{\theta}}}\) using the measured data.

Hierarchical stochastic gradient identification algorithm

The hierarchical stochastic gradient algorithm decomposes the system into several sub-systems with smaller dimension and fewer variables, thus reducing the complexity of the system and making the hierarchical stochastic gradient algorithm more efficient than the standard counterparts. Using the hierarchical identification principle49,50, the four intermediate variables are defined as:

$$y_{a} (t) = y(t) - {{\varvec{\Psi}}}^{T} (t){{\varvec{\lambda}}} - {{\varvec{\Phi}}}^{T} (t){{\varvec{\rho}}} - {{\varvec{\Omega}}}^{T} (t){{\varvec{\theta}}},$$
(11)
$$y_{\lambda } (t) = y(t) - {\varvec{f}}^{T} (u(t - 1)){\varvec{a}} - {{\varvec{\Phi}}}^{T} (t){{\varvec{\rho}}} - {{\varvec{\Omega}}}^{T} (t){{\varvec{\theta}}},$$
(12)
$$y_{\rho } (t) = y(t) - {\varvec{f}}^{T} (u(t - 1)){\varvec{a}} - {{\varvec{\Psi}}}^{T} (t){{\varvec{\lambda}}} - {{\varvec{\Omega}}}^{T} (t){{\varvec{\theta}}},$$
(13)
$$y_{\theta } (t) = y(t) - {\varvec{f}}^{T} (u(t - 1)){\varvec{a}} - {{\varvec{\Psi}}}^{T} (t){{\varvec{\lambda}}} - {{\varvec{\Phi}}}^{T} (t){{\varvec{\rho}}}.$$
(14)

Then Eq. (10) can be decomposed into four subsystems as bellow:

$$y_{a} (t) = {\varvec{f}}^{T} (u(t - 1)){\varvec{a}} + v(t),$$
(15)
$$y_{\lambda } (t) = {{\varvec{\Psi}}}^{T} (t){{\varvec{\lambda}}} + v(t),$$
(16)
$$y_{\rho } (t) = {{\varvec{\Phi}}}^{T} (t){{\varvec{\rho}}} + v(t),$$
(17)
$$y_{\theta } (t) = {{\varvec{\Omega}}}^{T} (t){{\varvec{\theta}}} + v(t).$$
(18)

Defining the quadratic cost functions according to (15–18):

$$J_{a} ({\varvec{a}}):\; = \left\| {y_{a} (t) - {\varvec{f}}^{T} (u(t - 1)){\varvec{a}}} \right\|^{2} ,$$
(19)
$$J_{\lambda } ({{\varvec{\lambda}}}):\; = \left\| {y_{\lambda } (t) - {{\varvec{\Psi}}}^{T} (t){{\varvec{\lambda}}}} \right\|^{2} ,$$
(20)
$$J_{\rho } ({{\varvec{\rho}}}):\; = \left\| {y_{\rho } (t) - {{\varvec{\Phi}}}^{T} (t){{\varvec{\rho}}}} \right\|^{2} ,$$
(21)
$$J_{\theta } ({{\varvec{\theta}}}):\; = \left\| {y_{\theta } (t) - {{\varvec{\Omega}}}^{T} (t){{\varvec{\theta}}}} \right\|^{2} .$$
(22)

Using the negative gradient search and minimizing \(J_{a} ({\varvec{a}}),J_{\lambda } ({{\varvec{\lambda}}}),J_{\rho } ({{\varvec{\rho}}})\) and \(J_{\theta } ({{\varvec{\theta}}})\), we can gain the following recursive algorithm for estimating \({\varvec{a}},{{\varvec{\lambda}}},{{\varvec{\rho}}}\) and \({{\varvec{\theta}}}\).

$$\begin{aligned} {\hat{\varvec{a}}}(t) = & {\hat{\varvec{a}}}(t - 1) - \frac{{\mu_{a} (t)}}{2}grad[J_{a} ({\hat{\varvec{a}}}(t - 1))] \\ = & {\hat{\varvec{a}}}(t - 1){ + }\mu_{a} (t){\varvec{f}}(u(t - 1)) \left[y_{a} (t) - {\varvec{f}}^{T} (u(t - 1)){\hat{\varvec{a}}}(t - 1)) \right] \\ \end{aligned}$$
$$= {\hat{\varvec{a}}}(t - 1){ + }\mu_{a} (t){\varvec{f}}(u(t - 1)) \left[y(t) - {{\varvec{\Psi}}}^{T} (t){\hat{\varvec{\lambda }}}(t - 1) - {{\varvec{\Phi}}}^{T} (t){\hat{\varvec{\rho }}}(t - 1) - {{\varvec{\Omega}}}^{T} (t){\hat{\varvec{\theta }}}(t - 1) - {\varvec{f}}^{T} (u(t - 1)){\hat{\varvec{a}}}(t - 1)) \right]$$
$$= {\hat{\varvec{a}}}(t - 1){ + }\mu_{a} (t){\varvec{f}}(u(t - 1))e(t),$$
(23)
$$e(t) = y(t) - {{\varvec{\Psi}}}^{T} (t){\hat{\varvec{\lambda }}}(t - 1) - {{\varvec{\Phi}}}^{T} (t){\hat{\varvec{\rho }}}(t - 1) - {{\varvec{\Omega}}}^{T} (t){\hat{\varvec{\theta }}}(t - 1) - {\varvec{f}}^{T} (u(t - 1)){\hat{\varvec{a}}}(t - 1)),$$
(24)
$$\mu_{a} (t) = \frac{1}{{r_{a}^{\varepsilon } (t)}},\frac{1}{2} < \varepsilon \le 1,$$
(25)
$$r_{a} (t) = r_{a} (t - 1) + \left\| {{\varvec{f}}(u(t - 1))} \right\|^{2} ,r_{a} (0) = 1,$$
(26)
$${\hat{\varvec{\lambda }}}(t) = {\hat{\varvec{\lambda }}}(t - 1) - \frac{{\mu_{\lambda } (t)}}{2}grad \left[J_{\lambda } ({\hat{\varvec{\lambda }}}(t - 1)) \right]$$
$$= {\hat{\varvec{\lambda }}}(t - 1) + \mu_{\lambda } (t){{\varvec{\Psi}}}(t) \left[y_{\lambda } (t) - {{\varvec{\Psi}}}^{T} (t){\hat{\varvec{\lambda }}}(t - 1) \right]$$
$$= {\hat{\varvec{\lambda }}}(t - 1) + \mu_{\lambda } (t){{\varvec{\Psi}}}(t) \left[y(t) - {\varvec{f}}^{T} (u(t - 1)){\hat{\varvec{a}}}(t - 1) - {{\varvec{\Phi}}}^{T} (t){\hat{\varvec{\rho }}}(t - 1) - {{\varvec{\Omega}}}^{T} (t){\hat{\varvec{\theta }}}(t - 1) - {{\varvec{\Psi}}}^{T} (t){\hat{\varvec{\lambda }}}(t - 1 \right)]$$
$$= {\hat{\varvec{\lambda }}}(t - 1) + \mu_{\lambda } (t){{\varvec{\Psi}}}(t)e(t),$$
(27)
$$\mu_{\lambda } (t) = \frac{1}{{r_{\lambda }^{\varepsilon } (t)}},\frac{1}{2} < \varepsilon \le 1,$$
(28)
$$r_{\lambda } (t) = r_{\lambda } (t - 1) + \left\| {{{\varvec{\Psi}}}(t)} \right\|^{2} ,r_{\lambda } (0) = 1,$$
(29)
$${\hat{\varvec{\rho }}}(t) = {\hat{\varvec{\rho }}}(t - 1) - \frac{{\mu_{\rho } (t)}}{2}grad \left[J_{\rho } ({\hat{\varvec{\rho }}}(t - 1)) \right]$$
$$= {\hat{\varvec{\rho }}}(t - 1) + \mu_{\rho } (t){{\varvec{\Phi}}}(t) \left[y_{\rho } (t) - {{\varvec{\Phi}}}^{T} (t){\hat{\varvec{\rho }}}(t - 1) \right]$$
$$= {\hat{\varvec{\rho }}}(t - 1) + \mu_{\rho } (t){{\varvec{\Phi}}}(t) \left[y(t) - {\varvec{f}}^{T} (u(t - 1)){\hat{\varvec{a}}}(t - 1) - {{\varvec{\Psi}}}^{T} (t){\hat{\varvec{\lambda }}}(t - 1) - {{\varvec{\Omega}}}^{T} (t){\hat{\varvec{\theta }}}(t - 1) - {{\varvec{\Phi}}}^{T} (t){\hat{\varvec{\rho }}}(t - 1) \right]$$
$$= {\hat{\varvec{\rho }}}(t - 1) + \mu_{\rho } (t){{\varvec{\Phi}}}(t)e(t),$$
(30)
$$\mu_{\rho } (t) = \frac{1}{{r_{\rho }^{\varepsilon } (t)}},\frac{1}{2} < \varepsilon \le 1,$$
(31)
$$r_{\rho } (t) = r_{\rho } (t - 1) + \left\| {{{\varvec{\Phi}}}(t)} \right\|^{2} ,r_{\rho } (0) = 1,$$
(32)
$${\hat{\varvec{\theta }}}(t) = {\hat{\varvec{\theta }}}(t - 1) - \frac{{\mu_{\theta } (t)}}{2}grad \left[J_{\theta } ({\hat{\varvec{\theta }}}(t - 1)) \right]$$
$$= {\hat{\varvec{\theta }}}(t - 1) + \mu_{\theta } (t){{\varvec{\Omega}}}(t) \left[y_{\theta } (t) - {{\varvec{\Omega}}}^{T} (t){\hat{\varvec{\theta }}}(t - 1) \right]$$
$$= {\hat{\varvec{\theta }}}(t - 1) + \mu_{\theta } (t){{\varvec{\Omega}}}(t) \left[y(t) - {\varvec{f}}^{T} (u(t - 1)){\hat{\varvec{a}}}(t - 1) - {{\varvec{\Psi}}}^{T} (t){\hat{\varvec{\lambda }}}(t - 1) - {{\varvec{\Phi}}}^{T} (t){\hat{\varvec{\rho }}}(t - 1) - {{\varvec{\Omega}}}^{T} (t){\hat{\varvec{\theta }}}(t - 1) \right]$$
$$= {\hat{\varvec{\theta }}}(t - 1) + \mu_{\theta } (t){{\varvec{\Omega}}}(t)e(t),$$
(33)
$$\mu_{\theta } (t) = \frac{1}{{r_{\theta }^{\varepsilon } (t)}},\frac{1}{2} < \varepsilon \le 1,$$
(34)
$$r_{\theta } (t) = r_{\theta } (t - 1) + \left\| {{{\varvec{\Omega}}}(t)} \right\|^{2} ,r_{\theta } (0) = 1,$$
(35)

where \(\mu_{a} (t),\,\mu_{\lambda } (t),\,\mu_{\rho } (t)\) and \(\mu_{\theta } (t)\) are the first-order derivative learning rates, \(\varepsilon\) is the convergence index.

But Eqs. (23), (24), (27), (29), (30), (33), (35) contain unmeasurable inner variables \(\overline{u}(t - i)\) in \({{\varvec{\Psi}}}(t)\) and \(w(t - i),v(t - i)\) in \({{\varvec{\Omega}}}(t)\), the recursive algorithm in (23–35) is impossible to implement. These unknown variables should be replaced with their corresponding estimates through the following relations:

$${\hat{\mathbf{\Psi }}}(t) = \left[\hat{\overline{u}}(t - 2),\hat{\overline{u}}(t - 3), \cdots ,\hat{\overline{u}}(t - n_{b} ) \right]^{T} ,$$
(36)
$$\hat{\overline{u}}(t - i) = \sum\limits_{j = 1}^{{n_{a} }} {\hat{a}_{j} (t - i)f_{j} (u(t - i))} = {\varvec{f}}^{T} (t - i){\hat{\varvec{a}}}(t - i),$$
(37)
$${\hat{\mathbf{\Omega }}}(t) = \left[ - \hat{w}(t - 1), - \hat{w}(t - 2), \cdots , - \hat{w}(t - n_{c} ),\hat{v}(t - 1),\hat{v}(t - 2), \cdots ,\hat{v}(t - n_{d} ) \right]^{T} ,$$
(38)
$$\hat{w}(t) = y(t) - {\varvec{f}}^{T} (u(t - 1)){\hat{\varvec{a}}}(t) - {\hat{\mathbf{\Psi }}}^{T} (t){\hat{\varvec{\lambda }}}(t) - {{\varvec{\Phi}}}^{T} (t){\hat{\varvec{\rho }}}(t),$$
(39)
$$\hat{v}(t) = \hat{w}(t) - {{\varvec{\Omega}}}^{T} (t){\hat{\varvec{\theta }}}(t).$$
(40)

Thus, we obtain the hierarchical stochastic gradient identification algorithm (HSG) for parameter estimation of GBIP system using the key term separation principle as follow:

$${\hat{\varvec{a}}}(t) = {\hat{\varvec{a}}}(t - 1){ + }\mu_{a} (t){\varvec{f}}(u(t - 1))e(t),$$
(41)
$$e(t) = y(t) - {\hat{\mathbf{\Psi }}}^{T} (t){\hat{\varvec{\lambda }}}(t - 1) - {{\varvec{\Phi}}}^{T} (t){\hat{\varvec{\rho }}}(t - 1) - {\hat{\mathbf{\Omega }}}^{T} (t){\hat{\varvec{\theta }}}(t - 1) - {\varvec{f}}^{T} (u(t - 1)){\hat{\varvec{a}}}(t - 1)),$$
(42)
$$\mu_{a} (t) = \frac{1}{{r_{a}^{\varepsilon } (t)}},\frac{1}{2} < \varepsilon \le 1,$$
(43)
$$r_{a} (t) = r_{a} (t - 1) + \left\| {{\varvec{f}}(u(t - 1))} \right\|^{2} ,r_{a} (0) = 1,$$
(44)
$${\hat{\varvec{\lambda }}}(t) = {\hat{\varvec{\lambda }}}(t - 1) + \mu_{\lambda } (t){\hat{\mathbf{\Psi }}}(t)e(t),$$
(45)
$$\mu_{\lambda } (t) = \frac{1}{{r_{\lambda }^{\varepsilon } (t)}},\frac{1}{2} < \varepsilon \le 1,$$
(46)
$$r_{\lambda } (t) = r_{\lambda } (t - 1) + \left\| {{\hat{\mathbf{\Psi }}}(t)} \right\|^{2} ,r_{\lambda } (0) = 1,$$
(47)
$${\hat{\varvec{\rho }}}(t) = {\hat{\varvec{\rho }}}(t - 1) + \mu_{\rho } (t){{\varvec{\Phi}}}(t)e(t),$$
(48)
$$\mu_{\rho } (t) = \frac{1}{{r_{\rho }^{\varepsilon } (t)}},\frac{1}{2} < \varepsilon \le 1,$$
(49)
$$r_{\rho } (t) = r_{\rho } (t - 1) + \left\| {{{\varvec{\Phi}}}(t)} \right\|^{2} ,r_{\rho } (0) = 1,$$
(50)
$${\hat{\varvec{\theta }}}(t) = {\hat{\varvec{\theta }}}(t - 1) + \mu_{\theta } (t){\hat{\mathbf{\Omega }}}(t)e(t),$$
(51)
$$\mu_{\theta } (t) = \frac{1}{{r_{\theta }^{\varepsilon } (t)}},\frac{1}{2} < \varepsilon \le 1,$$
(52)
$$r_{\theta } (t) = r_{\theta } (t - 1) + \left\| {{\hat{\mathbf{\Omega }}}(t)} \right\|^{2} ,r_{\theta } (0) = 1,$$
(53)
$${\hat{\mathbf{\Psi }}}(t) = \left[\hat{\overline{u}}(t - 2),\hat{\overline{u}}(t - 3), \cdots ,\hat{\overline{u}}(t - n_{b} ) \right]^{T} ,$$
(54)
$$\hat{\overline{u}}(t - i) = \sum\limits_{j = 1}^{{n_{a} }} {\hat{a}_{j} (t - i)f_{j} (u(t - i))} = {\varvec{f}}^{T} (t - i){\hat{\varvec{a}}}(t - i),$$
(55)
$${\hat{\mathbf{\Omega }}}(t) = \left[ - \hat{w}(t - 1), - \hat{w}(t - 2), \cdots , - \hat{w}(t - n_{c} ),\hat{v}(t - 1),\hat{v}(t - 2), \cdots ,\hat{v}(t - n_{d} ) \right]^{T} ,$$
(56)
$$\hat{w}(t) = y(t) - {\varvec{f}}^{T} (u(t - 1)){\hat{\varvec{a}}}(t) - {\hat{\mathbf{\Psi }}}^{T} (t){\hat{\varvec{\lambda }}}(t) - {{\varvec{\Phi}}}^{T} (t){\hat{\varvec{\rho }}}(t),$$
(57)
$$\hat{v}(t) = \hat{w}(t) - {{\varvec{\Omega}}}^{T} (t){\hat{\varvec{\theta }}}(t).$$
(58)

Hierarchical fractional least mean square identification algorithm

The better performance of the hierarchical fractional gradient approaches over the conventional fractional counterparts motivate researchers to design the fractional variant of identification algorithms that generalizes the hierarchical stochastic gradient algorithm to the case of fractional order. In this section, we present hierarchical fractional adaptive algorithms for GBIP system by minimizing the quadratic cost functions (19–22) through first-order derivative and fractional derivative. Before proposed the fractional gradient algorithms, the basic concepts of the fractional derivative are introduced briefly. One of the most widely used definitions of the fractional derivative is Riemann–Liouville(RL) definition, which of the unknown fractional order \(\delta\) with lower terminal at \(c\) for the function \(f(t)\) is given as51,52:

$$\left\{ \begin{gathered} {}_{c}^{RL} D_{t}^{\delta } f(t) = \frac{1}{\Gamma (n - \delta )}\frac{{d^{n} }}{{dt^{n} }}\int_{c}^{t} {(t - x)^{n - \delta - 1} f(x)dx} \hfill \\ t > c,n - 1 < \delta \le n \hfill \\ \end{gathered} \right..$$
(59)

The fractional order derivative of order \(\delta\) for the polynomial function \(f(t) = t^{n}\) is defined as53:

$${}_{c}D_{t}^{\delta } t^{n} = \frac{\Gamma (n + 1)}{{\Gamma (n - \delta + 1)}}t^{n - \delta } ,$$
(60)

where known constant \(n\) is a positive integer, the symbol \(\Gamma\) is the gamma function and is given as:

$$\Gamma (t) = \int_{0}^{\infty } {e^{ - x} x^{t - 1} dx} .$$
(61)

The quadratic cost function for the GBIP model is written as24:

$$J(n) = E \left[\left| {e(n)} \right|^{2} \right].$$
(62)

The estimation error \(e(n)\) is the difference between actual response \(d(n)\) and the estimated response \(\hat{d}(n)\), and it is given as:

$$e(n) = d(n) - \hat{d}(n) = d(n) - {\hat{\mathbf{w}}}^{T} (n){\varvec{x}}(n),$$
(63)

where \({\hat{\mathbf{w}}}(n)\) represents the estimate of the weight vector \({\mathbf{w}}(n)\), \({\mathbf{w}}(n) = \left[w_{0} (n),w_{1} (n), \cdots ,w_{M - 1} (n) \right]^{T}\), and \({\varvec{x}}(n)\) represents the input vector, \({\varvec{x}}(n) = \left[x(n),x(n - 1), \cdots ,x(n - M + 1) \right]^{T}\).

Minimizing the quadratic cost function (62) by taking first-order derivative and fractional derivative with respect to \(\hat{w}_{k}\), the recursive update relations of conventional fractional least mean square (FLMS) algorithm is written as54:

$$\hat{w}_{k} (n) = \hat{w}_{k} (n - 1) - \frac{{\mu_{1} }}{2}\frac{\partial J(n)}{{\partial \hat{w}_{k} }} - \frac{{\mu_{\delta } }}{2}\frac{{\partial^{\delta } J(n)}}{{\partial \hat{w}_{k}^{\delta } }},$$
(64)

where \(\mu_{1}\) and \(\mu_{\delta }\) are the learning rates for the first-order derivative terms and fractional derivative terms, \(k = 0,1, \cdots ,M - 1\).

Using (60) and (61), the fractional derivative of the cost function is given as:

$$\frac{{\partial^{\delta } J(n)}}{{\partial \hat{w}_{k}^{\delta } }} = - 2e(n)x(n - k)\frac{{\partial^{\delta } \hat{w}_{k} (n)}}{{\partial \hat{w}_{k}^{\delta } }}$$
(65)
$$= - 2e(n)x(n - k)\frac{1}{\Gamma (2 - \delta )}\hat{w}_{k}^{1 - \delta } (n).$$
(66)

Substituting (66) in (64), and assuming \(\hat{w}_{k}^{1 - \delta } (n) \cong \hat{w}_{k}^{1 - \delta } (n - 1)\), the adaptive weight updating equation of FLMS algorithm is rewritten as bellow:

$$\hat{w}_{k} (n) = \hat{w}_{k} (n - 1) + \mu_{1} e(n)x(n - k) + \mu_{\delta } e(n)x(n - k)\frac{1}{\Gamma (2 - \delta )}\hat{w}_{k}^{1 - \delta } (n - 1).$$
(67)

The vector form of the FLMS algorithm for the GBIP system is given as:

$${\hat{\mathbf{w}}}(n) = {\hat{\mathbf{w}}}(n - 1) + \mu_{1} e(n){\varvec{x}}(n) + \frac{{\mu_{\delta } }}{\Gamma (2 - \delta )}e(n){\varvec{x}}(n) \circ \left| {{\hat{\mathbf{w}}}} \right|^{1 - \delta } (n - 1),$$
(68)

where the absolute value of the estimate of weight vector \({\hat{\mathbf{w}}}(n)\) is taken for avoiding complex entries, and the symbol \(\circ\) is expresses an element by element multiplication of input vector \({\varvec{x}}(n)\) and \({\hat{\mathbf{w}}}(n)\).

Based on Eq. (68), we have:

$$\begin{aligned} {\hat{\varvec{a}}}(t) &= {\hat{\varvec{a}}}(t - 1) - \frac{{\mu_{a} (t)}}{2}\frac{{\partial J_{a} ({\hat{\varvec{a}}}(t - 1))}}{{\partial {\hat{\varvec{a}}}}} - \frac{{\mu_{\delta } (t)}}{2}\frac{{\partial^{\delta } \left[J_{a} ({\hat{\varvec{a}}}(t - 1)) \right]}}{{\partial {\hat{\varvec{a}}}}} \\ &= {\hat{\varvec{a}}}(t - 1) + \mu_{a} (t)e(t){\varvec{f}}(u(t - 1)){ + }\frac{{\mu_{\delta } }}{\Gamma (2 - \delta )}e(t){\varvec{f}}(u(t - 1)) \circ \left| {{\hat{\varvec{a}}}} \right|^{1 - \delta } (t - 1).\\ \end{aligned}$$
(69)

For enhancing the convergence of the proposed K-HFLMS algorithm, we have \(\mu_{a} (t) = \mu_{\delta }\), then the Eq. (69) can be updated as:

$${\hat{\varvec{a}}}(t) = {\hat{\varvec{a}}}(t - 1) + \mu_{\delta } e(t){\varvec{f}}(u(t - 1)){ + }\frac{{\mu_{\delta } }}{\Gamma (2 - \delta )}e(t){\varvec{f}}(u(t - 1)) \circ \left| {{\hat{\varvec{a}}}} \right|^{1 - \delta } (t - 1).$$
(70)

By minimizing the quadratic cost functions (19–22), then the K-HFLMS for GBIP system is given as bellow:

$${\hat{\varvec{a}}}(t) = {\hat{\varvec{a}}}(t - 1) + \mu_{\delta } e(t){\varvec{f}}(u(t - 1)) \circ \left[1 + \frac{1}{\Gamma (2 - \delta )}\left| {{\hat{\varvec{a}}}} \right|^{1 - \delta } (t - 1) \right],$$
(71)
$$e(t) = y(t) - {\hat{\mathbf{\Psi }}}^{T} (t){\hat{\varvec{\lambda }}}(t - 1) - {{\varvec{\Phi}}}^{T} (t){\hat{\varvec{\rho }}}(t - 1) - {\hat{\mathbf{\Omega }}}^{T} (t){\hat{\varvec{\theta }}}(t - 1) - {\varvec{f}}^{T} (u(t - 1)){\hat{\varvec{a}}}(t - 1)),$$
(72)
$${\hat{\varvec{\lambda }}}(t) = {\hat{\varvec{\lambda }}}(t - 1) + \mu_{\lambda } (t)e(t){\hat{\mathbf{\Psi }}}(t) + \frac{{\mu_{\delta } }}{\Gamma (2 - \delta )}e(t){\hat{\mathbf{\Psi }}}(t) \circ \left| {{\hat{\varvec{\lambda }}}} \right|^{1 - \delta } (t - 1),$$
(73)
$$\mu_{\lambda } (t) = \frac{1}{{r_{\lambda }^{\varepsilon } (t)}},\;\frac{1}{2} < \varepsilon \le 1,$$
(74)
$$r_{\lambda } (t) = r_{\lambda } (t - 1) + \left\| {{\hat{\mathbf{\Psi }}}(t)} \right\|^{2} ,r_{\lambda } (0) = 1,$$
(75)
$${\hat{\varvec{\rho }}}(t) = {\hat{\varvec{\rho }}}(t - 1) + \mu_{\rho } (t)e(t){{\varvec{\Phi}}}(t) + \frac{{\mu_{\delta } }}{\Gamma (2 - \delta )}e(t){{\varvec{\Phi}}}(t) \circ \left| {{\hat{\varvec{\rho }}}} \right|^{1 - \delta } (t - 1),$$
(76)
$$\mu_{\rho } (t) = \frac{1}{{r_{\rho }^{\varepsilon } (t)}},\;\frac{1}{2} < \varepsilon \le 1,$$
(77)
$$r_{\rho } (t) = r_{\rho } (t - 1) + \left\| {{{\varvec{\Phi}}}(t)} \right\|^{2} ,r_{\rho } (0) = 1,$$
(78)
$${\hat{\varvec{\theta }}}(t) = {\hat{\varvec{\theta }}}(t - 1) + \mu_{\theta } (t)e(t){\hat{\mathbf{\Omega }}}(t) + \frac{{\mu_{\delta } }}{\Gamma (2 - \delta )}e(t){\hat{\mathbf{\Omega }}}(t) \circ \left| {{\hat{\varvec{\theta }}}} \right|^{1 - \delta } (t - 1),$$
(79)
$$\mu_{\theta } (t) = \frac{1}{{r_{\theta }^{\varepsilon } (t)}},\;\frac{1}{2} < \varepsilon \le 1,$$
(80)
$$r_{\theta } (t) = r_{\theta } (t - 1) + \left\| {{\hat{\mathbf{\Omega }}}(t)} \right\|^{2} ,r_{\theta } (0) = 1,$$
(81)
$${\hat{\mathbf{\Psi }}}(t) = \left[\hat{\overline{u}}(t - 2),\hat{\overline{u}}(t - 3), \cdots ,\hat{\overline{u}}(t - n_{b} ) \right]^{T} ,$$
(82)
$$\hat{\overline{u}}(t - i) = \sum\limits_{j = 1}^{{n_{a} }} {\hat{a}_{j} (t - i)f_{j} (u(t - i))} = {\varvec{f}}^{T} (t - i){\hat{\varvec{a}}}(t - i),$$
(83)
$${\hat{\mathbf{\Omega }}}(t) = \left[ - \hat{w}(t - 1), - \hat{w}(t - 2), \cdots , - \hat{w}(t - n_{c} ),\hat{v}(t - 1),\hat{v}(t - 2), \cdots ,\hat{v}(t - n_{d} ) \right]^{T} ,$$
(84)
$$\hat{w}(t) = y(t) - {\varvec{f}}^{T} (u(t - 1)){\hat{\varvec{a}}}(t) - {\hat{\mathbf{\Psi }}}^{T} (t){\hat{\varvec{\lambda }}}(t) - {{\varvec{\Phi}}}^{T} (t){\hat{\varvec{\rho }}}(t),$$
(85)
$$\hat{v}(t) = \hat{w}(t) - {\hat{\mathbf{\Omega }}}^{T} (t){\hat{\varvec{\theta }}}(t).$$
(86)

When \(\mu_{\delta } = 0\), the K-HFLMS algorithm (69) and (72–86) can be reduced to the hierarchical stochastic gradient (HSG) algorithm. And when \(\mu_{a} (t) = \mu_{\lambda } (t) = \mu_{\rho } (t) = \mu_{\theta } (t) = 0\), the K-HFLMS algorithm also can be reduced to the hierarchical fractional stochastic gradient algorithm based on the key term separation principle (K-HFSG)43.

Multi-innovation hierarchical fractional least mean square identification algorithm

In order to improve the identification accuracy and convergence speed of the K-HFLMS algorithm, the multi-innovation theory was introduced for the fractional system identification55,56. The main idea consists in expanding the information vector \({\varvec{f}}(u(t - 1)) \in R^{{n_{a} }}\),\({\hat{\mathbf{\Psi }}}(t) \in R^{{n_{\lambda } }}\) to the information matrix \({{\varvec{\Gamma}}}(u(t - 1)) \in R^{{n_{a} \times p}}\), \({\hat{\mathbf{\Pi }}}(p,t) \in R^{{n_{\lambda } \times p}}\), and the scalar innovation \(e(t) \in R^{1}\) to the innovation vector \({\varvec{E}}(p,t) \in R^{p}\). Compared with the K-HFLMS algorithm, the multi-innovation algorithm in this section can make full use of not only the current time data but also the previous data, where \(p\) is the innovation length. The innovation vector is defined as:

$${\varvec{E}}(p,t) = \left[ \begin{gathered} \quad \;\,e(t) \hfill \\ \;\;\;e(t - 1) \hfill \\ \quad \quad \vdots \hfill \\ e(t - p + 1) \hfill \\ \end{gathered} \right],$$
(87)
$$= \left[ \begin{gathered} \quad \quad \quad \quad y(t) - {\hat{\mathbf{\Psi }}}^{T} (t){\hat{\varvec{\lambda }}}(t - 1) - {{\varvec{\Phi}}}^{T} (t){\hat{\varvec{\rho }}}(t - 1) - {\hat{\mathbf{\Omega }}}^{T} (t){\hat{\varvec{\theta }}}(t - 1) - {\varvec{f}}^{T} (u(t - 1)){\hat{\varvec{a}}}(t - 1)) \hfill \\ \quad \quad \;y(t - 1) - {\hat{\mathbf{\Psi }}}^{T} (t - 1){\hat{\varvec{\lambda }}}(t - 1) - {{\varvec{\Phi}}}^{T} (t - 1){\hat{\varvec{\rho }}}(t - 1) - {\hat{\mathbf{\Omega }}}^{T} (t - 1){\hat{\varvec{\theta }}}(t - 1) - {\varvec{f}}^{T} (u(t - 2)){\hat{\varvec{a}}}(t - 1)) \hfill \\ \quad \quad \quad \quad \quad \quad \quad \vdots \hfill \\ y(t - p + 1) - {\hat{\mathbf{\Psi }}}^{T} (t - p + 1){\hat{\varvec{\lambda }}}(t - 1) - {{\varvec{\Phi}}}^{T} (t - p + 1){\hat{\varvec{\rho }}}(t - 1) - {\hat{\mathbf{\Omega }}}^{T} (t - p + 1){\hat{\varvec{\theta }}}(t - 1) - {\varvec{f}}^{T} (u(t - p)){\hat{\varvec{a}}}(t - 1)) \hfill \\ \end{gathered} \right].$$
(88)

Accordingly, the stacked output vector and the stacked information matrices are written as:

$${\varvec{Y}}(p,t) = \left[y(t),y(t - 1), \cdots ,y(t - p + 1) \right]^{T} \in R^{p} ,$$
(89)
$${{\varvec{\Gamma}}}(p,t) = \left[{\varvec{f}}(u(t - 1)),{\varvec{f}}(u(t - 2)), \cdots ,{\varvec{f}}(u(t - p)) \right] \in R^{{n_{a} \times p}} ,$$
(90)
$${\hat{\mathbf{\Pi }}}(p,t) = \left[{\hat{\mathbf{\Psi }}}(t),{\hat{\mathbf{\Psi }}}(t - 1), \cdots ,{\hat{\mathbf{\Psi }}}(t - p + 1) \right] \in R^{{n_{\lambda } \times p}} ,$$
(91)
$${{\varvec{\Theta}}}(p,t) = \left[{{\varvec{\Phi}}}(t),{{\varvec{\Phi}}}(t - 1), \cdots ,{{\varvec{\Phi}}}(t - p + 1) \right] \in R^{{n_{\rho } \times p}} ,$$
(92)
$${\hat{\mathbf{\Xi }}}(p,t) = \left[{\hat{\mathbf{\Omega }}}(t),{\hat{\mathbf{\Omega }}}(t - 1), \cdots ,{\hat{\mathbf{\Omega }}}(t - p + 1) \right] \in R^{{n_{\theta } \times p}} .$$
(93)

Then (88) can be rewritten as:

$${\mathbf{E}}(p,t) = {\mathbf{Y}}(p,t) - {\hat{\mathbf{\Pi }}}^{\text{T}} (p,t){\hat{\varvec{\lambda }}}(t - 1) - {{\varvec{\Theta}}}^{\text{T}} (p,t){\hat{\varvec{\rho }}}(t - 1) - {\hat{\mathbf{\Xi }}}^{\text{T}} (p,t){\hat{\varvec{\theta }}}(t - 1) - {{\varvec{\Gamma}}}^{\text{T}} (p,t){\hat{\varvec{a}}}(t - 1).$$
(94)

So the K-MHFLMS for GBIP system is given as:

$${\hat{\varvec{a}}}(t) = {\hat{\varvec{a}}}(t - 1) + \mu_{\delta } {{\varvec{\Gamma}}}(p,t){\mathbf{E}}(p,t) \circ \left[1 + \frac{1}{\Gamma (2 - \delta )}\left| {{\hat{\varvec{a}}}} \right|^{1 - \delta } (t - 1) \right],$$
(95)
$${\hat{\varvec{\lambda }}}(t) = {\hat{\varvec{\lambda }}}(t - 1) + \mu_{\delta } {\hat{\mathbf{\Pi }}}(p,t){\mathbf{E}}(p,t) \circ \left[1 + \frac{1}{\Gamma (2 - \delta )}\left| {{\hat{\varvec{\lambda }}}} \right|^{1 - \delta } (t - 1) \right],$$
(96)
$${\hat{\varvec{\rho }}}(t) = {\hat{\varvec{\rho }}}(t - 1) + \mu_{\delta } {{\varvec{\Theta}}}(p,t){\mathbf{E}}(p,t) \circ \left[1 + \frac{1}{\Gamma (2 - \delta )}\left| {{\hat{\varvec{\rho }}}} \right|^{1 - \delta } (t - 1) \right],$$
(97)
$${\hat{\varvec{\theta }}}(t) = {\hat{\varvec{\theta }}}(t - 1) + \mu_{\delta } {\hat{\mathbf{\Xi }}}(p,t){\mathbf{E}}(p,t) \circ \left[1 + \frac{1}{\Gamma (2 - \delta )}\left| {{\hat{\varvec{\theta }}}} \right|^{1 - \delta } (t - 1) \right],$$
(98)
$${\hat{\mathbf{\Psi }}}(t) = \left[\hat{\overline{u}}(t - 2),\hat{\overline{u}}(t - 3), \cdots ,\hat{\overline{u}}(t - n_{b} ) \right]^{T} ,$$
(99)
$$\hat{\overline{u}}(t - i) = \sum\limits_{j = 1}^{{n_{a} }} {\hat{a}_{j} (t - i)f_{j} (u(t - i))} = {\varvec{f}}^{T} (t - i){\hat{\varvec{a}}}(t - i),$$
(100)
$${\hat{\mathbf{\Omega }}}(t) = \left[ - \hat{w}(t - 1), - \hat{w}(t - 2), \cdots , - \hat{w}(t - n_{c} ),\hat{v}(t - 1),\hat{v}(t - 2), \cdots ,\hat{v}(t - n_{d} ) \right]^{T} ,$$
(101)
$$\hat{w}(t) = y(t) - {\varvec{f}}^{T} (u(t - 1)){\hat{\varvec{a}}}(t) - {\hat{\mathbf{\Psi }}}^{T} (t){\hat{\varvec{\lambda }}}(t) - {{\varvec{\Phi}}}^{T} (t){\hat{\varvec{\rho }}}(t),$$
(102)
$$\hat{v}(t) = \hat{w}(t) - {\hat{\mathbf{\Omega }}}^{T} (t){\hat{\varvec{\theta }}}(t).$$
(103)

Obviously, the K-MHFLMS algorithm (89–103) can improve parameter estimation performance by using full of all the collected data. When \(p = 1\), the K-MHFLMS algorithm will reduce to the K-HFLSM algorithm.

Simulation experimentation and results discussion

In this section, the simulation experimentation results are given for the example of GBIP system using the proposed K-MHFLMS and K-HFLMS algorithms for different noise levels, fractional orders and innovation lengths. The graphical abstract of proposed strategy is given in Fig. 2.

Fig. 2
figure 2

Graphical abstract of proposed strategy for GBIP system.

The example

In order to verify the performance of the proposed fractional algorithm, the following parameters are considered in GBIP system with \(n_{a} = 2\),\(n_{b} = 3\),\(n_{\lambda } = 2\),\(n_{\rho } = 2\),\(n_{c} = 1\) and \(n_{d} = 1\).

where,

$$\begin{aligned} {\varvec{F}}(t) &= \left[{\varvec{f}}(u(t - 1)),{\varvec{f}}(u(t - 2)),{\varvec{f}}(u(t - 3)) \right]\\ &= \left[ {\begin{array}{*{20}c} \begin{gathered} f_{1} (u(t - 1)) \hfill \\ f_{2} (u(t - 1)) \hfill \\ \end{gathered} & \begin{gathered} f_{1} (u(t - 2)) \hfill \\ f_{2} (u(t - 2)) \hfill \\ \end{gathered} & \begin{gathered} f_{1} (u(t - 3)) \hfill \\ f_{2} (u(t - 3)) \hfill \\ \end{gathered} \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} \begin{gathered} u(t - 1) \hfill \\ u^{2} (t - 1) \hfill \\ \end{gathered} & \begin{gathered} u(t - 2) \hfill \\ u^{2} (t - 2) \hfill \\ \end{gathered} & \begin{gathered} u(t - 3) \hfill \\ u^{2} (t - 3) \hfill \\ \end{gathered} \\ \end{array} } \right] \in R^{2 \times 3} \end{aligned},$$
$${\varvec{a}} = \left[ {a_{1} ,a_{2} } \right]^{T} = \left[ {0.80,1.40} \right]^{T} \in R^{2} ,\,{\varvec{b}} = \left[ {b_{1} ,b_{2} ,b_{3} } \right]^{T} = \left[ {1,0.40,1.90} \right]^{T} \in R^{3} ,\,{{\varvec{\Phi}}}(t) = [1,u(t - 1)]^{T} \in R^{2} ,$$
$${{\varvec{\rho}}} = [\rho_{1} ,\rho_{2} ]^{T} = \left[ {3.00, - 2.00} \right]^{T} \in R^{2} ,\,C(z) = 1 + c_{1} z^{ - 1} = 1 + 0.65z^{ - 1} ,\,D(z) = 1 + d_{1} z^{ - 1} = 1 - 1.20z^{ - 1} .$$

Then the parameter vector of GBIP system is written as:

$${{\varvec{\lambda}}} = [\lambda_{1} ,\lambda_{2} ]^{T} = [b_{2} ,b_{3} ]^{T} = [0.40,1.90]^{T} ,\,{{\varvec{\theta}}} = [c_{1} ,d_{1} ]^{T} = [0.65, - 1.20]^{T},$$
$${{\varvec{\Sigma}}}: = [{\varvec{a}}^{T} ,{{\varvec{\lambda}}}^{T} ,{{\varvec{\rho}}}^{T} ,{{\varvec{\theta}}}^{T} ]^{T} = [0.80,1.40,0.40,1.90,3.00, - 2.00,0.65, - 1.20]^{T} \in R^{8} .$$

In order to evaluate the performance of the proposed K-MHFLMS and K-HFLMS algorithm for GBIP system, three performance metrics based on the estimation error are used, i.e., fitness function (Fitness), mean square error (MSE), and Nash Sutcliffe efficiency (NSE). They are respectively defined as following57:

$${\text{Fitness}} :\; = \frac{{\left\| {{\hat{\mathbf{\Sigma }}} - {{\varvec{\Sigma}}}} \right\|}}{{\left\| {{\varvec{\Sigma}}} \right\|}},$$
$${\text{MSE}} :\; = {\text{mean}} ({\hat{\mathbf{\Sigma }}} - {{\varvec{\Sigma}}})^{2} ,$$
$${\text{NSE}} :\; = 1 - \left( {\frac{{{\text{mean}} ({\hat{\mathbf{\Sigma }}} - {{\varvec{\Sigma}}})^{2} }}{{{\text{mean}} [{{\varvec{\Sigma}}} - {\text{mean}} ({{\varvec{\Sigma}}})]^{2} }}} \right),$$

where \({\hat{\mathbf{\Sigma }}}(t)\) is the value of an estimated parameter vector, and \({{\varvec{\Sigma}}}\) is the actual parameter vector at \(t\) th iteration of the proposed hierarchical fractional identification algorithms. In case of perfect model, the optimum values of MSE and NSE are zero and one.

In simulation experimentation, a zero mean and unit variance persistent excitation signal sequence with the uniform distribution is taken as the input \(\left\{ {u(t)} \right\}\), and the uncorrelated white noise disturbance sequence \(\left\{ {v(t)} \right\}\) is taken as a zero mean and variance \(\sigma^{2}\) having the normal distribution, and the autoregressive moving average noise \(w(t)\) is generated based on Eq. (5) , then the system output \(y(t)\) can be obtained according to Eq. (10).

The data length is \(10800\), using the first \(L = 10000\) data and applying the proposed K-HFLMS and K-MHFLMS algorithms, we estimated the parameter vector \({{\varvec{\Sigma}}}\) of the GBIP model, and then used the remaining \(L_{\;r} = 800\) data from \(t = L + 1 = 10001\) to \(t = L + L_{\;r} = 1080010800\) to validate the effectiveness of the proposed multi-innovation hierarchical fractional algorithm. The fractional learning rate \(\mu_{\delta }\) is empirically selected as \(0.0001\). The convergence index is taken as \(\varepsilon = 0.65\). The robustness of the fractional adaptive algorithms are evaluated for the noise variances \(\sigma^{2} { = 0}{\text{.2}}^{2} {,0}{\text{.5}}^{2} {,}0.8^{2}\). The fractional order is examined for five different values, i.e., \(\delta = 0.4,0.7,1.0,1.2,1.4\). The effectiveness of the K-MHFLMS algorithm is verified for distinct innovation lengths namely \(p = 1,2,3,4\). The proposed fractional algorithms are evaluated based on Fitness and MSE criterion and results are shown in the Tables and Figures.

The effect of noise variances on the hierarchical fractional algorithms

The robustness of the K-HFLMS and K-MHFLMS strategies for GBIP model is tested for three different noise variances \(\sigma^{2} = 0.2^{2} ,0.5^{2} ,0.8^{2}\), and the fractional order is taken as \(\delta = 1.2\), the innovation lengths are taken as \(p = 2,3,4\) of the K-MHFLMS algorithm, the results are presented in Fig. 3 and Tables 1 and 2 based on the MSE values and the Fitness values respectively for these noise variances. Form Fig. 3 and Tables 1 and 2, it can be seen that the proposed fractional adaptive algorithms are all convergent and accurate for the noise variances, and while the accuracy of both K-HFLMS and K-MHFLMS strategies increases with the increase in noise variance. All the strategies provide better results for higher noise variance i.e., \(\sigma^{2} = 0.8^{2}\). So it is justifiable to take \(\sigma^{2} = 0.8^{2}\) in later investigations.

Fig. 3
figure 3

The estimation errors curves versus t for different noise variances in the example.

Table 1 Comparative results of the K-HFLMS estimates and errors for the example with \(\sigma { = 0}{\text{.2}},\,{0}{\text{.5}},\,{0}{\text{.8}}\),\(\delta = 1.2\).
Table 2 Comparative results of the K-MHFLMS \((p = 4)\) estimates and errors for the example with \(\sigma { = 0}{\text{.2}},\,{0}{\text{.5}},\,{0}{\text{.8}}\),\(\delta = 1.2\).

The effect of fractional orders on the hierarchical fractional algorithms

The selection of fractional order is very significant but a little complicated for the parameter identification of fractional adaptive strategies. To choose a more appropriate fractional order, we evaluated the proposed algorithms for five different values of fractional order \(\delta = 0.4,0.7,1.0,1.2,1.4\), and the noise variances and the innovation lengths of the K-MHFLMS algorithm are selected as \(\sigma^{2} = 0.8^{2}\) and \(p = 2,3,4\) respectively, the results are shown in Fig. 4 and Tables 3 and 4 in terms of the MSE values and the Fitness values as before. From Fig. 4 and Tables 3 and 4, we can conclude that the proposed fractional adaptive strategies are effective for the fractional orders, and the accuracy of both K-HFLMS and K-MHFLMS strategies increases with the increase in fractional order from \(\delta = 0.4\) to \(\delta = 1.2\), but from the Fig. 4a, b , the effect of \(\delta = 1.4\) is obviously different from those of the other fractional orders. Then it is reasonable to take \(\delta = 1.2\) in later investigations.

Fig. 4
figure 4

The estimation errors curves versus t for different fractional order in the example.

Table 3 Comparative results of the K − HFLMS estimates and errors for the example with \(\delta { = 0}{\text{.4,0}}{.7,1}{\text{.0,1}}{.2,1}{\text{.4}}\),\(\sigma = 0.8\).
Table 4 Comparative results of the K-MHFLMS \((p = 4)\) estimates and errors for the example with \(\delta { = 0}{\text{.4,0}}{.7,1}{\text{.0,1}}{.2,1}{\text{.4}}\),\(\sigma = 0.8\).

The effect of innovation lengths on the multi-innovation hierarchical fractional algorithm

For further verified the effect of innovation lengths to the proposed multi-innovation hierarchical fractional algorithm, we also examine the proposed K-MHFLMS algorithm in detail for four different innovation lengths \(p = 1,2,3,4\). The noise variance and the fractional order are selected four groups, that is: \(\delta { = 0}{\text{.4}}\)(\(\sigma^{2} = 0.2^{2}\)) and \(\delta { = 1}{\text{.2}}\)(\(\sigma^{2} = 0.2^{2}\),\(\sigma^{2} = 0.5^{2}\),\(\sigma^{2} = 0.8^{2}\)). The results are presented in Fig. 5 and Tables 5 and 6 in terms of the Fitness values. It is clearly noticed that the error of K-MHFLMS(\(p = 4\)) is the smallest in the errors of the four multi-innovation hierarchical algorithms, and it provides better effect than K-HFLMS and K-MHFLMS (\(p = 2,3\)) in terms of convergence and accuracy. This also verified that the proposed K-MHFLMS algorithms when \(p = 2,3,4\) are more effective and more accurate than the K-HFLMS algorithm which is \(p = 1\) of the K-MHFLMS algorithms.

Fig. 5
figure 5

The estimation errors curves versus t for different innovation length in the example.

Table 5 Comparative results of the K-MHFLMS estimates and errors for the example with \(p = 1,2,3,4\),\(\sigma = 0.2\),\(\delta { = 0}{\text{.4}}\).
Table 6 Comparative results of the K-MHFLMS estimates and errors for the example with \(p = 1,2,3,4\),\(\sigma = 0.8\),\(\delta { = 1}{\text{.2}}\).

The comparative analysis of the proposed hierarchical fractional algorithms

We compare the proposed the K- HFLMS algorithm with the K-HFSG algorithm based on MSE for the example.

with \(\sigma^{2} = 0.2^{2} ,0.8^{2}\) and \(\delta = 0.4,1.2\), and the comparative analysis results are show in Fig. 6. From Fig. 6, it is clearly observed that the performance of the K-HFLMS algorithm is better than that of the K-HFSG algorithm, and the K-HFLMS algorithm has more faster convergence rate compared with the K-HFSG algorithm. So it is concluded that K-HFLMS adaptive method gives more accurate estimation precision than the K-HFSG adaptive method.

Fig. 6
figure 6

The K-HFSG and the K-HFLMS algorithms estimation errors MSE versus t for the example.

Model validation

Based on the estimated parameters when \(t = 10000\), the rest \(800\) data from \(t = 10001\) to \(t = 10800\) are taken to further validate the effectiveness of the multi-innovation hierarchical fractional algorithm.

The average predicted output error of the multi-innovation hierarchical fractional identification model is written as55:

$$E{\text{rror}}(L_{r} ) = \left[\frac{1}{{L_{r} }}\sum\limits_{{t = t_{0} + 1}}^{{t_{0} + L_{r} }} {\left[\hat{y}(t) - y(t) \right]^{2} } \right ]^{\frac{1}{2}} ,$$
(104)

where \(\hat{y}(t)\) is the output of the estimated models and \(y(t)\) is the actual model output. According to the express (104), the average prediction errors of the four multi-innovation algorithms of \(p = 1,2,3,4\) are calculated when \(\delta { = 0}{\text{.4}}\)(\(\sigma^{2} = 0.2^{2}\)) and \(\delta { = 1}{\text{.2}}\)(\(\sigma^{2} = 0.2^{2}\),\(\sigma^{2} = 0.5^{2}\),\(\sigma^{2} = 0.8^{2}\)) respectively, the results are shown in Table 7. From Table 7, it can be further indicate that the average predicted output errors of K-MHFLMS strategies decrease with the increase in innovation lengths from \(p = 1\) to \(p = 4\), the proposed multi-innovation fractional algorithm is effective, and K-MHFLMS can achieve more accurate estimation than K-HFLMS based on the average prediction errors. This also validates the previous conclusion namely that the convergence rate of the K-MHFLMS algorithms increases in case of the increase in noise variance.

Table 7 Average predicted output error results of the K-MHFLMS for the example with \(p = 1,2,3,4\).

Conclusions

The K-HFLMS algorithm and the K-MHFLMS algorithm are proposed by combining fractional derivative concepts based on the key term separation principle and the hierarchical identification principle for effective parameter estimation of GBIP system.

The K-MHFLMS algorithm is generalized from the K-HFLMS strategy through exploiting the multi-innovation theory.

The proposed hierarchical fractional algorithms are convergent and accurate for the three noise variances, and better results are achieved for higher noise variance.

The designed hierarchical fractional strategies are effective and robust for the fractional orders. And the relatively better results are found for higher fractional order for the identification algorithms.

It can be obviously noticed that the relatively better results are achieved for higher innovation lengths of the multi-innovation hierarchical algorithms based on the Fitness error and the average predicted output error.

The results of the simulation analyses demonstrate that the K-MHFLMS algorithms have more accurate and robust performance than the K-HFLMS counterparts.

In the future, there are more challenging parameter identification problems for us to investigate, especially the fractional adaptive strategies based on the filtering technique and the hierarchical identification principle would be the important works for us to attempt. The proposed the multi-innovation hierarchical adaptive algorithm can be coupled with the filtering technique, the multi-stage estimation theory, the key term separation technique to solve more complicated parameter estimation problems of GBIP system, CARMA system, CARARMA system, Hammerstein OEMA system, finite impulse response MA system, mutli-input multi-output nonlinear system and kernel systems58,59,60,61,62,63,64,65.