Abstract
The limited nature of short series presents difficulties for classical prediction models, as each may only contain partial information about the underlying pattern. A straightforward solution would be to concatenate these short series into longer ones in order to enhance the model performance. However, this approach can result in significant deviations from the original series, owing to substantial value differences at the junctions. Direct concatenation of the short series will ultimately disrupt the periodicity and regularity of the nearby data. This paper proposes a multi-series prediction model that is based on connecting series (hereafter referred to as connectors) and an Empirical Mode Decomposition (EMD) process. It normalizes the dataset first, and then two types of connectors are introduced between the short series for concatenation: the linear interpolation connector and the linear interpolation with random vibration (LRV) connector. Subsequently, the connected long series is decomposed into sub-sequences using EMD. Experimental results show that with the connectors EMD decomposes sub-sequences that are better aligned with the characteristics of the original short series. Specifically, the LRV connectors are suitable for multi-series with periodic characteristics, while linear interpolation connectors are more appropriate for the series short of such periodicity.
Similar content being viewed by others
Introduction
Time series prediction has been applied in widespread domains, such as the prediction of stock price1, weather forecast2, trajectory of hurricanes or typhoons3, traffic flow prediction4, etc. The time series vary in the sampling number and frequency, which challenge the training process of the prediction models. In time series prediction, a model acquires the knowledge (features) of various regular characteristics inherent in the entire dataset. The prediction model has to be trained with different approaches for different datasets due to the variation of the collected sample series. In the datasets with long series, the model can be trained either independently on each series or collectively. However, for those datasets with short series (within hundreds of samples), the common way is to train the model with multiple series in one session. The paper will focus on this kind of ‘multiple-series’ training of prediction models.
Many time series comprise a combination of an overall or long-term trend, distinct long and short period regularities, and irregular fluctuations5 from nonlinear and non-stationary series. This characteristic significantly elevates the feature complexity compared to those from stationary series, and consequently limits the predictive capacity. Currently, several effective methods exist to segregate different periodic regularities and overall variation trends in time series, such as the Seasonal and Trend decomposition using LOESS (STL), wavelet transform, Empirical Mode Decomposition (EMD), etc.5. There are also some related improved models such as RobustSTL6, Fast RobustSTL7, Seasonal-Trend-Dispersion decomposition (STD)5.
However, the aforementioned methods to decompose periodic regularities and general trends of time series, including EMD that we will employ, are constrained to decompose a single long series at a time. There are at least 3 limitations of such methods.
-
The features learned from training with a single short series might be incomplete to all the features of the dataset.
-
The features learned from multiple short series might be inconsistent to each other.
-
It is none-economic to feed a model with short series when it is capable to accept a longer series.
Consequently, it is necessary to concatenate each individual series prior to decomposition, in order to apply these decomposition methods (such as EMD8) to multi-series data. During this concatenation process, two challenges should be taken into account:
-
The value range diverse among different series, and when this difference is substantial, the inherent characteristics of change will be significantly diminished after concatenation, consequently reducing the decomposed periodicity. This issue can be resolved by normalizing each series separately9.
-
There exists discontinuity on the junction between concatenating series. It is shown in the preliminary results that the decomposition model will generate an anomalous component at the junctions that deviates from regular patterns and disrupts nearby periodic regularities, if the variation at the junction between the concatenate series surpasses the variation within adjacent observations of each respective series.
To address the second challenge, this paper introduces a connector insertion method to generate sub-sequences that align with the original periodic regularities through the decomposition. It incorporates a connector between each pair of series to be concatenated, ensuring the continuity of the multi-series at the junctions and enhancing the consistency of sub-sequences with their original periodic regularity through decomposition. The contributions of the paper can be summarized as follows:
-
It proposes two types of connectors for the multiple series concatenation, the linear interpolation connector shorted as LIP, and the linear interpolation superimposed with random vibrations shorted as LRV.
-
It introduces a framework consisting of four modules: data pre-processing, EMD, prediction via long and short term memory (LSTM10) mechanism, and output. The data pre-processing includes separate normalization and series concatenation. For prediction purposes, it employs three methods: LSTM for all sub-sequences; LSTM with temporal attention mechanism (LSTM-TA) for all sub-sequences; and LSTM-TA for some sub-sequences combined with LSTM for others.
-
It presents experiments on multiple multi-series datasets comparing to direct concatenation with both types of connectors, and found the appropriate scope of the connectors. The linear and vibrating connector suits those series with obvious periodic characteristics, while simple linear interpolation connector is more appropriate in cases where such characteristics are absent.
The rest of the paper is organized as follows: section "Preliminaries" prepares the preliminary definitions and theorems of EMD and backbone of the method; section "Methodology" describes the method with the general framework, the settings of two types of connectors, and the insertion details; section "Experiments" shows the experiment results of prediction via connectors; section "Related works" surveys the time series prediction methods and series decomposition methods; and section "Conclusion" concludes the paper.
Preliminaries
Empirical mode decomposition
Empirical Mode Decomposition (EMD) is a method to process nonlinear and non-stationary sequences, to separate different period features of the sequence. EMD decomposes the sequence into a series of Intrinsic Mode Function (IMF) sub-sequences and a residue. A candidate sub-sequence satisfies the following conditions.
-
1.
For the entire dataset, the number of extremes and the number of zero crossings must either be equal or differ at most by one;
-
2.
At any point, it must be zero for the mean value of the envelopes defined by the local maxima and the local minima.
A classical EMD algorithm consists of 5 steps8:
- Step 1:
-
Given a sequence x(t), it sets the initial residue and number \(r(t) = x(t), k = 1\)
- Step 2:
-
It joins all the maxima as the upper envelope \(e_{\max }(t)\), and respectively all the minima as the lower envelope, \(e_{\min }(t)\), by the cubic spline lines.
- Step 3:
-
It gets the mean of the envelope m(t) and the candidate c(t):
$$\begin{aligned} m(t)=\frac{1}{2}(e_{min}(t)+e_{max}(t)) \end{aligned}$$(1)$$\begin{aligned} c(t)=r(t)-m(t) \end{aligned}$$(2) - Step 4:
-
If the candidate c(t) meets the conditions above, it sets \(\textrm{IMF}_k(t) = c(t)\) as an IMF sub-sequence and recalculates the residue \(r(t)=x(t)-c(t)\), and go to Step 1 with \(k=k+1\) and \(x(t)=r(t)\). Otherwise, it takes the current candidate c(t) as the input, i.e. \(x(t) = c(t)\) and repeat Steps 3\(\sim\)5.
- Step 5:
-
It terminates if the residue r(t) is a constant or has at most one minimum point and one maximum point each. Then the input is decomposed as the following:
$$\begin{aligned} x(t)=\sum _{i=1}^{k}{{\textrm{IMF}}_i(t)}+r(t) \end{aligned}$$(3)Otherwise, increment \(k = k + 1\) and repeat Steps 2\(\sim\)5.
EEMD11 and CEEMDAN12 are two variants of EMD. CEEMDAN optimizes EMD adaptive noise and multiple iterations. It first performs EMD decomposition on the original sequence, and then further decomposes it by adding adaptive noise and multiple iterations to eliminate modal aliasing.
EMD with temporal attention
In the classical long and short-term memory (LSTM) model, a unit is composed of three types of gates: input, forget and output. Such gates implement the selective memory and long-term dependence on the input information.
LSTM aims to address the long-term dependence problem of RNN; however, ‘long short-term memory’ is distinct from long-term memory. When dealing with relatively long sequences or sequences with uncertain lengths, the information storage capacity of the hidden and cell states output by LSTM remains limited. The temporal attention mechanism, also known as the global attention mechanism, is one of the methods used to better capture the patterns within long sequences. Temporal attention is described as the follow 5 steps:
-
Step 1.
It outputs the hidden state h and the cell state c into the LSTM unit for one-step prediction, obtaining the updated hidden state \(h_t\) and cell state \(c_t\).
-
Step 2.
It concatenates \(h_t\) to the tail of X, the output tensor of the LSTM encoder, along the time dimension as the prediction sequence, \(H_0\).
-
Step 3.
Let Q be the last time step of \(H_0\), as well as K and V be the \((T+1)\)th to the 2nd time step from the bottom of \(H_0\), then T is the time dimension of the input to the LSTM encoder. The attention value is defined as the equation below:
$$\begin{aligned} Atten=\textrm{diag}\left( \textrm{softmax}\left( \frac{KQ^\textrm{T}}{\sqrt{T}}\right) ^\textrm{T}\right) V \end{aligned}$$(4)where ‘diag(·)’ is a function to convert the column vector to a diagonal matrix. In this case, it is equivalent to multiplying each dimension of the column vector output by the softmax function with the corresponding row vector in V.
-
Step 4.
It adds the attention values for each time dimension as \(h^*\) in Formula 4.
-
Step 5.
It concatenate Q to the tail of \(h^*\), which becomes the prediction result \(h_{atten}\) to output.
Temporal attention can be utilized for multi-step prediction by iteratively executing the aforementioned procedure in the decoder. After each step, the hidden state h and cell state c are updated to \(h_t\) and \(c_t\), respectively, and the prediction results of each step are concatenated along the time dimension in sequence.
Methodology
Framework
This paper proposes a framework in Fig. 1 for multi-series prediction based on connector and EMD. It employs a pipeline of normalization and decomposition. Firstly, it normalizes the series in the original dataset separately instead of using global normalization. Then, it applies two kinds of connectors to concatenate the short series into long ones. Afterwards, such long series are concatenated in the time dimension and decomposed using CEEMDAN.
Each individual series in the original dataset undergoes separate normalization, followed by concatenation with specific connector. The resulting concatenated series is then decomposed into multiple sub-sequences, comprising a set of Intrinsic Mode Functions (IMF) series (\(\textrm{IMF}_1\), \(\textrm{IMF}_2, \ldots , \textrm{IMF}_n\)) and a residual series (RES). Each sub-sequence serves as input for training the prediction model, and the integration of these output series serves as the final prediction result.
Detailed steps
The separate normalization approach effectively eliminates value range discrepancies between series, resulting in a more uniform distribution of decomposed series. However, it is important to note that there is almost no continuity between the adjacent series to concatenate, i.e. there can be a significant variation at the junction as is shown in Fig. 2 on time spot 400, 800 and 1200. If these discontinuities were ignored during concatenation and subsequent decomposition, it led to sub-sequences with distinct features at the junction points that could potentially impact subsequent analysis.
The images of series \(x_1(t)\) (left), \(x_2(t)\) (middle) and \(x_3(t)\) (right) as well as their first several sub-sequences after decomposition. In each group of images of this and the following figures, the first sub-image represents the series before decomposition, and then each sub-image represents \(\textrm{IMF}_1\), \(\textrm{IMF}_2\), etc.
For instance, there is a multi-series dataset with five series \(s_1(t) \sim s_5(t)\) where k is assigned 400 as a experienced value without loss of generality:
The five series \(s_1(t)\sim s_5(t)\) need not be normalized since the ranges of them are all the range [-1, 1]. Based on these series, we make three synthetic datasets below. We sequentially concatenate \(s_1(t)\sim s_5(t)\) to get \(x_1(t)\):
To simulate the mixing of multiple modes, a short period and low amplitude sine wave component is superimposed into \(x_1(t)\) to get \(x_2(t)\):
Furthermore, to better simulate the situation of the dataset selected in this paper, a random shock within ± 0.1 is superimposed into \(x_1(t)\) to get \(x_3(t)\):
where \(\alpha\) is initialized as 10 as an experienced value; \(\gamma\) is the amplitude of the vibration, initialized as 0.1 as ten percent of the variation of the series; \(\textrm{uniform}(a, b)\) is a function that outputs a random floating-point number in the range (a, b). By adding the random vibration function uniform, it simulates the uncertainty of the variation between adjacent observations , thus \(x_2(t)\) and \(x_3(t)\) can better simulate the numerical variation characteristics of the dataset selected in this paper than \(x_1(t)\).
Use CEEMDAN to decompose the series \(x_1(t)\), \(x_2(t)\) and \(x_3(t)\), and the images of their several sub-sequences are shown in Fig. 2. As for \(x_1(t)\), its \(\textrm{IMF}_1\) and \(\textrm{IMF}_2\) only generate waveforms at and near the junctions of the first three junctions. The amplitude of the sub-sequence is directly proportional to the magnitude of change in the original series. The value of other parts is always zero. At the fourth junction, no waveform appears in the sub-sequence since the values are continuous. \(\textrm{IMF}_3\) shows the characteristics similar to those of \(\textrm{IMF}_1\) and \(\textrm{IMF}_2\), but there is a significant deformation in the second segment. Although \(\textrm{IMF}_4\) approximately captures the overall trend of the original data, there is still a significant disparity in trends near the junction point compared to the original series. Moreover, this deviation becomes more pronounced as the magnitude of changes increases within the original data. As for \(x_2(t)\), besides some characteristics reflected in \(x_1(t)\), it shows the short-term regularity of the original series on \(\textrm{IMF}_1\) and \(\textrm{IMF}_2\). However, the series junction exhibits a longer period and significantly higher amplitude compared to other regions, deviating substantially from the original periodic regularity. Similarly, the sub-sequence of \(x_3(t)\) also demonstrates characteristics akin to those of \(x_1(t)\) and \(x_2(t)\), with a pronounced increase in amplitude at the junction of the original series as well as distinct periodic features that differ from other sections.
Real-world datasets have more complex periodic regularities and variations than the synthetic datasets above (Sec. 4.1 shows the details about these datasets). Taking randomly 5 areas from the Monthly ReTail Sales of the USA (MRTS) dataset and another random 5 stocks in Stock-D as examples, after direct concatenation and decomposition, the images of their several sub-sequences are shown in the left column of Fig. 3. As in the previous experiments on synthetic datasets, in the sample of MRTS, the amplitudes of \(\textrm{IMF}_1\sim \textrm{IMF}_4\) are significantly higher than those of the other parts at and near the four junctions where t = 336, 672, 974 and 1310. In the example of Stock-D, at three junctions where t = 124, 248 and 372, there is no significant difference of the observation changes from nearby, thus the phenomenon in the previous experiments is not obviously reflected. However, at the junction where t = 496, since the observation still have a high upward jump, a higher amplitude appears at the same position of \(\textrm{IMF}_1\) and \(\textrm{IMF}_2\).
The images of the sample of MRTS (the top row) and Stock-D datasets (the bottom row) as well as their first several sub-sequences after decomposition. In each row, the left figure is the series with direct concatenation, the middle figure is the series with linear connector, and the right figure is the series with linear and randomly vibrating connector.
From the aforementioned series decomposition experiments conducted on both synthetic and real-world datasets, it is observed that even after separate normalization of the datasets, direct concatenation of these series prior to decomposition results in significant amplitude spikes at the junctions compared to other regions. Consequently, this leads to deviations from periodic regularities in the decomposed sub-sequences near these junctions. Additionally, an increase in jump magnitude within the original series corresponds to amplified waveform deviations from the original periodic regularity at these junctions.
Connector insertion
To address the issue of waveform deviation in the decomposed series, caused by significant jumps during direct concatenation that disrupt the original periodic regularity and overall trend, we propose an alternative approach known as indirect concatenation of multi-series. This method involves introducing a connector between each segment of the series. We consider the following two types of connector.
Linear Interpolation (LIP) Given the end value of the former series, \(y_{\textrm{end}1}\), and the start value of the latter series, \(y_{\textrm{start}2}\), the LIP connector interpolates a linear series of the length s, as the number of the interpolation values added in between \(y_{\textrm{end}1}\) and \(y_{\textrm{start}2}\). There will be s values inserted, that the ith (i = 1, 2, ..., s) value of the LIP connector from \(y_{\textrm{end}1}\) is computed as in Formula (5).
The LIP connector relieves the ‘jumping’ phase between \(y_{\textrm{end}1}\) and \(y_{\textrm{start}2}\), to reduce the IMF decomposed to fit such phases. The length of connector is worth mentioning too. Experienced tests show that given the length of the short series as L, the length of connectors, s is supposed to work well in the certain range as is discussed in Sec. 4.3.
Linear Interpolation with Random Vibration (LRV) The straight forward simple linear interpolation is still not the optimal when the ‘jumping’ distance \(\delta = |y_{\textrm{end}1} - y_{\textrm{start}2}|\), because the linear sampling values lead to multiple IMF with ‘firm’ frequencies. Based on Equation (5), a vibration function \(\textrm{uniform}(-d,d)\) is introduced into the LRV connectors. It makes each interpolation point vary with a random value as shown in Formula (6).
The LRV connector introduces the vibration in the uniform distribution within the range of \(\pm d\) on the LIP connector. LRV connector brings vibrations smoothly, not too rigid (such as simple linear) to disturb the uncertainty of the original series, and thus keep the short-term regularities of the whole series as consistent as possible.
Apart from the differences at the junctions introduced by the connectors, there is no disparities in the periodic characteristics of each sub-sequence within the regions representing the original data.
A comprehensive evaluation of the advantages and disadvantages of both connectors will be verified via further experiments on diverse datasets in Sec. 4.
Experiments
This section describes the datasets used in the experiment first, then it validates the usage of two types of connectors, further more, it discusses the length of connectors in relationship with the connection effects, and the usage of normalization in the connection.
Experiments are performed on 5 real-world datasets: Monthly ReTail Sales of the USA (MRTS), three selected and organized stock price datasets Stock-D/Stock-W/Stock-M, and the Socioeconomic Status Score (SES). It evaluates the performance of the connector-based approach using four metrics, the Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Mean Absolute Percentage Error (MAPE), and R-square (\(R^2\)) metrics. Each dataset is divided into the training and test subsets according to a certain proportion with series as the unit.
After normalization, each dataset is processed by three methods: direct concatenation, connect with LIP connector, and connect with LRV connector. The three concatenated series for each dataset are separately decomposed and trained. For each dataset, the length of connector, the number of sub-sequences, and the model selection for each sub-sequence in ‘Temporal Attention Integration’ are shown in the following subsection.
Dataset description
The datasets used and or analyzed during the current study available from the corresponding author on reasonable request.
Monthly ReTail Sales of the USA (MRTS): It’s a database of sales from https://www.kaggle.com/datasets/landlord/usa-monthly-retail-trade (March 24, 2023). It contains the data of monthly sales in various fields of the US retail industry from January 1992 to May 2020. We selected the original statistical data (excluding revised data) stored in excel format. Eliminating some total items and combining with some situations in the experiment, we select the data from 28 fields.
Stock-D/Stock-W/Stock-M: It’s a database of stock price from https://www.kaggle.com/competitions/jpx-tokyo-stock-exchange-prediction (June 26, 2023). Three stock price datasets selected and organized from the competition dataset of the Tokyo Stock Exchange Stock Price Prediction. Stocks are screened based on average daily volume. Stock-D is the closing price of 50 companies’ stocks from July 1 to December 30, 2019. Stock-W is the weekly price of 100 companies’ stocks in 51 weeks of 2019 (the stock market is closed for the whole first week of May). Stock-M is the monthly price of 120 companies’ stocks during 2017\(\sim\)2019.
Socioeconomic Status Score (SES): It’s a database of macroeconomic indicators from https://www.kaggle.com/datasets/sdorius/globses (November 11, 2022). It contains socioeconomic status percentage scores for 149 countries every 10 years between 1880 and 2010.
The relevant characteristics of the above datasets are shown in Table 1. The prediction length is 1 and the number of neural network units is 32 for all datasets.
Connector validity
The average results of multiple tests (at least 5 times) are presented in Table 2. The best results are highlighted in bold, and second-best results are underlined. For each prediction model, on the MRTS dataset, incorporating the LRV connector performs superior prediction results compared to direct concatenation. However, adding only the LIP connector does not exhibit as good prediction performance as direct concatenation. On the other four datasets, under identical conditions, incorporating a LIP connector demonstrates better prediction performance than both direct concatenation and incorporating a LRV connector.
The efficacy of various concatenation techniques for multiple time series differs significantly across datasets. This variability stems from the nature of the MRTS dataset, which captures monthly sales figures. The sales volumes of certain items, particularly seasonal products such as agricultural goods, sideline products, and clothing, are subject to annual fluctuations. Furthermore, specific events, including store anniversaries and annual shopping events like ‘year-end specials,’ can also lead to variations in yearly sales patterns.
When comparing the use of a LRV connector to the use of a LIP connector alone, the latter tends to diverge more from the periodic changes observed in the data, which can significantly affect the identification of corresponding periodic patterns. On the other hand, the other four multi-series datasets, which include stock price variations, do not exhibit clear periodic regularities.
Additionally, datasets with sparse time series, such as SES, demonstrate a wide range of variation characteristics. Even when these series are concatenated, they do not reveal any discernible periodic regularity. Therefore, for these types of datasets, the LIP connector, without the random vibrations, can still provide a rough alignment with the variation characteristics of each individual series.
As a stage conclusion, the choice of concatenation method for multi-series datasets should be tailored to their inherent characteristics. It is advisable to insert incorporates both LRV connectors for datasets that exhibit clear periodic patterns. This approach can effectively capture and preserve the periodic nature of the data. Conversely, a simpler method that adds a LIP connector is more appropriate for datasets that do not display pronounced periodic traits, without the introduction of random vibrations. This alternative is better suited to datasets where the preservation of periodicity is not a primary concern, allowing for a more straightforward analysis of the underlying data trends.
Connector length concerns
The length of the connector also speaks. If the connector is too short, there will be a large variation left between adjacent points, which will produce high amplitude at the corresponding position of the sub-sequences. If the series is too long, it will obviously increase the overhead of series decomposition. Accordingly, this experiment focuses on the length of the connector. The following experiments are only conducted for the sample of MRTS dataset.
Set s = 10, 30, 50, 80 and 100 to concatenate the sample dataset by the two proposed concatenating method in Subsection 3.3, then decompose the series. The images of their several sub-sequences are shown in Figs. 4 and 5.
In Fig. 4, considering the region of the original data only, there is no obvious difference among \(\textrm{IMF}_1\) and \(\textrm{IMF}_2\) in each group of images. There is no obvious difference among \(\textrm{IMF}_3\) in each group, but when \(s=10\), a large amplitude appears near the junctions where t = 336, 984 and 1340 compared with other groups. It can be seen from \(\textrm{IMF}_4\) in each group that along with the increase of the connection series length, the amplitude difference of the four junctions from other areas (except the second half of the third segment of series, whose regularity is decomposed into \(\textrm{IMF}_4\) instead of \(\textrm{IMF}_5\)) is gradually decreasing, and when \(s = 80\), this difference has been basically eliminated.
Figure 5 also represents some of the features in Fig. 4. In \(\textrm{IMF}_4\), along with the increase of the connection series length, the amplitude difference of the four junctions from other areas is also gradually decreasing, and when \(s = 50\), this difference has been basically eliminated.
The correlation of the length of connector, the number of sub-sequences and the model selection in ‘Temporal Attention Integration’ in Table 3.
Normalization effects
The experiments show that it is necessary to normalize the short series. There are two types of normalization commonly employed, min-max scaler and Z-normalization, which involves scaling based on the mean and standard deviation. The former, min-max scaler, is employed our paper given its prevalence in existing research.
There are two ways of normalization shown below, global and separate normalizations.
Global normalization Each sequence segment is concatenated along the time dimension and normalized as a whole. For example, there are two sequences:
After the global min-max scaler, the two sequences become:
However, in multiple sequences, there may possibly be large diversity in the range of values covered by the individual sequence segments (e.g., the price of one stock is tens to hundreds of dollars, while the price of another is only a few dollars). If they are normalized globally, this diversity will not be eliminated, which will still increase the difficulty and reduce the efficiency of neural network training, thus affecting the accuracy of prediction. To eliminate this diversity, there is the following method, named separate normalization.
Separate normalization Each sequence segment is normalized separately and then concatenated along the time dimension for further research. For the sequences \(\{a_n\}\) and \(\{b_n\}\) above, after the separate min-max scaler, they will be as follows:
The approach described effectively reduces the variability within each sequence segment. However, to ensure precise denormalization prior to output, it is essential to document the length of each segment as they are concatenated directly. Additionally, it is crucial to identify the individual maximum and minimum values of each segment to facilitate the subsequent denormalization process. In this paper, when the series are joined with connectors, it is imperative to record not only the maximum and minimum values but also the starting and ending positions of each segment to ensure accurate reconstruction and analysis.
For the multi-sequence datasets MRTS and SES, the two ways of normalization, ‘global’ and ‘separate’, are tested before EMD. Firstly, before decomposition, the sequences after the two types of normalization are input to LSTM, respectively, to train and predict. The results are collected in Table 4. The best results are highlighted in bold.
In Table 4, all the results are better with separately normalized MRTS than that of global normalization, while the same for SES data, except on MAPE. The reason why the MAPE of separately normalized SES is higher than the globally normalized one is that the prediction error is larger for some items with smaller values.
Furthermore, for MRTS, after the sequences’ two types of normalization, the partially decomposed sequences are shown in Fig. 6. The specific selected sub-sequences are labeled on the left of the image. The comparison shows that since the lower observations in the original data account for a large part, the peaks in the obtained short and medium period sub-sequences are obviously biased to the higher part of the original data after its global normalization and decomposition. However, for the sequence normalized separately, the peak distribution after resolving is more even.
The two sets of subsequences obtained above are input to LSTM, respectively, and the \(R^2\) index comparison between them is shown in Table 5. After the decomposition of the global-normalized sequences, serious overfitting appeared on two short-term regularity sequences, \(\textrm{IMF}_1\) and \(\textrm{IMF}_2\), which makes the prediction of them significantly deviate from the actual results. On the contrary, the network fits much better for the sequences normalized separately.
Related works
Time series prediction
There exist systematic researches on time series mining. In these studies, the experiment datasets vary in both the size and the length of each series. However, it is seldom ever emphasized to handle series with diverse sizes in a uniform manner, thereby highlighting the significance of series concatenation.
Deep neural network has been widely used in time series prediction13,14,15,16) due to its complex nonlinear characteristics. Ren et al. proposed an anomaly detection algorithm based on spectral residual and Convolution Neural Network (CNN) in13, proving its universality and effectiveness. Chen et al. proposed in14 a Time-Aware Multi-Scale Recurrent Neural Networks (TAMS-RNNs), which can adaptively capture the multi-scale information of each time series at each time step. Cirstea et al. proposed a Distinct Filter Generation Network (DFGN) in15 to capture different temporal dynamics of different entities, and Dynamic Adjacency Matrix Generation Network (DAMGN) to generate dynamic graphs. Jin et al. proposed the Domain Adaptation Forecaster (DAF)16, which applies ___domain adaptation techniques via attention sharing to solve the data scarcity issue.
The research of time series prediction also explore the feature capture15,17 and model optimization18,19,20. Ding et al. introduced extreme loss in17 to detect possible extreme events. Crabbe et al. proposed dynamic masks in21 to select the feature parsimoniously and legibly from the large number of inputs. Zaffran et al. applied Adaptive Conformal Inference (ACI) to general time series18, proposing an adaptive method, AgACI, which reduces parameter dependencies by online expert aggregation. Hasson et al. discussed the stack generalization in ensemble learning and applied it to time series prediction in19. Woo et al. proposed a time-index model in20 to automatically learn a function form from the time series.
Series decomposition
Studies are explored to decompose periodic regularities and general trends from nonlinear and non-stationary series, such as STL (Seasonal and Trend decomposition using Loess)22, discrete wavelet transform23, EMD8, VMD (Variational Mode Decomposition)24, SSA (Singular Spectrum Analysis)25 and STR (Seasonal-Trend Decomposition based on Regression)26. There are two outstanding improvements to EMD: EEMD (Ensemble Empirical mode Decomposition)11 and CEEMDAN (Complete Ensemble Empirical Mode Decomposition with Adaptive Noise)27.
Although the first model dates back to 1990’s in10, Long and Short-Term Memory was popular in time series decomposition and prediction recently. Wang et al. proposed a multilevel wavelet decomposition network28 to build frequency-aware deep learning models for time series analysis, and proposed multi-frequency long short-term memory (mLSTM) for time series prediction. Tran et al. employed a seasonal-adjustment method that decomposes each time series into seasonal, trend and irregular components, and built prediction models for each component individually29. Wen et al. proposed a seasonal trend decomposition method6, RobustSTL, which extract the trend component by the least absolute deviations (LAD) loss with sparse regularization, and the seasonality component by the non-local seasonal filtering. On this basis, they proposed generalized ADMM (alternating direction method of multipliers) to speed up the computation30. Yang et al. proposed a model hybridized by EMD, stacked auto-encoders and extreme learning machines31. Dudeck et al. proposed a seasonal-trend-dispersion decomposition (STD) to extract the trend, seasonal component and component related to the dispersion of the time series5.
In the aforementioned studies, the series is decomposed usually one at a time. But it is hard to train the model simultaneously as (1) the number of decomposed series varies, (2) multiple series exhibit distinct periodic regularities or even lack obvious periodic patterns altogether, (3) the decomposition components may possess dissimilar periodic characteristics across different series. There is hardly any research exploring concatenation of multiple series followed by mode decomposition.
Conclusion
This paper investigates the pre-processing methods for multi-series based on series decomposition, aiming to get sub-sequences that align better with the periodic characteristics and overall trend of the original series. Employing series decomposition on multi-series data, the paper examines the issue of trend deviation caused by high difference of sub-sequence amplitude in direct concatenation of multi-series. To address this problem, it proposes a pre-processing method to incorporating connector between each pair of concatenating series. These connectors include linear and random vibrating types. The decomposed sub-sequences are then leveraged for training and predictive modeling. A comparative analysis of the decomposition and prediction outcomes between multi-series processed through direct concatenation and those employing the two different connector methods reveals that these methods effectively mitigate the issues related to direct concatenation. Furthermore, it is observed that the combination of linear and random vibrating connectors is well-suited for datasets with periodic features, whereas a simple linear connector is more fitting for datasets that do not exhibit clear periodic patterns.
This methodology significantly improves the accuracy of predictions. Moreover, the paper meticulously evaluates the model using representative datasets from a variety of fields, demonstrating the model’s applicability to a broad spectrum of common multi-series datasets of varying scales. The experimental results analysis suggests that while the method of inserting additional connectors mitigates the discontinuity at the junctions, it is not entirely comprehensive or optimal. There is no theoretical substantiation for the existence of an ideal connector. Although it may be feasible to develop a connector that better corresponds with the periodic characteristics identified in the training set, there is a lack of systematic criteria for evaluating the discovery of such characteristics. These challenges highlight the need for further research in this area.
References
Zhang, L., Aggarwal, C., & Qi, G.-J. Stock price prediction via discovering multi-frequency trading patterns. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’17, 2141–2149 (ACM, Halifax NS, Canada, 2017). https://doi.org/10.1145/3097983.3098117.
Nguyen, T., Brandstetter, J., Kapoor, A., Gupta, J. K., Grover, A. Climax: A foundation model for weather and climate. In Proceedings of the 40th International Conference on Machine Learning, Vol. 202, 25904–25938 (PMLR, Honolulu, Hawaii, USA, 2023). https://proceedings.mlr.press/v202/nguyen23a.html.
Moradi Kordmahalleh, M., Gorji Sefidmazgi, M., & Homaifar, A. A sparse recurrent neural network for trajectory prediction of atlantic hurricanes. In Proceedings of the Genetic and Evolutionary Computation Conference 2016, 957–964 (2016). https://doi.org/10.1145/2908812.2908834.
Lan, S., Ma, Y., Huang, W., Wang, W., Yang, H., & Li, P. Dstagnn: Dynamic spatial-temporal aware graph neural network for traffic flow forecasting. In Proceedings of the 39th International Conference on Machine Learning, Vol. 162, 11906–11917 (PMLR, 2022). https://proceedings.mlr.press/v162/lan22a.html.
Dudek, G. Std: A seasonal-trend-dispersion decomposition of time series. IEEE Trans. Knowl. Data Eng. 35(10), 10339–10350. https://doi.org/10.1109/tkde.2023.3268125 (2023).
Wen, Q., Gao, J., Song, X., Sun, L., Xu, H., & Zhu, S. Robuststl: A robust seasonal-trend decomposition algorithm for long time series. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 5409–5416 (2019). https://doi.org/10.1609/aaai.v33i01.33015409.
Wen, Q., Zhang, Z., Li, Y., & Sun, L. Fast robuststl: Efficient and robust seasonal-trend decomposition for time series with complex patterns. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD ’20, 2203–2213 (Association for Computing Machinery, New York, NY, USA, 2020). https://doi.org/10.1145/3394486.3403271.
Huang, N. E., Shen, Z. & Long, S. R. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Lond. Ser. A: Math. Phys. Eng. Sci. 454, 903–995 (1998).
Wang, X., Dong, S. & Zhang, R. An integrated time series prediction model based on empirical mode decomposition and two attention mechanisms. Information 14, 610 (2023).
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997).
Wu, Z. & Huang, N. E. Ensemble empirical mode decomposition: A noise-assisted data analysis method. Adv. Adapt. Data Anal. 1(1), 1–41 (2009).
Yeh, J.-R., Shieh, J.-S. & Huang, N. E. Complementary ensemble empirical mode decomposition: A novel noise enhanced data analysis method. Adv. Adapt. Data Anal. 02(02), 135–156. https://doi.org/10.1142/S1793536910000422 (2010).
Ren, H., Xu, B., Wang, Y., Yi, C., Huang, C., Kou, X., Xing, T., Yang, M., Tong, J., & Zhang, Q. Time-series anomaly detection service at microsoft. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 3009–3017 (2019). https://doi.org/10.1145/3292500.3330680.
Chen, Z., Ma, Q., & Lin, Z. Time-aware multi-scale rnns for time series modeling. In Proceedings of the 30th International Joint Conference on Artificial Intelligence, IJCAI-21, 2285–2291. (International Joint Conferences on Artificial Intelligence Organization, 2021). https://doi.org/10.24963/ijcai.2021/315.
Cirstea, R.-G., Kieu, T., Guo, C., Yang, B., Pan, S. J. Enhancenet: Plugin neural networks for enhancing correlated time series forecasting. In 2021 IEEE 37th International Conference on Data Engineering (ICDE), 1739–1750 (IEEE, Chania, Greece, 2021). https://doi.org/10.1109/icde51399.2021.00153.
Jin, X., Park, Y., Maddix, D., Wang, H., Wang, Y. Domain adaptation for time series forecasting via attention sharing. In Proceedings of the 39th International Conference on Machine Learning, Vol. 162, 10280–10297 (PMLR, ore, Maryland, USA, 2022). https://proceedings.mlr.press/v162/jin22d.html.
Ding, D., Zhang, M., Pan, X., Liu, R., Sun, W. Modeling extreme events in time series prediction. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’19), 1114–1122 (ACM, Anchorage, Alaska, USA, 2019).
Zaffran, M., Feron, O., Goude, Y., Josse, J., & Dieuleveut, A. Adaptive conformal predictions for time series. In Proceedings of the 39th International Conference on Machine Learning, Vol. 162, 25834–25866 (PMLR, Baltimore, Maryland, USA, 2022). https://proceedings.mlr.press/v162/zaffran22a.html.
Hasson, H., Maddix, D.C., Wang, B., Gupta, G., & Park, Y. Theoretical guarantees of learning ensembling strategies with applications to time series forecasting. In Proceedings of the 40th International Conference on Machine Learning, Vol. 202, 12616–12632 (PMLR, Honolulu, Hawaii, USA, 2023). https://proceedings.mlr.press/v202/hasson23a.html.
Woo, G., Liu, C., Sahoo, D., Kumar, A., Hoi, S.: Learning deep time-index models for time series forecasting. In: Proceedings of the 40th International Conference on Machine Learning, Vol. 202, 37217–37237 (PMLR, Honolulu, Hawaii, USA, 2023). https://proceedings.mlr.press/v202/woo23b.html.
Crabbé, J., & Schaar, M. V. D. Explaining time series predictions with dynamic masks. In Proceedings of the 38th International Conference on Machine Learning, Vol. 139, 2166–2177 (PMLR, Virtual Only, 2021). https://proceedings.mlr.press/v139/crabbe21a.html.
Cleveland, R. B., Cleveland, W. S., McRae, J. E. & Terpenning, I. J. Stl: A seasonal-trend decomposition procedure based on loess. J. Off. Stat. 6(1), 3–33 (1990).
Mallat, S. G. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 11(7), 674–693 (1989).
Dragomiretskiy, K. & Zosso, D. Variational mode decomposition. IEEE Trans. Signal Process. 62(3), 531–544 (2014).
Golyandina, N., Nekrutkin, V. & Zhigljavsky, A. A. Analysis of Time Series Structure: SSA and Related Techniques (Chapman and Hall/CRC, Boca Raton, 2001).
Dokumento, A. & Hyndman, R. J. Str: Seasonal-trend decomposition using regression. Informs J. Data Sci. 1(1), 50–62 (2021).
Torres, M. E., Coloinas, M. A., Schlotthauer, G., & Flandrin, P. A complete ensemble empirical mode decomposition with adaptive noise. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4144–4147 (IEEE, Prague, Czech Republic, 2011).
Wang, J., Wang, Z., Li, J., Wu, J.: Multilevel wavelet decomposition network for interpretable time series analysis. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2437–2446 (Association for Computing Machinery, London, UK, 2018). https://doi.org/10.1145/3219819.3220060.
Tran, T. Q., & Sakuma, J. Seasonal-adjustment based feature selection method for predicting epidemic with large-scale search engine logs. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2857–2866 (2019). https://doi.org/10.1145/3292500.3330766.
Wen, Q., Zhang, Z., Li, Y., & Sun, L. Fast robuststl: Efficient and robust seasonal-trend decomposition for time series with complex patterns. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2203–2213. (Association for Computing Machinery, Virtual Only, 2020). https://doi.org/10.1145/3394486.3403271.
Yang, H.-F. & Chen, Y.-P.P. Representation learning with extreme learning machines and empirical mode decomposition for wind speed forecasting methods. Artif. Intell. 277, 1. https://doi.org/10.1016/j.artint.2019.103176 (2019).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Gao, W., Li, C., Dong, S. et al. Connector based short time series prediction. Sci Rep 15, 7082 (2025). https://doi.org/10.1038/s41598-024-83122-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-83122-y