Convolutional Neural Networks for Modeling and Forecasting Nonlinear Nonstationary Processes

The object of research: The object of research is modeling and forecasting nonlinear nonstationary processes presented in the form of time-series data. Investigated problem: There are several popular approaches to solving the problems of adequate model constructing and forecasting nonlinear nonstationary processes, such as autoregressive models and recurrent neural networks. However, each of them has its advantages and drawbacks. Autoregressive models cannot deal with the nonlinear or combined influence of previous states or external factors. Recurrent neural networks are computationally expensive and cannot work with sequences of high length or frequency. The main scientific result: The model for forecasting nonlinear nonstationary processes presented in the form of the time series data was built using convolutional neural networks. The current study shows results in which convolutional networks are superior to recurrent ones in terms of both accuracy and complexity. It was possible to build a more accurate model with a much fewer number of parameters. It indicates that one-dimensional convolutional neural networks can be a quite reasonable choice for solving time series forecasting problems. The area of practical use of the research results: Forecasting dynamics of processes in economy, finances, ecology, healthcare, technical systems and other areas exhibiting the types of nonlinear nonstationary processes. Innovative technological product: Methodology of using convolutional neural networks for modeling and forecasting nonlinear nonstationary processes presented in the form of time-series data. Scope of the innovative technological product: Nonlinear nonstationary processes presented in the form of time-series data.


Introduction The object of research
The object of research is modeling and forecasting nonlinear nonstationary processes (NNP) presented in the form of time series data, which can describe the dynamics of processes in economy, finances, ecology, healthcare, technical systems and other areas exhibiting the types of processes mentioned above.

2. Problem description
Forecasting based on models built on experimental (statistical) data is one of the most popular approaches to forecast the dynamics of such processes and has numerous applications in energy, network systems, trade, investment activities. It can be used for evaluating alternative economic strategies, forming budgets of enterprises, forecasting and managing the risks of arbitrary nature and solving other problems [1].
The problem of forecasting processes in technical systems is deeply analyzed using classical autoregressive approaches, which are quite simple to implement. This is a popular family of math-

A B S T R A C T
The object of research. The object of research is modeling and forecasting nonlinear nonstationary processes presented in the form of time-series data. Investigated problem. There are several popular approaches to solving the problems of adequate model constructing and forecasting nonlinear nonstationary processes, such as autoregressive models and recurrent neural networks. However, each of them has its advantages and drawbacks. Autoregressive models cannot deal with the nonlinear or combined influence of previous states or external factors. Recurrent neural networks are computationally expensive and cannot work with sequences of high length or frequency. The main scientific result. The model for forecasting nonlinear nonstationary processes presented in the form of the time series data was built using convolutional neural networks. The current study shows results in which convolutional networks are superior to recurrent ones in terms of both accuracy and complexity. It was possible to build a more accurate model with a much fewer number of parameters. It indicates that one-dimensional convolutional neural networks can be a quite reasonable choice for solving time series forecasting problems. The area of practical use of the research results. Forecasting dynamics of processes in economy, finances, ecology, healthcare, technical systems and other areas exhibiting the types of nonlinear nonstationary processes. Innovative technological product. Methodology of using convolutional neural networks for modeling and forecasting nonlinear nonstationary processes presented in the form of time-series data. Scope of the innovative technological product. Nonlinear nonstationary processes presented in the form of time-series data. ematical models based on linear self-dependence within time series (autocorrelation), which is able to explain future fluctuations [2,3]. However, this approach is limited by the difficulty of taking into account a large number of external factors, due to the problem of multicollinearity; in addition, it also can't model nonlinear interactions [4].
Therefore, it is proposed to consider the possibility of using neural networks, as they may take into account the nonlinear interactions or combined influence of external factors. The first thing that can be applied to any sequence analysis with neural networks is recurrent neural networks. They are created specifically for sequences with the ability to maintain a hidden state and learn time dependencies [5]. But, as recent research has shown, there is little use of these benefits in practice. Applying this approach requires a lot of computational costs, so this approach cannot be applied to very long sequences, which is a problem for solving modern problems using large amounts of data [4].

3. Suggested solution to the problem
There is a need to develop a new approach that would allow efficient computational modeling of large sequences, taking into account the nonlinear or combined effects of external factors. To solve this problem, it is proposed to consider convolutional neural networks (CNN) [6,7]. CNN is suitable for creating computer vision because it is able to capture the finest details (local patterns) in images or even 3D volumetric data. In addition, there are already a large number of modern architectures for convolutional neural networks, such as ResNet or DenseNet [8]. So it is possible to try to apply them to even simpler 1D data, in which it is possible to replace 2D convolutions with 1D. They are highly efficient, fast, can be optimized, and work well for both classification and regression analysis.
The aim of the research is to develop a mathematical model based on convolutional neural networks for forecasting nonlinear nonstationary processes.

Materials and methods 1. Neural networks of the LSTM type
One of the effective approaches for modeling and forecasting is the technology of artificial neural networks. For working with sequences (time series, signals, etc.), it is common to use recurrent neural networks.
Networks with long short-term memory -are usually simply called "LSTM" -a special kind of RNN, capable of learning long-term dependencies. They were proposed by [11] and were refined and popularized by many people in further work [12]. They provide the opportunity to get high quality results on a wide variety of problems and are currently widely used to simulate nonlinear processes.
LSTM have a chain structure, like the classic RNN, but the repeating module has a different structure. Instead of having a single neural network layer, there are four of them, and they interact in a special way ( Fig. 1) [13]. LSTM has the ability to remove or add information to the cell state, but this ability is carefully regulated by structures called gates. The first step in the LSTM is to decide what information let's intend to throw out of the cell state. This decision is taken by a sigmoid layer (named after used activation function), also called the "forget gate layer", which can be written as: where h t-1 -previous hidden state; x t -new input data; W f , b f -weight matrix and bias for this layer respectively. The next step is to decide what new information are going to keep in a cell state. This step consists of two parts. Firstly, the another sigmoid layer, called the "input gate layer", decides what values let's update. Next, the layer of the hyperbolic tangent creates a vector of candidates for new values C(t), which can be added to the state. In the next step, let's connect these two parts to create an update for the state. This step can be written as: where h t-1 -previous hidden state; x t -new input data; W i,c , b i,c -weight matrices and biases for this layer respectively. Next, it is necessary to update the old cell state, C(t-1), with the new cell state C(t): Finally, it is necessary to decide what result are going to give way to the exit. This result will be based on the output result o t and the cell state C t , but it will be a filtered version, using hyperbolic tangent activation function, to scale the values between -1 and 1: However, the construction of such networks is associated with great computational difficulties. In addition, it is confronted with numerous problems [14] that will not allow them to work with too long sequences (for example, when processing a high-frequency sampling rate stream, for example, 500-100 Hz).

Convolutional neural networks (CNN)
Convolutional neural networks (CNN) were introduced by LeCun in [6]. The network architecture got its name from the presence of a convolution operation, the essence of which is that each fragment of the image is multiplied by core convolution matrix element by element, and the result is summed and recorded in a similar position of the original image. In network architecture laid a priori knowledge of the subject area computer vision: pixel image strongly related to the neighboring (local correlation) and the object in the image can be found in any part of the image.
The basic idea of CNNs is that an image fed to the input of the neural network directly. And the network automatically learns and determines the hierarchy of necessary features. So it is possible to get a network that is more accurate than a network built on traditional approaches without difficult feature engineering [15]. Neural networks with convolution solve two problems at once. First, study nonlocal dependencies -they are able to find certain relationships and certain templates are not only linked to their local value. Second, here let's study an incredible reduction in the number of parameters. CNN uses relatively little pre-processing and additional feature engineering and that makes them very efficient.
This idea was used to recognize symbols and numbers in [16]. But after a single successful use of convolutions neural networks they have not gained popularity. And only in [17] have revived interest in convolutional neurons networks after showing impressively high classification accuracy images in the ImageNet Large Scale Visual Recognition Challenge. In this competition, neural networks were applied to a data set that had more than a million images from the Internet, which contained more than 1,000 different classes. This success revolutionized computer vision application of deep networks in various directions and in many different computer vision applications [18].
Let's examine how CNNs applied to the image data so successful and then think how it is possible to transfer this into time series data.
The input data is defined as a tensor of 3 rd or 2 nd rank, depending on the number of channels present in the data. Channels mean the number of variables which describes the color of the pixel. That is, in the case of RGB encoding, let's have 3 channels. However, to reduce the amount of data, quite often the input data is translated into 1 black and white channel. Resulting value of the convolutional operation for each pixel of the filtered image is calculated based on the square area around it using the convolution core (kernel). As it is possible to see from the Fig. 2, and the formula above, in convolutional operation there is elementwise matrix multiplication, sliding kernel matrix across the input matrix. Kernel matrix is the matrix with model parameters, which have to be learned. It initialized at random and then changed during training as a result of optimization procedure such as stochastic gradient descent.
It is also worth noting that usually when building a convolutional layer at the same time several filters are used, as a result of which let's get new ones at the output images, commonly referred to as feature maps. That is, in total the features can change as shown in Fig. 3.

Fig. 3. Stacked convolutional layers
It is possible to write this as: So, it is possible to see how convolutional neural networks can be applied on computer vision problems and perform particularly well, due to their ability to operate convolutionally, extracting features from local input patches and allowing for representation modularity and data efficiency. The same properties that make CNNs excel at computer vision also make them highly relevant to sequence processing.
It is possible to try to use CNNs to the problem of forecasting time series data in which it is possible to replace 2D convolutions with 1D. Therefore, it is possible to try to use all of the advantages described above and achieve great performance, nonlinear and multivariable interactions with reasonable speed and complexity. Time can be treated as a spatial dimension, like the height or width of a 2D image. Such 1D convnets can be competitive with RNNs on certain sequence-processing problems, usually at a considerably cheaper computational cost. Small 1D CNNs can offer a fast alternative to RNNs for tasks such as time series forecasting. The convolution layers introduced previously were 2D convolutions, extracting 2D patches from image tensors and applying an identical transformation to every patch. In the same way, it is possible to use 1D convolutions, extracting local 1D patches (subsequences) from sequences (Fig. 4). Such 1D convolution layers can recognize local patterns in a sequence. Because the same input transformation is performed on every patch, a pattern learned at a certain position in can later be recognized at a different position, making 1D CNNs translation invariant (for temporal translations). For instance, a 1D CNN processing time series using convolution windows of size 5 should be able to learn pattern fragments of length 5 or less, and it should be able to recognize them in any other context in an input series.

3. Experimental procedures
For practical examples, let's use a sample of real historical sales data for 1 of the 45 Walmart stores located in different regions [9]. The problem is forecasting weekly sales in various stores and departments for retail trade. The sample size is 138 weeks. This paper uses the last 28 weeks to test and evaluate the quality of forecasts.
Data loaders for the neural network training created using sliding window approach with 1 week step size, 1st week forecast horizon, and 4 weeks lookback period (Fig. 5). Such approach is very popular to prepare data in form that is used for neural networks training and described in many works [19]. Data was normalized before training.

Fig. 5.
Example of using the sliding window approach for building the data loaders for neural network training The methodology proposed in [10] is used as a general approach. Naïve forecast with previous observation were used as a benchmark baseline. For comparison let's train RNN and CNN models. For RNN let's use 1 RNN layer with hidden size of 32 and linear layer to compute the output prediction (Fig. 6).

Fig. 6. Recurrent neural network architecture
For CNN let's use 1D convolution with 1 channel, kernel size 2 and stride 1. For computing output prediction linear layer was used (Fig. 7).

Results
For evaluation of the results let's use MAE as our metric here. Comparison in terms of accuracy and complexity of the models (number of parameters) is given in the following Table 1. From this results it is possible to see, that, first of all, our models are better than naive baseline, so results are at least meaningful. Secondly, it is possible to see that convolutional networks are superior to recurrent ones in terms of both accuracy and complexity. It was able to build more accurate model with a much less number of parameters. It clearly indicates that 1D CNNs can be quite reasonable choice for solving time series forecasting problems.

Discussion
The task of forecasting processes in technical systems is very deeply analyzed in literature using classical regression approaches, which are quite simply used both from a theoretical and a computational point of view [2,3,20,21]. However, this approach has drawbacks, because it cannot take into account a large number of external factors, due to the problem of multicollinearity, moreover, if they show nonlinear interaction [4].
Another popular approach is to consider neural networks of the LSTM type [5], which also solves the problem of modeling sequences, in addition to taking into account the nonlinear or combined effects of external factors. This is the most common and accepted approach for the task of modeling and forecasting nonlinear nonstationary processes using time series data [7]. There are a lot of modern studies using this approach to solve the problem mentioned above [22,23]. But they are focusing only on recurrent neural network architectures, such as LSTM.
However, the application of this approach requires large computational costs, and this approach cannot be applied to very long sequences, what creates the problem for modern studies with the use of big data [4].
In this study it was able to develop a mathematical model for forecasting nonlinear nonstationary processes, based on novel architecture for this kind of data -convolutional neural networks, so it is possible to take into account nonlinear influence of the previous observations or the external factors without creating very complicated (in terms of number of parameters and computational architecture) model. This particular dataset was studied in [24,25]. In both works authors used classical machine learning approaches. The results of their research indicate that the Random Forest is the best algorithm which has scored the minimum amount in MAE evaluation of 1979.4. However, in our work, using novel CNN approach let's achieve MAE of 1618.9, which is much better result.
Limitation of the current approach is that kernel size is limited and it is not possible to handle the very start and the very end of the sequence in one kernel.
However in the further research it is possible to explore deeper or more complicated architectures of the convolutional neural networks such as ResNet and try to translate it to the time series domain; or it is possible to try combination of the CNNs and LSTMs or combinations with classical regression approaches to include more diverse information into the model.

Conclusions
In this study a comparative analysis of recurrent and convolutional neural networks for modeling and prediction of nonlinear nonstationary processes was performed both in theory and on practical examples based on real world data.
The model for forecasting nonlinear nonstationary processes presented in the form of the time series data was built using convolutional neural networks. Current study shows results in which convolutional networks are superior to recurrent ones in terms of both accuracy and complexity. It was able to build more accurate model with MAE of 1618.9, while in RNN approach MAE was 1769.4. Also CNN model has a much less number of parameters -CNN model has only 7 parameters, while RNN 1153, which is much harder from computational point of view and has tendency to overfitting. It clearly indicates that 1D CNNs can be quite reasonable choice for solving time series forecasting problems.
Because RNNs are extremely expensive for processing very long sequences, but 1D CNNs are cheap, it can be a good idea to use a 1D CNNs as a preprocessing step before an RNN, shortening the sequence and extracting useful representations for the RNN to perform further processing. For the future research the idea of combining convolutional and recurrent networks in one model or combining neural networks with classical approaches for better detection of time dependences also looks quite promising.
In conclusion, one-dimensional convolutional networks or a combination of convolutional networks with recurrent networks or regression models can solve the problem of modeling nonlinear nonstationary processes presented in the form of long sequences in which there is a nonlinear or combined influence of external factors. Thus, this approach can be a powerful tool for creating adequate models and acceptable quality forecasts of selected processes.