Resource Economy

Forecasting Gas Consumption Based on a Residual Auto-Regression Model and Kalman Filtering Algorithm

  • ZHU Meifeng , * ,
  • WU Qinglong ,
  • WANG Yongqin
Expand
  • School of Economics and Management, North University of China, Taiyuan 030051, China
* ZHU Meifeng, E-mail:

Received date: 2018-09-19

  Accepted date: 2019-06-05

  Online published: 2019-10-11

Supported by

Foundation: Soft Science Research Project in Shanxi Province of China(2017041030-5)

Science Fund Projects in North University of China (XJJ2016037).()

Copyright

Copyright reserved © 2019

Abstract

Consumption of clean energy has been increasing in China. Forecasting gas consumption is important to adjusting the energy consumption structure in the future. Based on historical data of gas consumption from 1980 to 2017, this paper presents a weight method of the inverse deviation of fitted value, and a combined forecast based on a residual auto-regression model and Kalman filtering algorithm is used to forecast gas consumption. Our results show that: (1) The combination forecast is of higher precision: the relative errors of the residual auto-regressive model, the Kalman filtering algorithm and the combination model are within the range (-0.08, 0.09), (-0.09, 0.32) and (-0.03, 0.11), respectively. (2) The combination forecast is of greater stability: the variance of relative error of the residual auto-regressive model, the Kalman filtering algorithm and the combination model are 0.002, 0.007 and 0.001, respectively. (3) Provided that other conditions are invariant, the predicted value of gas consumption in 2018 is 241.81×10 9 m 3. Compared to other time-series forecasting methods, this combined model is less restrictive, performs well and the result is more credible.

Cite this article

ZHU Meifeng , WU Qinglong , WANG Yongqin . Forecasting Gas Consumption Based on a Residual Auto-Regression Model and Kalman Filtering Algorithm[J]. Journal of Resources and Ecology, 2019 , 10(5) : 546 -552 . DOI: 10.5814/j.issn.1674-764X.2019.05.011

1 Introduction

At present, China’s energy consumption is still dominated by energy from fossil fuels, and with the development of the economy, energy consumption increases year by year. In 2016, 62% of energy consumption was raw coal, 18.30% was oil, and 19.7% was clean energy, of which 6.40% was natural gas. In the context of environmental constraints, clean energy consumption has become an important aspect of China’s energy consumption. For example, the proportion of natural gas in energy consumption rose from 2.20% in 2000 to 6.40% in 2016. The forecasting of energy consumption has become a hot topic for scholars in various countries. Forecasting China’s clean energy consumption in the future is important to the adjustment and planning of China’s energy consumption structure.
Domestic and foreign scholars in various fields have forecasted energy consumption. Current research methods focus mainly on time series predictions, non-linear predictions and combined forecasts (Qin et al., 2017; Wang et al., 2017; Xiao et al., 2017; Zhou et al., 2017). The time series method includes mainly grey theory and ARMA. Grey prediction was used to predict Brazil’s carbon dioxide emissions, energy consumption and economic growth rate (Pao et al., 2011). The grey prediction model was improved by nonlinear regression technique and was used to predicted China’s coal consumption (Wang et al., 2017).
Nonlinear prediction methods including: 1) Support vector regression (SVR) model, which introduces “kernel function”, transforms the low-dimensional space problem to the high-dimensional space, constructs a linear decision function, and replaces the linear term in the linear equation with the kernel function to realize the linear regression problem (Kavaklioglu, 2011). 2) Genetic programming prediction, which simulates the optimization process of the automatic variable formation function expression in the process of biological evolution, and automatically forms a prediction model. 3) Artificial neural network method (Szoplik, 2015; Pino et al., 2017).
Combined forecasting, which integrates multiple forecasting methods, has improved the accuracy and stability of forecasting (Kumar et al., 2010; Xu et al., 2016).
Other forecasting methods include the scenario analysis model (Liu et al., 2016), the application of the long-term energy alternative planning system (LEAP) (Chen et al., 2017), the elastic coefficient method, and others.
The residual auto-regressive method is effective in predicting the instable time series with trend characteristics. The Kalman filtering theory has few restrictive constraints on the time series and is effective making short-term predictions (Zhu et al., 2015). The combined forecast model based on residual auto-regressive and Kalman filtering theory is proposed in this paper to forecast natural gas consumption. The empirical results show that combined forecasting is superior in accuracy and stability to single forecasting models.

2 Related theories

2.1 Residual auto-regressive model

The residual auto-regressive model decomposes the time series into two parts: the trend feature and the residual term with the change of time. The specific form is expressed in the following formula:
${{X}_{t}}={{T}_{t}}+{{\varepsilon }_{t}}$
where,${{X}_{t}}$ is the original time series,${{T}_{t}}$is the trend term obtained by the decomposition of deterministic factors, and ${{\varepsilon }_{t}}$ is the residual term.${{T}_{t}}$ is expressed in the following formula (2):
${{T}_{t}}=c+\alpha \times f(t)+{{\varepsilon }_{t}},t=1,2,\cdots ,n$
where, $c$ is the constant term,$\alpha $ is the parameter,$f(t)$ is the function of $t$.
Most of the information of the original time series can be represented by the trend term ${{T}_{t}}$,then the residual term ${{\varepsilon }_{t}}$generally does not have autocorrelation. On the contrary, if the residual term ${{\varepsilon }_{t}}$has autocorrelation, it indicates that the residual term contains the information of the original sequence, and the information contained by the trend term ${{T}_{t}}$is not enough to explain the original time series. Therefore, in-depth analysis of the residual term is necessary. In this paper, it is assumed that the residual term is mainly affected by its lagged term, then AR(p) is constructed:
${{\varepsilon }_{t}}={{\alpha }_{0}}+{{\alpha }_{1}}{{\varepsilon }_{t-1}}+{{\alpha }_{2}}{{\varepsilon }_{t-2}}+\cdots +{{\alpha }_{p}}{{\varepsilon }_{t-p}}+{{\nu }_{t}}$
where, ${{\alpha }_{0}}$ is the constant term of AR(p), ${{\alpha }_{i}}\ (i=1,2,\cdots ,p)$ is the coefficient of residual term and lagged term, and ${{\nu }_{t}}$ is white noise sequence, which meets the condition $E({{\nu }_{t}})=0,\operatorname{var}({{\nu }_{t}})={{\sigma }^{2}},\operatorname{cov}({{\nu }_{i}},{{\nu }_{j}})=0,i\ne j$.

2.2 Kalman filtering theory

Kalman filtering is essentially a recursive algorithm to estimate the linear, unbiased and minimum error variance of the observation values of each state according to new real-time information. Kalman filtering theory can be represented by the following state equation, observation equation and recurrence equation.
Supposing the state space model of the system is:
${{x}_{k+1}}={{\phi }_{k}}{{x}_{k}}+{{B}_{k}}{{U}_{k}}+{{\omega }_{k}}$
${{z}_{k}}={{H}_{k}}{{x}_{k}}+{{V}_{k}}$
where equation (4) is state equation and equation (5) is observation equation. In equation (4), xk is the state variable at time k, xk+1 is the state variable at time k+1, ϕk is the state transition matrix of the system, Uk is the control variable, Bk is the coefficient matrix, and ωk is the disturbance term at time k. In equation (5), zk is the observation value at time k, Hk is the observation matrix, and Vk is the measurement noise at the time of measurement. Here, it is assumed that bothωk and Vk are Gaussian white noise sequences with zero mean value, whose covariances are Q(k) and R(k), as shown in the following formula, where Q(k) and R(k) are positive definite matrix.
$\begin{matrix} & Q(k)=E({{\omega }_{k}}{{{{\omega }'}}_{j}})={{Q}_{k}}{{\delta }_{kj}} \\ & R(k)=E({{V}_{k}}{{{{V}'}}_{j}})={{R}_{k}}{{\delta }_{kj}} \\ \end{matrix}$, $ \delta_{kj}=\left\{ \begin{matrix} 1,& & k=j \\ 0,& & k≠j \\ \end{matrix} \right. $
The mathematical expression of the general form of state vector filtering estimation is:
${{\hat{x}}_{k}}={{D}_{k-1}}{{\hat{x}}_{k-1}}+{{\text{K}}_{k}}{{z}_{k}}+{{E}_{k-1}}{{u}_{k-1}}$
${{\hat{x}}_{k}}$ is the filtering estimation at time k, and ${{D}_{k-1}}$, ${{\text{K}}_{k}}$ and ${{E}_{k-1}}$ are all coefficient matrix.
Under the condition that the estimation error is zero and unbiased, the linear unbiased Kalman filtering predictive estimation equation can be deduced from equations (4) to (6) :
${{\hat{x}}_{k}}={{\hat{x}}_{k/k-1}}+{{K}_{k}}[{{z}_{k}}-{{H}_{k}}{{\hat{x}}_{k/k-1}}]$
The new information equation is defined as:
${{\tilde{z}}_{k}}={{z}_{k}}-{{H}_{k}}{{\hat{x}}_{k/k-1}}$
Under the condition of unbiased estimation, the first- order predictive estimation equation from state xk to state xk+1 can be deduced as follows:
${{\hat{x}}_{k/k-1}}={{\phi }_{k}}{{\hat{x}}_{k}}+{{B}_{k}}{{U}_{k}}$
The covariance of Kalman filtering estimation is the optimal weight matrix:
${{\text{K}}_{k}}=P(k/k-1){{{H}'}_{k}}{{[{{H}_{k}}P(k/k-1){{{H}'}_{k-1}}+R(k)]}^{-1}}$
where, ${{P}_{{}}}(k/k-1)={{\varphi }_{k-1}}{{P}_{e}}(k-1){{{\varphi }'}_{k-1}}+Q(k-1)$
The above equation constitutes the optimal linear recursive filtering process that is the Kalman filtering process.

3 The empirical analysis

In this paper, a combined forecast model based on residual auto-regression and Kalman filtering theory is constructed to predict the time series of natural gas consumption. The combined forecast results are compared with the results of the residual auto-regressive model and the Kalman filtering prediction model. The data for natural gas consumption during the years 1980-2016 was taken from the BP Statistical Review of World Energy, and the data for 2017 is from the website of China’s National Development and Reform Commission. The data trend is shown in Fig. 1.

Fig. 1 Natural gas consumption in China from 1980 to 2017

3.1 Forecast analysis based on residual auto-regressive model

Fig. 1 shows that the trend of China’s natural gas consumption in the sample period presents a significant time trend, so the trend term of original sequence is extracted. After multiple modeling analysis and screening, a statistical model of $X_{t}$ about time t is constructed:
$f(t)=c+\alpha t+{{\varepsilon }_{t}}$
The result of empirical analysis shows that the F statistic of this statistical model is 78.899, the t statistic of time t is 8.883, and the constant term is -33.627; that is, both the variable and the whole model have passed the significance test. The concrete form of equation (11) is
${{Y}_{t}}=-33.627+4.621t,(t=1,2,\cdots ,37)$
The DW value of the above model is 0.045, indicating that the residual term of the model has a certain autocorrelation.
Stationarity test of residual term. According to the characteristics of the time series of the residual term, the test type with the intercept term in the ADF test is selected. The test results show that the t-statistic is -4.969, less than the critical value -3.640 at the 1% significance level, indicating that there is no unit root of residual term $\varepsilon$ at 99% confidence level; that is, the residual sequence is stable.
Correlation analysis of residual term. The results of autocorrelation and partial autocorrelation analysis show that there is a 1-order sequence correlation in the residual series. In this paper, an AR (1) or AR (2) model can be established. According to the empirical analysis data, AR (2) has a better effect. After adjustment, the goodness of fit is 0.984, the F statistic is 1030.801, the t statistic of the first-order lagged term is 18.069, and the t statistic of the second-order lagged term is -8.088; that is, the model parameters and the whole model have passed the significance test. The AR (2) model of the residual term is established:
${{\varepsilon }_{t}}=1.838{{\varepsilon }_{t-1}}-0.850{{\varepsilon }_{t-2}}$
The Lagrange multiplier test is used to test whether the residual auto-regressive model misses important information. As for the two statistics usually provided by this test, the statistic of F is 8.496 and that of T×R2 is 12.656. The results show that the statistics of the Lagrange multiplier cannot reject the null hypothesis, which means the residual term does not have sequence correlation at the significance level of 5%. So the residual auto-regressive model does not have sequence correlation. Then, the specific form of the residual auto-regressive model can be expressed by the following formula:
$\begin{matrix} & {{Y}_{t}}=-33.627+4.621t+1.838{{\varepsilon }_{t-1}}-0.850{{\varepsilon }_{t-2}} \\ & \begin{matrix} \begin{matrix} {} & {} \\ \end{matrix} & {} \\ \end{matrix}(t=3,4,\cdots ,37) \\ \end{matrix}$
Through the prediction and analysis of the residual auto-regressive model, the fitting relative error range falls within the range (-0.1, 0.1), and the variance of the fitting relative error is 0.002. The fitting degree of the model is represented by Fig. 2. The 2018 natural gas consumption is forecasted to be is 244.218×109 m3.

Fig. 2 Observed value and estimated value by residual auto-regression

3.2 Forecast analysis based on Kalman filtering

The Kalman filtering theory is used to predict natural gas demand. Firstly, the initial value of the vector and the covariance of the initial value are assigned to the state equation. Considering that the consumption of natural gas in 1980 was 14.74×109 m3, the initial value and expected value of the state vector are set to 14.74 and the initial value of the covariance is set to 0.2. Since the time series eventually converges, the initial value setting does not affect the result of the Kalman estimation. Considering that the observation value of natural gas consumption is likely to have a one-to-one correspondence with the actual state value, the observation coefficient H is set to 1, and the coefficient matrix is reduced to the coefficient.
Given the premise for setting the basic coefficient stated above, the empirical results of Kalman filtering show that estimate is accurate and stable. The fitting of the actual natural gas consumption value and the value estimated by Kalman filtering is shown in Fig. 3 below.

Fig. 3 Observed value and value estimated by Kalman filtering

The performance of the Kalman filtering estimation is represented by the covariance of the filtering estimation error, and its fluctuation interval falls within the range of (9.434e-5, 0.239). This indicates that the fitting deviation between Kalman filtering estimation of natural gas consumption and the observed value of natural gas consumption is small and the fitting process is relatively stable, as shown in Fig. 4. As time goes by, the relative fitting error range of the filtering estimation falls within the range (-0.2, 0.1), and the variance of the the relative fitting error range is 0.002; that is, the relative error of the Kalman filtering estimation is controllable and relatively stable, as shown in Fig. 5. The natural gas consumption in 2018 is forecasted to be 239.49×109 m3.

Fig. 4 Covariance of estimation error of Kalman filtering

Fig. 5 Relative error of Kalman filtering estimation

3.3 Combined forecast analysis based on residual auto-regression and Kalman filtering

The way that the weight of different forecast models is set is crucial to the fitting of the combined forecast model and the accuracy of the forecast. This paper uses multiple fittings of different weight settings, and designs an inverse fitting value deviation method. The basic principle is to calculate the deviation degree for each prediction of a single forecast model. The fitting value with a large deviation degree is given a small weight. Here the fitting mean of the calculated deviation degree is calculated by the observation value.
Assume the prediction result of the residual auto- regressive model as sequence ${{A}_{i}}\ (i=1982,1983,\cdots ,2016)$, i is the year. Let
${{a}_{i}}=\frac{1}{{{({{A}_{i}}-{{X}_{i}})}^{2}}},i=1982,1983,\cdots ,2016;$ ${{X}_{i}}$ is the annual data observation value.
Assume the prediction result of Kalman model as sequence ${{B}_{i}}(i=1982,1983,\cdots ,2016)$, i is the year. Let
${{b}_{i}}=\frac{1}{{{({{B}_{i}}-{{X}_{i}})}^{2}}},i=1982,1983,\cdots ,2016$$;{{X}_{i}}$is the annual data observation value.
The weight of the estimated value of residual auto-regression is ${{\lambda }_{i}}=\frac{{{a}_{i}}}{{{a}_{i}}+{{b}_{i}}},\ i=1982,1983,\cdots ,2016$ and the weight of the estimated value of Kalman filtering is ${{\beta }_{i}}=\frac{{{b}_{i}}}{{{a}_{i}}+{{b}_{i}}},i=1982,1983,\cdots ,2016; $ where ${{\lambda }_{i}}+{{\beta }_{i}}=1$. The combined forecast result is ${{C}_{i}}={{\lambda }_{i}}{{A}_{i}}+{{\beta }_{i}}{{B}_{i}}\ (i=1982,$ $1983,\cdots ,2016)$
The fitting results of the combined prediction are shown in Fig. 6 below. On the whole, the fitting effect of the time series for natural gas consumption using the combined forecast model constructed with the methods above is better.

Fig. 6 Observed value and estimated value by the combined predictive value

4 Comparative analyses of empirical results

It can be seen from the above analysis that both the residual auto-regressive prediction method and the Kalman filtering prediction method both provide a reasonable fit for the time series of natural gas consumption to some extent, and the fitting error can also be controlled in an acceptable range. However, the prediction performance of the combined forecast model based on the inverse predicted value deviation method is better. This is reflected mainly in the higher accuracy of the combined forecast compared to that provided by either of the single forecast models and the smaller variance of the prediction error. The specific data is shown in table 1 below.
Table 1 Observated values and the fitting values of each prediction method (unit: 109 m3)
Year Observed
value
Estimated value of residual
auto-regression
Estimated
value of Kalman
Estimated value of combined forecast Year Observed
value
Estimated value of residual auto-regression Estimated
value of
Kalman
Estimated value of combined forecast
1982 12.33 12.06 13.48 12.13 2000 25.35 24.43 22.13 24.25
1983 12.61 11.92 12.42 12.39 2001 28.37 29.17 22.39 29.05
1984 12.84 13.22 12.58 12.79 2002 30.19 32.12 28.36 30.14
1985 13.36 13.45 12.79 13.43 2003 35.08 32.93 28.58 32.51
1986 14.22 14.26 13.36 14.26 2004 41.04 40.43 34.06 40.38
1987 14.35 15.45 14.11 14.17 2005 48.21 47.29 34.62 47.23
1988 14.84 15.03 14.29 14.95 2006 59.31 55.47 47.76 54.70
1989 15.53 15.86 14.84 15.67 2007 72.95 69.81 53.10 69.40
1990 15.76 16.77 14.85 15.70 2008 84.09 85.52 68.39 85.38
1991 16.42 16.66 14.88 16.62 2009 92.60 94.45 79.96 94.15
1992 16.41 17.74 16.39 16.39 2010 111.18 100.68 92.35 98.70
1993 17.32 17.21 16.40 17.20 2011 137.08 127.64 92.63 126.13
1994 17.92 18.95 16.55 18.08 2012 150.94 159.51 129.35 155.41
1995 18.33 19.33 17.76 18.14 2013 171.92 163.04 139.83 161.39
1996 19.10 19.64 18.32 19.22 2014 188.40 189.86 171.53 189.72
1997 20.19 20.76 18.45 20.54 2015 194.76 202.38 175.21 200.80
1998 20.93 22.17 20.17 20.72 2016 210.34 200.11 194.72 211.50
1999 22.21 22.65 20.72 22.49 2017 237.30 235.40 220.27 235.21
As the data in Table 1 shows, compared with the single forecast models, the fitting effect of the combined forecast model is better. Fig. 7 shows that the data fitting status of different forecast models can be seen intuitively.

Fig. 7 Fitting figure of observed value and the estimated value

As can be seen from Fig. 8, the degree of relative error control is better for the combined forecast model, with the relative error falling within the range (-0.03, 0.11). The relative error of Kalman filtering estimation falls within the range (-0.09, 0.32). The relative error of the residual auto-regressive estimation falls within the range (-0.08, 0.09). With respect to the degree of relative error fluctuation, the variance of relative error of the combined forecast model is 0.001, the variance of relative error of Kalman filtering estimation is 0.007, and the variance of relative error of residual auto-regressive estimation is 0.002.

Fig. 8 Comparison of relative error

According to the above analysis, the relative error estimated by the combined forecast model is smaller than that of either of the single forecast models, and the fluctuation of relative error is smaller. This is useful for predicting the short-term trend of natural gas consumption. According to the residual auto-regressive forecast model and Kalman filtering theory, natural gas consumption in 2018 is 234.219×109 m3 and 236.49×109 m3 respectively, and the forecast result of the combined forecast model is 241.808× 109 m3.

5 Conclusions

With the ecological environment imposing constraints, China’s energy consumption pattern dominated by fossil fuel energy is bound to gradually improve; the consumption scale of clean energy has begun to increase and will continue to do so rapidly. Forecasting and analysis of total energy consumption can play a positive role in China’s energy production layout.
Based on the residual auto-regressive model, Kalman filtering theory, and the weight of inverse fitting value deviation, this paper constructs a combined forecast model for natural gas consumption. From the results of the empirical analysis, it can be seen that the range of the relative error of the prediction is within the range (-0.03, 0.11), and the variance is 0.001, and the combined forecast model performs well. The combined forecast model can be effectively applied to predict energy consumption.
[1]
Chen R, Rao Z, Liu J , et al. 2017. Prediction of energy demand and policy analysis of Changsha based on LEAP Model. Resources Science, 39(3):482-489. (in Chinese)

[2]
Kavaklioglu K . 2011. Modeling and prediction of Turkey’s electricity consumption using Support Vector Regression. Applied Energy, 88(1):368-375.

[3]
Kovačič M, Šarler B . 2014. Genetic programming prediction of the natural gas consumption in a steel plant. Energy, 66(2):273-284.

[4]
Kumar U, Jain V K . 2010. Time series models (Grey-Markov, Grey Model with rolling mechanism and singular spectrum analysis) to forecast energy consumption in India. Energy, 35(4):1709-1716.

[5]
Liu W D, Zhong W Z, Shi Q . 2016. Forecast of China’s total energy consumption in 2020 based on method of fixed base energy consumption elasticity coefficient. Resources Science, 38(4):658-664. (in Chinese)

[6]
Pao H T, Tsai C M . 2011. Tsai Modeling and forecasting the CO2 emissions, energy consumption, and economic growth in Brazil. Energy, 36(5):2450-2458.

[7]
Pino Mejías R, Pérez-Fargallo A, Rubio-Bellido C , et al. 2017. Comparison of linear regression and artificial neural networks models to predict heating and cooling energy demand, energy consumption and CO2 emissions. Energy, 118(1):24-36.

[8]
Qin L, Huang W B, Ma G W , et al. 2017. Study of energy consumption based on Verhulst model and information entropy. China Population, Resources and Environment, 27(11):45-49. (in Chinese)

[9]
Szoplik J . 2015. Forecasting of natural gas consumption with artificial neural networks. Energy, 85(1):208-220.

[10]
Wang X, Fan Z Q . 2017. Prediction of coal consumption based on optimized GM (1, 1) model. Coal Technology, 36(9):321-322. (in Chinese)

[11]
Xiao J, Sun H Y, Liu D H , et al. 2017. GMDH based hybrid model for China's energy consumption prediction. Chinese Journal of Management Science, 25(12):158-166. (in Chinese)

[12]
Xu N, Dang Y, Gong Y . 2016. Novel grey prediction model with nonlinear optimized time response method for forecasting of electricity consumption in China. Energy, 118(1):473-480.

[13]
Zhou W J, Zhang H R, Dang Y G , et al. 2017. New information priority accumulated grey discrete model and its application. Chinese Journal of Management Science, 25(8):140-148. (in Chinese)

[14]
Zhu M F, Zhao G H . 2015. Forecasting the coke price based on the Kalman filtering algorithm. Journal of Resources and Ecology, 6(1):60-64.

Outlines

/