![]() |
![]() |
![]() |
---|---|---|
Yanfei Kang | Rob J Hyndman | Feng Li |
Beihang University | Monash University | Central University of Finance and Economics |
Ultra-long time series are increasingly accumulated in many cases.
Forecasting these time series is challenging.
Some attempts are made in the vast literature.
P1: Ultra-long time series are becoming increasingly common. Examples include hourly electricity demands spanning several years, daily maximum temperatures recorded for hundreds of years, and streaming data
P2: It is challenging to deal with such long time series. We identify three significant challenges, including:
P3: Forecasters have made some attempts to address these limitations.
The Global Energy Forecasting Competition 2017 (GEFCom2017)
Ranging from March 1, 2003 to April 30, 2017 ( 124,171 time points)
10 hourly electricity load series
Training periods
March 1, 2003 - December 31, 2016
Test periods
January 1, 2017 - April 30, 2017
( h=2,879 )
An example series from GEFCom2017 dataset.
In this work, we want to find a better way to resolve all challenges associated with forecasting ultra-long time series. (...) Inspired by this, we aim to extend the DLSA method to solve the time series modeling problem.
For an ultra-long time series {yt;t=1,2,…,T}. Define S={1,2,⋯,T} to be the timestamp sequence.
The parameter estimation problem can be formulated as f(θ,Σ|yt,t∈S).
For an ultra-long time series {yt;t=1,2,…,T}. Define S={1,2,⋯,T} to be the timestamp sequence.
The parameter estimation problem can be formulated as f(θ,Σ|yt,t∈S).
Suppose the whole time series is split into K subseries with contiguous time intervals, that is S=∪Kk=1Sk.
The parameter estimation problem is transformed into K sub-problems and one combination problem as follows: f(θ,Σ|yt,t∈S)=g(f1(θ1,Σ1|yt,t∈S1),…,fK(θK,ΣK|yt,t∈SK)).
We identify the parameter estimation problem as this formula ~.
The proposed framework for time series forecasting on a distributed system.
the most widely used statistical time series forecasting approaches.
frequently serve as benchmark methods for model combination.
handle non-stationary series via differencing and seasonal patterns.
the automatic ARIMA modeling was developed to easily implement the order selection process (Hyndman & Khandakar, 2008).
be converted into AR representations (linear form).
The automatic ARIMA modeling mainly consists of 3 steps... Where the order selection and model refit process are time-consuming for ultra-long time series. The time spend in forecasting one electricity demand series ranges from 20 minutes to 2 hours. So, it is necessary to develop a new approach to make ARIMA models work well for ultra-long series.
A seasonal ARIMA model is generally defined as (1−p∑i=1ϕiBi)(1−P∑i=1ΦiBim)(1−B)d(1−Bm)D(yt−μ0−μ1t)=(1+q∑i=1θiBi)(1+Q∑i=1ΘiBim)εt.
The linear representation of the original seasonal ARIMA model can be given by yt=β0+β1t+∞∑i=1πiyt−i+εt, where β0=μ0(1−∞∑i=1πi)+μ1∞∑i=1iπiandβ1=μ1(1−∞∑i=1πi).
For a general seasonal ARIMA model, by using multiplication and long division of polynomials, we can obtain the final converted linear representation in this form. In this way, all ARIMA models fitted for subseries can be converted into this linear form.
Some excellent statistical properties of the global estimator obtained by DLSA (Distributed Least Squares Approximation) has been proved (Zhu, Li & Wang 2021, JCGS).
We extend the DLSA method to solve time series modeling problem.
Define L(θ;yt) to be a second order differentiable loss function, we have L(θ)=1TK∑k=1∑t∈SkL(θ;yt)=1TK∑k=1∑t∈Sk{L(θ;yt)−L(^θk;yt)}+c1≈1TK∑k=1∑t∈Sk(θ−ˆθk)⊤¨L(^θk;yt)(θ−^θk)+c2≈K∑k=1(θ−^θk)⊤(TkTˆΣ−1k)(θ−^θk)+c2.
The global estimator calculated by minimizing the weighted least squares takes the following form ~θ=(K∑k=1TkTˆΣ−1k)−1(K∑k=1TkTˆΣ−1k^θk),~Σ=(K∑k=1TkTˆΣ−1k)−1.
ˆΣk is not known and has to be estimated.
We approximate a GLS estimator by an OLS estimator (e.g., Hyndman et al., 2011) while still obtaining consistency.
We consider approximating ˆΣk by ^σ2kI for each subseries.
The global estimator and its covariance matrix can be obtained. The covariance matrix of subseries is not known, so we estimate it by sigma2I.
The h-step-ahead point forecast can be calculated as ^yT+h|T=~β0+~β1(T+h)+⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩∑pi=1~πiyT+1−i,h=1∑h−1i=1~πi^yT+h−i|T+∑pi=h~πiyT+h−i,1<h<p.∑pi=1~πi^yT+h−i|T,h≥p
The central (1−α)×100% prediction interval of h-step ahead forecast can be defined as ^yT+h|T±Φ−1(1−α/2)~σh.
The standard error of h-step ahead forecast is formally expressed as ~σ2h=⎧⎨⎩~σ2,h=1~σ2(1+∑h−1i=1~θ2i),h>1, where ~σ2=tr(~Σ)/p.
Then, the point forecasts and prediction intervals can be obtained.
Number of subseries: 150
AR order: 2000
The experiments are carried out on a Spark-on-YARN cluster
Benchmarking the performance of DARIMA against ARIMA model and its AR representation.
The rationality of setting the AR order to 2000.
DARIMA always outperforms the benchmark method regardless of point forecasts or prediction intervals.
If long-term observations are considered, DARIMA is preferable, especially for interval forecasting.
The achieved performance improvements of DARIMA become more pronounced as the forecast horizon increases.
Benchmarking the performance of DARIMA against ARIMA for various forecast horizons.
Our approach has captured the decreasing yearly seasonal trend.
Both DARIMA and ARIMA have captured the hourly seasonality, while DARIMA results in forecasts closer to the true future values than ARIMA.
A distributed time series forecasting framework using the industry-standard MapReduce framework.
The local estimators trained on each subseries are combined using weighted least squares to minimize a global loss function.
Our framework
works better than competing methods for long-term forecasting.
achieves improved computational efficiency in optimizing the model parameters.
allows that the DGP of each subseries could vary.
can be viewed as a model combination approach.
Thanks!
Spark implementation: @xqnwang/darima
Website: https://xqnwang.rbind.io
Twitter: @Xia0qianWang
![]() |
![]() |
![]() |
---|---|---|
Yanfei Kang | Rob J Hyndman | Feng Li |
Beihang University | Monash University | Central University of Finance and Economics |
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |