class: title-slide <br><br><br> # .center[The uncertainty estimation of feature-based forecast combinations] <br> ## .center[Xiaoqian Wang] ### .center[Beihang University] ### .center[41st International Symposium on Forecasting] ### .center[June 17, 2021] --- # Joint work with |<img src="figs/YanfeiKang.png" height=250 width=250>|<img src="figs/FotiosPetropoulos.png" height=250 width=250>|<img src="figs/FengLi.png" height=250 width=250>| | :---: | :---: | :---: | |Yanfei Kang|Fotios Petropoulos|Feng Li| |.small[Beihang University]|.small[University of Bath]|.small[Central University of <br>Finance and Economics]| --- # Outline .large[ - Introduction - Feature-based interval forecasting framework - Weight determination - Application to the M4 competition data - Conclusions ] --- class: inverse, center, middle # Introduction --- # Motivation .pull-left-0[                   .small[Forecasting]     Time series     <img src="figs/rightarrow.png" height=30 width=150> ] .pull-right-0[ - Point forecasts - Probabilistic forecasts ] .center[ <img src="figs/example.png"> ] --- # Motivation .pull-left-0[                   .small[Forecasting]     Time series     <img src="figs/rightarrow.png" height=30 width=150> ] .pull-right-0[ - Point forecasts - Probabilistic forecasts ] .pull-right[ .pull-left-4[         <img src="figs/downarrow.png" height=100 width=30> ] .pull-right-4[ <br> .small[Forecasting method] ]] .pull-right-0[     .bold[Individual models] .pull-right-5[ .small[ - Naïve - Snaïve - ARIMA - ETS... ]]] --- # Motivation .pull-left-0[                   .small[Forecasting]     Time series     <img src="figs/rightarrow.png" height=30 width=150> ] .pull-right-0[ - Point forecasts - Probabilistic forecasts ] .pull-left[ .pull-left-3[ <br> .small[Description] ] .pull-right-3[ <img src="figs/downarrow.png" height=100 width=30> ]] .pull-right[ .pull-left-4[         <img src="figs/downarrow.png" height=100 width=30> ] .pull-right-4[ <br> .small[Forecasting method] ]] .content-box-white[ .pull-left-0[     .bold[Features] .pull-right-5[ .small[ - Trend - Linearity - Nonlinearity - Seasonality... ]]] .pull-right-0[     .bold[Individual models] .pull-right-5[ .small[ - Naïve - Snaïve - ARIMA - ETS... ]]] ] --- # Motivation .pull-left-0[                   .small[Forecasting]     Time series     <img src="figs/rightarrow.png" height=30 width=150> ] .pull-right-0[ - Point forecasts - Probabilistic forecasts ] .pull-left[ .pull-left-3[ <br> .small[Description] ] .pull-right-3[ <img src="figs/downarrow.png" height=100 width=30> ]] .pull-right[ .pull-left-4[         <img src="figs/downarrow.png" height=100 width=30> ] .pull-right-4[ <br> .small[Forecasting method] ]] .content-box-gray[ .pull-left-0[     .bold[Features] .pull-right-5[ .small[ - Trend     <img src="figs/rightarrow.png" height=30 width=150> - Linearity    .darkorange[Feature-based] - Nonlinearity    .darkorange[forecasting] - Seasonality... ]]] .pull-right-0[     .bold[Individual models] .pull-right-5[ .small[ - Naïve - Snaïve - ARIMA - ETS... ]]] ] --- # Introduction - .bolder[Point forecasting] mainly forecasts the mean or the median of the distributions for future observations. - .bolder[Probabilistic forecasting] can provide a comprehensive outlook of the expected future value and the future uncertainty. - .bolder[Time series features] provide valuable information for decision makers. - The superiority of .bolder[forecast combinations] over a single model. - *No-free-lunch* theorem (Wolpert & Macready, 1997). - *Horses for courses* (Petropoulos et al., 2014). - Merely tackling model uncertainty is sufficient to help (Petropoulos et al., 2018). --- # Challenges - Previous literature mainly focuses on - point forecasting + forecast combinations. - How do features .bolder[affect] the uncertainty estimation of forecasts? - How to .bolder[guarantee the effectiveness] of the relationship in forecasting a newly given dataset? - How to .bolder[translate] the relationship into an attempt to improve the forecasting performance? <br> .center[.bolder[ Feature-based probabilistic forecast combinations. ]] --- class: inverse, center, middle # Feature-based interval forecasting framework --- # General framework <img src="figs/flowchart.png"> --- # GRATIS .small[.black[(Kang et al., 2020)]] .center[ <img src="figs/gratis1.png" height=265> <img src="figs/gratis2.png" height=265> ] --- # Dataset .center[ <img src="figs/table1.png" height=200> ] .pull-left[ .center[Reference (GRATIS)] <img src="figs/reference.png" height=270> ] .pull-right[ .center[Test (M4)] <img src="figs/test.png" height=270> ] --- # Other components - `\(42\)` times series features (R package `tsfeatures`) - Individual model pool .center[ <img src="figs/method.png" > ] - Interval forecast evaluation `\begin{align} \mathrm{MSIS} = \frac{1}{h}\frac{\sum_{t=n+1}^{n+h}(U_t-L_t)+\frac{2}{\alpha}(L_t-Y_t)\mathbb{1}\left\{ Y_t < L_t\right\} + \frac{2}{\alpha}(Y_t - U_t)\mathbb{1}\left\{Y_t>U_t\right\}}{\frac{1}{n-m}\sum_{t=m+1}^{n} \vert Y_t-Y_{t-m} \vert} \end{align}` --- # Linking features with performance ## Why GAM? .pull-left-3[ - Interpretability - Regularization - Flexibility ] .pull-right-3[ .center[ <img src="figs/gam.png"> ] ] <br> ## GAM model for each individual model - `\(\log (\mathrm{MSIS}_{N}) \Longleftrightarrow F_{N \times P}\)` --- # Partial effect analysis | Feature | Description | Range | | :---------------- | :------------------------ | :------------ | | seasonal_strength | Strength of seasonality | `\([0, 1)\)` | | nonlinearity | Nonlinearity coefficient | `\([0, \infty)\)` | | x_acf1 | The first autocorrelation coefficient | `\((-1, 1)\)` | <br> .center[ <img src="figs/effect.png"> ] --- # Partial effect analysis .small[ - The partial effect of one feature on the interval forecasting performance is distinct from the other features. - A feature has its unique way of affecting the interval forecasting performance of individual models. - Some features are biased towards up-weighting some forecasting models over others. ] .center[ <img src="figs/effect.png"> ] --- class: inverse, center, middle # Weight determination --- # Weight assignment .bold[Adjusted softmax function] `\begin{align} P_{ij} = \frac{\exp\left\{ \frac{\mu_i-\widehat{\log(\mathrm{MSIS}_{ij})}}{\sigma_i} \right\}}{\sum_{k=1}^{M}\exp\left\{\frac{\mu_i-\widehat{\log(\mathrm{MSIS}_{ik})}}{\sigma_i}\right\}}, \quad i=1,\ldots,N;\quad j=1,\ldots,M \end{align}` - Negative values can be down-weighted to near-zero. - `\(\log(\mathrm{MSIS})\boldsymbol{\uparrow} \quad \Longrightarrow \text{Accuracy}\boldsymbol{\downarrow} \quad \Longrightarrow P\boldsymbol{\downarrow}\)` .bold[Optimal threshold ratio search] For `\(i\)`th time series, - calculate the ratio of weight `\(R_{k}=P_{i j} / \max \left(P_{i k}\right)\)`. - select individual models that satisfy `\(R_{k}>Tr\)` `\((0 < Tr \leq 1)\)`. --- # Combined forecasts ### Combined prediction intervals `\begin{align} f_{wi}^l &= \frac{1}{\sum_{k=1}^{S}P_{ik}}\sum_{k=1}^{S}P_{ik}f_{ik}^l \\ f_{wi}^u &= \frac{1}{\sum_{k=1}^{S}P_{ik}}\sum_{k=1}^{S}P_{ik}f_{ik}^u \end{align}` ### Combined point forecasts `\begin{align} f_{wi} = \frac{1}{2}(f_{wi}^l + f_{wi}^u) \end{align}` ??? we assume the intervals to be symmetric around the point forecast --- # Optimal threshold ratio search .center[ <img src="figs/threshold.png"> ] - Model combination `\(\longrightarrow\)` Model selection. - `\(Tr = 0\)` indicates that .bolder[all] the methods from the pool are selected. - `\(Tr = 1\)` indicates that only the method with the .bolder[minimal] fitted `\(\log(\mathrm{MSIS})\)` is selected. ??? A larger threshold value means that fewer methods are selected for model combining, while a smaller threshold value means that many more methods are used for model combining. This indicates that con- trolling the number of methods using the threshold searching algorithm is beneficial for improving the forecasting performance. --- class: inverse, center, middle # Application to the M4 competition data --- # Selection rates of each model .center[ <img src="figs/select.png"> ] --- # Performance for different confidence levels .center[ <img src="figs/performance.png"> ] --- # Forecasting results .center[ <img src="figs/results.png"> ] --- class: inverse, center, middle # Conclusions --- # Conclusions - .bolder[Features] are taken into account to estimate the .bolder[uncertainty] of forecasts (.bolder[cross-learning]). - We propose an optimal threshold ratio searching algorithm to select an appropriate .bolder[subset] of models per time series for model combination. - Our approach .bolder[outperforms] a variety of individual models with distinctions for both point forecasts and prediction intervals. --- <br><br><br> # .center[Thanks for your attention!] .content-box-gray[ - Paper: [Wang et al., (2021, JORS)](https://www.tandfonline.com/doi/full/10.1080/01605682.2021.1880297) - R package: https://github.com/xqnwang/fuma - Slides: https://xqnwang.rbind.io/talk/fuma/Slides.html - Web: https://xqnwang.rbind.io ]