- Department of Computer Science and Artificial Intelligence, University of Granada, Spain.
- Department of Econometrics and Business Statistics, Monash University, Australia.

*International Journal of Forecasting*. to appear.

**Abstract**:

Exponential smoothing is one of the most popular forecasting methods. We present a method for bootstrap aggregation (bagging) of exponential smoothing methods. The bagging uses a Box-Cox transformation followed by an STL decomposition to separate the time series into trend, seasonal part, and remainder. The remainder is then bootstrapped using a moving block bootstrap, and a new series is assembled using this bootstrapped remainder. On the bootstrapped series, an ensemble of exponential smoothing models is estimated. The resulting point forecasts are averaged using the mean. We evaluate this new method on the M3 data set, showing that it consistently outperforms the original exponential smoothing models. On the monthly data, we achieve better results than any of the original M3 participants. We also perform statistical testing to explore significance of the results. Using the MASE, our method is significantly better than all the M3 participants on the monthly data.

Banff, Canada.

It is becoming increasingly common for organizations to collect very large amounts of data over time. Data visualization is essential for exploring and understanding structures and patterns, and to identify unusual observations. However, the sheer quantity of data available challenges current time series visualisation methods.

For example, Yahoo has banks of mail servers that are monitored over time. Many measurements on server performance are collected every hour for each of thousands of servers. We wish to identify servers that are behaving unusually.

Alternatively, we may have thousands of time series we wish to forecast, and we want to be able to identify the types of time series that are easy to forecast and those that are inherently challenging.

I will demonstrate a functional data approach to this problem using a vector of features on each time series, measuring characteristics of the series. For example, the features may include lag correlation, strength of seasonality, spectral entropy, etc. Then we use a principal component decomposition on the features, and plot the first few principal components. This enables us to explore a lower dimensional space and discover interesting structure and unusual observations.

]]>

Sunnyvale, California

Friday 26 June 2015, 3:00-4:00 pm

Location: Yahoo Sunnyvale Campus and LIVE at labs.yahoo.com

Why is it that we can accurately forecast a solar eclipse in 1000 years time, but we have no idea whether Yahoo’s stock price will rise or fall tomorrow? Or why can we forecast electricity consumption next week with remarkable precision, but we cannot forecast exchange rate fluctuations in the next hour?

In this talk, I will discuss the conditions we need for predictability, how to measure the uncertainty of predictions, and the consequences of thinking we can predict something more accurately than we can.

I will draw on my experiences in forecasting Australia’s health budget for the next few years, in developing forecasting models for peak electricity demand in 20 years time, and in identifying unpredictable activity on Yahoo’s mail servers.

]]>

Mountain View, California.

Many applications require a large number of time series to be forecast completely automatically. For example, manufacturing companies often require weekly forecasts of demand for thousands of products at dozens of locations in order to plan distribution and maintain suitable inventory stocks. In these circumstances, it is not feasible for time series models to be developed for each series by an experienced analyst. Instead, an automatic forecasting algorithm is required.

In addition to providing automatic forecasts when required, these algorithms also provide high quality benchmarks that can be used when developing more specific and specialized forecasting models.

I will describe some algorithms for automatically forecasting univariate time series that have been developed over the last 20 years. The role of forecasting competitions in comparing the forecast accuracy of these algorithms will also be discussed.

]]>

Riverside, California

I will describe and demonstrate a new open-source R package that implements the Monash Electricity Forecasting Model, a semi-parametric probabilistic approach to forecasting long-term electricity demand. The underlying model proposed in Hyndman and Fan (2010) is now widely used in practice, particularly in Australia. The model has undergone many improvements and developments since it was first proposed, and these have been incorporated in this R implementation.

The package allows for ensemble forecasting of demand based on simulations of future sample paths of temperatures and other predictor variables. It requires the following data as inputs: half-hourly/hourly electricity demands; half-hourly/hourly temperatures at one or two locations; seasonal (e.g., quarterly) demographic and economic data; and public holiday data.

Peak electricity demand forecasting is important in medium and long-term planning of electricity supply. Extreme demand often leads to supply failure with consequential business and social disruption. Forecasting extreme demand events is therefore an important problem in energy management, and this package provides a useful tool for energy companies and regulators in future planning.

]]>

Rosemead, California

Electricity demand forecasting plays an important role in short-term load allocation and long-term planning for future generation facilities and transmission augmentation. It is a challenging problem because of the different uncertainties including underlying population growth, changing technology, economic conditions, prevailing weather conditions (and the timing of those conditions), as well as the general randomness inherent in individual usage. It is also subject to some known calendar effects due to the time of day, day of week, time of year, and public holidays. But the most challenging part is that we often want to forecast the peak demand rather than the average demand. Consequently, it is necessary to adopt a probabilistic view of potential peak demand levels in order to evaluate and hedge the financial risk accrued by demand variability and forecasting uncertainty.

I will describe some Australian experiences in addressing these problems via the Monash Electricity Forecasting Model, a semiparametric additive model designed to take all the available information into account, and to provide forecast distributions from a few hours ahead to a few decades ahead. The approach is being used by energy market operators and supply companies to forecast the probability distribution of electricity demand in various regions of Australia.

I will briefly demonstrate an open-source R package to implement the model. The package allows for ensemble forecasting of demand based on simulations of future sample paths of temperatures and other predictor variables.

Finally, I will discuss some recent developments in evaluating peak demand forecasts, and some research competitions that have generated some innovative new methods to tackle energy forecasting problems.

]]>

*Journal of Allergy and Clinical Immunology*.

**Abstract**

We propose new generic methods for decomposing seasonal data: STR (a Seasonal-Trend decomposition procedure based on Regression) and Robust STR. In some ways, STR is similar to Ridge Regression and Robust STR can be related to LASSO. Our new methods are much more general than any alternative time series decomposition methods. They allow for multiple seasonal and cyclic components, and multiple linear regressors with constant, flexible, seasonal and cyclic influence. Seasonal patterns (for both seasonal components and seasonal regressors) can be fractional and flexible over time; moreover they can be either strictly periodic or have a more complex topology. We also provide confidence intervals for the estimated components, and discuss how STR can be used for forecasting.

- CEMSE Division, King Abdullah University of Science and Technology
- Monash Business School, Monash University, Clayton, Victoria, Australia.

**Abstract**

A large body of the forecasting literature so far has been focused on forecasting the conditional mean of future observations. However, there is an increasing need for generating the entire conditional distribution of future observations in order to effectively quantify the uncertainty in time series data. We present two different methods for probabilistic time series forecasting that allow the inclusion of a possibly large set of exogenous variables. One method is based on forecasting both the conditional mean and variance of the future distribution using a traditional regression approach. The other directly computes multiple quantiles of the future distribution using quantile regression. We propose an implementation for the two methods based on boosted additive models, which enjoy many useful properties including accuracy, flexibility, interpretability and automatic variable selection. We conduct extensive experiments using electricity smart meter data, on both aggregated and disaggregated scales, to compare the two forecasting methods for the challenging problem of forecasting the distribution of future electricity consumption. The empirical results demonstrate that the mean and variance forecasting provides better forecasts for aggregated demand, while the flexibility of the quantile regression approach is more suitable for disaggregated demand. These results are particularly useful since more energy data will become available at the disaggregated level in the future.

]]>

- Monash Business School, Monash University, Clayton, Victoria, Australia.
- Yahoo Labs, Sunnyvale, California, USA

**Abstract**

It is becoming increasingly common for organizations to collect very large amounts of data over time, and to need to detect unusual or anomalous time series. For example, Yahoo has banks of mail servers that are monitored over time. Many measurements on server performance are collected every hour for each of thousands of servers. We wish to identify servers that are behaving unusually.

We compute a vector of features on each time series, measuring characteristics of the series. The features may include lag correlation, strength of seasonality, spectral entropy, etc. Then we use a principal component decomposition on the features, and use various bivariate outlier detection methods applied to the first two principal components. This enables the most unusual series, based on their feature vectors, to be identified. The bivariate outlier detection methods used are based on highest density regions and -hulls.

]]>