**Abstract**

It is becoming increasingly common for organizations to collect very large amounts of data over time. Data visualization is essential for exploring and understanding structures and patterns, and to identify unusual observations. However, the sheer quantity of data available challenges current time series visualisation methods.

For example, Yahoo has banks of mail servers that are monitored over time. Many measurements on server performance are collected every hour for each of thousands of servers. We wish to identify servers that are behaving unusually.

Alternatively, we may have thousands of time series we wish to forecast, and we want to be able to identify the types of time series that are easy to forecast and those that are inherently challenging.

I will demonstrate an approach to this problem using a vector of features on each time series, measuring characteristics of the series. For example, the features may include lag correlation, strength of seasonality, spectral entropy, etc. Then we use a principal component decomposition on the features, and plot the first few principal components. This enables us to explore a lower dimensional space and discover interesting structure and unusual observations.

**Slides**

Electricity demand forecasting plays an important role in long-term planning for future generation facilities and transmission augmentation. It is a challenging problem because of the different uncertainties including underlying population growth, changing technology, economic conditions, prevailing weather conditions (and the timing of those conditions), as well as the general randomness inherent in individual usage. It is also subject to some known calendar effects due to the time of day, day of week, time of year, and public holidays. But the most challenging part is that we often want to forecast the peak demand rather than the average demand. Consequently, it is necessary to adopt a probabilistic view of potential peak demand levels in order to evaluate and hedge the financial risk accrued by demand variability and forecasting uncertainty.

I will describe my experiences in addressing these problems via the Monash Electricity Forecasting Model, a semiparametric additive model designed to take all the available information into account, and to provide forecast distributions up to a few decades ahead. The approach is being used by energy market operators and supply companies to forecast the probability distribution of electricity demand in various regions of Australia.

I will also briefly demonstrate an open-source R package to implement the model. The package allows for ensemble forecasting of demand based on simulations of future sample paths of temperatures and other predictor variables.

]]>One of the most widely used standard procedures for model evaluation in classification and regression is -fold cross-validation (CV). However, when it comes to time series forecasting, because of the inherent serial correlation and potential non-stationarity of the data, its application is not straightforward and often omitted by practitioners in favor of an out-of-sample (OOS) evaluation. In this paper, we show that the particular setup in which time series forecasting is usually performed using Machine Learning methods renders the use of standard -fold CV possible. We present theoretical insights supporting our arguments. Furthermore, we present a simulation study where we show empirically that -fold CV performs favorably compared to both OOS evaluation and other time-series-specific techniques such as non-dependent cross-validation. ]]>

]]>

時 間 2015/01/12 11:00 星期一

地 點 中研院-統計所 2F 交誼廳

備 註 茶 會：上午10：40統計所二樓交誼廳

Time series can often be naturally disaggregated in a hierarchical or grouped structure. For example, a manufacturing company can disaggregate total demand for their products by country of sale, retail outlet, product type, package size, and so on. As a result, there can be millions of individual time series to forecast at the most disaggregated level, plus additional series to forecast at higher levels of aggregation.

The first problem with handling such large numbers of time series is how to produce useful graphics to uncover structures and relationships between series. I will demonstrate some data visualization tools that help in exploring big time series data.

The second problem is that the disaggregated forecasts need to add up to the forecasts of the aggregated data. This is known as reconciliation. I will show that the optimal reconciliation method involves fitting an ill-conditioned linear regression model where the design matrix has one column for each of the series at the most disaggregated level. For problems involving huge numbers of series, the model is impossible to estimate using standard regression algorithms. I will also discuss some fast algorithms for implementing this model that make it practicable for implementing in business contexts.

]]>**Abstract:**

In this article we explore some bivariate smoothing methods with partial differential regularizations designed to handle smooth bivariate surfaces with occasional ridges. We apply our technique to smoothing mortality rates.

Mortality rates are typically smoothed over two dimensions: age and time. Occasional ridges occur due to period effects (e.g., deaths due to wars and epidemics) and cohort effects (e.g., the effects of wars and epidemics on the survivors).

We propose three new practical methods of smoothing mortality rates over age and time. The first method uses bivariate thin plate splines. The second uses a similar procedure but with lasso-type regularization. The third method also uses bivariate lasso-type regularization, but allows for both period and cohort effects. In these smoothing methods, the {logarithms of} mortality rates are modelled as the sum of four components: a smooth bivariate function of age and time, smooth one-dimensional cohort effects, smooth one-dimensional period effects and random errors. Cross validation is used to compare these new smoothing methods with existing approaches.

Although our methods are designed to smooth logarithms of mortality rates, they are generic enough to be applied to any bivariate data with occasional ridges.

**Keywords:** Bivariate data, nonparametric smoothing, mortality rates, graduation, cohort effects, period effects.

]]>

The package requires the following data as input: half-hourly/hourly electricity demands; half-hourly/hourly temperatures at one or two locations; seasonal demographical and economical data; public holiday data. The formats of the required data are described in the help files.

Some documentation of the underlying model is provided at http://robjhyndman.com/working-papers/mefm/.

The package itself is hosted on github and can be installed as follows:

R Code

install.packages("devtools") library(devtools) install_github("robjhyndman/MEFM-package") |

*Foresight* (Fall, 2014). pp.42-48.

This is an introduction to our approach to forecast reconciliation without using any matrices. The original research is available here:

The software is available in the hts package for R with some notes on usage in the vignette. There is also a gentle introduction in my forecasting textbook.

]]>**Venue**: The University Club, University of Western Australia, Nedlands WA.

**Requirements:** a laptop with R installed, along with the fpp package and its dependencies. We will also use the hts and vars package on the third day.

Hyndman and Athanasopoulos (2014)

*Forecasting: principles and practice*,

OTexts: Melbourne, Australia.

- Introduction to forecasting [Slides, R code, Lab solutions]
- Forecasting tools [Slides, R code, Lab solutions]
- Exponential smoothing I [Slides, R code, Lab solutions]
- Exponential smoothing II [Slides, R code, Lab solutions]
- Time series decomposition and cross-validation [Slides, R code, Lab solutions]
- Transformations, stationarity and differencing [Slides, R code, Lab solutions]
- Non-seasonal ARIMA models [Slides, R code, Lab solutions]
- Seasonal ARIMA models [Slides, R code, Lab solutions]
- State space models [Slides, R code, Lab solutions]
- Dynamic regression [Slides, R code, Lab solutions]
- Hierarchical forecasting [Slides, R code, Lab solutions]
- Advanced methods [Slides, R code, Lab solutions]