Nominations are invited from any reader of the IJF. Each person may nominate up to three papers, but you cannot nominate a paper that you have coauthored yourself. Papers coauthored by Tao Hong or one of the award committee are not eligible for the prize. All nominations are to be accompanied by a short statement (up to 200 words) from the nominator, explaining why the paper deserves an award.

You can see the relevant papers published in the period 2013-2014 on Google Scholar. Of course, a good paper does not always get noticed, so don’t let the citation count sway you too much in nominating what you consider to be the best IJF paper from this period.

Nominations should be sent to me by email by **8 February 2017**.

]]>Dear xxxxx

I am writing concerning manuscript #INTFOR_16xxxxx entitled “xxxxxxxxxxxxxxxx” which you submitted to the International Journal of Forecasting.

Thank you for this submission, but as it consists entirely of the IJF author guidelines, it is not suitable for publication in the IJF. We publish original research, not author guidelines. Perhaps the Journal for Guidelines would be an appropriate outlet.

In future, when you are asked to check the pdf of your paper, you might find it useful to actually do so, rather than just claim to have done so. That way, you might avoid this kind of mistake.

In the light of the comments above, I have chosen not to publish your manuscript in the International Journal of Forecasting. I know this will be disappointing to you, but we receive a large number of submissions and can only publish a small percentage of them.

Thank you for considering the International Journal of Forecasting for the publication of your research. I hope the outcome of this specific submission will not discourage you from the submission of future manuscripts.

Sincerely,

Prof. Rob J Hyndman

Editor-in-Chief, International Journal of Forecasting

The first is regular k-fold cross-validation for autoregressive models. Although cross-validation is sometimes not valid for time series models, it does work for autoregressions, which includes many machine learning approaches to time series. The theoretical background is provided in Bergmeir, Hyndman and Koo (2015). So cross-validation can be applied to any model where the predictors are lagged values of the response variable.

This is implemented for NNAR models (neural network autoregressions) in R as follows:

modelcv <- CVar(lynx, k=5, lambda=0.15) print(modelcv) |

The output is a summary of the accuracy across folds:

5-fold cross-validation Mean SD ME -32.88142801 98.0725227 RMSE 931.90966858 352.8705338 MAE 608.99488205 272.1244879 MPE -17.84710226 15.2700638 MAPE 53.99760978 12.7264054 ACF1 0.04842174 0.1480883 Theil's U 0.82984737 0.1487229

The `CVar`

function is rather limited at this stage, and will only handle cross-validation for models computed using `nnetar`

. If there is enough interest, I might add more functionality at a later stage.

In this procedure, there is a series of test sets, each consisting of a single observation. The corresponding training set consists only of observations that occurred *prior* to the observation that forms the test set. Thus, no future observations can be used in constructing the forecast. The following diagram illustrates the series of training and test sets, where the blue observations form the training sets, and the red observations form the test sets.

The forecast accuracy is computed by averaging over the test sets. This procedure is sometimes known as “evaluation on a rolling forecasting origin” because the “origin” at which the forecast is based rolls forward in time.

With time series forecasting, one-step forecasts may not be as relevant as multi-step forecasts. In this case, the cross-validation procedure based on a rolling forecasting origin can be modified to allow multi-step errors to be used. Suppose that we are interested in models that produce good 4-step-ahead forecasts. Then the corresponding diagram is shown below.

Time series cross validation is implemented with the `tsCV`

function. In the following example, we compare the residual RMSE with the RMSE obtained via time series cross-validation.

library(fpp) e <- tsCV(dj, rwf, drift=TRUE, h=1) sqrt(mean(e^2, na.rm=TRUE)) ## [1] 22.68249 sqrt(mean(residuals(rwf(dj, drift=TRUE))^2, na.rm=TRUE)) ## [1] 22.49681 |

Here I apply a random walk with drift to the Dow-Jones index time series `dj`

. The first calculation implements a one-step time series cross-validation where the drift parameter is re-estimated at every forecast origin. The second calculation estimates the drift parameter once for the whole data set, and then computes the RMSE from the one-step forecasts. As expected, the RMSE from the residuals is smaller, as the corresponding “forecasts” are based on a model fitted to the entire data set, rather than being true forecasts.

The `tsCV`

function is very general, and will work for any forecasting function that returns an object of class `forecast`

. You don’t even have to specify the minimum sample size for model fitting, as it will silently fit models beginning with a single observation, and return a missing value whenever the model cannot be estimated.

These functions are only on the github version for now, but will migrate to CRAN next time I release a new stable version.

]]>An invited session consists of 3 or 4 talks around a specific forecasting theme. You are allowed to be one of the speakers in a session you organize (although it is not necessary). So if you know what you are planning to speak about, all you need to do is find 2 or 3 other speakers who will speak on something related, and invite them to join you. The length of all such invited talks will be about 20 minutes.

Invited sessions will be marked as such on the program and carry a slightly higher status than a contributed session. Unfortunately, we can’t offer any financial support for these invited speakers or session organizers.

If you are interested in organizing an invited session, please contact us with your topic. The deadline for proposals is 28 February 2017. We don’t need to know who will speak at it — you have a few months to find willing participants after you agree to organize a session.

The ISF is a little different from most academic conferences in that about 1/3 of the attendees are practitioners, and 2/3 are academics. Consequently, we are not only interested in traditional academic sessions, but also in talks from company-based forecasters describing the forecasting challenges they face, and hopefully some of the solutions.

See forecasters.org/isf/ for more information about the conference, and the location. Cairns is one of the most beautiful places in Australia, and very close to the Great Barrier Reef. June is also the best time to visit the area, as it is during the dry season with moderate temperatures and lots of sunshine. We are hoping that people attending the conference will choose to have a holiday in the region as well.

]]>The position is advertised on the Amstat jobs site and on the Monash careers site.

Enquiries to Professor Heather Anderson.

]]>Data mining is not just for tech companies, in fact it can be especially useful for industries which are not typically thought of to be ‘innovative’ such as agriculture. What are some of the main industries that you think benefit from predictive analysis?

Any industry that collects data can use data mining and statistical modelling.

Agriculture is becoming a heavy user of data science methods, with data being collected on every aspect of crop or livestock health and development. This data is then used to optimize crop rotation, or breeding programs.

Energy companies are now using very sophisticated predictive analytic methods for forecasting energy usage from one day to 20 years ahead. With the widespread installation of smart-meters, we now have available electricity demand data by household throughout the day. This allows better network planning, finding anomalies due to equipment malfunction, etc.

Many companies are collecting data, but not using it effectively. I’ve worked with manufacturing companies that have extremely detailed data on demand for their products right down to store level, but they do almost nothing with the data. This is a waste of resources and useful information. Good predictive modelling will help optimize the supply chain, reduce warehouse costs, have more effective promotions, and so on.

What can predictive analytics tell us that we would be surprised by?

I am frequently surprised when I am on Amazon how good it is at predicting what books I might be interested in. These suggestions are from a “recommendation algorithm” that is based on what books I have bought, or browsed, in the past. I do buy a lot of books, so I’ve provided plenty of data for Amazon to predict my interests. Netflix uses a similar algorithm, although it is not so good in my case because I haven’t watched enough movies and TV for it to learn what I like.

In Pennsylvania, statistical models are being used to predict if someone convicted of a crime is likely to re-offend. This information is being used to determine the length of sentences in some cases. So the time in prison is based not just on what crime the person has committed, but also on what crimes they are predicted to commit in the future.

A related project is to predict at the time someone is born, how likely it is that they will commit a crime before they turn 18, based on the history of the child’s parents, their socio-economic status, where they live, and so on. The big question, of course, is how will those predictions be used?

Are there limits to how much data we can mine and what it can tell us?

The data sets that are being mined are getting larger every year. Google handles data sets involving every search query from every user in the world for many years. The data set is so large that it has to be distributed across thousands of separate computers just to store it. Once a data set is too large for an individual computer to handle, we need to use a different set of tools for data analysis. There has been a lot of work in recent years on making analysis on distributed data sets easier. It is hard to see that there would be a limit to the amount of data that could be stored.

Where we can record data frequently, we can usually do good predictive modelling. But there are situations where it is simply not possible to get enough data. For example, we are currently very bad at predicting economies. The problem is that economies change so slowly that we don’t have a very long history of data to work with. There is not much point using data more frequently than quarterly or monthly, because economic variables such as the GDP and unemployment rates do not change that quickly. Even if we combine data from similar countries, we do not have enough data to develop a good predictive model. We supplement our data with economic theory, but still the results are not very accurate. This situation is unlikely to improve in the future, because we are never going to get enough data.

There are also some things that are not amenable to data analysis, at least using currently available tools. For example, how to compose a symphony, or write beautiful poetry. Most likely the analytic tools will eventually develop to the point where computers can write great music and beautiful poetry, but we are a long way off that right now. Not so long ago it was thought impossible for a computer to beat a person at the game GO, or drive a car in traffic, and both are now possible. Maybe, one day, you will be driven around by your car, listening to music composed by your phone.

]]>How important is it that I know and understand the underlying mathematical framework to forecasting methods? I understand conceptually how most of them work, but I feel as if I may benefit from truly understanding the math.

The main benefit of understanding the mathematics behind different forecasting models is to be able to adapt the models when they don’t work well for your data. I’ve tried to develop tools that are very easy to use even if you don’t know much about time series models, but you need a deeper understanding if you want to modify the models, or develop new models. If forecasting is your main job task, it would be a good idea to get a deep level of training in how forecasting models work and how to adapt them.

Do you offer paid training over Skype?

No.

What are your thoughts on gradient boosted trees? It appears Kaggle winners tend to use it on time series problems by throwing many different models/features into the algorithm and let it decide which to use and how to weight them.

If you have thousands of similar time series, then building a model across all series can be very powerful. Whether that mega-model uses deep learning or gradient boosting or something else depends on the number of time series you have, the number of features you want to include, and the complexity of the relationships.

Why don’t you create your own forecasting software like autobox/forecastX? It seems like a lucrative endeavor with your knowledge!

I *have* created my own forecasting software. It is the forecast package for R.

Presumably you mean, why don’t I create my own *commercial* software? Because my life objective is not income maximization. I am well-paid by my university, and I would rather spend my time making the world a better place by developing free resources.

So many people ask me why I don’t do this or that, with the underlying unstated assumption being that making more money is a good thing. I disagree. I need enough money to live comfortably, but not more.

I know neural networks performed worse than other methods for the M3 competition. Have there been any worthwhile advances in that space that I should research (such as LSTM)?

In general, individual time series are too short for effective modelling using neural networks. And even if you have very long time series, most likely the data at the start of the series are not so useful for forecasting now, because the underlying patterns and relationships have changed. So I do not think that neural networks will ever be particularly effective for univariate time series forecasting. You would need a very long series (at least thousands of observations) from a very stable system.

The benefit of neural networks for time series comes when you have a large number of related time series, and you can build a model for all series together. I am aware of some organizations using deep learning for millions of time series and it seems to work extremely well.

I don’t know anything about LSTM — yet another development that I will eventually have to read up on.

I recently interviewed for a tech company. During 2 of the interviews the interviewer asked me “what is your forecasting accuracy?” I attempted to explain that my answer gives them no valuable information, since all time series have different noise to signal ratios, patterns, etc. They wanted none of it and made me give them an answer. Then I found out that job performance is partly based on forecast accuracy. In my mind, these are big red flags, but what are your thoughts? Have you come across this before?

Yes, I’ve seen these problems before, unfortunately.

Forecast accuracy can only ever be relative — how well can you forecast compared to other people/models on the same problem?

Job performance should not be based on forecast accuracy. Accuracy computed on a small test set is prone to so much uncertainty that the relative accuracy is largely random. Even over the 3003 time series in the M3 competition, there is no significant difference between the top 2–3 methods (Koning et al, IJF 2005).

This is the same problem that Deming observed — rewarding employees based on measures of performance is usually a bad idea, because those measures are subject to randomness, and also because it discourages teamwork. See this great interview with him discussing the problem:

]]>Thanks to Peter Ellis for putting the package together. He has also produced a nice blog post about it.

]]>The previous GEFComs have led to some major advances in forecasting methodology, available via IJF papers by the winning teams. I expect similar developments to arise out of this competition. Winners get to present their work in Cairns, Australia at ISEA2017.

]]>Actually, the Uni of Melbourne currently has two professorships on offer — the other being the Peter Hall Chair in Mathematical Statistics. (Not sure that anyone would actually feel qualified to have a job with that title!)

So any professors of statistics out there looking for a new challenge, please consider coming to Melbourne. We’ll even invite you to visit us from time to time at Monash.

]]>