Today, algorithms such as the gradient boosting machine and the random forest are among the most competitive tools in prediction contests. We review how these algorithms came about. The basic underlying idea is to aggregate predictions from a diverse collection of models. We also explore a few very diverse directions in which the basic idea has evolved, and clarify some common misconceptions that grew as the idea steadily gained its popularity.

For further resources related to this article, please visit the WIREs website.

Classical statistics relies largely on parametric models. Typically, assumptions are made on the structural and the stochastic parts of the model and optimal procedures are derived under these assumptions. Standard examples are least squares estimators in linear models and their extensions, maximum-likelihood estimators and the corresponding likelihood-based tests, and generalized methods of moments (GMM) techniques in econometrics. Robust statistics deals with deviations from the stochastic assumptions and their dangers for classical estimators and tests and develops statistical procedures that are still reliable and reasonably efficient in the presence of such deviations. It can be viewed as a statistical theory dealing with approximate parametric models by providing a reasonable compromise between the rigidity of a strict parametric approach and the potential difficulties of interpretation of a fully nonparametric analysis. Many classical procedures are well known for not being robust. These procedures are optimal when the assumed model holds exactly, but they are biased and/or inefficient when small deviations from the model are present. The statistical results obtained from standard classical procedures on real data applications can therefore be misleading. In this paper we will give a brief introduction to robust statistics by reviewing some basic general concepts and tools and by showing how they can be used in data analysis to provide an alternative complementary analysis with additional useful information. In this study, we focus on robust statistical procedures based on M-estimators and tests because they provide a unified statistical framework that complements the classical theory. Robust procedures will be discussed for standard models, including linear models, general linear model, and multivariate analysis. Some recent developments in high-dimensional statistics will also be outlined.

For further resources related to this article, please visit the WIREs website.

We were motivated by three novel technologies, which exemplify a new design paradigm in high throughput genomics: nanostring ^{
TM
}, DNA-mediated Annealing, Selection, extension, and Ligation DASL ^{
TM
}, and multiplex real-time *quantitative polymerase chain reaction* (QPCR). All three are solution hybridization based, and all three employ on 10–1000 DNA sequence probes in a small volume, each probe specific for a particular sequence in a different human gene. nanostring ^{
TM
} uses 50-mer, DASL and multiplex QPCR use ∼20-mer probes. Assuming a 1-nM probe concentration in a 1 μL volume, there are 10^{− 9} × 10^{− 9} × 6.23 × 10^{23} or 6.23 × 10^{5} molecules of each probe present in the reaction compared to 10–1000 target molecules. Excess probe drives the sensitivity of the reaction. We are interested in the limits of multiplexing, i.e., the probability that in such a design a particular probe would bind to any other, sequence-related probe rather than the intended, specific target. If this were to happen with appreciable frequency, this would result in much reduced sensitivity and potential failure of this design. We established upper and lower bounds for the probability that in a multiplex assay at least one probe would bind to another sequence-related probe rather than its cognate target. These bounds are reassuring, because for reasonable degrees of multiplexing (10^{3} probes) the probability for such an event is practically negligible. As the degree of multiplexing increases to ∼10^{6} probes, our theoretical boundaries gain practical importance and establish a principal upper limit for the use of highly multiplexed solution-based assays vis--*a*-vis solid-support anchored designs.

For further resources related to this article, please visit the WIREs website.

In spatial analysis, typically we specify a region of interest and consider a spatial surface over the region. It is often of interest to ascertain where the surface is changing rapidly. Identifying locations or curves where there is rapid change is referred to as wombling. The surface may arise continuously over the region or discretely, in which case values are provided for a collection of areal units. In either setting, algorithmic strategies are available to attempt to identify so-called wombling boundaries. In this study, the surfaces of interest are all assumed to be random, realizations of a Gaussian process in the continuous case, of a Markov random field in the discrete case. With specifications given as stochastic models, we discuss Bayesian approaches to implement desired boundary analysis. We refer to this as Bayesian wombling and show how fully model-based inference can be carried out, including assessment of uncertainty. The approach for the continuous case is more theoretically demanding (expected with an uncountable set of locations) but yields elegant distribution theory. The discrete case is more straightforward. Each case is illustrated with a brief example. *WIREs Comput Stat* 2015, 7:307–315. doi: 10.1002/wics.1360

For further resources related to this article, please visit the WIREs website.

One type of sports competition between two teams consists of a sequence of plays where the winner is the team with the most points scored. A win probability is a calculation during the game which expresses the likelihood of the home team winning the contest. Win probabilities are helpful in understanding the ebb and flow of the game, and in assessing which plays had the largest impact on the outcome of the contest. This article reviews the definition, construction, and application of win probabilities in a number of sports including the use of this concept in measuring the contribution of individual players. *WIREs Comput Stat* 2015, 7:316–325. doi: 10.1002/wics.1358

For further resources related to this article, please visit the WIREs website.

Text analytics continue to proliferate as mass volumes of unstructured but highly useful data are generated at unbounded rates. Vector space models for text data—in which documents are represented by rows and words by columns—provide a translation of this unstructured data into a format that may be analyzed with statistical and machine learning techniques. This approach gives excellent results in revealing common themes, clustering documents, clustering words, and in translating unstructured text fields (such as an open-ended survey response) to usable input variables for predictive modeling. After discussing the collection and processing of text, we explore properties and transformations of the document-term matrix (DTM). We show how the singular value decomposition may be used to drastically reduce the size of the document space while also setting the stage for automatic topic extraction, courtesy of the varimax rotation. This latent semantic analysis (LSA) approach produces factors that are compatible with graphical exploration and advanced analytics. We also explore Latent Dirichlet Allocation for topic analysis. We reference published R packages to implement the methods and conclude with a summary of other popular open-source and commercial software packages. *WIREs Comput Stat* 2015, 7:326–340. doi: 10.1002/wics.1361

For further resources related to this article, please visit the WIREs website.

Classification is an important topic in statistical learning. The goal of classification is to build a predictive model from the training dataset for the class label of an observation. It is commonly assumed that the class labels are unordered. However, in many real applications, there exists an intrinsic ordinal relation between the class labels. Examples of these include cancer patients grouped in early, mediocre, and terminal stages, customers grouped into low, middle, and high credit levels, and experimental subjects enriched with different amounts of bacterial. In this article, we focus on the classification problem for ordinal data and introduce the theoretical setup of the problem. We review both traditional and modern methods in learning ordinal data. In particular, we emphasize the trade-off between model flexibility and interpretability. Lastly, we discuss some issues regarding ordinal data learning, including an appropriate loss function for this problem. *WIREs Comput Stat* 2015, 7:341–346. doi: 10.1002/wics.1357

For further resources related to this article, please visit the WIREs website.

Co-clustering means simultaneously identifying natural clusters in different kinds of objects. Examples include simultaneously clustering customers and products for a recommender application; simultaneously clustering proteins and molecules in microbiology; or simultaneously clustering documents and words in a text mining application. Important insights into a problem can be gained by understanding the interactions between clusters for the different kinds of objects. This paper considers Bayesian models for co-clustering. The Bayesian approach begins by developing a model for the data generating process, and inverting that model through Bayesian inference to infer cluster membership, learn characteristics of the clusters, and fill in missing observations. We consider a basic Bayesian clustering model and several extensions to the model. Experimental evaluations and comparisons among the clustering methods are presented. *WIREs Comput Stat* 2015, 7:347–356. doi: 10.1002/wics.1359

For further resources related to this article, please visit the WIREs website.