We review recent advances in modal regression studies using kernel density estimation. Modal regression is an alternative approach for investigating the relationship between a response variable and its covariates. Specifically, modal regression summarizes the interactions between the response variable and covariates using the conditional mode or local modes. We first describe the underlying model of modal regression and its estimators based on kernel density estimation. We then review the asymptotic properties of the estimators and strategies for choosing the smoothing bandwidth. We also discuss useful algorithms and similar alternative approaches for modal regression, and propose future direction in this field.

This article is categorized under:

- Statistical and Graphical Methods of Data Analysis > Bayesian Methods and Theory
- Statistical and Graphical Methods of Data Analysis > Nonparametric Methods
- Statistical and Graphical Methods of Data Analysis > Density Estimation

90% prediction regions constructed from unimodal regression (pink area in the left panel) and multimodal regression (light blue area in the right panel). Clearly, the prediction region is much smaller in the multimodal regression than in unimodal regression because multimodal regression detects all components whereas unimodal regression discovers only the main component.

Permutation statistical methods possess a number of advantages compared with conventional statistical methods, making permutation statistical methods the preferred statistical approach for many research situations. Permutation statistical methods are data-dependent, do not rely on distribution assumptions such as normality, provide either exact or highly-accurate approximate probability values, do not require knowledge of theoretical standard errors, and are ideal methods for small data sets where theoretical mathematical functions are often poor fits to discrete sampling distributions. On the other hand, permutation statistical methods are computationally intensive. Computational efficiencies for permutation statistical methods are described and permutation statistical methods are illustrated with a variety of common statistical tests and measures.

This article is categorized under:

- Statistical and Graphical Methods of Data Analysis > Bootstrap and Resampling
- Statistical and Graphical Methods of Data Analysis > Multivariate Analysis
- Statistical and Graphical Methods of Data Analysis > Nonparametric Methods
- Statistical and Graphical Methods of Data Analysis > Monte Carlo Methods

A discrete permutation probability distribution.

Inverse problems deal with the quest for unknown causes of observed consequences, based on predictive models, known as the forward models, that associate the former quantities to the latter in the causal order. Forward models are usually well-posed, as causes determine consequences in a unique and stable way. Inverse problems, on the other hand, are usually ill-posed: the data may be insufficient to identify the cause unambiguously, an exact solution may not exist, and, like in a mystery story, discovering the cause without extra information tends to be highly sensitive to measurement noise and modeling errors. The Bayesian methodology provides a versatile and natural way of incorporating extra information to supplement the noisy data by modeling the unknown as a random variable to highlight the uncertainty about its value. Presenting the solution in the form of a posterior distribution provides a wide range of possibilities to compute useful estimates. Inverse problems are traditionally approached from the point of view of regularization, a process whereby the ill-posed problem is replaced by a nearby well-posed one. While many of the regularization techniques can be reinterpreted in the Bayesian framework through prior design, the Bayesian formalism provides new techniques to enrich the paradigm of traditional inverse problems. In particular, inaccuracies and inadequacies of the forward model are naturally handled in the statistical framework. Similarly, qualitative information about the solution may be reformulated in the form of priors with unknown parameters that can be successfully handled in the hierarchical Bayesian context.

This article is categorized under:

- Statistical and Graphical Methods of Data Analysis > Bayesian Methods and Theory
- Algorithms and Computational Methods > Numerical Methods
- Applications of Computational Statistics > Computational Mathematics

In inverse problems, the ill-posedness manifests itself in the form of a likelihood density whose support is wide in some directions, or more generally, along some manifolds, where no clear preference to parameter values is given. An informative prior efficiently restricts the support of the posterior density where the likelihood is non-informative.

The use of covariance is limited by the need of the finite second moment. This restriction excludes the use of heavy tailed distributions and data. By developing methods that only require finite first moment assumption of the variables we allow for heavy tailed scenarios. Such methods would be valid in the heavy tailed setting, conceptually meaningful in a population model and effective in practical applications. Gini autocovariance function (Gini ACV) is defined under merely finite first moment assumption. Conceptually, it plays a similar role as the usual Pearson autocovariance function, and thus represents a new fundamental tool in time series modeling. The latest applications of Gini ACV and Gini autocorrelation function (Gini ACF) are reviewed, providing a big picture of the latest research in the area. The formulation of Gini ACV and Gini ACF is quickly reviewed and followed by the applications in linear time series modeling, unit root test and reversibility test in time series and nonlinear autoregressive Pareto process.

This article is categorized under

- Statistical Learning and Exploratory Methods of the Data Sciences > Manifold Learning
- Statistical and Graphical Methods of Data Analysis > Nonparametric Methods
- Statistical Models > Time Series Models

Sample paths of an AR(2) process of time length 500, for φ1 = 0.8, φ2 = −0.5 and Pareto(1.5) innovations.

Volumes of data are generated at every moment as we go through the paces of our daily lives. Many of these data flows are routinely captured through administrative records, social media, and surveys. Historically, agencies at different levels of government have been responsible for curating and reporting statistics about our social, economic, and health conditions associated with these data flows. Recently, the U.S. government has proposed the use of data derived from administrative records at the federal level to support social policy and program evaluation. Why not consider parallel activities at state and local levels? Harnessing local data sources and integrating them with state and federal sources will provide timelier and more geographically specific analyses to support local insights and policy development. Leveraging community-based participatory research, researchers and civic leaders can work together to identify the questions and execute rigorous, yet flexible, processes for building local sustainable community learning cultures based on data-driven discovery. In the process of conducting research with local civic leaders, we have observed that issues can be classified into 3 categories: locating and describing a population within a community; estimating a statistic and a measure of variability; and evaluating a program, policy, or procedure. Through a series of case studies, this paper demonstrates the unexpected value in liberating and repurposing local data.

This article is categorized under:

- Applications of Computational Statistics > Organizations and Publications

Locations of fire stations in Arlington, Virginia and locations of 911 Fire/EMS calls during 2010–2015. Black points are the 10 fire stations. The colored points are the locations of the calls where points of same color are calls dispatched from the same fire station.

The minimum covariance determinant (MCD) method is a highly robust estimator of multivariate location and scatter, for which a fast algorithm is available. Since estimating the covariance matrix is the cornerstone of many multivariate statistical methods, the MCD is an important building block when developing robust multivariate techniques. It also serves as a convenient and efficient tool for outlier detection. The MCD estimator is reviewed, along with its main properties such as affine equivariance, breakdown value, and influence function. We discuss its computation, and list applications and extensions of the MCD in applied and methodological multivariate statistics. Two recent extensions of the MCD are described. The first one is a fast deterministic algorithm which inherits the robustness of the MCD while being almost affine equivariant. The second is tailored to high-dimensional data, possibly with more dimensions than cases, and incorporates regularization to prevent singular matrices.

This article is categorized under:

- Statistical and Graphical Methods of Data Analysis > Multivariate Analysis
- Statistical and Graphical Methods of Data Analysis > Robust Methods
- Statistical Learning and Exploratory Methods of the Data Sciences > Knowledge Discovery

Level curves of scatter matrices fit to data with outliers: classical (red) and robust (blue).

The testing of multiple hypotheses is an important consideration in many statistical analyses. A theme for multiple comparisons problems under a frequentist paradigm is the need for an adjustment to control the overall error probability for the false detection of null effects. Our review will focus on Bayesian approaches to multiple comparisons problems. Under a Bayesian paradigm, multiplicity adjustments arise from a concern that many of the effects to be tested are null. We will discuss how Bayesian models provide a multiplicity adjustment through a prior placing increased probability on null effects, or through hierarchical modeling. We will also show how the Bayesian information criterion for model selection fits naturally into the study of multiple comparisons problems. *WIREs Comput Stat* 2018, 10:e1420. doi: 10.1002/wics.1420

This article is categorized under:

- Statistical and Graphical Methods of Data Analysis > Bayesian Methods and Theory
- Statistical and Graphical Methods of Data Analysis > Modeling Methods and Algorithms
- Data: Types and Structure > Traditional Statistical Data

Estimated mean differences in the log-transformed creatine kinase levels. (1) corresponds to empirical group means; (2) corresponds to estimated group means under Bayesian model averaging; (3) corresponds to estimated group means under the highest posterior probability model.

Standard generalized linear models (GLMs) consist of three components: random component referring to a distribution of the response variable that belongs to the exponential family; systematic component referring to the linear predictor; and known link function specifying the relationship between the linear predictor and the mean of the distribution function. A flexible extension of the standard GLMs allows an unknown link function. Classical parametric likelihood approach is not applicable due to a large parameter space. To address this issue, sieve maximum likelihood estimation has been developed in literature in which the estimator of the unknown link function is assumed to lie in a sieve space. Various methods of sieves including the B-spline and P-spline based methods are introduced. The numerical implementation and theoretical properties of these methods are also discussed. *WIREs Comput Stat* 2018, 10:e1425. doi: 10.1002/wics.1425

This article is categorized under:

- Applications of Computational Statistics > Signal and Image Processing and Coding
- Statistical and Graphical Methods of Data Analysis > Nonparametric Methods
- Statistical Models > Generalized Linear Models
- Algorithms and Computational Methods > Maximum Likelihood Methods

GLMs with an unknown link function provide a flexible and robust way to characterize the relationship between the predictors and the outcome of interest. While classical parametric likelihood approach is not applicable, sieve maximum likelihood approach is useful in facilitating the estimation and inference of the unknown parameters.

A discrete event simulator, CLOURAM: CLOUD Risk Assessor and Manager, algorithmically estimates the risk indices in modern-day CLOUD computing scenarios with tangible risk management targets that are favorable to the intractably tedious, theoretical Markov solutions or hand calculations overly limited in scope. The goal is to improve the operational quality of CLOUD by optimizing the number of servers for capacity addition and optimizing the final repair crew count. We too optimize the server unit repair rates, and the consumer load cycle by curbing the demand using Linear Programming (LP)*-*based optimization with the proper objective functions and constraints. Small and large CLOUD systems are simulated with cost and benefit comparisons. The 2-state (UP and DN) or 3-State (UP, DN, and DER) units statistically fail and recover with Negative Exponential or Weibull densities. *WIREs Comput Stat* 2018, 10:e1424. doi: 10.1002/wics.1424

This article is categorized under:

- Statistical and Graphical Methods of Data Analysis > Reliability, Survivability, and Quality Control
- Algorithms and Computational Methods > Networks and Security
- Statistical Models > Simulation Models
- Algorithms and Computational Methods > Linear Programming

An illustration of commercial CLOUD computing network (2010) enterprised viewed from a user perspective.

Measures of association have been widely used for describing statistical relationships between two sets of variables. Traditionally, such association measures focus on specialized settings. Based on an in-depth summary of existing common measures, we present a general framework for association measures that unifies existing methods and novel extensions based on kernels, including practical solutions to computational challenges. Specifically, we introduce association screening and variable selection via maximizing kernel-based association measures. We also develop a backward dropping procedure for feature selection when there are a large number of candidate variables. The proposed framework was evaluated by independence tests and feature selection using kernel association measures on a diversified set of simulated association patterns with different dimensions and variable types. The results show the superiority of the generalized association measures over existing ones. We also apply our framework to a real-world problem of gender prediction from handwritten texts. We demonstrate, through this application, the data-driven adaptation of kernels, and how kernel-based association measures can naturally be applied to data structures including functional input spaces. This suggests that the proposed framework can guide derivation of appropriate association measures in a wide range of real-world problems and work well in practice. *WIREs Comput Stat* 2018, 10:e1422. doi: 10.1002/wics.1422

This article is categorized under:

- Statistical Learning and Exploratory Methods of the Data Sciences > Pattern Recognition
- Statistical Learning and Exploratory Methods of the Data Sciences > Knowledge Discovery
- Statistical and Graphical Methods of Data Analysis > Multivariate Analysis

A general framework for association measures that unifies existing methods and guides derivation of novel measures for complex data types.

Many important stochastic counting models can be written as general birth-death processes (BDPs). BDPs are continuous-time Markov chains on the non-negative integers in which only jumps to adjacent states are allowed. BDPs can be used to easily parameterize a rich variety of probability distributions on the non-negative integers, and straightforward conditions guarantee that these distributions are proper. BDPs also provide a mechanistic interpretation—birth and death of actual particles or organisms—that has proven useful in evolution, ecology, physics, and chemistry. Although the theoretical properties of general BDPs are well understood, traditionally statistical work on BDPs has been limited to the simple linear (Kendall) process. Aside from a few simple cases, it remains impossible to find analytic expressions for the likelihood of a discretely-observed BDP, and computational difficulties have hindered development of tools for statistical inference. But the gap between BDP theory and practical methods for estimation has narrowed in recent years. There are now robust methods for evaluating likelihoods for realizations of BDPs: finite-time transition, first passage, equilibrium probabilities, and distributions of summary statistics that arise commonly in applications. Recent work has also exploited the connection between continuously- and discretely-observed BDPs to derive EM algorithms for maximum likelihood estimation. Likelihood-based inference for previously intractable BDPs is much easier than previously thought and regression approaches analogous to Poisson regression are straightforward to derive. In this review, we outline the basic mathematical theory for BDPs and demonstrate new tools for statistical inference using data from BDPs. *WIREs Comput Stat* 2018, 10:e1423. doi: 10.1002/wics.1423

This article is categorized under:

- Statistical and Graphical Methods of Data Analysis > Bayesian Methods and Theory
- Statistical and Graphical Methods of Data Analysis > Modeling Methods and Algorithms
- Applications of Computational Statistics > Computational Chemistry

Realization of a birth-death process *X*(*t*).