Stat Pipe

Stat Pipe Pipes Output http://pipes.yahoo.com/pipes/pipe.info?_id=0ff16d6cdec5ca5a7b447f497b8ccd73 Thu, 01 Oct 2015 19:36:31 +0000 http://pipes.yahoo.com/pipes/ Zeros and ones: a case for suppressing zeros in sensitive count data with an application to stroke mortality http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1002%2Fsta4.92 In the current era of global internet connectivity, privacy concerns are of the utmost importance. When official statistical agencies collect spatially referenced, confidential data that they intend to release as public-use files, the suppression of small counts is a common measure that agencies take to protect the confidentiality of the data-subjects from ill-intentioned users. The goal of this paper is to demonstrate that an interval suppression criterion that does not suppress zeros can fail to protect regions with a single occurrence. We illustrate the difference in disclosure risk between an interval suppression criterion and a one-sided suppression criterion by considering a US county-level dataset composed of the number of deaths due to stroke in White men. Here, we illustrate that an interval suppression criterion leads to a twofold increase in the disclosure risk when compared with a one-sided suppression criterion for regions with a single incidence among a population of less than 600. We conclude with an extension of these findings beyond stroke mortality and by offering general guidelines for data suppression. Copyright © 2015 John Wiley & Sons, Ltd. Mon, 21 Sep 2015 21:52:47 +0000 Longitudinal functional data analysis http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1002%2Fsta4.89 We consider dependent functional data that are correlated because of a longitudinal-based design: each subject is observed at repeated times and at each time, a functional observation (curve) is recorded. We propose a novel parsimonious modelling framework for repeatedly observed functional observations that allows to extract low-dimensional features. The proposed methodology accounts for the longitudinal design, is designed to study the dynamic behaviour of the underlying process, allows prediction of full future trajectory and is computationally fast. Theoretical properties of this framework are studied, and numerical investigations confirm excellent behaviour in finite samples. The proposed method is motivated by and applied to a diffusion tensor imaging study of multiple sclerosis. Copyright © 2015 John Wiley & Sons, Ltd. Mon, 24 Aug 2015 21:53:57 +0000 Figures of merit for simultaneous inference and comparisons in simulation experiments http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1002%2Fsta4.88 This article considers the traditional figures of merit, namely, bias and mean squared (prediction) error, which are typically used to evaluate simulation experiments. We propose functions of them that account for different variables' units; these alternative figures of merit are closely tied to simultaneous multivariate inference on an unknown parameter vector or unknown state vector. Their usefulness is illustrated in a simulation experiment, where the goal is to determine the statistical properties associated with prediction of a multivariate state. Copyright © 2015 John Wiley & Sons, Ltd. Thu, 06 Aug 2015 22:39:50 +0000 Accelerated non-parametrics for cascades of Poisson processes http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1002%2Fsta4.87 Cascades of Poisson processes are probabilistic models for spatio-temporal phenomena in which (i) previous events may trigger subsequent events and (ii) both the background and triggering processes are conditionally Poisson. Such phenomena are typically “data rich but knowledge poor,” in the sense that large datasets are available, yet a mechanistic understanding of the background and triggering processes that generate the data is unavailable. In these settings, non-parametric estimation plays a central role. However, existing non-parametric estimators have computational and storage complexity O(N2), precluding their application on large datasets. Here, by assuming the triggering process acts only locally, we derive non-parametric estimators with computational complexity O(NlogN) and storage complexity O(N). Our approach automatically learns the domain of the triggering process from data and is essentially free from hyperparameters. The methodology is applied to a large seismic dataset where estimation under existing algorithms would be infeasible. Copyright © 2015 John Wiley & Sons, Ltd. Thu, 06 Aug 2015 12:32:10 +0000 Covariance models on the surface of a sphere: when does it matter? http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1002%2Fsta4.84 There is a growing interest in developing covariance functions for processes on the surface of a sphere because of the wide availability of data on the globe. Utilizing the one-to-one mapping between the Euclidean distance and the great circle distance, isotropic and positive definite functions in a Euclidean space can be used as covariance functions on the surface of a sphere. This approach, however, may result in physically unrealistic distortion on the sphere especially for large distances. We consider several classes of parametric covariance functions on the surface of a sphere, defined with either the great circle distance or the Euclidean distance, and investigate their impact upon spatial prediction. We fit several isotropic covariance models to simulated data as well as real data from National Center for Environmental Prediction (NCEP)/National Center for Atmospheric Research (NCAR) reanalysis on the sphere. We demonstrate that covariance functions originally defined with the Euclidean distance may not be adequate for some global data. Copyright © 2015 John Wiley & Sons, Ltd. Wed, 10 Jun 2015 20:22:26 +0000 Preconditioning for classical relationships: a note relating ridge regression and OLS p-values to preconditioned sparse penalized regression http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1002%2Fsta4.86 When the design matrix has orthonormal columns, “soft thresholding” the ordinary least squares solution produces the Lasso solution. If one uses the Puffer preconditioned Lasso, then this result generalizes from orthonormal designs to full rank designs (Theorem 1). Theorem 2 refines the Puffer preconditioner to make the Lasso select the same model as removing the elements of the ordinary least squares solution with the largest p-values. Using a generalized Puffer preconditioner, Theorem 3 relates ridge regression to the preconditioned Lasso; this result is for the high-dimensional setting, p > n. Where the standard Lasso is akin to forward selection, Theorems 1, 2, and 3 suggest that the preconditioned Lasso is more akin to backward elimination. These results hold for sparse penalties beyond; for a broad class of sparse and non-convex techniques (e.g. SCAD and MC+), the results hold for all local minima. Copyright © 2015 John Wiley & Sons, Ltd. Tue, 09 Jun 2015 19:30:52 +0000 Modelling space–time varying ENSO teleconnections to droughts in North America http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1002%2Fsta4.85 Teleconnection in atmospheric science refers to a significant correlation between climate anomalies in widely separated regions (typically thousands of kilometres), and it is often considered to be responsible for extreme weather conditions occurring simultaneously over large distances. In this paper, we study the influence of El Niño-Southern Oscillation teleconnection on meteorological droughts represented by the Palmer severity drought index across North America from 1870 to 1990. We develop a flexible statistical framework based on spatial random effects to model the covariance (teleconnection) between winter (October–March) sea surface temperature in the tropical Pacific and summer (June–August) droughts in North America. Our model allows us to analyse the dynamic pattern of teleconnection over space and time, and results indicate that the influence of El Niño-Southern Oscillation teleconnections on droughts varies spatially and temporally across North America. We further provide the time-varying teleconnection estimates with their uncertainties for 12 subregions in North America. Copyright ©2015 John Wiley & Sons, Ltd. Tue, 09 Jun 2015 19:30:07 +0000 Random effects model for bias estimation: higher-order asymptotic inference http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1002%2Fsta4.82 A common issue in physical, chemical and biometrical applications is to validate a laboratory's method. For that purpose, a lab performs measurements on a certified reference material with a given coverage interval. These reference materials are a major tool for assuring quality and reliability of results obtained by a lab in analysis and testing. Assuming that the measurand is random with a normal distribution whose parameters are obtained from the reference material certificate, new remarkably accurate confidence intervals for the bias are derived. These procedures are based on modern higher-order asymptotic statistical methods. Published 2015. This article is a U.S. Government work and is in the public domain in the USA. Sun, 31 May 2015 21:02:51 +0000 A family of likelihood functions to make inferences about the reliability parameter for many stress-strength distributions http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1002%2Fsta4.83 Many research papers in statistical literature address the estimation of the reliability parameter in stress-strength models, considering different types of distributions for stress and for strength. We have found that for many of these distributions, their corresponding profile likelihood functions of the reliability parameter can be grouped in a family of likelihood functions, with a simple algebraic structure that facilitates making inferences about this parameter. The novel family of likelihood functions, proposed here, maximum likelihood estimation procedures and suitable reparameterizations, were used to obtain a simple closed-form expression for the likelihood confidence interval of the reliability parameter. This new approach is particularly useful when small and/or unequal sample sizes are involved. Simulation studies for some distributions were carried out to illustrate the performance of the likelihood confidence intervals for the reliability parameter, and adequate coverage frequencies were obtained. The simplicity of our unifying proposal is shown here using three stress-strength distributions that have been analysed individually in statistical literature. However, there are many distributions for which inferences about the reliability parameter could be easily obtained using the proposed family. Copyright © 2015 John Wiley & Sons, Ltd. Wed, 27 May 2015 02:16:01 +0000 Multivariate spatial hierarchical Bayesian empirical likelihood methods for small area estimation http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1002%2Fsta4.81 Recent advances in small area estimation incorporating both explicit spatial autocorrelation and empirical likelihood techniques have produced estimates with greater precision. Furthermore, the multivariate Fay–Herriot models take advantage of within-location correlation between multiple outcomes for a set of small areas. We extend the Fay–Herriot model by utilizing empirical likelihood techniques to the spatially explicit multivariate setting. We then model the five-year period estimates from the American Community Survey (2006–10) of percent of unemployed individuals and percent of families in poverty for the counties of Missouri. We demonstrate bivariate reduction in leave-one-out median absolute deviation over an approximately equivalently specified parametric model. Copyright © 2015 John Wiley & Sons, Ltd. Mon, 04 May 2015 22:50:51 +0000 A new weighted likelihood approach http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1002%2Fsta4.80 In this paper, we propose a new weighted likelihood procedure. Here, the weights are suitably calibrated functions of appropriately described residuals at each data point. The residuals describe the match (or mismatch) between the empirical distribution function and the model distribution function. If the match is high, the observation is considered to be a regular observation. But for large (in magnitude) residuals, there is a mismatch, and the corresponding likelihood score function may require downweighting in order to obtain a robust solution. As there is little or no downweighting for observations where there is no evidence of mismatch, asymptotically, we expect that there will be no downweighting under the pure model leading to highly efficient estimators. On the other hand, properly calibrated weight functions that penalize the observations with large residuals will lead to highly robust solutions under model misspecification and the presence of outliers. Copyright © 2015 John Wiley & Sons, Ltd. Tue, 21 Apr 2015 02:13:36 +0000 Visuanimation in statistics http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1002%2Fsta4.77 This paper explores the use of visualization through animations, coined visuanimation, in the field of statistics. In particular, it illustrates the embedding of animations in the paper itself and the storage of larger movies in the online supplemental material. We present results from statistics research projects using a variety of visuanimations, ranging from exploratory data analysis of image data sets to spatio-temporal extreme event modelling; these include a multiscale analysis of classification methods, the study of the effects of a simulated explosive volcanic eruption and an emulation of climate model output. This paper serves as an illustration of visuanimation for future publications in Stat. Copyright © 2015 John Wiley & Sons, Ltd. Tue, 14 Apr 2015 02:35:06 +0000 Optimal sample planning for system state analysis with partial data collection http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1002%2Fsta4.79 We develop optimal and computationally practical procedures to minimize uncertainty concerning the presence of dangerous levels of a contaminant within a building when neither replication nor complete data collection is feasible. More generally, we address inference about the state of a finite system when the state is related to information collected over components of the system when only partial data collection is feasible. When there is no correlation between sample locations, a simple random sample or maximum a priori trait presence would provide optimal sampling choices. When complicated probability models describe trait manifestation, the need to collect only partial data precludes a full fitting of complicated models, and one must rely heavily on prior information naturally leading to a Bayesian approach. Herein, we introduce a computationally efficient heuristic algorithm to simultaneously find optimal sample locations and decision rule parameterizations and then show that it drastically outperforms both random selection and maximum a priori methods. Copyright © 2015 John Wiley & Sons, Ltd. Fri, 27 Mar 2015 04:04:11 +0000 On sparse representation for optimal individualized treatment selection with penalized outcome weighted learning http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1002%2Fsta4.78 As a new strategy for treatment, which takes individual heterogeneity into consideration, personalized medicine is of growing interest. Discovering individualized treatment rules for patients who have heterogeneous responses to treatment is one of the important areas in developing personalized medicine. As more and more information per individual is being collected in clinical studies and not all of the information is relevant for treatment discovery, variable selection becomes increasingly important in discovering individualized treatment rules. In this article, we develop a variable selection method based on penalized outcome weighted learning through which an optimal treatment rule is considered as a classification problem where each subject is weighted proportional to his or her clinical outcome. We show that the resulting estimator of the treatment rule is consistent and establish variable selection consistency and the asymptotic distribution of the estimators. The performance of the proposed approach is demonstrated via simulation studies and an analysis of chronic depression data. Copyright © 2015 John Wiley & Sons, Ltd. Fri, 06 Mar 2015 01:43:09 +0000 Non-parametric Bayes to infer playing strategies adopted in a population of mobile gamers http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1002%2Fsta4.75 Analysis of trace logging data collections of interactions of a heterogenous and diverse population of consumers of digital software with mobile devices provides unprecedented possibilities for understanding how software is actually used and for finding recurring patterns of software usage over the population that are exhibited to a greater or lesser degree in each individual software user. In this work, we consider an elementary mobile game played by a population of mobile gamers and collect pieces of game sessions over an extended period, resulting in a collection of users' trace logs for multiple sessions. We develop a simple, yet flexible, non-parametric Bayes approach to infer playing strategies adopted in the population from the logged traces of game interactions. We demonstrate that our approach finds interpretable strategies and provides good predictive performance compared with alternative modelling assumptions using a non-parametric Bayes framework. Copyright © 2015 John Wiley & Sons, Ltd. Wed, 04 Mar 2015 03:59:11 +0000 Unbiased regression estimation under correlated linkage errors http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1002%2Fsta4.76 Linkage errors can occur when probability-based methods are used to link records from two or more distinct data sets corresponding to the same target population. Recent research on allowing for these errors when carrying out regression analysis based on linked data assumes that the linkage errors are independent when more than two data sets are used to generate these data. In this paper, we extend these results to accommodate the more realistic scenario of dependent linkage errors. Our simulation results show that an incorrect assumption of independent linkage errors can lead to insufficient linkage error bias correction, while an approach that allows for correlated linkage errors appears to overcome this problem. Copyright © 2015 John Wiley & Sons, Ltd. Mon, 02 Mar 2015 06:49:03 +0000 Spanifold: spanning tree flattening onto lower dimension http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1002%2Fsta4.74 Dimensionality reduction and manifold learning techniques attempt to recover a lower-dimensional submanifold from the data as encoded in high dimensions. Many techniques, linear or non-linear, have been introduced in the literature. Standard methods, such as Isomap and local linear embedding, map the high-dimensional data points into a low dimension so as to globally minimize a so-called energy function, which measures the mismatch between the precise geometry in high dimensions and the approximate geometry in low dimensions. However, the local effects of such minimizations are often unpredictable, because the energy minimization algorithms are global in nature. In contrast to these methods, the Spanifold algorithm of this paper constructs a tree on the manifold and flattens the manifold in such a way as to approximately preserve pairwise distance relationships within the tree. The vertices of this tree are the data points, and the edges of the tree form a subset of the edges of the nearest-neighbour graph on the data. In addition, the pairwise distances between data points close to the root of the tree undergo minimal distortion as the data are flattened. This allows the user to design the flattening algorithm so as to approximately preserve neighbour relationships in any chosen local region of the data. Copyright © 2015 John Wiley & Sons, Ltd. Mon, 23 Feb 2015 04:25:55 +0000 Issue Information http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1002%2Fsta4.63 No abstract is available for this article. Mon, 16 Feb 2015 02:36:03 +0000 Correcting for non-ignorable missingness in smoking trends http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1002%2Fsta4.73 Data missing not at random (MNAR) are a major challenge in survey sampling. We propose an approach based on registry data to deal with non-ignorable missingness in health examination surveys. The approach relies on follow-up data available from administrative registers several years after the survey. For illustration, we use data on smoking prevalence in Finnish National FINRISK study conducted in 1972–97. The data consist of measured survey information including missingness indicators, register-based background information and register-based time-to-disease survival data. The parameters of missingness mechanism are estimable with these data although the original survey data are MNAR. The underlying data generation process is modelled by a Bayesian model. The results indicate that the estimated smoking prevalence rates in Finland may be significantly affected by missing data. Copyright © 2015 John Wiley & Sons, Ltd. Thu, 29 Jan 2015 22:52:50 +0000 Wiley-Blackwell Announces Launch of Stat – The ISI's Journal for the Rapid Dissemination of Statistics Research http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1002%2Fsta4.1 Tue, 17 Apr 2012 04:34:14 +0000