The post Bayesian methods at Bletchley Park appeared first on All About Statistics.

]]>From Nick Patterson’s interview on Talking Machines:

GCHQ in the ’70s, we thought of ourselves as completely Bayesian statisticians. All our data analysis was completely Bayesian, and that was a direct inheritance from Alan Turing. I’m not sure this has ever really been published, but Turing, almost as a sideline during his cryptoanalytic work, reinvented Bayesian statistics for himself. The work against Enigma and other German ciphers was fully Bayesian. …

Bayesian statistics was an

extrememinority discipline in the ’70s. In academia, I only really know of two people who were working majorly in the field, Jimmy Savage … in the States and Dennis Lindley in Britain. And they were regarded as fringe figures in the statistics community. It’s extremely different now. The reason is that Bayesian statisticsworks. So eventually truth will out. There are many, many problems where Bayesian methods are obviously the right thing to do. But in the ’70s we understood that already in Britain in the classified environment.

**Please comment on the article here:** **Statistics – John D. Cook**

The post Bayesian methods at Bletchley Park appeared first on All About Statistics.

]]>Today’s post reminded me of this article from 2005: We illustrate the strengths and limitations of multilevel modeling through an example of the prediction of home radon levels in U.S. counties. . . . Compared with the two classical estimates (no pooling and complete pooling), the inferences from the multilevel models are more reasonable. . […]

The post Multilevel modeling: What it can and cannot do appeared first on Statistical Modeling, Causal Inference, and Social Science.

The post Multilevel modeling: What it can and cannot do appeared first on All About Statistics.

]]>Today’s post reminded me of this article from 2005:

We illustrate the strengths and limitations of multilevel modeling through an example of the prediction of home radon levels in U.S. counties. . . .

Compared with the two classical estimates (no pooling and complete pooling), the inferences from the multilevel models are more reasonable. . . . Although the specific assumptions of model (1) could be questioned or improved, it would be difficult to argue against the use of multilevel modeling for the purpose of estimating radon levels within counties. . . . Perhaps the clearest advantage of multilevel models comes in prediction. In our example we can predict the radon levels for new houses in an existing county or a new county. . . . We can use cross-validation to formally demonstrate the benefits of multilevel modeling. . . . The multilevel model gives more accurate predictions than the no-pooling and complete-pooling regressions, especially when predicting group averages.

The most interesting part comes near the end of the three-page article:

We now consider our model as an observational study of the effect of basements on home radon levels. The study includes houses with and without basements throughout Minnesota. The proportion of homes with basements varies by county (see Fig. 1), but a regression model should address that lack of balance by estimating county and basement effects separately. . . . The new group-level coefficient γ2 is estimated at −.39 (with standard error .20), implying that, all other things being equal, counties with more basements tend to have lower baseline radon levels. For the radon problem, the county-level basement proportion is difficult to interpret directly as a predictor, and we consider it a proxy for underlying variables (e.g., the type of soil prevalent in the county).

This should serve as a warning:

In other settings, especially in social science, individual av- erages used as group-level predictors are often interpreted as “contextual effects.” For example, the presence of more base- ments in a county would somehow have a radon-lowering ef- fect. This makes no sense here, but it serves as a warning that, with identical data of a social nature (e.g., consider substituting “income” for “radon level” and “ethnic minority” for “base- ment” in our study), it would be easy to leap to a misleading conclusion and find contextual effects where none necessarily exist. . . .

This is related to the problem in meta-analysis that between-study variation is typically observational even if individual studies are randomized experiments . . .

In summary:

One intriguing feature of multilevel models is their ability to separately estimate the predictive effects of an indi- vidual predictor and its group-level mean, which are sometimes interpreted as “direct” and “contextual” effects of the predictor. As we have illustrated in this article, these effects cannot necessarily be interpreted causally for observational data, even if these data are a random sample from the population of interest. Our analysis arose in a real research problem (Price et al. 1996) and is not a “trick” example. The houses in the study were sampled at random from Minnesota counties, and there were no problems of selection bias.

Read the whole thing.

The post Multilevel modeling: What it can and cannot do appeared first on Statistical Modeling, Causal Inference, and Social Science.

**Please comment on the article here:** **Statistical Modeling, Causal Inference, and Social Science**

The post Multilevel modeling: What it can and cannot do appeared first on All About Statistics.

]]>Chao Zhang writes: When I want to know the contribution of a predictor in a multilevel model, I often calculate how much of the total variance is reduced in the random effects by the added predictor. For example, the between-group variance is 0.7 and residual variance is 0.9 in the null model, and by adding […]

The post Adding a predictor can increase the residual variance! appeared first on Statistical Modeling, Causal Inference, and Social Science.

The post Adding a predictor can increase the residual variance! appeared first on All About Statistics.

]]>Chao Zhang writes:

When I want to know the contribution of a predictor in a multilevel model, I often calculate how much of the total variance is reduced in the random effects by the added predictor. For example, the between-group variance is 0.7 and residual variance is 0.9 in the null model, and by adding the predictor the residual variance is reduced to 0.7, then VPC = (0.7 + 0.9 – 0.7 – 0.7) / (0.7 + 0.9) = 0.125. Then I assume that the new predictor explained 12.5% more of the total variance than the null model. I guess this is sometimes done by some researchers when they need a measure of sort of an effect size.

However, now I have a case in which adding a new predictor (X) greatly increased the between-group variance. After some inspection, I realized that this was because although X correlate with Y positively overall, it correlate with Y negatively within each group, and X and Y vary in the same direction regarding the grouping variable. Under this situation, the VPC as computed above becomes negative! I am puzzled by this because how could the total variance increase? And this seems to invalidate the above method, at least in some situations.

My reply: this phenomenon is discussed in Section 21.7 of my book with Jennifer Hill. The section is entitled, “Adding a predictor can increase the residual variance!”

It’s great when I can answer a question so easily!

The post Adding a predictor can increase the residual variance! appeared first on Statistical Modeling, Causal Inference, and Social Science.

**Please comment on the article here:** **Statistical Modeling, Causal Inference, and Social Science**

The post Adding a predictor can increase the residual variance! appeared first on All About Statistics.

]]>This research is 60 years in the making: How “you” makes meaning “You” is one of the most common words in the English language. Although it typically refers to the person addressed (“How are you?”), “you” is also used to make timeless statements about people in general (“You win some, you lose some.”). Here, we […]

The post Recently in the sister blog appeared first on Statistical Modeling, Causal Inference, and Social Science.

The post Recently in the sister blog appeared first on All About Statistics.

]]>This research is 60 years in the making:

“You” is one of the most common words in the English language. Although it typically refers to the person addressed (“How are you?”), “you” is also used to make timeless statements about people in general (“You win some, you lose some.”). Here, we demonstrate that this ubiquitous but understudied linguistic device, known as “generic-you,” has important implications for how people derive meaning from experience. Across six experiments, we found that generic-you is used to express norms in both ordinary and emotional contexts and that producing generic-you when reflecting on negative experiences allows people to “normalize” their experience by extending it beyond the self. In this way, a simple linguistic device serves a powerful meaning-making function.

The post Recently in the sister blog appeared first on Statistical Modeling, Causal Inference, and Social Science.

**Please comment on the article here:** **Statistical Modeling, Causal Inference, and Social Science**

The post Recently in the sister blog appeared first on All About Statistics.

]]>The post Another simple Excel chart needs help appeared first on All About Statistics.

]]>Twitter friend Jimmy A. asked if I can help Elon Musk make this chart "more readable".

Let's start with a couple of things he did right. Placing SpaceX, his firm's data, at the bottom of the chart is perfect, as the bottom part of a stacked column chart is the only part that is immediately readable. Combining all of Europe into one category and Other U.S. into one group reduce the number of necessary colors.

Why is this chart unreadable? Here is a line-up of the culprits:

- Red Russia is stealing the thunder
- SpaceX is sharing the blues with Japan/China/Other U.S.
- The legend is sorted in the opposite way as the column segments (courtesy of Excel defaults)
- Axis labels given to two decimal places for market share split only a small number of ways
- It's unclear what "market share" means: is it share of the number of launches or the revenues generated by those launches? Is the "base" of the market share changing over time?
- The last two columns are speculative and these are the two years in which SpaceX has a noticeable advantage (unless they are talking about contracts already concluded)

According to the underlying data, there are some very big changes at foot. The following small-multiples chart shows what is going on:

**Please comment on the article here:** **Junk Charts**

The post Another simple Excel chart needs help appeared first on All About Statistics.

]]>The post Stippling and TSP art in R: emulating StippleGen appeared first on All About Statistics.

]]>Stippling is the creation of a pattern simulating varying degrees of solidity or shading by using small dots (Wikipedia).StippleGen is a piece of software that renders images using stipple patterns, which I discovered on Xi’an’s blog a couple days ago.

StippleGen uses an algorithm by Adrian Secord (described here) that turns out to be related to a problem in spatial statistics, specifically how to mess with high-order statistics of point processes while controlling density. The algorithm is a variant of k-means and is extremely easy to implement in R.

library(imager) library(dplyr) library(purrr) stipple <- function(im,nPoints=1e3,gamma=2,nSteps=10) { dens <- (1-im)^gamma xy <- sample(nPix(im),nPoints,replace=TRUE,prob=dens) %>% coord.index(im,.) %>% select(x,y) for (ind in 1:nSteps) { xy <- cvt(xy,dens) plot(im); points(xy,col="red") } xy } plot.stipple <- function(im,out,cex=.25) { g <- imgradient(im,"xy") %>% map(~ interp(.,out)) plot(out,ylim=c(height(im),1),cex=cex,pch=19,axes=FALSE,xlab="",ylab="") } ##Compute Voronoi diagram of point set xy, ##and return center of mass of each cell (with density given by image im) cvt <- function(xy,im) { voronoi(xy,width(im),height(im)) %>% as.data.frame %>% mutate(vim=c(im)) %>% group_by(value) %>% dplyr::summarise(x=weighted.mean(x,w=vim),y=weighted.mean(y,w=vim)) %>% select(x,y) %>% filter(x %inr% c(1,width(im)),y %inr% c(1,height(im))) } ##Compute Voronoi diagram for points xy over image of size (w,h) ##Uses a distance transform followed by watershed voronoi <- function(xy,w,h) { v <- imfill(w,h) ind <- round(xy) %>% index.coord(v,.) v[ind] <- seq_along(ind) d <- distance_transform(v>0,1) watershed(v,-d,fill_lines=FALSE) } #image from original paper im <- load.image("http://dahtah.github.io/imager/images/stippling_leaves.png") out <- stipple(im,1e4,nSteps=5,gamma=1.5) plot.stipple(im,out)

TSP art is a variant where you solve a TSP problem to connect all the dots.

library(TSP) ##im is the original image (used only for its dimensions) ##out is the output of the stipple function (dot positions) draw.tsp <- function(im,out) { tour <- out %>% ETSP %>% solve_TSP plot(out[tour,],type="l",ylim=c(height(im),1),axes=FALSE,xlab="",ylab="") } ##Be careful, this is memory-heavy (also, slow) out <- stipple(im,4e3,gamma=1.5) draw.tsp(im,out)

I’ve written a more detailed explanation on the imager website, with other variants like stippling with line segments, and a mosaic filter.

**Please comment on the article here:** **dahtah**

The post Stippling and TSP art in R: emulating StippleGen appeared first on All About Statistics.

]]>For a time series { y1, y2, ..., yN }, the difference operator computes the difference between two observations. The kth-order difference is the series { yk+1 - y1, ..., yN - yN-k }. In SAS, the DIF function in the DATA step computes differences between observations. The DIF function [...]

The post Difference operators as matrices appeared first on The DO Loop.

The post Difference operators as matrices appeared first on All About Statistics.

]]>
For a time series { y_{1}, y_{2}, ..., y_{N} }, the difference operator computes the difference between two observations. The *k*th-order difference is the series
{ y_{k+1} - y_{1}, ..., y_{N} - y_{N-k} }.
In SAS, The DIF function in the SAS/IML language takes a column vector of values and returns a vector of differences.

For example, the following SAS/IML statements define a column vector that has five observations and calls the DIF function to compute the first-order differences between adjacent observations. By convention, the DIF function returns a vector that is the same size as the input vector and inserts a missing value in the first element.

proc iml; x = {0, 0.1, 0.3, 0.7, 1}; dif = dif(x); /* by default DIF(x, 1) ==> first-order differences */ print x dif; |

The difference operator is a linear operator that can be represented by a matrix. The first nonmissing value of the difference is x[2]-x[1], followed by x[3]-x[2], and so forth. Thus the linear operator can be represented by the matrix that has -1 on the main diagonal and +1 on the super-diagonal (above the diagonal). An efficient way to construct the difference operator is to start with the zero matrix and insert ±1 on the diagonal and super-diagonal elements. You can use the DO function to construct the indices for the diagonal and super-diagonal elements in a matrix:

start DifOp(dim); D = j(dim-1, dim, 0); /* allocate zero martrix */ n = nrow(D); m = ncol(D); diagIdx = do(1,n*m, m+1); /* index diagonal elements */ superIdx = do(2,n*m, m+1); /* index superdiagonal elements */ *subIdx = do(m+1,n*m, m+1); /* index subdiagonal elements (optional) */ D[diagIdx] = -1; /* assign -1 to diagonal elements */ D[superIdx] = 1; /* assign +1 to super-diagonal elements */ return D; finish; B = DifOp(nrow(x)); d = B*x; print B, d[L="Difference"]; |

You can see that the DifOp function constructs an (n-1) x n matrix, which is the correct dimension for transforming an n-dimensional vector into an (n-1)-dimensional vector. Notice that the matrix multiplication omits the element that previously held a missing value.

You probably would not use a matrix multiplication in place of the DIF function if you needed the first-order difference for a single time series. However, the matrix formulation makes it possible to use one matrix multiplication to find the difference for many time series.

The following matrix contains three time-series, one in each column. The B matrix computes the first-order difference for all columns by using a single matrix-matrix multiplication. The same SAS/IML code is valid whether the X matrix has three columns or three million columns.

/* The matrix can operate on a matrix where each column is a time series */ x = {0 0 0, 0.1 0.2 0.3, 0.3 0.8 0.5, 0.7 0.9 0.8, 1 1 1 }; B = DifOp(nrow(x)); d = B*x; /* apply the difference operator */ print d[L="Difference of Columns"]; |

Other operators in time series analysis can also be represented by matrices. For example, the first-order lag operator is represented by a matrix that has +1 on the super-diagonal. Moving average operators also have matrix representations.

The matrix formulation is efficient for short time series but is not efficient for a time series that contains thousands of elements. If the time series contains n elements, then the dense-matrix representation of the difference operator contains about n2 elements, which consumes a lot of RAM when n is large. However, as we have seen, the matrix representation of an operator is advantageous when you want to operate on a large number of short time series, as might arise in a simulation.

The post Difference operators as matrices appeared first on The DO Loop.

**Please comment on the article here:** **The DO Loop**

The post Difference operators as matrices appeared first on All About Statistics.

]]>The post On the Origin of "Frequentist" Statistics appeared first on All About Statistics.

]]>Efron and Hastie note that the "frequentist" term "seems to have been suggested by Neyman as a statistical analogue of Richard von Mises' frequentist theory of probability, the connection being made explicit in his 1977 paper, 'Frequentist Probability and Frequentist Statistics'". It strikes me that I may have always subconsciously assumed that the term originated with one or another Bayesian, in an attempt to steer toward something more neutral than "classical", which could be interpreted as "canonical" or "foundational" or "the first and best". Quite fascinating that the ultimate "classical" statistician, Neyman, seems to have initiated the switch to "frequentist".

**Please comment on the article here:** **No Hesitations**

The post On the Origin of "Frequentist" Statistics appeared first on All About Statistics.

]]>John Rauser writes: I’ve been a reader of yours (books, papers and the blog) for a long time, and it occurred to me today that I might be able to give something back to you. I recently wrote a talk (https://www.youtube.com/watch?v=fSgEeI2Xpdc) about human factors research applied to making statistical graphics. I mainly cover material from […]

The post Applying human factors research to statistical graphics appeared first on Statistical Modeling, Causal Inference, and Social Science.

The post Applying human factors research to statistical graphics appeared first on All About Statistics.

]]>John Rauser writes:

I’ve been a reader of yours (books, papers and the blog) for a long time, and it occurred to me today that I might be able to give something back to you.

I recently wrote a talk (https://www.youtube.com/watch?v=fSgEeI2Xpdc) about human factors research applied to making statistical graphics. I mainly cover material from Cleveland, but also fold in ideas from Gestalt psychology. My main contribution is to apply these ideas to commonly seen real world graphics and draw out observations and recommendations.

I thought the talk might be decent fodder for your course on statistical graphics and communication, or perhaps for the blog. Either way, thanks so much for everything you’ve contributed to the community.

Trips to Cleveland, huh? I don’t have the patience to watch videos but I’ll post here and we can see what people think.

The post Applying human factors research to statistical graphics appeared first on Statistical Modeling, Causal Inference, and Social Science.

**Please comment on the article here:** **Statistical Modeling, Causal Inference, and Social Science**

The post Applying human factors research to statistical graphics appeared first on All About Statistics.

]]>The post Tutorial: Using seplyr to Program Over dplyr appeared first on All About Statistics.

]]>`seplyr`

is an `R`

package that makes it easy to program over `dplyr`

`0.7.*`

.

To illustrate this we will work an example.

Suppose you had worked out a `dplyr`

pipeline that performed an analysis you were interested in. For an example we could take something similar to one of the examples from the `dplyr`

`0.7.0`

announcement.

```
suppressPackageStartupMessages(library("dplyr"))
packageVersion("dplyr")
```

`## [1] '0.7.2'`

`cat(colnames(starwars), sep='\n')`

```
## name
## height
## mass
## hair_color
## skin_color
## eye_color
## birth_year
## gender
## homeworld
## species
## films
## vehicles
## starships
```

```
starwars %>%
group_by(homeworld) %>%
summarise(mean_height =
mean(height, na.rm = TRUE),
mean_mass =
mean(mass, na.rm = TRUE),
count = n())
```

```
## # A tibble: 49 x 4
## homeworld mean_height mean_mass count
## <chr> <dbl> <dbl> <int>
## 1 Alderaan 176.3333 64.0 3
## 2 Aleen Minor 79.0000 15.0 1
## 3 Bespin 175.0000 79.0 1
## 4 Bestine IV 180.0000 110.0 1
## 5 Cato Neimoidia 191.0000 90.0 1
## 6 Cerea 198.0000 82.0 1
## 7 Champala 196.0000 NaN 1
## 8 Chandrila 150.0000 NaN 1
## 9 Concord Dawn 183.0000 79.0 1
## 10 Corellia 175.0000 78.5 2
## # ... with 39 more rows
```

The above is colloquially called "an interactive script." The name comes from the fact that we use names of variables (such as "`homeworld`

") that would only be known from looking at the data directly in the analysis code. Only somebody interacting with the data could write such a script (hence the name).

It has long been considered a point of discomfort to convert such an interactive `dplyr`

pipeline into a re-usable script or function. That is a script or function that specifies column names in some parametric or re-usable fashion. Roughly it means the names of the data columns are not yet known when we are writing the code (and this is what makes the code re-usable).

This inessential (or conquerable) difficulty is largely a due to the preference for non-standard evaluation interfaces (that is interfaces that capture and inspect un-evaluated expressions from their calling interface) in the design `dplyr`

.

`seplyr`

is a `dplyr`

adapter layer that prefers "slightly clunkier" standard interfaces (or referentially transparent interfaces), which are actually very powerful and can be used to some advantage.

The above description and comparisons can come off as needlessly broad and painfully abstract. Things are much clearer if we move away from theory and return to our practical example.

Let’s translate the above example into a re-usable function in small (easy) stages. First translate the interactive script from `dplyr`

notation into `seplyr`

notation. This step is a pure re-factoring, we are changing the code without changing its observable external behavior.

The translation is mechanical in that it is mostly using `seplyr`

documentation as a lookup table. What you have to do is:

- Change
`dplyr`

verbs to their matching`seplyr`

"`*_se()`

" adapters. - Add quote marks around names and expressions.
- Convert sequences of expressions (such as in the
`summarize()`

) to explicit vectors by adding the "`c()`

" notation. - Replace "
`=`

" in expressions with "`:=`

".

Our converted code looks like the following.

```
library("seplyr")
starwars %>%
group_by_se("homeworld") %>%
summarize_se(c("mean_height" :=
"mean(height, na.rm = TRUE)",
"mean_mass" :=
"mean(mass, na.rm = TRUE)",
"count" := "n()"))
```

```
## # A tibble: 49 x 4
## homeworld mean_height mean_mass count
## <chr> <dbl> <dbl> <int>
## 1 Alderaan 176.3333 64.0 3
## 2 Aleen Minor 79.0000 15.0 1
## 3 Bespin 175.0000 79.0 1
## 4 Bestine IV 180.0000 110.0 1
## 5 Cato Neimoidia 191.0000 90.0 1
## 6 Cerea 198.0000 82.0 1
## 7 Champala 196.0000 NaN 1
## 8 Chandrila 150.0000 NaN 1
## 9 Concord Dawn 183.0000 79.0 1
## 10 Corellia 175.0000 78.5 2
## # ... with 39 more rows
```

This code works the same as the original `dplyr`

code. Obviously at this point all we have done is: worked to make the code a bit less pleasant looking. We have yet to see any benefit from this conversion (though we can turn this on its head and say all the original `dplyr`

notation is saving us is from having to write a few quote marks).

The benefit is: this new code can *very easily* be parameterized and wrapped in a re-usable function. In fact it is now simpler to do than to describe.

For example: suppose (as in the original example) we want to create a function that lets us choose the grouping variable? This is now easy, we copy the code into a function and replace the explicit value `"homeworld"`

with a variable:

```
starwars_mean <- function(my_var) {
starwars %>%
group_by_se(my_var) %>%
summarize_se(c("mean_height" :=
"mean(height, na.rm = TRUE)",
"mean_mass" :=
"mean(mass, na.rm = TRUE)",
"count" := "n()"))
}
starwars_mean("hair_color")
```

```
## # A tibble: 13 x 4
## hair_color mean_height mean_mass count
## <chr> <dbl> <dbl> <int>
## 1 auburn 150.0000 NaN 1
## 2 auburn, grey 180.0000 NaN 1
## 3 auburn, white 182.0000 77.00000 1
## 4 black 174.3333 73.05714 13
## 5 blond 176.6667 80.50000 3
## 6 blonde 168.0000 55.00000 1
## 7 brown 175.2667 79.27273 18
## 8 brown, grey 178.0000 120.00000 1
## 9 grey 170.0000 75.00000 1
## 10 none 180.8889 78.51852 37
## 11 unknown NaN NaN 1
## 12 white 156.0000 59.66667 4
## 13 <NA> 141.6000 314.20000 5
```

In `seplyr`

programming is easy (just replace values with variables). For example we can make a completely generic re-usable "grouped mean" function using `R`

‘s `paste()`

function to build up expressions.

```
grouped_mean <- function(data,
grouping_variables,
value_variables) {
result_names <- paste0("mean_",
value_variables)
expressions <- paste0("mean(",
value_variables,
", na.rm = TRUE)")
calculation <- result_names := expressions
print(as.list(calculation)) # print for demo
data %>%
group_by_se(grouping_variables) %>%
summarize_se(c(calculation,
"count" := "n()"))
}
starwars %>%
grouped_mean(grouping_variables = "eye_color",
value_variables = c("mass", "birth_year"))
```

```
## $mean_mass
## [1] "mean(mass, na.rm = TRUE)"
##
## $mean_birth_year
## [1] "mean(birth_year, na.rm = TRUE)"
## # A tibble: 15 x 4
## eye_color mean_mass mean_birth_year count
## <chr> <dbl> <dbl> <int>
## 1 black 76.28571 33.00000 10
## 2 blue 86.51667 67.06923 19
## 3 blue-gray 77.00000 57.00000 1
## 4 brown 66.09231 108.96429 21
## 5 dark NaN NaN 1
## 6 gold NaN NaN 1
## 7 green, yellow 159.00000 NaN 1
## 8 hazel 66.00000 34.50000 3
## 9 orange 282.33333 231.00000 8
## 10 pink NaN NaN 1
## 11 red 81.40000 33.66667 5
## 12 red, blue NaN NaN 1
## 13 unknown 31.50000 NaN 3
## 14 white 48.00000 NaN 1
## 15 yellow 81.11111 76.38000 11
```

The only part that requires more study and practice was messing around with the expressions using `paste()`

(for more details on the string manipulation please try "`help(paste)`

"). Notice also we used the "`:=`

" operator to bind the list of desired result names to the matching calculations (please see "`help(named_map_builder)`

" for more details).

The point is: we did not have to bring in (or study) any deep-theory or heavy-weight tools such as `rlang`

/`tidyeval`

or `lazyeval`

to complete our programming task. Once you are in `seplyr`

notation, changes are very easy. You can separate translating into `seplyr`

notation from the work of designing your wrapper function (breaking your programming work into smaller easier to understand steps).

The `seplyr`

method is simple, easy to teach, and powerful. The package contains a number of worked examples both in `help()`

and `vignette(package='seplyr')`

documentation.

**Please comment on the article here:** **Statistics – Win-Vector Blog**

The post Tutorial: Using seplyr to Program Over dplyr appeared first on All About Statistics.

]]>