(This article was first published on ** R – R-BAR**, and kindly contributed to R-bloggers)

In this post, I will show you how a very basic R code can be used to estimate quality control constants needed to construct X-Individuals, X-Bar, and R-Bar charts. The value of this approach is that it gives you a mechanical sense of where these constants come from and some reinforcement on their application.

If you work in a production or quality control environment, chances are you've made or seen a control chart. If you're new to control charting or need a refresher check out *Understanding Statistical Process Control*, Wheeler et. al. If you want to dive in and start making control charts with R, check out R packages

If your familiar with control charts, you've likely encountered cryptic alpha-numeric constants like d2, A2, E2, d3, D3, and asked,

“What are they and where do they come from?”

**Short (not so satisfying) answer**: They are constants you plug into formulas to determine your control limits. Their value depends on how many samplings you do at a time and the type of chart you are making. For example, if you measure the size of **5** widgets per lot of 50, then your subgroup size, n, is 5 and you should be using a set of control chart constants for n = 5.

So where do they come from and how are they calculated?

Read on.

Often, control charts represent variability in terms of the mean range, R, observed over several subgroup rather than the mean standard deviation. The table below should make the idea of subgroup range and mean range more clear.

Why range? My guess is that, historically, employees at all levels would have understood the concept of range. Range requires no special computation, just (max-min). Speculation aside, we begin our quest to understand where control constants come from with the relationship shown in **Eq. 1** that the mean subgroup range is proportional to standard deviation of the individual values.

The proportionality constant between R(X_{Sub_Grp_Indv}) and S(X_{indv}) is d_{2}, the first constant we'll be estimating. The relationship is expressed in **Eq.2**

To estimate d_{2} for n = 2 (i.e, the subgroup size is 2), we start by drawing two samples from a normal distribution with mean = 0 and sd = 1. Why you ask? Because it makes the math really simple. Consider **Eq. 2**, if S(X_{indv}) = 1 then R(X_{Sub_Grp_Indv}) = d_{2}.

So all we need to do to determine d_{2} when n = 2 is:

- Draw 2 individuals from the normal distribution,
- Determine the range of the 2 samples.
- Repeat many, many times
- Take the average of all the ranges you’ve calculated, R
- R = d
_{2}(when the mean = 0 and sd = 1)

The R code for the process is shown below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ```r require(magrittr) #Bring in the Pipe Function reps <- 1E6 set.seed(5555) replicate(reps, rnorm(n=2, mean = 0, sd = 1) %>% #Draw Two From the Normal Distibution range() %>% #Determine the Range Vector = (Max, Min) diff() %>% #Determine the Difference of the Range Vector abs() #Take the Absolution Value to make sure the Result is positive ) %>% # Replicate the above proceedure 1,000,000 Times mean() -> R_BAR -> d2 #Take the mean of the 1,000,000 ranges d2 ``` ``` ## [1] 1.12804 ``` |

The pipes make the above code easy to read but slow things down quite a bit. The following code does the same thing about 12.5 times faster.

1 2 3 4 5 6 7 8 9 10 11 | ```r reps <- 1E6 set.seed(5555) d2 <- R_BAR <- mean(replicate(reps, abs(diff(range(rnorm(2)))))) d2 ``` ``` ## [1] 1.12804 ``` |

Once you have d_{2}, calculating E2 (3σ for the individuals) and A2 (3σ for the sub-group means) is straight forward as shown in **Eq.3 – Eq.6**. A2 and E3 are the coefficients to the left of R.

The code below gives the expected results for all the control constants need to construct X-Bar and X-Individual charts.

1 2 3 4 5 6 7 8 9 | ```r c(N=2, d2 = d2, E2 = 3/d2, A2 = 3/(d2*sqrt(2))) ``` ``` ## N d2 E2 A2 ## 2.000000 1.128040 2.659480 1.880536 ``` |

The constants for R charts are d_{3} (1σ around R,), D3 (Lower 3σ limit of R) and D4 (Upper 3σ limit of R). To get these constants, we start with the assumption that the standard deviation of R is proportional to the standard deviation of the individual X's. The proportionality constant is d_{3} shown in **Eq.7**. Notice that Eq. 7 has the same form as Eq. 2.

1 2 3 4 5 6 7 8 9 10 11 | ```r reps <- 1E6 set.seed(seed) d3 <- sd(replicate(reps, abs(diff(range(rnorm(2)))))) d3 ``` ``` ## [1] 0.8529419 ``` |

Notice in the R code above, the only difference between the calculation of d_{3} and d_{2} is that we use **standard deviation** rather than the **mean** of the R_{Sub_Grp_Indv}. Now we have d_{3}, but we need to do a little simple algebra to express the S(R_{Sub_Grp_Indv}) in terms of R. Remember, historically the employee doesn't need to worry about standard deviation – just ranges. We can define the above expression in term of R by combining **Eq.2** and **Eq.7**, yielding **Eq.8**.

OK almost to D3 and D4. The lower 3σ limit of R can be expressed as Eq.8:

Factoring out the R terms on the right-hand side of the expression yields

The expression inside the parentheses is D3. D4, the upper limit of R is evaluated analogously. The only difference is a “+” sign in the final expression. Final expressions for D3 and D4 are:

All done! Here is the R code summarizing the constants for R using n=2.

1 2 3 4 5 6 7 8 9 10 11 12 13 | ```r c(N = 2, d3 = d3, D3 = ifelse(1 - 3*d3/d2 < 0, 0, 1 - 3*d3/d2), D4 = 1 + 3*d3/d2 ) ``` ``` ## N d3 D3 D4 ## 2.0000000 0.8529419 0.0000000 3.2683817 ``` |

Notice for D3 the value is 0, this is because the value calculated was negative. Such values are rounded to zero per the R code above.

In this post, we used R to estimate the control chart constants needed to produce X-Individuals, X-Bar, and R-Bar charts. All the constants together are shown below. In addition, the constants for n = 7 have also been presented.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | ```r reps <- 1E6 set.seed(5555) FUN_d2 <- function(x) {mean(replicate(reps, abs(diff(range(rnorm(x))))))} FUN_d3 <- function(x) {sd(replicate(reps, abs(diff(range(rnorm(x))))))} Ns <- c(2,7) d2 <- sapply(Ns, FUN_d2) d3 <- sapply(Ns, FUN_d3) round(data.frame( N = Ns, d2 = d2, E2 = 3/d2, A2 = 3/(d2*sqrt(Ns)), d3 = d3, D3 = ifelse(1 - 3*d3/d2 < 0, 0, 1 - 3*d3/d2), D4 = 1 + 3*d3/d2 ), digits = 3) ``` ``` ## N d2 E2 A2 d3 D3 D4 ## 1 2 1.128 2.659 1.881 0.854 0.000 3.272 ## 2 7 2.704 1.109 0.419 0.833 0.076 1.924 ``` |

To **leave a comment** for the author, please follow the link and comment on their blog: ** R – R-BAR**.

R-bloggers.com offers

(This article was first published on ** Shirin's playgRound**, and kindly contributed to R-bloggers)

During my stay in London for the m3 conference, I also gave a talk at the R-Ladies London Meetup on Tuesday, October 16th, about one of my favorite topics: Interpretable Deep Learning with R, Keras and LIME.

Keras is a high-level open-source deep learning framework that by default works on top of TensorFlow. Keras is minimalistic, efficient and highly flexible because it works with a modular layer system to define, compile and fit neural networks. It has been written in Python but can also be used from within R. Because the underlying backend can be changed from TensorFlow to Theano and CNTK (with more options being developed right now) it is designed to be framework-independent. Models can be trained on CPU or GPU, locally or in the cloud. Shirin will show an example of how to build an image classifier with Keras. We’ll be using a convolutional neural net to classify fruits in images. But that’s not all! We not only want to judge our black-box model based on accuracy and loss measures – we want to get a better understanding of how the model works. We will use an algorithm called LIME (local interpretable model-agnostic explanations) to find out what part of the different test images contributed most strongly to the classification that was made by our model. Shirin will introduce LIME and explain how it works. And finally, she will show how to apply LIME to the image classifier we built before, as well as to a pre-trained Imagenet model.

To **leave a comment** for the author, please follow the link and comment on their blog: ** Shirin's playgRound**.

R-bloggers.com offers

(This article was first published on ** R – Amy Whitehead's Research**, and kindly contributed to R-bloggers)

It’s been pretty quiet over here for a long time. But I have been busy beavering away at many interesting projects at NIWA, including a project where we developed a new method for identifying when your regression model is starting … Continue reading

To **leave a comment** for the author, please follow the link and comment on their blog: ** R – Amy Whitehead's Research**.

R-bloggers.com offers

(This article was first published on ** R – Insights of a PhD**, and kindly contributed to R-bloggers)

I came across this post which gives a method to estimate Pi by using a circle, it’s circumscribed square and (lots of) random points within said square. Booth used Stata to estimate Pi, but here’s some R code to do the same thing…

x <- 0.5 # center x y <- 0.5 # center y n <- 1000 # nr of pts r <- 0.5 # radius pts <- seq(0, 2 * pi, length.out = n) plot(sin(pts), cos(pts), type = 'l', asp = 1) # test require(sp) xy <- cbind(x + r * sin(pts), y + r * cos(pts)) sl <- SpatialPolygons(list(Polygons(list(Polygon(xy)), "polygon"))) plot(sl, add=FALSE, col = 'red', axes=T ) # the square xy <- cbind(c(0, 1, 1, 0), c(0, 0, 1, 1)) sq <- SpatialPolygons(list(Polygons(list(Polygon(xy)), "polygon"))) plot(sq, add = TRUE) N <- 1e6 x <- runif(N, 0, 1) y <- runif(N, 0, 1) sp <- SpatialPoints(cbind(x, y)) plot(sp, add = TRUE, col = "green") require(rgeos) (sim_pi <- (sum(gIntersects(sp, sl, byid = TRUE))/N) *4) sim_pi - pi

Note the use of sp and rgeos packages to calculate the intersections.

To **leave a comment** for the author, please follow the link and comment on their blog: ** R – Insights of a PhD**.

R-bloggers.com offers

(This article was first published on ** R – Win-Vector Blog**, and kindly contributed to R-bloggers)

In August of 2003 Thomas Lumley added `bquote()`

to `R`

1.8.1. This gave `R`

and `R`

users an explicit Lisp-style quasiquotation capability. `bquote()`

and quasiquotation are actually quite powerful. Professor Thomas Lumley should get, and should continue to receive, a lot of credit and thanks for introducing the concept into `R`

.

In fact `bquote()`

is already powerful enough to build a version of `dplyr 0.5.0`

with quasiquotation semantics quite close (from a user perspective) to what is now claimed in `tidyeval`

/`rlang`

.

Let’s take a look at that.

For our example problem: suppose we wanted to average a ratio of columns by groups, but want to write code so that what columns are to be used is specified elsewhere (perhaps coming in as function arguments). You would write code such as the following in `dplyr 0.7.*`

:

```
suppressPackageStartupMessages(library("dplyr"))
# define our parameters
# pretend these come from far away
# or as function arguments.
group_nm <- as.name("am")
num_nm <- as.name("hp")
den_nm <- as.name("cyl")
derived_nm <- as.name(paste0(num_nm, "_per_", den_nm))
mean_nm <- as.name(paste0("mean_", derived_nm))
count_nm <- as.name("group_count")
```

And then you would execute something like the following. We are not running the following block (so there is no result), as we have `dplyr 0.5.0`

installed, which does not yet accept the quasiquotation notation.

```
# apply a parameterized pipeline using rlang::!!
mtcars %>%
group_by(!!group_nm) %>%
mutate(!!derived_nm := !!num_nm/!!den_nm) %>%
summarize(
!!mean_nm := mean(!!derived_nm),
!!count_nm := n()
) %>%
ungroup() %>%
arrange(!!group_nm)
```

Writing the above code requires two ideas:

- quasiqotation (marking some symbols for substitution, in this case using
`!!`

as the marker). As we said, quasiquotation has been available in`R`

since 2003. - Using
`:=`

as a substitute for`=`

to allow substitution marks on the left-hand sides of assignment like expressions without triggering syntax errors.`:=`

is an unused left-over assignment operator that has low-precedence (like assignments do), but not all the syntactic restrictions of`=`

. This is an idea used in`data.table`

for quite some time, and is actually first visible in`dplyr`

when used as part of`dplyr`

‘s`data.table`

adapter in May 2013).

`dplyr 0.5.0`

(released June 2016) did not yet incorporate quasiqotation, so we can not run the above code in `dplyr 0.5.0`

unless we take the steps of:

- Replacing the
`!!x`

substitution markings with`bquote()`

‘s`.(x)`

notation. - Changing the
`:=`

back to`=`

. - Placing the whole pipeline in an
`eval(bquote())`

block. - Not directly substituting symbols on the left-hand sides of expressions.

This looks like the following.

```
# apply a partially parameterized pipeline using bquote
eval(bquote(
mtcars %>%
group_by(.(group_nm)) %>%
mutate(TMP1 = .(num_nm)/.(den_nm)) %>%
rename_(.dots = setNames("TMP1", as.character(derived_nm))) %>%
summarize(
TMP2 = mean(.(derived_nm)),
TMP3 = n()
) %>%
ungroup() %>%
rename_(.dots = setNames("TMP2", as.character(mean_nm))) %>%
rename_(.dots = setNames("TMP3", as.character(count_nm))) %>%
arrange(.(group_nm))
))
```

```
## # A tibble: 2 x 3
## am mean_hp_per_cyl group_count
##
```
## 1 0 22.7 19
## 2 1 23.4 13

That nearly solved the problem (notice we were not able to control the left-hand sides of assignment-like expressions). In December of 2016 we publicly announced and released the `let()`

adapter that addressed almost all of these concerns at the user level (deliberately not modifying `dplyr`

code). The substitutions are specified by name (instead of by position), and the code must still be executed in a `let()`

block to get the desired control.

This looks like the following.

```
alias <-
c(group_nm = as.character(group_nm),
num_nm = as.character(num_nm),
den_nm = as.character(den_nm),
derived_nm = as.character(derived_nm),
mean_nm = as.character(mean_nm),
count_nm = as.character(count_nm))
wrapr::let(alias,
mtcars %>%
group_by(group_nm) %>%
mutate(derived_nm = num_nm/den_nm) %>%
summarize(
mean_nm = mean(derived_nm),
count_nm = n()
) %>%
ungroup() %>%
arrange(group_nm)
)
```

```
## # A tibble: 2 x 3
## am mean_hp_per_cyl group_count
##
```
## 1 0 22.7 19
## 2 1 23.4 13

We felt this was the least intrusive way to give `dplyr`

users useful macro substitution ability over `dplyr`

pipelines. So by December of 2016 users had public access to a complete solution. More on `wrapr::let()`

can be found here.

In May of 2017 `rlang`

was publicly announced and released. `dplyr`

adapted new `rlang`

based quasiquotation ability directly into many of its functions/methods. This means the user does not need to draw a substitution block around their code, as the package authors have written the block. At first glance this is something only the `dplyr`

package authors could do (as they control the code), but it turns out one can also easily adapt (or wrap) existing code in `R`

.

A very small effort (about 25 lines of code now found in `bquote_call()`

) is needed to specify argument substitution semantics using `bquote()`

. The bulk of that code is adding the top-level `:=`

to `=`

substitution. Then it takes only about 10 lines of code to write a function that re-maps existing `dplyr`

functions to new functions using the `bquote()`

adapter `bquote_function()`

. And a package author could choose to directly integrate `bquote()`

transforms at even smaller cost.

Lets convert `dplyr 0.5.0`

to `bquote()`

based quasiquotation. We can do that by running our adapter over the names of 29 common `dplyr`

methods, which places `bquote()`

enhanced versions in our running environment. The code to do this is below.

```
# wrap some dplyr methods
nms <- c("all_equal", "anti_join", "arrange",
"as.tbl", "bind_cols", "bind_rows",
"collect", "compute", "copy_to", "count",
"distinct", "filter", "full_join",
"group_by", "group_indices",
"inner_join", "is.tbl", "left_join",
"mutate", "rename", "right_join",
"select", "semi_join", "summarise",
"summarize", "tally", "tbl", "transmute",
"ungroup")
env <- environment()
for(fname in nms) {
fn <- getFromNamespace(fname, ns = "dplyr")
assign(fname,
# requires wrapr 1.6.4 or newer
wrapr::bquote_function(fn),
envir = env)
}
```

With these adaptions in place we can write partially substituted or parametric code as follows.

`packageVersion("dplyr")`

`## [1] '0.5.0'`

```
# apply a parameterized pipeline using bquote
mtcars %>%
group_by(.(group_nm)) %>%
mutate(.(derived_nm) := .(num_nm)/.(den_nm)) %>%
summarize(
.(mean_nm) := mean(.(derived_nm)),
.(count_nm) := n()
) %>%
ungroup() %>%
arrange(.(group_nm))
```

```
## # A tibble: 2 x 3
## am mean_hp_per_cyl group_count
##
```
## 1 0 22.7 19
## 2 1 23.4 13

This pretty much demonstrates that `bquote()`

was up to the job the whole time. All one had to do is decide to incorporate it into the source code (something only the package author could do) or adapt the functions (a bit invasive, but something any user can do).

The user visible difference is that `bquote()`

uses `.(x)`

to mark substitution of `x`

. `rlang`

currently uses `!!x`

to mark substitution of `x`

(syntactically reminiscent of Lisp’s back-tick). Earlier versions of `rlang`

used `UQ(x)`

, `uq(x)`

, and even `((x))`

(signs of this can be found here). Other famous substitution notations include `%x`

(from `C`

/`printf`

), `{}`

(from Python PEP 498 — Literal String Interpolation, and used in `glue`

.), `$`

and `${}`

(from bash), and many more.

There are some minor technical differences between the `bquote()`

and `rlang`

solutions. But one can write corner case code that fails either method at will. We are not discussing the `rlang`

`!!!`

"splice" syntax, as the *need* for it is avoidable by avoiding over-use of "`...`

" tricks. For actual user work both systems are going to look very similar.

Overall we are discussing avoiding interfaces that are convenient for interactive work, yet difficult to parameterize or program over. Quasiquotation is a tool to overcome package design limitations or design flaws. You don’t see quasiquotation very much in `R`

, as many `R`

developers work hard on their end to make sure the user does not require it.

For `R`

package and function developers my advice is:

- If you are using "quoting interfaces" (interfaces that take names by quoting un-evaluated
`R`

-code) then design your package to also have sufficiently powerful standard evaluation interfaces so that users can move to these if they run into trouble. This was the advice in the first edition of "Advanced R":

As a developer, you should always provide an escape hatch: an alternative version of the function that uses standard evaluation.

- If you wish to introduce quasiquotation into your functions or packages,
*please*consider using`bquote()`

to do it. It has been part of base-`R`

for 15 years, works very well, and is quite stable.`bquote()`

is small enough to be comprehensible. Users who learn`bquote()`

notation on one package will be able to re-use their experience on other`bquote()`

enabled packages. To help: we have some example code called`bquote_call_args()`

where we are prototyping a canonical way to incorporate`bquote()`

and`:=`

re-writing in your own functions and packages. It isn’t in the released version of`wrapr`

yet, but it is testing well in the development version.

A lot of the technical details (including defining standard and non-standard evaluation) are covered in our long article on macro capabilities in `R`

.

To **leave a comment** for the author, please follow the link and comment on their blog: ** R – Win-Vector Blog**.

R-bloggers.com offers

(This article was first published on ** Variance Explained**, and kindly contributed to R-bloggers)

I recently came up with the idea for a series of screencasts:

I've thought about recording a screencast of an example data analysis in #rstats. I'd do it on a dataset I'm unfamiliar with so that I can show and narrate my live thought process.

Any suggestions for interesting datasets to use?

— David Robinson (@drob) October 6, 2018

Hadley Wickham had the great suggestion of analyzing a Tidy Tuesday dataset. Tidy Tuesday is a fantastic project run by the R for Data Science online learning community (especially Thomas Mock) that releases an interesting dataset each week.

I’ve now released my first such screencast, exploring this week’s Tidy Tuesday dataset on (the data behind The Economic Guide to Picking a College Major). You can also find the R Markdown I produced here.

I produced a handful of figures that I found pretty interesting. I took a look at the distribution of income from graduates within each category of major.

I spent some time on looking at the differences in gender distribution across majors, which was also included in the data.

And I ended by setting up an interactive scatterplot with the plotly package that compared the share of women in a field to the median salary.

Some notes and observations:

**This isn’t an R tutorial**: If I were teaching R, I’d have prepared in advance and moved more slowly through the material. This is a case study in how I’d dive into a dataset and learn from it, including steps where I think aloud and decide what route to take. If anything, it’s closer to a “speedrun”.**I enjoyed showing the order I work in**: I write blog posts somewhat “inside-out”: I start with a few figures, then figure out what preprocessing I should have started with, and I’m always moving uninteresting figures out of the post or to appendices. It was nice to show how an analysis took shape and ended up looking like a organized final product.**I ran into fewer bugs than I expected to**Part of the excitement of a live screencast is that “anything can go wrong” (part of the reason I recorded this first one in advance rather than live is to practice with less pressure!) I was pretty well versed in the tools I used in this session (dplyr and ggplot2), so I got stuck on only a handful of bugs (though I did go down a few unproductive routes).

I had enough fun that I think I’ll do it again (though probably not every week). With that in mind, I’ve learned some lessons that might improve my future screencasts:

**I speak too fast**: this is a recurring issue for me. When I’m speaking in front of an audience I can watch people’s faces and pace myself a bit better, but when I’m “by myself” while recording it’s difficult. I’ve learned this is especially difficult for non-native listeners, and I’ll try to be more conscious and speak slower!**I need to keep a better eye on time**: The screencast is about 80 minutes long (I’d originally planned on an hour, and I’ll probably aim for that in the future). I’d be interested in feedback about the length, and whether people find the whole session interesting.

I look forward to hearing your feedback, and to recording to the next one!

To **leave a comment** for the author, please follow the link and comment on their blog: ** Variance Explained**.

R-bloggers.com offers

(This article was first published on ** R – Irregularly Scheduled Programming**, and kindly contributed to R-bloggers)

This is the last update to this strange saga… I hope.

Easily two of the most popular posts on my blog are this one and this one describing a couple of ways in which I managed to hack together using an image as a category label in a ggplot.

There are likely many people who believe one should *never* do such a thing, but given the popularity, it seems a lot of people aren’t listening to that. Good on you.

One of these posts was recently shared again by the amazing #rstats amplifier Mara Averick (if you’re not following her on Twitter, you’re missing out) and @baptiste_auguie (the saviour of the previous implementation) mentioned that he had written a ‘hack’ to get chemical symbols as a categorical axis label using `tikzDevice`

. That package leverages (of which I am *very* familiar, having written my PhD thesis entirely in many moons ago) to treat all of the text in an image as potential commands and produce a working source code which generates the required plot.

The example code is straightforward enough

options(tikzLatexPackages = c(getOption('tikzLatexPackages'),"\\usepackage{acide-amine}\n")) d = data.frame(x=1:10, y=1:10, f=factor(sample(letters[1:2], 10, repl=TRUE))) p <- qplot(x,y,data=d) + theme_bw() + opts(plot.margin = unit(c(1, 1, 5, 1), "lines"), axis.text.x = theme_text(size = 12 * 0.8, lineheight = 0.9, vjust = 10)) + scale_x_continuous(breaks = c(2, 8), labels=c("\\phe{15}", "\\leu{15}")) tikz("annotation.tex",standAlone=T,width=4,height=4) print(p) dev.off()

and produces this

This got me curious, though — if it can process *arbitrary* , could it process a `\\includegraphics`

call?

Efficient! If it's arbitrary LaTeX, could the labels just be \includegraphics calls?

— Jonathan Carroll (@carroll_jono) October 11, 2018

Yes, as it turns out.

A quick test showed that it was indeed possible, which only leaves re-implementing the previous posts’ images using this method.

I’ve done so, and the code isn’t particularly shorter than the other method.

Producing nearly the same end result.

There are a few differences compared to the previous version(s):

– I had a request for rotating the additional text, which I actually also updated recently, and it seemed to fit better, so I rotated the labels within the command.

– Since all of the text has been rendered via , the fonts are a bit different.

– The rankings have since changed, so I’ve added an 11th to keep Australia in the list.

The component of this also meant that a few changes were necessary in the other labels, such as the dollar sign in the y-axis label, and the underscores throughout (these are considered special characters in ). Lastly, the result of running the `tikz`

command is that a `.tex`

( source code) file is produced. This isn’t quite the plot image file we want. It *does* however have the commands to generate one. The last steps in the above gist are to process this `.tex`

file with . Here I used the `tools::texi2dvi`

function, but one *could* also use a `system`

command to their installation.

That still only produced a PDF. The last step was to use the `magick`

package to convert this into an image.

Overall, this is a nice proof of concept, but I don’t think it’s a particularly tidy way of achieving the goal of image axis labels. It does however lay the groundwork for anyone else who decides this might be a useful route to take. Plus I learned a bit more about how `tikzDevice`

works and got to revisit my knowledge.

To **leave a comment** for the author, please follow the link and comment on their blog: ** R – Irregularly Scheduled Programming**.

R-bloggers.com offers

(This article was first published on ** Revolutions**, and kindly contributed to R-bloggers)

In R, the logical `||`

(OR) and `&&`

(AND) operators are unique in that they are designed only to work with scalar arguments. Typically used in statements like

`while(iter < 1000 && eps < 0.0001) continue_optimization()`

the assumption is that the objects on either side (in the example above, `iter`

and `eps`

) are single values (that is, vectors of length 1) — nothing else makes sense for control flow branching like this. If either `iter`

or `eps`

above happened to be vectors with more than one value, R would silently consider only the first elements when making the AND comparison.

In an ideal world that should never happen, of course. But R programmers, like all programmers, do make errors, and using non-scalar values with `||`

or `&&`

is a good sign that something, somewhere, has gone wrong. And with an experimental feature in the next version of R, you will be able to set an environment variable, `_R_CHECK_LENGTH_1_LOGIC2_`

, to signal a warning or error in this case. But why make it experimental, and not the default behavior? Surely making such a change is a no-brainer, right?

It turns out things are more difficult than they seem. The issue is the large ecosystem of R packages, which may rely (wittingly or unwittingly) on the existing behavior. Suddenly throwing an error would throw those packages out of CRAN, and all of their dependencies as well.

We can see the scale of the impact in a recent R Foundation blog post by Tomas Kalibara where he details the effort it took to introduce an even simpler change: making `if(cond) { ... }`

throw an error if `cond`

is a logical vector of length greater than one. Again, it seems like a no-brainer, but when this change was introduced experimentally in March 2017, 154 packages on CRAN started failing because of it … and that *rose* to 179 packages by November 2017. The next step was to implement a new CRAN check to alert those package authors that, in the future, their package would fail. It wasn't until October 2018 that the number of affected packages had dropped to a suitable level that the error could actually be implemented in R as well.

So the lesson is this: with an ecosystem as large as CRAN, even "trivial" changes take a tremendous amount of effort and time. They involve testing the impact on CRAN packages, coding up tests and warnings for maintainers, and allowing enough time for packages to be udpated. For an in-depth perspective, read Tomas Kalibera's essay at the link below.

R Developer blog: Tomas Kalibera: Conditions of Length Greater Than One

To **leave a comment** for the author, please follow the link and comment on their blog: ** Revolutions**.

R-bloggers.com offers