(This article was first published on ** CillianMacAodh**, and kindly contributed to R-bloggers)

It has been quite a while since I posted, but I haven’t been idle, I completed my PhD since the last post, and I’m due to graduate next Thursday. I am also delighted to have recently been added to R-bloggers.com so I’m keen to get back into it.

I have already written 2 posts about writing functions, and I will try to diversify my content. That said, I won’t refrain from sharing something that has been helpful to me. The function(s) I describe in this post is an artefact left over from before I started using R Markdown. It is a product of its time but may still be of use to people who haven’t switched to R Markdown yet. It is lazy (and quite imperfect) solution to a tedious task.

At the time I wrote this function I was using R for my statistics and Libreoffice for writing. I would run a test in R and then write it up in Libreoffice. Each value that needed reporting had to be transferred from my R output to Libreoffice – and for each test there are a number of values that need reporting. Writing up these tests is pretty formulaic. There’s a set structure to the sentence, for example writing up a t-test with a significant result nearly always looks something like this:

An independent samples t-test revealed a significant difference in X between the Y sample, (*M* = [ ], *SD* = [ ]), and the Z sample, (*M* = [ ], SD = [ ]), *t*([df]) = [ ], *p* = [ ].

And the write up of a non-significant result looks something like this:

An independent samples t-test revealed no significant difference in X between the Y sample, (*M* = [ ], *SD* = [ ]), and the Z sample, (*M* = [ ], SD = [ ]), *t*([df]) = [ ], *p* = [ ].

Seven values (the square [ ] brackets) need to be reported for this single test. Whether you copy and paste or type each value, the reporting of such tests can be very tedious, and leave you prone to errors in reporting.

In order to make reporting values easier (and more accurate) I wrote the `t_paragraph()`

function (and the related `t_paired_paragraph()`

function). This provided an output that I could copy and paste into a Word (Libreoffice) document. This function is part of the `desnum`

^{1} package (McHugh, 2017).

`t_parapgraph()`

FunctionThe `t_parapgraph()`

function runs a t-test and generates an output that can be copied and pasted into a word document. The code for the function is as follows:

```
# Create the function t_paragraph with arguments x, y, and measure
# x is the dependent variable
# y is the independent (grouping) variable
# measure is the name of dependent variable inputted as string
t_paragraph <- function (x, y, measure){
# Run a t-test and store it as an object t
t <- t.test(x ~ y)
# If your grouping variable has labelled levels, the next line will store them for reporting at a later stage
labels <- levels(y)
# Create an object for each value to be reported
tsl <- as.vector(t$statistic)
ts <- round(tsl, digits = 3)
tpl <- as.vector(t$p.value)
tp <- round(tpl, digits = 3)
d_fl <- as.vector(t$parameter)
d_f <- round(d_fl, digits = 2)
ml <- as.vector(tapply(x, y, mean))
m <- round(ml, digits = 2)
sdl <- as.vector(tapply(x, y, sd))
sd <- round(sdl, digits = 2)
# Use print(paste0()) to combine the objects above and create two potential outputs
# The output that is generated will depend on the result of the test
# wording if significant difference is observed
if (tp < 0.05)
print(paste0("An independent samples t-test revealed a significant difference in ",
measure, " between the ", labels[1], " sample, (M = ",
m[1], ", SD = ", sd[1], "), and the ", labels[2],
" sample, (M =", m[2], ", SD =", sd[2], "), t(",
d_f, ") = ", ts, ", p = ", tp, "."), quote = FALSE,
digits = 2)
# wording if no significant difference is observed
if (tp > 0.05)
print(paste0("An independent samples t-test revealed no difference in ",
measure, " between the ", labels[1], " sample, (M = ",
m[1], ", SD = ", sd[1], "), and the ", labels[2],
" sample, (M = ", m[2], ", SD =", sd[2], "), t(",
d_f, ") = ", ts, ", p = ", tp, "."), quote = FALSE,
digits = 2)
}
```

When using `t_paragraph()`

, `x`

is your DV, `y`

is your grouping variable while `measure`

is a string value that the name of the dependent variable. To illustrate the function I’ll use the `mtcars`

dataset.

`t_parapgraph()`

FunctionThe `mtcars`

dataset is comes with R. For information on it simply type `help(mtcars)`

. The variables of interest here are `am`

(transmission; 0 = automatic, 1 = manual), `mpg`

(miles per gallon), `qsec`

(1/4 mile time). The two questions I’m going to look at are:

- Is there a difference in miles per gallon depending on transmission?
- Is there a difference in 1/4 mile time depending on transmission?

Before running the test it is a good idea to look at the data^{2}. Because we’re going to look at differences between groups we want to run descriptives for each group separately. To do this I’m going to combine the the `descriptives()`

function which I previously covered here (also part of the `desnum`

package) and the `tapply()`

function.

The `tapply()`

function allows you to run a function on subsets of a dataset using a grouping variable (or index). The arguments are as follows `tapply(vector, index, function)`

. `vector`

is the variable you want to pass through `function`

; and `index`

is the grouping variable. The examples below will make this clearer.

We want to run descriptives on `mtcars$mpg`

and on `mtcars$qsec`

and for each we want to group by transmission (`mtcars$am`

). This can be done using `tapply()`

and `descriptives()`

together as follows:

`tapply(mtcars$mpg, mtcars$am, descriptives)`

```
## $`0`
## mean sd min max len
## 1 17.14737 3.833966 10.4 24.4 19
##
## $`1`
## mean sd min max len
## 1 24.39231 6.166504 15 33.9 13
```

Recall that 0 = automatic, and 1 = manual. Replace `mpg`

with `qsec`

and run again:

`tapply(mtcars$qsec, mtcars$am, descriptives)`

```
## $`0`
## mean sd min max len
## 1 18.18316 1.751308 15.41 22.9 19
##
## $`1`
## mean sd min max len
## 1 17.36 1.792359 14.5 19.9 13
```

`t_paragraph()`

Now that we know the values for automatic vs manual cars we can run our t-tests using `t_paragraph()`

. Our first question:

Is there a difference in miles per gallon depeding on transmission?

`t_paragraph(mtcars$mpg, mtcars$am, "miles per gallon")`

`## [1] An independent samples t-test revealed a significant difference in miles per gallon between the sample, (M = 17.15, SD = 3.83), and the sample, (M =24.39, SD =6.17), t(18.33) = -3.767, p = 0.001.`

There is a difference, and the output above can be copied and pasted into a word document with minimal changes required.

Our second question was:

Is there a difference in 1/4 mile time depending on transmission?

`t_paragraph(mtcars$qsec, mtcars$am, "quarter-mile time")`

`## [1] An independent samples t-test revealed no difference in quarter-mile time between the sample, (M = 18.18, SD = 1.75), and the sample, (M = 17.36, SD =1.79), t(25.53) = 1.288, p = 0.209.`

This time there was no significant difference, and again the output can be copied and pasted into word with minimal changes.

The function described was written a long time ago, and could be updated. However I no longer copy and paste into word (having switched to R markdown instead). The reporting of the p value is not always to APA standards. If p is < .001 this is what should be reported. The code for `t_paragraph()`

could be updated to include the `p_report`

function (described here) which would address this. Another limitation is that the formatting of the text isn’t perfect, the letters (N,M,SD,t,p) should all be italicised, but having to manually fix this formatting is still easier than manually transferring individual values.

Despite the limitations the functions `t_paragraph()`

and `t_paired_paragraph()`

^{3} have made my life easier. I still use them occasionally. I hope they can be of use to anyone who is using R but has not switched to R Markdown yet.

McHugh, C. (2017). *Desnum: Creates some useful functions*.

- To install
`desnum`

just run`devtools::install_github("cillianmiltown/R_desnum")`

- In this case this is particularly useful because there are no value labels for
`mtcars$am`

, so it won’t be clear from the output which values refer to the automatic group and which refer to the manual group. Running descriptives will help with this. - If you want to see the code for
`t_paired_paragraph()`

just load`desnum`

and run`t_paired_paragraph`

(without parenthesis)

To **leave a comment** for the author, please follow the link and comment on their blog: ** CillianMacAodh**.

R-bloggers.com offers

(This article was first published on ** R-posts.com**, and kindly contributed to R-bloggers)

In 2016, 2.1 million Americans were found to have an opioid use disorder (according to SAMHSA), with drug overdose now the leading cause of injury and death in the United States. But some of the country’s top minds are working to fight this epidemic, and statisticians are helping to lead the charge.

In* **This* is Statistics’ second annual fall data challenge, high school and undergraduate students will use statistics to analyze data and develop recommendations to help address this important public health crisis.

The contest invites teams of two to five students to put their statistical and data visualization skills to work using the Centers for Disease Control and Prevention (CDC)’s Multiple Cause of Death (Detailed Mortality) data set, and contribute to creating healthier communities. Given the size and complexity of the CDC dataset, programming languages such as R can be used to manipulate and conduct analysis effectively.

Each submission will consist of a short essay and presentation of recommendations. Winners will be awarded for best overall analysis, best visualization and best use of external data. Submissions are due November 12, 2018.

If you or a student you know is interested in participating, get full contest details here.

Teachers, get resources about how to engage your students in the contest here.

To **leave a comment** for the author, please follow the link and comment on their blog: ** R-posts.com**.

R-bloggers.com offers

(This article was first published on ** R-english – Freakonometrics**, and kindly contributed to R-bloggers)

Some pre-Halloween post today. It started actually while I was in Barcelona : kids wanted to go back to some store we’ve seen the first day, in the gothic part, and I could not remember where it was. And I said to myself that would be quite long to do all the street of the neighborhood. And I discovered that it was actually an old problem. In 1962, Meigu Guan was interested in a postman delivering mail to a number of streets such that the total distance walked by the postman was as short as possible. How could the postman ensure that the distance walked was a minimum?

A very close notion is the concept of **traversable graph**, which is one that can be drawn without taking a pen from the paper and without retracing the same edge. In such a case the graph is said to have an **Eulerian trail **(yes, from Euler’s bridges problem). An Eulerian trail uses all the edges of a graph. For a graph to be Eulerian **all the vertices must be of even order**.

An algorithm for finding an optimal Chinese postman route is:

- List all odd vertices.
- List all possible pairings of odd vertices.
- For each pairing find the edges that connect the vertices with the minimum weight.
- Find the pairings such that the sum of the weights is minimised.
- On the original graph add the edges that have been found in Step 4.
- The length of an optimal Chinese postman route is the sum of all the edges added to the total found in Step 4.
- A route corresponding to this minimum weight can then be easily found.

For the first steps, we can use the codes from Hurley & Oldford’s Eulerian tour algorithms for data visualization and the PairViz package. First, we have to load some R packages

1 2 3 4 | require(igraph) require(graph) require(eulerian) require(GA) |

Then use the following function from stackoverflow,

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | make_eulerian = function(graph){ info = c("broken" = FALSE, "Added" = 0, "Successfull" = TRUE) is.even = function(x){ x %% 2 == 0 } search.for.even.neighbor = !is.even(sum(!is.even(degree(graph)))) for(i in V(graph)){ set.j = NULL uneven.neighbors = !is.even(degree(graph, neighbors(graph,i))) if(!is.even(degree(graph,i))){ if(sum(uneven.neighbors) == 0){ if(sum(!is.even(degree(graph))) > 0){ info["Broken"] = TRUE uneven.candidates <- !is.even(degree(graph, V(graph))) if(sum(uneven.candidates) != 0){ set.j <- V(graph)[uneven.candidates][[1]] }else{ info["Successfull"] <- FALSE } } }else{ set.j <- neighbors(graph, i)[uneven.neighbors][[1]] } }else if(search.for.even.neighbor == TRUE & is.null(set.j)){ info["Added"] <- info["Added"] + 1 set.j <- neighbors(graph, i)[ !uneven.neighbors ][[1]] if(!is.null(set.j)){search.for.even.neighbor <- FALSE} } if(!is.null(set.j)){ if(i != set.j){ graph <- add_edges(graph, edges=c(i, set.j)) info["Added"] <- info["Added"] + 1 } } } (list("graph" = graph, "info" = info))} |

Then, consider some network, with 12 nodes

1 2 3 | g1 = graph(c(1,2, 1,3, 2,4, 2,5, 1,5, 3,5, 4,7, 5,7, 5,8, 3,6, 6,8, 6,9, 9,11, 8,11, 8,10, 8,12, 7,10, 10,12, 11,12), directed = FALSE) |

To plot that network, use

1 2 3 4 | V(g1)$name=LETTERS[1:12] V(g1)$color=rgb(0,0,1,.4) ly=layout.kamada.kawai(g1) plot(g1,vertex.color=V(newg)$color,layout=ly) |

Then we convert it to some traversable graph by adding 5 vertices

1 2 3 4 5 | eulerian = make_eulerian(g1) eulerian$info broken Added Successfull 0 5 1 g = eulerian$graph |

as shown below

1 2 | ly=layout.kamada.kawai(g) plot(g,vertex.color=V(newg)$color,layout=ly) |

We cut those 5 vertices in two part, and therefore, we add 5 artificial nodes

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | A=as.matrix(as_adj(g)) A1=as.matrix(as_adj(g1)) newA=lower.tri(A, diag = FALSE)*A1+upper.tri(A, diag = FALSE)*A for(i in 1:sum(newA==2)) newA = cbind(newA,0) for(i in 1:sum(newA==2)) newA = rbind(newA,0) s=nrow(A) for(i in 1:nrow(A)){ Aj=which(newA[i,]==2) if(!is.null(Aj)){ for(j in Aj){ newA[i,s+1]=newA[s+1,i]=1 newA[j,s+1]=newA[s+1,j]=1 newA[i,j]=1 s=s+1 }}} |

We get the following graph, where all nodes have an even number of vertices !

1 2 3 4 5 6 7 8 9 10 11 12 | newg=graph_from_adjacency_matrix(newA) newg=as.undirected(newg) V(newg)$name=LETTERS[1:17] V(newg)$color=c(rep(rgb(0,0,1,.4),12),rep(rgb(1,0,0,.4),5)) ly2=ly transl=cbind(c(0,0,0,.2,0),c(.2,-.2,-.2,0,-.2)) for(i in 13:17){ j=which(newA[i,]>0) lc=ly[j,] ly2=rbind(ly2,apply(lc,2,mean)+transl[i-12,]) } plot(newg,layout=ly2) |

Our network is now the following (new nodes are small because actually, they don’t really matter, it’s just for computational reasons)

1 2 3 | plot(newg,vertex.color=V(newg)$color,layout=ly2, vertex.size=c(rep(20,12),rep(0,5)), vertex.label.cex=c(rep(1,12),rep(.1,5))) |

Now we can get the optimal path

1 2 3 4 5 6 7 | n <- LETTERS[1:nrow(newA)] g_2 <- new("graphNEL",nodes=n) for(i in 1:nrow(newA)){ for(j in which(newA[i,]>0)){ g_2 <- addEdge(n[i],n[j],g_2,1) }} etour(g_2,weighted=FALSE) [1] "A" "B" "D" "G" "E" "A" "C" "E" "H" "F" "I" "K" "H" "J" "G" "P" "J" "L" "K" "Q" "L" "H" "O" "F" "C" [26] "N" "E" "B" "M" "A" |

or

1 2 3 4 5 6 7 8 9 10 11 12 13 14 | edg=attr(E(newg), "vnames") ET=etour(g_2,weighted=FALSE) parcours=trajet=rep(NA,length(ET)-1) for(i in 1:length(parcours)){ u=c(ET[i],ET[i+1]) ou=order(u) parcours[i]=paste(u[ou[1]],u[ou[2]],sep="|") trajet[i]=which(edg==parcours[i]) } parcours [1] "A|B" "B|D" "D|G" "E|G" "A|E" "A|C" "C|E" "E|H" "F|H" "F|I" "I|K" "H|K" "H|J" "G|J" "G|P" "J|P" [17] "J|L" "K|L" "K|Q" "L|Q" "H|L" "H|O" "F|O" "C|F" "C|N" "E|N" "B|E" "B|M" "A|M" trajet [1] 1 3 8 9 4 2 6 10 11 12 16 15 14 13 26 27 18 19 28 29 17 25 24 7 22 23 5 21 20 |

Let us try now on a real network of streets. Like Missoula, Montana.

I will not try to get the shapefile of the city, I will just try to replicate the photography above.

If you look carefully, you will see some problem : 10 and 93 have an odd number of vertices (3 here), so one strategy is to connect them (which explains the grey line).

But actually, to be more realistic, we start in 93, and we end in 10. Here is the optimal (shortest) path which goes through all vertices.

Now, we are ready for Halloween, to go through all streets in the neighborhood !

To **leave a comment** for the author, please follow the link and comment on their blog: ** R-english – Freakonometrics**.

R-bloggers.com offers

(This article was first published on ** R – Fantasy Football Analytics**, and kindly contributed to R-bloggers)

Week 7 Gold Mining and Fantasy Football Projection Roundup now available. Go check out our cheat sheet for this week.

The post Gold-Mining Week 7 (2018) appeared first on Fantasy Football Analytics.

To **leave a comment** for the author, please follow the link and comment on their blog: ** R – Fantasy Football Analytics**.

R-bloggers.com offers

(This article was first published on ** DataCamp Community - r programming**, and kindly contributed to R-bloggers)

Here is the course link.

Data visualization is an integral part of the data analysis process. This course will get you introduced to rbokeh: a visualization library for interactive web-based plots. You will learn how to use rbokeh layers and options to create effective visualizations that carry your message and emphasize your ideas. We will focus on the two main pieces of data visualization: wrangling data in the appropriate format as well as employing the appropriate visualization tools, charts and options from rbokeh.

In this chapter we get introduced to rbokeh layers. You will learn how to specify data and arguments to create the desired plot and how to combine multiple layers in one figure.

In this chapter you will learn how to customize your rbokeh figures using aesthetic attributes and figure options. You will see how aesthetic attributes such as color, transparancy and shape can serve a purpose and add more info to your visualizations. In addition, you will learn how to activate the tooltip and specify the hover info in your figures.

In this chapter, you will learn how to put your data in the right format to fit the desired figure. And how to transform between the wide and long formats. You will also see how to combine normal layers with regression lines. In addition you will learn how to customize the interaction tools that appear with each figure.

In this chapter you will learn how to combine multiple plots in one layout using grid plots. In addition, you will learn how to create interactive maps.

To **leave a comment** for the author, please follow the link and comment on their blog: ** DataCamp Community - r programming**.

R-bloggers.com offers

(This article was first published on ** DataCamp Community - r programming**, and kindly contributed to R-bloggers)

Here is the course link.

This course will help you take your data visualization skills beyond the basics and hone them into a powerful member of your data science toolkit. Over the lessons we will use two interesting open datasets to cover different types of data (proportions, point-data, single distributions, and multiple distributions) and discuss the pros and cons of the most common visualizations. In addition, we will cover some less common alternatives visualizations for the data types and how to tweak default ggplot settings to most efficiently and effectively get your message across.

In this chapter, we focus on visualizing proportions of a whole; we see that pie charts really aren’t so bad, along with discussing the waffle chart and stacked bars for comparing multiple proportions.

We shift our focus now to single-observation or point data and go over when bar charts are appropriate and when they are not, what to use when they are not, and general perception-based enhancements for your charts.

We now move on to visualizing distributional data, we expose the fragility of histograms, discuss when it is better to shift to a kernel density plots, and how to make both plots work best for your data.

Finishing off we take a look at comparing multiple distributions to each other. We see why the traditional box plots are very dangerous and how to easily improve them, along with investigating when you should use more advanced alternatives like the beeswarm plot and violin plots.

To **leave a comment** for the author, please follow the link and comment on their blog: ** DataCamp Community - r programming**.

R-bloggers.com offers

(This article was first published on ** Marcelo S. Perlin**, and kindly contributed to R-bloggers)

An Introduction to Loops in R –

First, if you are new to programming, you should know that loops are a

way to tell the computer that you want to repeat some operation for a

number of times. This is a very common task that can be found in many

programming languages. For example, let’s say you invited five friends

for dinner at your home and the whole cost of four pizzas will be split

evenly. Assume now that you **must** give instructions to a computer on

calculating how much each one will pay at the end of dinner. For that,

you need to sum up the individual tabs and divide by the number of

people. Your instructions to the computer could be: *start with a value
of x=zero, take each individual pizza cost and sum it to x until all
costs are processed, dividing the result by the number of friends at the
end*.

The great thing about *loops* is that the length of it is dynamically

set. Using the previous example, if we had 500 friends (and a large

dinner table!), we could use the same instructions for calculating the

individual tabs. That means we can encapsulate a generic procedure for

processing any given number of friends at dinner. With it, you have at

your reach a tool for the execution of any sequential process. In other

words, you are the boss of your computer and, as long as you can write

it down clearly, you can set it to do any kind of repeated task for you.

Now, about the code, we could write the solution to the *pizza problem*

in R as:

```
pizza.costs <- c(50, 80, 30, 60) # each cost of pizza
n.friends <- 5 # number of friends
x <- 0 # set first cost to zero
for (i.cost in pizza.costs) {
x <- x + i.cost # sum it up
}
x <- x/n.friends # divide for average per friend
print(x)
## [1] 44
```

Don’t worry if you didn’t understand the code. We’ll get to the

structure of a loop soon.

Back to our case, each friend would pay 44 for the meal. We can check

the result against function `sum`

:

```
x == sum(pizza.costs)/n.friends
## [1] TRUE
```

The output `TRUE`

shows that the results are equal.

Knowing how to use loops can be a powerful ally in a complex data

related problem. Let’s talk more about how *loops* are defined in R. The

structure of a *loop* in R follows:

```
for (i in i.vec){
...
}
```

In the previous code, command `for`

indicates the beginning of a *loop*.

Object `i`

in `(i in i.vec)`

is the iterator of the *loop*. This

iterator will change its value in each iteration, taking each individual

value contained in `i.vec`

. Note the *loop* is encapsulated by curly

braces (`{}`

). These are important, as they define where the *loop*

starts and where it ends. The indentation (use of bigger margins) is

also important for visual cues, but not necessary. Consider the

following practical example:

```
# set seq
my.seq <- seq(-5,5)
# do loop
for (i in my.seq){
cat(paste('\nThe value of i is',i))
}
##
## The value of i is -5
## The value of i is -4
## The value of i is -3
## The value of i is -2
## The value of i is -1
## The value of i is 0
## The value of i is 1
## The value of i is 2
## The value of i is 3
## The value of i is 4
## The value of i is 5
```

In the code, we created a sequence from -5 to 5 and presented a text for

each element with the `cat`

function. Notice how we also broke the

prompt line with `'\n'`

. The *loop* starts with `i=-5`

, execute command

`cat(paste('\nThe value of i is', -5))`

, proceed to the next iteration

by setting `i=-4`

, rerun the `cat`

command, and so on. At its final

iteration, the value of `i`

is `5`

.

The iterated sequence in the *loop* is not exclusive to numerical

vectors. Any type of vector or list may be used. See next:

```
# set char vec
my.char.vec <- letters[1:5]
# loop it!
for (i.char in my.char.vec){
cat(paste('\nThe value of i.char is', i.char))
}
##
## The value of i.char is a
## The value of i.char is b
## The value of i.char is c
## The value of i.char is d
## The value of i.char is e
```

The same goes for `lists`

:

```
# set list
my.l <- list(x = 1:5,
y = c('abc','dfg'),
z = factor('A','B','C','D'))
# loop list
for (i.l in my.l){
cat(paste0('\nThe class of i.l is ', class(i.l), '. '))
cat(paste0('The number of elements is ', length(i.l), '.'))
}
##
## The class of i.l is integer. The number of elements is 5.
## The class of i.l is character. The number of elements is 2.
## The class of i.l is factor. The number of elements is 1.
```

In the definition of *loops*, the iterator does not have to be the only

object incremented in each iteration. We can create other objects and

increment them using a simple sum operation. See next:

```
# set vec and iterators
my.vec <- seq(1:5)
my.x <- 5
my.z <- 10
for (i in my.vec){
# iterate "manually"
my.x <- my.x + 1
my.z <- my.z + 2
cat('\nValue of i = ', i,
' | Value of my.x = ', my.x,
' | Value of my.z = ', my.z)
}
##
## Value of i = 1 | Value of my.x = 6 | Value of my.z = 12
## Value of i = 2 | Value of my.x = 7 | Value of my.z = 14
## Value of i = 3 | Value of my.x = 8 | Value of my.z = 16
## Value of i = 4 | Value of my.x = 9 | Value of my.z = 18
## Value of i = 5 | Value of my.x = 10 | Value of my.z = 20
```

Using nested *loops*, that is, a *loop* inside of another *loop* is also

possible. See the following example, where we present all the elements

of a matrix:

```
# set matrix
my.mat <- matrix(1:9, nrow = 3)
# loop all values of matrix
for (i in seq(1,nrow(my.mat))){
for (j in seq(1,ncol(my.mat))){
cat(paste0('\nElement [', i, ', ', j, '] = ', my.mat[i,j]))
}
}
##
## Element [1, 1] = 1
## Element [1, 2] = 4
## Element [1, 3] = 7
## Element [2, 1] = 2
## Element [2, 2] = 5
## Element [2, 3] = 8
## Element [3, 1] = 3
## Element [3, 2] = 6
## Element [3, 3] = 9
```

Now, the computational needs of the real world is far more complex than

dividing a dinner expense. A practical example of using *loops* is

processing data according to groups. Using an example from Finance, if

we have a return dataset for several stocks and we want to calculate the

average return of each stock, we can use a *loop* for that. In this

example, we will use *Yahoo Finance* data from three stocks: FB, GE and

AA. The first step is downloading it with package `BatchGetSymbols`

.

```
library(BatchGetSymbols)
## Loading required package: rvest
## Loading required package: xml2
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
my.tickers <- c('FB', 'GE', 'AA')
df.stocks <- BatchGetSymbols(tickers = my.tickers,
first.date = '2012-01-01',
freq.data = 'yearly')[[2]]
##
## Running BatchGetSymbols for:
## tickers = FB, GE, AA
## Downloading data for benchmark ticker | Found cache file
## FB | yahoo (1|3) | Found cache file - Good job!
## GE | yahoo (2|3) | Found cache file - Nice!
## AA | yahoo (3|3) | Found cache file - You got it!
```

It worked fine. Let’s check the contents of the dataframe:

```
dplyr::glimpse(df.stocks)
## Observations: 21
## Variables: 10
## $ ticker
``` "AA", "AA", "AA", "AA", "AA", "AA", "AA", ...
## $ ref.date 2012-01-03, 2013-01-02, 2014-01-02, 2015-...
## $ volume 2217410500, 2149575500, 2146821400, 268355...
## $ price.open 21.48282, 21.33864, 25.30359, 38.13561, 22...
## $ price.high 25.85628, 25.68807, 42.29280, 41.01921, 32...
## $ price.low 19.27206, 18.50310, 24.27030, 18.79146, 16...
## $ price.close 22.17969, 21.60297, 25.30359, 38.15964, 23...
## $ price.adjusted 20.89342, 20.62187, 24.48568, 37.24207, 23...
## $ ret.adjusted.prices NA, -0.01299715, 0.18736494, 0.52097326, -...
## $ ret.closing.prices NA, -0.02600212, 0.17130149, 0.50807215, -...

All financial data is there. Notice that the return series is available

at column ret.adjusted.prices.

Now we will use a loop to build a table with the mean return of each

stock:

```
# find unique tickers in column ticker
unique.tickers <- unique(df.stocks$ticker)
# create empty df
tab.out <- data.frame()
# loop tickers
for (i.ticker in unique.tickers){
# create temp df with ticker i.ticker
temp <- df.stocks[df.stocks$ticker==i.ticker, ]
# row bind i.ticker and mean.ret
tab.out <- rbind(tab.out,
data.frame(ticker = i.ticker,
mean.ret = mean(temp$ret.adjusted.prices, na.rm = TRUE)))
}
# print result
print(tab.out)
## ticker mean.ret
## 1 AA 0.24663684
## 2 FB 0.35315566
## 3 GE 0.06784693
```

In the code, we used function `unique`

to find out the names of all the

tickers in the dataset. Soon after, we create an empty *dataframe* to

save the results and a loop to filter the data of each stock

sequentially and average its returns. At the end of the *loop*, we use

function `rbind`

to paste the results of each stock with the results of

the main table. As you can see, we can use the data to perform group

calculations with *loop*.

By now, I must be forward in saying that the previous loop is by no

means the best way of performing the data operation. What we just did by

loops is called a *split-apply-combine* procedure. There are base

function in R such as `tapply`

, `split`

and `lapply`

/`sapply`

that can

do the same job but with a more intuitive and functional approach. Going

further, functions from package `tidyverse`

can do the same procuedure

with an even more intuitive approach. In a future post I shall discuss

this possibilities further.

I hope you guys liked the post. Got a question? Just drop it at the

comment section.

To **leave a comment** for the author, please follow the link and comment on their blog: ** Marcelo S. Perlin**.

R-bloggers.com offers

(This article was first published on ** R on Gianluca Baio**, and kindly contributed to R-bloggers)

I have just submitted a revised version of `survHE` on CRAN — it should be up very shortly. This will be version 1.0.64 and its main feature is a major restructuring in the way the rstan/HMC stuff works.

Basically, this is due to a change in the default `C++` compiler. I don’t think much will change in terms of how `survHE` works when running full Bayesian models using HMC, but `R` now compiles it without problems. After the advice of Ben Goodrich, I have also modified the package so that it compiles the Stan programs serially as opposed to gluing them all together, which should optmise the use of the memory.

To **leave a comment** for the author, please follow the link and comment on their blog: ** R on Gianluca Baio**.

R-bloggers.com offers