In this earlier post I made brief mention of Deaton's path-breaking work, with John Muellbauer, that gave us the so-called "Almost Ideal Demand System".

The AIDS model took empirical consumer demand analysis to a new level. It facilitated more sophisticated, and less restrictive, econometric analysis of consumer demand behaviour than had been possible with earlier models. The latter included the fundamentally important Linear Expenditure System (Stone, 1954), and the Rotterdam Model (Barten, 1964; Theil, 1965).

I thought that readers may be interested in an empirical exercise with the AIDS model. Let's take a look at it. First of all,the theoretical model needs to be explained.

Let total expenditure on the n goods in the system be

M = Σi (pi qi) , (1)

and let

wi = (pi qi) / M ; i = 1, ..., n (2)

denote the "budget share" for the ith good.

The system of demand equations itself is:

wi = αi + βi [log(M) - log(P)] + Σj γij log(pj) + εi ; i = 1, ...., n (3)

The overall price index, P, is defined by the following translog specification:

log(P) = α0 + Σj αj log(pj) + 0.5 Σi Σj γij log(pi) log(pj) . (4)

(All of the summations in equations (1) to (4) run from i = 1 to n; or from j = 1 to n.)

Notice that once (4) is substituted into (3), each of the n equations in the latter system is highly non-linear in the parameters of the model.

In practice, a value for α0 is usually pre-assigned, and there are various ways of choosing an "optimal" value (see Michalek and Keyzer, 1992).

One of the things that I like about empirical exercises such as the one that follows is that they illustrate how the underlying microeconomic theory can be incorporated explicitly into the formulation of the econometric model, and the subsequent estimation and testing.

(

Specifically, for the AIDS model there are various restrictions on the parameters that we have to consider:

Engel aggregation requires that

Σk αk = 1 ; Σk βk = 0 ; Σk γkj = 0 ; for all j = 1, ...., n (5)

These restrictions will be satisfied

Homogeneity requires that

Σkγik = 0 ; for all i = 1, ...., n ; (6)

and Slutsky symmetry requires that

γij = γji ; for all i, j = 1, ...., n . (7)

(In equations (5) and (6), the summations run from k = 1 to n.)

The homogeneity and symmetry restrictions are testable, and can be imposed, as appropriate.

We're going to estimate an AIDS model for beer, wine, and spirits (numbered in that order), using annual time-series data for the U.K. over the period 1955 to 1985 inclusive. The data are available on the data page for this blog, and they come from Selvanathan, 1995, p.124). I'm going to use the 'micEconAids' package for R (Henningsen, 2015) for estimation and hypothesis testing, and my R code is on the code page for this blog.

In the following application, we'll assume weak separability of the underlying utility function, so "Total Expenditure" (M) will be the total expenditure on the three types of alcoholic beverages. The sample means of the budget shares for three goods are 91.2%, 6.6%, and 2.2% for beer, wine, and spirits, respectively. The last of the expenditure share is fairly constant over the sample period, while those for beer and wine decrease from 94.1% to 88.3%, and increase from 3.9% to 9.6% respectively.

Here is my R code to obtain a value for α0, and to estimate the 3-equation system with both the homogeneity and symmetry restrictions imposed on the parameters.

Here are the basic results, which are self-explanatory:

We see that the total expenditure ("income") elasticities suggest that beer is a necessity, while spirits and (especially) wine are both luxury goods. It would be a good idea to recall that the data are for U.K. for the period 1955 to 1985! Each good has own-price elasticities of demand that are negative, as expected. Beer is own-price inelastic, while wine and spirits are own-price elastic. The compensated price elasticities suggest, among other things, that each of the three beverages are substitutes.

The "summary" command in my code yields some additional results:

Recalling that these results are for an AIDS model in which both the homogeneity and symmetry restrictions have been imposed, we had better test to see if these restrictions are supported by the data. Here is some R code to facilitate this:

The results are:

So, these test results suggest that:

- We should reject the symmetry restrictions when they are added to the homogeneity restrictions. ( p = 0.0003).
- We should reject the homogeneity restrictions against the alternative of no restrictions,
*other than Engel aggregation.*(p = 4*10-6). - We should reject the (joint) symmetry and homogeneity restrictions in favour of no restrictions.
*other than Engel aggregation*. (p = 3*10-8).

Once again, the total expenditure ("income") elasticities suggest that beer is a necessity, while spirits and wine are both luxury goods. Each good has own-price elasticities of demand that are negative, as expected. Now both beer and wine are own-price inelastic, while spirits is own-price elastic. The compensated price elasticities still suggest that each of the three beverages are substitutes.

The rest of the results are:

Let's explore these results for the unrestricted model a little further.

Good news!

There are other things that can be done with the 'micEconAids' package in R. It's a great resource, and is just one of the packages available from the 'micEcon' project.

So, congratulations to Angus Deaton on his Nobel Prize, and let's not forget the many seminal contributions that he made to consumer demand theory, beyond the AIDS model.

References

Barten, A. P., 1964. Consumer demand functions under conditions of almost additive preferences.

Deaton, A. and J. Muellbauer, 1980. An almost ideal demand system.

Henningsen, A., 2015, Demand analysis with the almost ideal demand system: Package 'micEconAids', CRAN Repository.

Michalek, J. and M. A. Keyzer, 1992. Estimation of a two-stage LES-AIDS consumer demand system for eight EC countries.

Selvanathan, E. A., 1995. Data-analytic techniques for consumer economics. In E. A. Selvanathan and K. W. Clements (eds.),

Stone, R., 1954. Linear expenditure systems and demand analysis: An application to the pattern of British demand.

Theil, H., 1965. The information approach to demand analysis.

© 2015, David E. Giles ]]>

(This article was first published on ** Econometrics Beat: Dave Giles' Blog**, and kindly contributed to R-bloggers)

I was delighted by yesterday’s announcement that Angus Deaton has been awarded the Nobel Prize in Economic Science this year. His contributions have have been many, fundamental, and varied, and I certainly won’t attempt to summarize them here. Suffice to say that the official citation says that the award is “*for his contributions to consumption, poverty, and welfare*“.

In this earlier post I made brief mention of Deaton’s path-breaking work, with John Muellbauer, that gave us the so-called “Almost Ideal Demand System”.

The AIDS model took empirical consumer demand analysis to a new level. It facilitated more sophisticated, and less restrictive, econometric analysis of consumer demand behaviour than had been possible with earlier models. The latter included the fundamentally important Linear Expenditure System (Stone, 1954), and the Rotterdam Model (Barten, 1964; Theil, 1965).

I thought that readers may be interested in an empirical exercise with the AIDS model. Let’s take a look at it.

First of all,the theoretical model needs to be explained.

Let total expenditure on the n goods in the system be

M = Σ_{i} (p_{i} q_{i}) , (1)

and let

w_{i} = (p_{i} q_{i}) / M ; i = 1, …, n (2)

denote the “budget share” for the i^{th} good.

The system of demand equations itself is:

w_{i} = α_{i} + β_{i} [log(M) – log(P)] + Σ_{j} γ_{ij} log(p_{j}) + ε_{i} ; i = 1, …., n (3)

The overall price index, P, is defined by the following translog specification:

log(P) = α_{0 }+ Σ_{j} α_{j} log(p_{j}) + 0.5 Σ_{i} Σ_{j} γ_{ij} log(p_{i}) log(p_{j}) . (4)

(All of the summations in equations (1) to (4) run from i = 1 to n; or from j = 1 to n.)

Notice that once (4) is substituted into (3), each of the n equations in the latter system is highly non-linear in the parameters of the model.

In practice, a value for α_{0} is usually pre-assigned, and there are various ways of choosing an “optimal” value (see Michalek and Keyzer, 1992).

One of the things that I like about empirical exercises such as the one that follows is that they illustrate how the underlying microeconomic theory can be incorporated explicitly into the formulation of the econometric model, and the subsequent estimation and testing.

(*This stands in contrast with a lot of other empirical work that we encounter* – see this post.)

Specifically, for the AIDS model there are various restrictions on the parameters that we have to consider:

Engel aggregation requires that

Σ_{k} α_{k} = 1 ; Σ_{k} β_{k} = 0 ; Σ_{k} γ_{kj} = 0 ; for all j = 1, …., n (5)

These restrictions will be satisfied *automatically, a long as the individual expenditures add up to total expenditure in the sample.*

Homogeneity requires that

Σ_{k}γ_{ik} = 0 ; for all i = 1, …., n ; (6)

and Slutsky symmetry requires that

γ_{ij} = γ_{ji} ; for all i, j = 1, …., n . (7)

(In equations (5) and (6), the summations run from k = 1 to n.)

The homogeneity and symmetry restrictions are testable, and can be imposed, as appropriate.

We’re going to estimate an AIDS model for beer, wine, and spirits (numbered in that order), using annual time-series data for the U.K. over the period 1955 to 1985 inclusive. The data are available on the data page for this blog, and they come from Selvanathan, 1995, p.124). I’m going to use the ‘micEconAids’ package for R (Henningsen, 2015) for estimation and hypothesis testing, and my R code is on the code page for this blog.

In the following application, we’ll assume weak separability of the underlying utility function, so “Total Expenditure” (M) will be the total expenditure on the three types of alcoholic beverages. The sample means of the budget shares for three goods are 91.2%, 6.6%, and 2.2% for beer, wine, and spirits, respectively. The last of the expenditure share is fairly constant over the sample period, while those for beer and wine decrease from 94.1% to 88.3%, and increase from 3.9% to 9.6% respectively.

Here is my R code to obtain a value for α_{0}, and to estimate the 3-equation system with both the homogeneity and symmetry restrictions imposed on the parameters.

Here are the basic results, which are self-explanatory:

We see that the total expenditure (“income”) elasticities suggest that beer is a necessity, while spirits and (especially) wine are both luxury goods. It would be a good idea to recall that the data are for U.K. for the period 1955 to 1985! Each good has own-price elasticities of demand that are negative, as expected. Beer is own-price inelastic, while wine and spirits are own-price elastic. The compensated price elasticities suggest, among other things, that each of the three beverages are substitutes.

The “summary” command in my code yields some additional results:

Recalling that these results are for an AIDS model in which both the homogeneity and symmetry restrictions have been imposed, we had better test to see if these restrictions are supported by the data. Here is some R code to facilitate this:

The results are:

So, these test results suggest that:

- We should reject the symmetry restrictions when they are added to the homogeneity restrictions. ( p = 0.0003).
- We should reject the homogeneity restrictions against the alternative of no restrictions,
*other than Engel aggregation.*(p = 4*10^{-6}). - We should reject the (joint) symmetry and homogeneity restrictions in favour of no restrictions.
*other than Engel aggregation*. (p = 3*10^{-8}).

In short, we should remove the homogeneity and symmetry restrictions. Here are the estimation results when we do this:

Once again, the total expenditure (“income”) elasticities suggest that beer is a necessity, while spirits and wine are both luxury goods. Each good has own-price elasticities of demand that are negative, as expected. Now both beer and wine are own-price inelastic, while spirits is own-price elastic. The compensated price elasticities still suggest that each of the three beverages are substitutes.

The rest of the results are:

Let’s explore these results for the unrestricted model a little further.

Good news!

There are other things that can be done with the ‘micEconAids’ package in R. It’s a great resource, and is just one of the packages available from the ‘micEcon’ project.

So, congratulations to Angus Deaton on his Nobel Prize, and let’s not forget the many seminal contributions that he made to consumer demand theory, beyond the AIDS model.

Barten, A. P., 1964. Consumer demand functions under conditions of almost additive preferences. *Econometrica*, 32, 1-38.

Deaton, A. and J. Muellbauer, 1980. An almost ideal demand system. *American Economic Review*, 70, 312-326.

Henningsen, A., 2015, Demand analysis with the almost ideal demand system: Package ‘micEconAids’, CRAN Repository.

Michalek, J. and M. A. Keyzer, 1992. Estimation of a two-stage LES-AIDS consumer demand system for eight EC countries. *European Review of Agricultural Economics*, 19, 137-163.

Selvanathan, E. A., 1995. Data-analytic techniques for consumer economics. In E. A. Selvanathan and K. W. Clements (eds.), *Recent Developments in Applied Demand Analysis:**Alcohol, advertising and global consumption. *Springer, Berlin.

Stone, R., 1954. Linear expenditure systems and demand analysis: An application to the pattern of British demand. *Economic Journal*, 64, 511-527.

Theil, H., 1965. The information approach to demand analysis. *Econometrica*, 33, 67-87.

To **leave a comment** for the author, please follow the link and comment on their blog: ** Econometrics Beat: Dave Giles' Blog**.

R-bloggers.com offers

(This article was first published on ** Revolutions**, and kindly contributed to R-bloggers)

by Michele Usuelli

Microsoft Data Scientist

Azure Machine Learning Studio is a drag-and-drop tool to deploy data-driven solutions. It contains pre-built items including data preparation tools and Machine Learning algorithms. In addition, it allows to include R and Python custom scripts.

In order to build powerful R tools, you might want to use some packages from the CRAN repository. Azure ML already contains just a few packages, so you might need to include some others. There are 7000+ packages out of which you will need just a few. For this purpose, you can use the miniCRAN package which creates a local repository containing a selection of packages and their dependencies.

You can get a free Azure ML subscription following this:

https://azure.microsoft.com/en-us/trial/get-started-machine-learning

After having subscribed to Azure ML, the first step is creating a miniCRAN local repository. You can find some instructions in this link

http://blog.revolutionanalytics.com/2014/10/introducing-minicran.html

Azure ML is based on Windows, so in the function *makeRepo* you need to include the argument *type = "win.binary"*. In this demo, you will use the ggplot2 package, so it should be in the list.

After you create your own repository (called repoCRANwin, for instance), the package binary files are stored into the folder *repoCRANwinbinwindowscontrib3.1*.

Now, you need to zip the main folder *repoCRANwin* and upload it to Azure ML. For this purpose, from the Azure ML menu, you need to select:

*New -> Dataset -> From local file*

After having clicked on *New* (on the bottom-left), you should see this

Now you need to create a new Azure ML experiment, open the *Saved Datasets -> My Dataset* tab, and drag and drop repoCRANwin.zip into the experiment.

Then, you include a custom R script from *R Language Modules -> Execute R script.*

In order to connect *repoCRANwin.zip* to the R script, you need to drag its output to the right-hand side input of *Execute R script*.

Opening the Execute R script, you can edit its R code. Your targets are

- Setting up the miniCRAN repository
- Extracting the list of available packages
- Testing a package, e.g. ggplot2

This is the R script to include:

# setting-up the repository

uri_repo <- "file:///C:/src/repoCRANwin/"

options(repos = uri_repo)

# extracting the list of available packages

table_packages <- data.frame(package = rownames(available.packages()))

# installing the ggplot2 package

install.packages("ggplot2")

library("ggplot2")

# building a sample ggplot2 chart

p <- qplot(iris$Species) print(p)

# outputting the list of packages

maml.mapOutputPort("table_packages")

Execute R script has two outputs:

- the list of packages (on the bottom left-hand side)
- a sample ggplot2 chart (on the bottom right-hand side)

If you click on the left-hand side output and select "Visualize", you'll see this:

The "package" column contains the packages that can be installed and loaded.

If you click on the right-hand side output, you'll see a sample ggplot2 chart. If this works, ggplot2 has been loaded and used properly, so we expect that most of the other packages will work.

Loading miniCRAN into an Azure ML R script allows you to access any package that you included. If you have a list of packages that you will use, you can just create a local miniCRAN archive and upload it. Then, you'll just need to input miniCRAN to the related R scripts and include a few lines of R code to configure it into each script. A next step could be defining a miniCRAN repository for each topic. For instance, there might be one for data preparation, one for Machine Learning, and another for data visualization.

To **leave a comment** for the author, please follow the link and comment on their blog: ** Revolutions**.

R-bloggers.com offers

(This article was first published on ** eoda, R und Datenanalyse » eoda english R news**, and kindly contributed to R-bloggers)

The eoda R-Academy course “R in Live Systems” teaches key aspects of using R in a productive business environment with many practical examples from 16^{th} to 17^{th} November 2015 in Kassel, Germany.

The professional use of R indicates special requirements in terms of reproducibility, compatibility, teamwork, load distribution and rights management. Reproducible results should be generated by R scripts at any time – also at intermediate performance of package, R or environment updates.

By means of versioning software, R scripts can be managed by multiple employees at the same time. In addition, automated testing ensures that changes in scripts do not lead to consequences of other scripts.

Finally, the client-server architecture is an essential part of productive environments of R because access authorisation and computing power can be improved to be controlled.

R is in the process of becoming the multi-platform lingua franca of data analysis. In companies, the powerful programming language is used more and more frequently. „R in Live Systems“ is suitable for everyone who has already used R in their business environment or wants to start working with R in the future.

Please sign up if you would like to participate: http://www.eoda.de/de/R_im_produktiven_Unternehmensumfeld.html

Course topics at a glance:

- Updates of packages and R
- Working in a closed environment
- Testing
- Versioning and collaboration
- Documentation and package creation
- R in client-server architecture
- …

Date: 16^{th} and 17^{th} November 2015

Place: Kassel, Germany

Language: German

To **leave a comment** for the author, please follow the link and comment on their blog: ** eoda, R und Datenanalyse » eoda english R news**.

R-bloggers.com offers

(This article was first published on ** r4stats.com » R**, and kindly contributed to R-bloggers)

Rexer Analytics has released preliminary results showing the usage of various data science tools. I’ve added the results to my continuously-updated article, The Popularity of Data Analysis Software. For your convenience, the new section is repeated below.

**Surveys of Use**

One way to estimate the relative popularity of data analysis software is though a survey. Rexer Analytics conducts such a survey every other year, asking a wide range of questions regarding data science (previously referred to as data mining by the survey itself.) Figure 6a shows the tools that respondents reported using in 2015.

We see that R has a more than 2-to-1 lead over the next most popular packages, SPSS Statistics and SAS. Microsoft’s Excel Data Mining software is slightly less popular, but note that it is rarely used as the primary tool. Tableau comes next, also rarely used as the primary tool. That’s to be expected as Tableau is principally a visualization tool with minimal capabilities for advanced analytics.

The next batch of software appears at first to be all in the 15% to 20% range, but KNIME and RapidMiner are listed both in their free versions and, much further down, in their commercial versions. These data come from a “check all that apply” type of question, so if we add the two amounts, we may be over counting. However, the survey also asked, “What *one* (my emphasis) data mining / analytic software package did you use most frequently in the past year?” Using these data, I combined the free and commercial versions and plotted the top 10 packages again in figure 6b. Since other software combinations are likely, e.g. SAS and Enterprise Miner; SPSS Statistics and SPSS Modeler; etc. I combined a few others as well.

In this view we see R even more dominant, with over a 3-to-1 advantage compared to the software from IBM SPSS and SAS Institute. However, the overall ranking of the top three didn’t change. KNIME however rises from 9th place to 4th. RapidMiner rises as well, from 10th place to 6th. KNIME has roughly a 2-to-1 lead over RapidMiner, even though these two packages have similar capabilities and both use a workflow user interface. This may be due to RapidMiner’s move to a more commercially oriented licensing approach. For free, you can still get an older version of RapidMiner or a version of the latest release that is quite limited in the types of data files it can read. Even the academic license for RapidMiner is constrained by the fact that the company views “funded activity” (e.g. research done on government grants) the same as commercial work. The KNIME license is much more generous as the company makes its money from add-ons that increase productivity, collaboration and performance, rather than limiting analytic features or access to popular data formats.

If you found this interesting, you can read about the results of other surveys and several other ways to measure software popularity here.

Is your organization still learning R? I’d be happy to stop by and help. I also have a workshop, *R for SAS, SPSS and Stata Users,* on DataCamp.com. If you found this post useful, I invite you to follow me on Twitter.

To **leave a comment** for the author, please follow the link and comment on their blog: ** r4stats.com » R**.

R-bloggers.com offers

(This article was first published on ** A HopStat and Jump Away » Rbloggers**, and kindly contributed to R-bloggers)

This blog post is a little late; I wanted to get it out sooner.

As new students have flooded the halls for the new terms at JHU Biostat, I figured I would give some recommendations to our new students, and biostatistics students in general. Some of these things may be specific to our department, but others are general, so the title should be fitting. Let's dive in!

Some books are good for a reference, many are not. I say this because much of the information is available on Google or the internet and you will check that 98% of the time compared to going to a book. That being said, many students have these good reference books and will be willing to let you borrow them. Also, the library in your department or school will likely have them.

The full recommendation for books is this:

- Borrow books you need for class, especially from current (not new) students. Sharing books with current students is good except if you both need it during crucial times (like exams/comprehensive exams). Everyone has Chung or Billingsley.
- Of those you can't borrow, go to class for a week or two and see if you
**actually**need it. Some professors go straight off their lecture notes. Your school bookstore doesn't just go and send back all their copies when school starts, so you can still get it. Also, I heard this new website Amazon has books. - If you think a book is a really good reference, buy a copy. Better yet, buy a digital copy so you can digitally search and annotate it.

You will be spending the majority of your time on your laptop, so it better work and be fast. Most new programs will have some money for books and a laptop. If you read above, you saved some money on books, so use it to buy a new laptop. If your laptop is less than 2 years old, you can save that money (if PhD) or buy other electronics such as an iPad for notetaking (if Master's).

Have the tools to make your work easy because nothing is worse than you not getting work done due to other factors than yourself.

Get a Unix-like machine (aka Mac). Others say you can do stuff in Windows, but it's easier for some software in Unix. Cluster computing (see below) will be easier as well.

Side note: if you buy a new computer, do not open it until Friday afternoon/Saturday as you will likely spend a whole day playing with your new gear.

I find many students know who the faculty are and what research they do, but have no idea about who the staff is. These people know almost **everything** you need to know for non-research help. They schedule meetings with the chair, organize events, schedule rooms, and, very importantly, know how to get you paid/your stipend. These people are the glue that makes everything run and are a great resource.

Go into the office and introduce yourself and ask what you should go to person X for. They will know you then when you email.

If you want to learn what research is all about, get involved early. Even if you don't feel like you know anything, waiting to get involved on a research project will not help. It can hinder you. I'm not saying work 10 hours per week on a project; you have classes.

Attending research meetings of a few working groups can help you 1) get information on the group and how it's run, 2) meet the group members, 3) choose what you may want to focus on, and 4) get you a small-scale project to start on.

This small project is not set in stone. It is not your thesis. The project contact doesn't have to be your thesis advisor. You will likely be working on this “for free” (unless it's under a training grant mentor, technically). Therefore, you don't “owe” anyone anything if you decide in a month you hate that field or project. Don't take it lightly to abandon a project, but do use it as a feeler in that area.

Let me reiterate (at least in our department): Your academic advisor doesn't need to be your research advisor.

Learn how to program as soon as possible. Some good resources are codeschool or code academy. If using R, I recommend first Try R from Codeschool. I would then move on to Swirl. It will never be a waste of time getting up to speed or learning how to do something new with programming. If you already feel great with R, you can try Python or move deeper into R.

This may be necessary later in your program, but try to do it before it's “necessary”.

You are going to work on some project invariably that 1) will use simulations or 2) requires intense computation. As such, a computing cluster is made specifically for these scenarios. Learn how to use it. If you're not going to use it now for research, at least get your login and try it briefly for a class project.

Condense your note-taking into one app. I like using Evernote as it syncs with my phone and Mac.

Use Dropbox or Google Drive to have a “hands-free” syncing service. Also, think about investing in an External Hard Drive, maybe as your new gear, to doubly back-up your system/data. Laptops can (and have been) stolen. Although Google Drive/DropBox are likely to be around for some time, you always want something in your control (external HDD) in case something goes wrong on a server. GitHub is great for version control, and some people use it as a back up of sorts, but it's not really for that and not a “hands free” rsync-based solution.

Learn a Markdown language. Yihui has a good description of Markdown vs. LaTeX. You will need to know both. Think about learning some basics of HTML as well.

With your newfound HTML skills from above, build a webpage for yourself. Some use GitHub or WordPress. Many options exist, depending on your level of expertise, blogging capability, level of control.

Why do I need a webpage? You work on a computer (after classes) like 98% of the day. You should have a web presence.

What about my LinkedIn profile? That's good for a resume-like page, but not great for your opinions, picture uploads, general ideas. Also, your webpage allows you to control completely what you put out there. Remember, if you don't make the content, Google will pick what people see.

Check out student websites and ask the student whose you like best how they did it.

One of my rules is to never be scared to ask a stupid question. I ask questions all the time. Some of them are stupid. I know that I won't get an answer if I don't ask though.

We have offices. Students are in those offices. Ask them questions. It's that simple.

Many students say “well I don't want to bother them”. I learned how to code by bothering people. I bothered them very much so. I thought I was annoying, but I didn't care because I didn't know what the hell I was doing.

Does that mean I want questions all day by new students? No. Read that again. No. But I do try to pay forward information to new students just like others paid towards me. If a student is curt or makes you feel stupid about asking a question, stop talking to them. They forgot what it was like when they were lost and confused and are likely now severely delusional.

Your questions are usually not new. We've asked them likely ourselves. We either have the answer or know who does. Ask.

No one in my office knows anything!!? Who do I ask now? Well there are student-lead meetings. These have a lot of information and … other students! Go there, ask questions. If the topic is not what you need to know, wait until the end of the meeting when the structure breaks down and ask someone then.

Student-lead meetings have a lot less pressure to ask the “stupid questions” in a safer environment and will likely lead to answers that you understand. Because they are from other students.

Get chummy with your cohorts. You don't have to be best friends forever, but you will talk with them, have class with them, and likely work with them. Stop doing things on your own, that's not leveraging other people's brain for you as well.

These are other smart people (they were smarter than me). Why not work with them and grab some of that brainyness floating around. You will feel dumb for a while, but you'll figure it out. If you don't work with a group in the beginning, it may be too late later when people have grouped up.

They are not your competition, though many departments make it seem like that. The next stage of your career will be mixed with projects on team and the rare projects where you are alone (aka thesis). Learn how to play with, and more imporantly listen to, others.

“I came to grad school to get a 4.0” said no one ever. Grades are important for somewhat narrow things such as if the comprehensive exams go badly, are “an assessment” of your learning, or if you apply to a job with a Master's and they ask for your transcript (and for some reason care).

But good grades are not the goal of grad school. It's learning. Learn and understand the material. Learn how to learn new material. That's the goals. Grades matter in the sense they will let you know quite glaringly when you **really** don't know something. Remember learning is improving yourself and that should make it easier to do a project than just doing it “because someone told you to”.

You need rest. Take it. A day off can clarify things later. Sometimes it's only when you stop hitting your head against the wall when you realize that what you're doing doesn't work. That's not to say you still won't work like 60 hours a week for a while, but make sure you have some protected time for your banging head.

One of the best pieces of advice I've ever gotten for grad school was: “find a place you want to spend the next 5 years of your life” in reference to your department **AND** city. Whatever city your in has fun things to do. Find them. Explore your city and area. People tend to hate places they live in grad school if they don't associate anything with it other than working in a hole. Which leads me to…

Find a place where you are productive and like to go. I like the office; others don't. Find a coffee shop near you for days without class or when you are done classes. Use the reading room or other areas as your go to. Again, working somewhere you don't like is one more hurdle to getting things done. Get rid of such hurdles, you will have enough of them to make your own.

To **leave a comment** for the author, please follow the link and comment on their blog: ** A HopStat and Jump Away » Rbloggers**.

R-bloggers.com offers

(This article was first published on ** analytics for fun**, and kindly contributed to R-bloggers)

Recently my friend Andrew Geisler released a new version of the GAR package. Like other similar packages, the GAR package is designed to help you retrieve data from Google Analytics using R. But with some new features.

I have been playing a bit with the package and the feature I enjoy the most is the ability to **query multiple Google Analytics View IDs** in the same query. To do that, you simply need to pass a vector of the View IDs in the correspondent *gaRequest()* command, and you get back a data frame with each view/profile clearly identified and all their correspondent metrics/dimension you included in the query. Pretty simple, no?

I think this is a very useful feature which makes the GAR package stand out from other similar packages out there (as far as I know there are currently 4 Google Analytics packages available: RGoogleAnalytics, RGA, ganalytics and GAR of course).

You could also build a loop in R to query multiple View IDs at once, and this is actually what I did previously using the RGoogleAnalytics package. But having this feature included in a package, it just make your life easier!

The GAR package is available on CRAN repository (v1.1 was released on 17 Sep 2015) and you can install it and load with the following commands:

*install.packages(‘GAR’, type=source)**library(GAR)*

To get data from Google Analytics is easy and similar to other packages.

First of all you need to:

- Create a new project in the Google Developers’s API Console, if you have not done it before.
- Authenticate using your project credentials.

You can find a detailed explanation for these two steps on the GAR github tutorial here.

So, assuming you got the authentication right and obtained a token, you now need to make sure your token is refreshed (GA access tokens expire) every time you need to retrieve data, and finally execute your query from R.

To refresh the token you use the *tokenRefresh()* function. The resulting access token will be stored as an environmental variable accessible by the GAR Package.

*tokenRefresh(GAR_CLIENT_ID, GAR_CLIENT_SECRET, GAR_REFRESH_TOKEN)*

To get the data, you will use the *gaRequest()* function.

*df <- gaRequest(**id=c(‘ga:123456789′,’ga:987654321′),**dimensions=’ga:date,ga:month’,**metrics=’ga:sessions, ga:users, ga:pageviews’,**start=’YYYY-MM-DD’,**end=’YYYY-MM-DD’,**sort=’-ga:sessions,ga:users’**)*

The arguments of this function are based on the structure of the typical API call to Google Analytics. So, it’s here that you will specify all the parameters of your query (metrics, dimensions, period,etc.). And it is here in particular that you **specify the Google Analytics View IDs** you would like to get the data from.

Of course the *gaRequest()* function will authenticate using the access token previously stored as an environmental variable.

Let’s run an example. In the query below I am asking Google Analytics API to retrieve data about sessions and pageviews between 10 Oct 2015 to 11 Oct 2015, from five distinct View IDs.

*df <- gaRequest(**id=c(‘ga:83424646′,’ga:77989457′,’ga:82857332′,’ga:65743580′,’ga:65743194′), dimensions=’ga:date,ga:month’,**metrics=’ga:sessions, ga:pageviews’,**start=’2015-10-10′, end=’2015-10-11′,**sort=’-ga:sessions,ga:pageviews’)*

As expected, the resulting dataset has a total of 10 rows (5 View IDs x 2 days).

As you can see on the screenshot, in addition to the metrics and dimensions you requested, the resulting data frame contains also details about your request, such as:

- profile ID (or View ID)
- accountId
- webPropertyId
- internalWebPropertyId
- profileName (or View name)
- tableId
- start-date
- end-date

Now that you have got your output data frame, you might want to categorize different websites or Views according to specific criteria and apply any aggregate functions (sum, average). It’s up to you and to your internal business reporting needs. The key thing is that **all the data you requested are included in a single table** and ready analyse it with R.

Happy analysis!

To **leave a comment** for the author, please follow the link and comment on their blog: ** analytics for fun**.

R-bloggers.com offers

(This article was first published on ** Econometrics by Simulation**, and kindly contributed to R-bloggers)

The other day one of my friends on facebook posted this video about the mystical nature of the number 9. Being a skeptical of all think hokey, I decided to experiment with numbers to see how special 9 really is.
### 1: Magic 9 is embedded in the circle

### 2. Sides of the polygon

### 3. All of the digits less than 9 add up to 9 (1+2+3+4+5+6+7+8=36…3+6=9)

### 4. Nine plus any digit returns that digit (9+7=16…1+6=7)

### 5. The Magical Marvelous Mystery of Base 10

There are four claims made in the video about the magic of nine.

1. Partition a circle as many times and its digits add up to nine

2. Add the sides of a regular polygon together and their digits sum to 9

3. Add all of the digits up to 9 and they sum to 9

4. Add 9 to any other digit and it returns that digit.

In this post I will address all of these patterns and demonstrate conclusively that 9 is not special but only a feature of using a base 10 system (that is we count to 9 before starting over again for example 9 10 11 or 19 20 21 etc).

This may seem like a silly issue to address. However, a lot of people have watched this video (6 million plus either on youtube or the original facebook post). In this post I will support my arguments by use of some custom R functions built for this task. You need not have R or run the functions to understand the results.

At the beginning of the video, the author suggests that there is something special about 9 because when you divide the degrees of a circle in half all of the digits add up to 9. No only that but when you divide each of those halves each digit adds up to nine.

360 … 3+6+0=9

180 … 1+8+0=9

90 … 9+0=9

45 … 4+5=9

22.5 … 2+2+5=9

11.25…1+1+2+5=9

Up to double digits which you then add together. So the pattern continues.

5.625 … 5+6+2+5=18 … 1+8=9

At 150 splits (ignoring 0s and decimals) you get a series of numbers that look like this:

1261168617892335712057451360671424133527367330073300513283903262277297315100935657007585133880800115163136

And when you add them all together they equal: 396 … 3+9+6=18 … 1+8=9

So, as far as I am willing to explore, the pattern seems to hold.

First look at this, I said, “okay, but is this pattern just for 9s?”

After some exploration I came to the conclusion “yes” (at least with a base 10 system). There seems to be some other patterns but nothing as straightforward as the 9s.

In order to accomplish this efficiently I programmed a “splitter” algorithm in R. This will split any number in half, take the digits and add them together. See below:

library(magrittr)

library(gmp)

splitter <- function(X, it, nest, noisy=TRUE) {

Ys <- matrix(NA, nrow=it, ncol=nest)

options(digits=22,scipen=999)

esum <- function(x)

x %>% toString %>% sub(".", "", ., fixed = TRUE) %>% strsplit("") %>% unlist %>% as.numeric

for (i in 0:(it-1)) {

x <- as.bigz(X)

x <- x*10^(i)/2^i

Y <- x %>% esum

if (noisy) print(sprintf("%s: %s -> sum(%s)=%s",i, x ,paste(Y, collapse=" "), sum(Y)))

Ys[i+1, 1] <- sum(Y)

for (j in 2:nest) Ys[i+1, j] <- Ys[i+1, j-1] %>% esum %>% sum

}

Ys

}

# So let's first examine 9

splitter(X=9, it=150, 3)

# The first column is the sum of the digits

# The second column is the sum of the sum of the digits

# The third column is the sum of the sum of the sum of the digits

# Yes the first 4 halves produce a situation in which the digits all add up to 9

# If you combine the next 5 through 30 splits then the sum of the sum must be

# added together to also produce the designated 9.

# As we get deeper there is no reason to suspect that this will not carry to the

# next level.

splitter(8, 50, 3, noisy = FALSE)

splitter(7, 50, 3, noisy = FALSE)

splitter(6, 50, 3, noisy = FALSE)

splitter(5, 50, 3, noisy = FALSE)

splitter(4, 50, 3, noisy = FALSE)

splitter(3, 50, 3, noisy = FALSE)

splitter(2, 50, 3, noisy = FALSE)

splitter(1, 50, 3, noisy = FALSE)

# Looking at 1-8 we do not ever get the same number out as with 9.

# Does this make 9 unique, special, or even magical?

Created by Pretty R at inside-R.org

So, does this mean there is something to 9s? Well, maybe, but maybe it is a pattern that naturally emerges because we are using a base 10 system. What would happen if we switched to base 9? Or base 8?

In order to test this idea, I first programmed a function to switch numbers from base 10 to any other base.

base10to <- function(x, newbase=10, sep='') {

if (length(dim(x))==0) xout <- rep("", length(x))

if (length(dim(x))==2) xout <- matrix("", dim(x)[1], dim(x)[2])

for (j in 1:length(x)) {

x2 <- x[j]

digits <- ((1+x2) %>% as.bigz %>% log(newbase) %>% floor)

d <- rep(NA, digits+1)

for (i in 0:(digits)) {

d[i+1] <- (x2/newbase^(digits-i)) %>% as.numeric %>% floor

x2 <- x2-d[i+1]*newbase^(digits-i)

}

xout[j] <- paste(d, collapse=sep)

}

xout

}

x <- matrix(1:100, 10, 10)

base10to(x)

base10to(x, 5)

base10to(x, 9)

base10to(x, 2)

Created by Pretty R at inside-R.org

Seems to be working

Note, it does not work with decimals

Then I integrated it with my switcher:

# Now let’s redefine our splitter allowing for non-base 10

splitter2 <- function(X, it, nest, noisy=TRUE, base=10) {

Ys <- matrix(NA, nrow=it, ncol=nest)

esum <- function(x, base)

x %>% base10to(base) %>% strsplit("") %>% unlist %>% as.numeric

for (i in 0:(it-1)) {

x <- as.bigz(X)

x <- (x*10^(i)/2^i)

Y <- x %>% esum(base)

if (noisy)

print(sprintf("%s: %s -> sum(%s)=%s base %s",

i, x,

paste(Y, collapse=" "),

base10to(sum(Y), base),

base))

Ys[i+1, 1] <- sum(Y) %>% base10to(base)

for (j in 2:nest) Ys[i+1, j] <- Ys[i+1, j-1] %>% as.numeric %>% esum(10) %>%

sum %>% base10to(base)

}

Ys

}

Created by Pretty R at inside-R.org

splitter2(9, 15, 3, noisy = TRUE)

Output:

[,1] [,2]

[1,] "9" "9"

[2,] "9" "9"

[3,] "9" "9"

[4,] "9" "9"

[5,] "18" "9"

[6,] "18" "9"

[7,] "18" "9"

[8,] "18" "9"

[9,] "27" "9"

[10,] "36" "9"

[11,] "45" "9"

[12,] "36" "9"

[13,] "45" "9"

[14,] "45" "9"

[15,] "45" "9"

splitter2(8, 15, 2, noisy = TRUE, base=9)

Output:

[,1] [,2]

[1,] "8" "8"

[2,] "8" "8"

[3,] "8" "8"

[4,] "8" "8"

[5,] "26" "8"

[6,] "26" "8"

[7,] "17" "8"

[8,] "17" "8"

[9,] "35" "8"

[10,] "26" "8"

[11,] "26" "8"

[12,] "35" "8"

[13,] "35" "8"

[14,] "53" "8"

[15,] "35" "8"

splitter2(7, 15, 2, noisy = TRUE, base=8)

Output:

[,1] [,2]

[1,] "07" "07"

[2,] "07" "07"

[3,] "16" "07"

[4,] "16" "07"

[5,] "16" "07"

[6,] "25" "07"

[7,] "34" "07"

[8,] "25" "07"

[9,] "34" "07"

[10,] "34" "07"

[11,] "34" "07"

[12,] "43" "07"

[13,] "52" "07"

[14,] "52" "07"

[15,] "70" "07"

splitter2(6, 15, 2, noisy = TRUE, base=7)

[,1] [,2]

[1,] "6" "6"

[2,] "6" "6"

[3,] "6" "6"

[4,] "6" "6"

[5,] "24" "6"

[6,] "24" "6"

[7,] "24" "6"

[8,] "33" "6"

[9,] "33" "6"

[10,] "24" "6"

[11,] "33" "6"

[12,] "33" "6"

[13,] "51" "6"

[14,] "60" "6"

[15,] "51" "6"

splitter2(1, 15, 4, noisy = TRUE, base=2)

Output:

[,1] [,2] [,3] [,4]

[1,] "01" "01" "01" "01"

[2,] "10" "01" "01" "01"

[3,] "11" "10" "01" "01"

[4,] "110" "10" "01" "01"

[5,] "101" "10" "01" "01"

[6,] "110" "10" "01" "01"

[7,] "0111" "11" "10" "01"

[8,] "1000" "01" "01" "01"

[9,] "1100" "10" "01" "01"

[10,] "1101" "11" "10" "01"

[11,] "1011" "11" "10" "01"

[12,] "01111" "100" "01" "01"

[13,] "1101" "11" "10" "01"

[14,] "1110" "11" "10" "01"

[15,] "10001" "10" "01" "01"

Now we can see that by changing the base, the same pattern emerges. With base 9, the “magic” number is 8. With base 8, 7 etc. All the way down to base 2 where the “magic” number is 1.

What about the other claims put forward in the video? That is, that all sides of all polygons angles add up to 9. That is for a regular polygon

triangle 60+60+60=180…1+8+0=9

square 90+90+90+90=360…3+6=9

pentagon 108+108+108+108+108=540…5+4=9

etc.

We can think of any equilateral polygon forming n identical triangles. The tips of those triangles meet at the center with angle 360/n where n is the number of sides of the polygon. To find the other angles we reflect on knowing that the two other angles are equal and since all of the sides of a triangle add up to 180 we can figure their size is (180-360/n)/2. Nowever, the triangles formed by the polygons only represent half the triangle’s edge so we need to double that giving:

$$theta(n)=180-(360/n) =180(1-2/n)$$ with n being the number of sides.

We can see that this could be written: 9*20(1-2/n):

So the question is, using an alternative base system will we get the same pattern?

Let’s try base 9 instead of 10. Let’s define the angles of the polygon as now: 8*20(1-2/n) base 10. Making a circle now 320 degrees base 10 or 385 base 9. Now let’s see about the sum of the sides.

triangle 53+53+53=160 base 10 or 187 base 9 … 1+8+7 = 16 base 10 or 17 base 9 … 1+7=8

square 80+80+80+80=320 base 10 or 385 base 9 … 3+8+5 = 16 … 8

pentagon 96+96+96+96+96=480 base 10 or 583 base 9 … 5+8+3 = 16 … 8

Hail the magical 8!

Do I need to do this again with a different base?

Base 10 magic 9:1+2+3+4+5+6+7+8=36 base 10 … 3+6=9 YES!

Base 9 magic 8: 1+2+3+4+5+6+7=28 base 10 or 31 base 9… 3+1=4 nope!

Base 8 magic 7: 1+2+3+4+5+6=21 base 10 or 25 base 8 … 2+5=7 YES!

Base 7 magic 6: 1+2+3+4+5=15 base 10 or 21 base 7 … 2+1=3 nope!

Base 6 magic 5: 1+2+3+4=10 base 10 or 14 base 6 … 1+4=5 YES!

etc.

I think the pattern is pretty obvious here as well.

Base 10 magic 9: 9+5=14…1+4=5

Base 9 magic 8: 8+5=13 base 10 or 14 base 9…1+4=5

Base 8 magic 7: 7+5=12 base 10 or 14 base 8…1+4=5

Base 7 magic 6: 6+5=11 base 10 or 14 base 7…1+4=5

Base 6 magic 5: 5+5=10 base 10 or 14 base 6…1+4=5

etc.

Need I say more?

Clearly, we can see that there is nothing special about 9. If there is any mystery, it is in the base 10 since as soon as we change the system from base 10 to base 9 the “magic” moves to another number.

We must ask ourselves therefore, “why we are using base 10?”

This is an excellent question! Other systems have developed such as the binary and corresponding hexidecimal system which are powerful systems that intuitively are more consistent than the tens system. With hexidecimals, everything can be written as a series of four binaries reducing all communication to 0 or 1 signals. If you think about it, this is a much more intuitive communication system than a 10 digit one.

As for why we are using the 10 digit system. My best guess is that we are using base 10 because most people have 10 fingers and it is therefore easier to teach someone to count on a 10 digit system.

So, yes, 9 is special but only because 10 is special and that is only special because our hands developed in such a way as that we have typically have 5 fingers on each hand summing to 10.

So YAY, MAGIC US!

To **leave a comment** for the author, please follow the link and comment on their blog: ** Econometrics by Simulation**.

R-bloggers.com offers

(This article was first published on ** Revolutions**, and kindly contributed to R-bloggers)

Powerpoint is a powerful application for creating presentations, and allows you to include all sorts of text, pictures, animations and interactivity to create a compelling story. Most of the time you'll use the Powerpoint application to create slides, but if you want to include data and/or charts in your slides, in the interests of **reproducibility** you may want to automate the slide creation process. By using the R language with the Powerpoint API, you can recreate your slides in an instant whenever your data changes.

Asif Salam has created a nice tutorial showing how to use the RDCOMClient package to do exactly that. In the tutorial, Asif goes through the steps of creating the interactive visualization of Clint Eastwood's box office earnings shown below (and which you can also download as a PPT file):

The tutorial is in three parts:

- The basics: using the Powerpoint API to create slides and add content (with some use of VBA for animations).
- Getting data: scraping data from IMDB on Clint Eastwood films and earnings. This part isn't specific to Powerpoint, and of course you can use any data you like in a presentation.
- Creating a slide: adding elements to a slide using R objects and functions, and creating the animation.

The complete R code and data behind this animated slide is available on GitHub, and will serve as a useful starting point for your own automated slide generation.

By the way, if your needs are simpler and you just want to create static Powerpoint slides from R (with text, data, and graphics), take a look at Slidify. With Slidify, you can generate Powerpoint slides from R using just a Markdown document.

Asif Salam: Create amazing PowerPoint slides using R – The basics

To **leave a comment** for the author, please follow the link and comment on their blog: ** Revolutions**.

R-bloggers.com offers