For SAS programmers, the PUT statement in the DATA step and the %PUT macro statement are useful statements that enable you to display the values of variables and macro variables, respectively. By default, the output appears in the SAS log. This article shares a few tips that help you to […]

The post PUT it there! Six tips for using PUT and %PUT statements in SAS appeared first on The DO Loop.

The post PUT it there! Six tips for using PUT and %PUT statements in SAS appeared first on All About Statistics.

]]>For SAS programmers, the PUT statement in the DATA step and the %PUT macro statement are useful statements that enable you to display the values of variables and macro variables, respectively. By default, the output appears in the SAS log. This article shares a few tips that help you to use these statements more effectively.

The PUT statement supports a "named output" syntax that enables you to easily display a variable name and value. The trick is to put an equal sign immediately after the name of a variable: `PUT varname=;`
For example, the following statement displays the text "z=" followed by the value of z:

data _null_; x = 9.1; y = 6; z = sqrt(x**2 + y**2); put z=; /* display variable and value */ run; |

z=10.9 |

You can extend the previous tip to arrays and to sets of variables. The PUT statement enables you to display elements of an array (or multiple variables) by specifying the array name in parentheses, followed by an equal sign in parentheses, as follows:

data _null_; array x[5]; do k = 1 to dim(x); x[k] = k**2; end; put (x[*]) (=); /* put each element of array on separate lines */ put (x1 x3 x5) (=); /* put each variable/value on separate lines */ run; |

x1=1 x2=4 x3=9 x4=16 x5=25 x1=1 x3=9 x5=25 |

This syntax is not supported for _TEMPORARY_ arrays. However, as a workaraoun, you can use the CATQ function to concatenate array values into a character variable, as follows:

temp = catq('d', ',', of x[*]); /* x can be _TEMPORARY_ array */ put temp=; |

Incidentally, if you ever want to apply a format to the values, the format name goes inside the second set of parentheses, after the equal sign: `put (x1 x3 x5) (=6.2);`

The previous tip displayed all values on a single line. Sometimes it is useful to display each value on its own line. To do that, put a slash after the equal sign, as follows:

... put (x[*]) (=/); /* put each element on separate lines */ ... |

x1=1 x2=4 x3=9 x4=16 x5=25 |

You can display all values of all variables by using the _ALL_ keyword, as follows:

data _null_; x = 9.1; y = 6; z = sqrt(x**2 + y**2); A = "SAS"; B = "Statistics"; put _ALL_; /* display all variables and values */ run; |

x=9.1 y=6 z=10.9 A=SAS B=Statistics _ERROR_=0 _N_=1 |

Notice that in addition to the user-defined variables, the _ALL_ keyword also prints the values of two automatic variables named _ERROR_ and _N_.

Just as the PUT statement displays the value of an ordinary variable, you can use the %PUT statement to display the value of a macro variable. If you use the special "&=" syntax, SAS will display the name and value of a macro variable. For example, to display your SAS version, you can display the value of the SYSVLONG automatic system macro variable, as follows:

`%put &=SYSVLONG;` |

SYSVLONG=9.04.01M4P110916

The results above are for my system, which is running SAS 9.4M4. Your SAS version might be different.

You can display the name and value of all user-defined macros by using the _USER_ keyword. You can display the values of all SAS automatic system macros by using the _AUTOMATIC_ keyword.

%let N = 50; %let NumSamples = 1e4; %put _USER_; |

GLOBAL N 50 GLOBAL NUMSAMPLES 1e4

There you have it: six tips to make it easier to display the value of SAS variables and macro variables.
Thanks to Jiangtang Hu who pointed out the `%PUT &=var` syntax in his blog in 2012.
For additional features of the PUT and %PUT statements, see:

The post PUT it there! Six tips for using PUT and %PUT statements in SAS appeared first on The DO Loop.

**Please comment on the article here:** **The DO Loop**

The post PUT it there! Six tips for using PUT and %PUT statements in SAS appeared first on All About Statistics.

]]>The post New Series: ISOTYPE Books appeared first on All About Statistics.

]]>Presenting facts through data is not a recent idea. Otto and Marie Neurath created ISOTYPE in the 1920s and then ran their ISOTYPE Institute for more than two decades. During that time, they created charts for a wide variety of publications. In this series, I will show a number of these charts that I have found, and discuss the context they appeared in.

Over the past year, I have bought a number of books containing ISOTYPE charts. Some contain many of them, some just a single one. Some use them throughout, some just have them as the inside cover. These are all from the 1940s through the 1960s.

What I’m trying to understand and illustrate at the same time is how prevalent ISOTYPE was for a while. I’ll be looking for hints in the books to see if authors felt the need to explain them, or just assumed that people smart enough to read the books would also get the charts (the latter seems to be the case, from what I’ve seen). Steve Haroz, Steven Franconeri, and I have also shown that the technique is effective as a means of showing data so people remember what they've seen.

While ISOTYPE, Neurath, and Gerd Arntz are known in information design circles, visualization people seem to be largely unaware. Yet there seems to be a forgotten success story for data-based communication, and a perhaps a useful bit of history – it’s not like our history is so rich that we can’t use some of that.

In addition to the charts, these books are also fascinating time documents. Some were published just after World War II. I will try to capture some of that as well.

The plan is to write one ISOTYPE posting a month. With my current collection, that should give me enough material through this year at least (depending on how I group them, since some of them are series). In addition, I also want to talk about Neurath’s own writing, in particular *Modern Man in the Making* and his autobiography, *From Hieroglyphics to ISOTYPE*. Marie Neurath’s books after the war will also make an appearance at some point (they’re less numbers-based, but a similar style). I’m also starting to discover some of the other players in this space, like Rudolf Modley, who used ISOTYPE-style charts for various purposes.

There is clearly a lot to explore that visualization folks don’t seem to be aware of. I hope to shed some light on this under-appreciated part of our history.

**Please comment on the article here:** **eagereyes**

The post New Series: ISOTYPE Books appeared first on All About Statistics.

]]>The post The Class-size debate – it matters to teachers appeared first on All About Statistics.

]]>Class size is a perennial question in education. What is the ideal size for a school class? Teachers would like smaller classes, to improve learning. There is evidence of a small positive effect size due to reducing class size from meta-analysis published in John Hattie’s Visible Learning. But it makes sense, teachers argue – fewer children in the class means more opportunities for one-to-one interactions with the teacher. It makes for easier crowd control, less noise and less stress for teachers and pupils. And in these days of National Standards, it makes the assessment load more realistic.

I’d just like to point out that educational research is difficult. One of my favourite readings on educational research is an opinion piece by David Berliner, Educational Research: The hardest science of all, where he explains the challenge of educational research. It was written in response to a call by the US Government for evidence-based practices in education. Berliner reminds us of how many different factors contribute to learning. And measuring learning is itself an inexact science. At one point he asks: “It may be stretching a little, but imagine that Newton’s third law worked well in both the northern and southern hemispheres—except of course in Italy or New Zealand—and that the explanatory basis for that law was different in the two hemispheres. Such complexity would drive a physicist crazy, but it is a part of the day-to-day world of the educational researcher.”

So with this in mind, I decided to ask the experts. I asked NZ primary school teachers who are just gearing up for the 2017 school year. These teachers were invited via a Facebook group to participate in a very short poll using a Google Form. There were just eight questions – the year level they teach, the minimum, maximum and ideal size for a class at that level, how many children they are expecting in their class this year and how long they have been teaching. The actual wording for the question about ideal class size was: “In your opinion what is the ideal class size that will lead to good learning outcomes for the year level given above?” There were also two open-ended questions about how they had chosen their numbers, and what factors they think contribute to the decision on class-size.

Every time I do something like this, I underestimate how long the analysis will take. There were only eight questions, thought I. How hard can that be…. sigh. But in the interests of reporting back to the teachers as quickly as possible, I will summarise the numeric data, and deal with all the words later.

There were about 200 useable responses. There was a wide range of experience within the teachers. A third of the teachers had been teaching for five years or shorter, and 20% had been teaching for more than twenty years. There was no correlation between the perceived ideal class size and the experience of the teacher.

The graph below displays the results, comparing the ideal class-size for the different year levels. Each dot represents the response of one teacher. It is clear that the teachers believe the younger classes require smaller classes. The median value for the ideal class size for a New Entrant, Year 1 and/or Year 2 class is 16. The median value for the ideal class size for Year 3/4 is 20, for Year 5/6 is 22 and for year 7/8 is 24. The ideal class size increases as the year level goes up. It is interesting that even numbers are more popular than odd numbers. In the comments, teachers point out that 24 is a very good number for splitting children into equal-sized groups.

It is interesting to compare the maximum class size the teachers felt would lead to good learning outcomes. I also asked what class size they will be teaching this year. The table below gives the median response for the ideal class size, maximum acceptable, and current class size. It is notable that the current class sizes are all at least two students more than the maximum acceptable values, and between six and eight students more than the ideal value.

Median response | ||||

Year Level | Number of respondents |
Ideal class size | Maximum acceptable | Current |

New Entrant Year 1/2 | 56 |
16 | 20 | 22 |

Year 3/4 | 40 |
20 | 24.5 | 27.5 |

Year 5/6 | 53 |
22 | 25 | 30 |

Year 7/8 | 46 |
24 | 27 | 30 |

It appears that most teachers will be teaching classes that are considerably larger than desired. This looks like a problem. But it is also important to get the financial context. I asked myself how much money would it take to reduce all primary school classes by four pupils (moving below the maximum, but more than the ideal)? Using figures from the Ministry of Education website, and assuming the current figures from the survey are indicative of class sizes throughout New Zealand, we would need about 3500 more classes. That is 3500 more rooms that would need to be provided, and 3500 more teachers to employ. It is an 18% increase in the number of classes. The increase in salaries alone would be over one hundred million dollars per year. This is not a trivial amount of money. It would certainly help with unemployment, but taxes would need to increase, or money would need to come from elsewhere.

Is this the best way to use the money? Should all classes be reduced or just some? How would we decide? How would it be implemented? If you decrease class sizes suddenly you create a shortage of teachers, and have to fill positions with untrained teachers, which has been shown to decrease the quality of education. Is the improvement worth the money?

My sympathies really are with classroom teachers. (If I were in charge, National Standards would be gone by lunchtime.) I know what a difference a few students in a class makes to all sorts of things. At the same time, this is not a simple problem, and the solution is far from simple. Discussion is good, and informed discussion is even better. Please feel free to comment below. (I will summarise the open-ended responses from the survey in a later post.)

**Please comment on the article here:** **Learn and Teach Statistics and Operations Research**

The post The Class-size debate – it matters to teachers appeared first on All About Statistics.

]]>We can break up any statistical problem into three steps: 1. Design and data collection. 2. Data analysis. 3. Decision making. It’s well known that step 1 typically requires some thought of steps 2 and 3: It is only when you have a sense of what you will do with your data, that you can […]

The post To know the past, one must first know the future: The relevance of decision-based thinking to statistical analysis appeared first on Statistical Modeling, Causal Inference, and Social Science.

The post To know the past, one must first know the future: The relevance of decision-based thinking to statistical analysis appeared first on All About Statistics.

]]>We can break up any statistical problem into three steps:

1. Design and data collection.

2. Data analysis.

3. Decision making.

It’s well known that step 1 typically requires some thought of steps 2 and 3: It is only when you have a sense of what you will do with your data, that you can make decisions about where, when, and how accurately to take your measurements. In a survey, the plans for future data analysis influence which background variables to measure in the sample, whether to stratify or cluster; in an experiment, what pre-treatment measurements to take, whether to use blocking or multilevel treatment assignment; and so on.

The relevance for step 3 to step 2 is perhaps not so well understood. It came up in a recent thread following a comment by Nick Menzies. In many statistics textbooks (including my own), the steps of data analysis and decision making are kept separate: we first discuss how to analyze the data, with the general goal being the production of some (probabilistic) inferences that can be piped into any decision analysis.

But your decision plans may very well influence your analysis. Here are two ways this can happen:

– Precision. If you know ahead of time you only need to estimate a parameter to within an uncertainty of 0.1 (on some scale), say, and you have a simple analysis method that will give you this precision, you can just go simple and stop. This sort of thing occurs all the time.

– Relevance. If you know that a particular variable is relevant to your decision making, you should not sweep it aside, even if it is not statistically significant (or, to put it Bayesianly, even if you cannot express much certainty in the sign of its coefficient). For example, the problem that motivated our meta-analysis of effects of survey incentives was a decision of whether to give incentives to respondents in a survey we were conducting, the dollar value of any such incentive, and whether to give the incentive before or after the survey interview. It was important to keep all these variables in the model, even if their coefficients were not statistically significant, because the whole purpose of our study was to estimate these parameters. This is not to say that on should use simple least squares: another impact of the anticipated decision analysis is to suggest parts of the analysis where regularization and prior information will be particularly crucial.

Conversely, a variable that is not relevant to decisions could be excluded from the analysis (possibly for reasons of cost, convenience, or stability), in which case you’d interpret inferences as implicitly averaging over some distribution of that variable.

The post To know the past, one must first know the future: The relevance of decision-based thinking to statistical analysis appeared first on Statistical Modeling, Causal Inference, and Social Science.

**Please comment on the article here:** **Statistical Modeling, Causal Inference, and Social Science**

The post To know the past, one must first know the future: The relevance of decision-based thinking to statistical analysis appeared first on All About Statistics.

]]>Frank Harrell, author of an influential book on regression modeling and currently both a biostatistics professor and a statistician at the Food and Drug Administration, has started a blog. He sums up “some of his personal philosophy of statistics” here: Statistics needs to be fully integrated into research; experimental design is all important Don’t be […]

The post Frank Harrell statistics blog! appeared first on Statistical Modeling, Causal Inference, and Social Science.

The post Frank Harrell statistics blog! appeared first on All About Statistics.

]]>Frank Harrell, author of an influential book on regression modeling and currently both a biostatistics professor and a statistician at the Food and Drug Administration, has started a blog. He sums up “some of his personal philosophy of statistics” here:

Statistics needs to be fully integrated into research; experimental design is all important

Don’t be afraid of using modern methods

Preserve all the information in the data; Avoid categorizing continuous variables and predicted values at all costs

Don’t assume that anything operates linearly

Account for model uncertainty and avoid it when possible by using subject matter knowledge

Use the bootstrap routinely

Make the sample size a random variable when possible

Use Bayesian methods whenever possible

Use excellent graphics, liberally

To be trustworthy research must be reproducible

All data manipulation and statistical analysis must be reproducible (one ramification being that I advise against the use of point and click software in most cases)

Harrell continues:

Statistics has multiple challenges today, which I [Harrell] break down into three major sources:

1. Statistics has been and continues to be taught in a traditional way, leading to statisticians believing that our historical approach to estimation, prediction, and inference was good enough.

2. Statisticians do not receive sufficient training in computer science and computational methods, too often leaving those areas to others who get so good at dealing with vast quantities of data that they assume they can be self-sufficient in statistical analysis and not seek involvement of statisticians. Many persons who analyze data do not have sufficient training in statistics.

3. Subject matter experts (e.g., clinical researchers and epidemiologists) try to avoid statistical complexity by “dumbing down” the problem using dichotomization, and statisticians, always trying to be helpful, fail to argue the case that dichotomization of continuous or ordinal variables is almost never an appropriate way to view or analyze data. Statisticians in general do not sufficiently involve themselves in measurement issues.

The post Frank Harrell statistics blog! appeared first on Statistical Modeling, Causal Inference, and Social Science.

**Please comment on the article here:** **Statistical Modeling, Causal Inference, and Social Science**

The post Frank Harrell statistics blog! appeared first on All About Statistics.

]]>The post Beginners Exercise: Bayesian Computation with Stan and Farmer Jöns appeared first on All About Statistics.

]]>Over the last two years I’ve occasionally been giving a very basic tutorial to Bayesian statistics using R and Stan. At the end of the tutorial I hand out an exercise for those that want to flex their newly acquired skills. I call this exercise *Bayesian computation with Stan and Farmer Jöns* and it’s pretty cool! Now, it’s not cool because of *me*, but because the expressiveness of Stan allowed me to write a small number of data analytic questions that quickly takes you from running a simple binomial model up to running a linear regression. Throughout the exercise you work with the same model code and each question just requires you to make a *minimal* change to this code, yet you will cover most models taught in a basic statistics course! Well, briefly at least… :) If you want to try out this exercise yourself, or use it for some other purpose, you can find it here:

Beginners Exercise: Bayesian computation with Stan and Farmer Jöns (R-markdown source)

Solutions to Bayesian computation with Stan and Farmer Jöns (R-markdown source)

My friend and colleague Christophe Carvenius also helped me translate this exercise into Python:

Python Beginners Exercise: Bayesian computation with Stan and Farmer Jöns

Python Solutions to Bayesian computation with Stan and Farmer Jöns

Now, this exercise would surely have been better if I’d used real data, but unfortunately I couldn’t find enough datasets related to cows… Finally, here is a depiction of farmer Jöns and his two lazy siblings by the great master Hokusai.

**Please comment on the article here:** **Publishable Stuff**

The post Beginners Exercise: Bayesian Computation with Stan and Farmer Jöns appeared first on All About Statistics.

]]>The post RStudio Conference 2017 Recap appeared first on All About Statistics.

]]>The first ever RStudio conference was held January 11-14, 2017 in Orlando, FL. For anyone else like me who spends hours each working day staring into an RStudio session, the conference was truly *excellent*. The speaker lineup was diverse and covered lots of areas related to development in R, including the tidyverse, the RStudio IDE, Shiny, htmlwidgets, and authoring with RMarkdown.

This is *not* a complete list by any means — with split sessions I could only go to half the talks at most. Here are some noncomprehensive notes and links to slides and resources for some of the awesome things are doing with R and RStudio that I learned about at the RStudio Conference.

`lm`

that both computes values and does things, like printing those values to the screen, making it difficult to capture (see broom). I asked Hadley after his talk about strategies to reduce issues getting Bioconductor data structures to play nicely with tidyverse tools. Within minutes **David Robinson** released a new feature in the **fuzzyjoin package** that leverages IRanges within this tidyverse-friendly package for efficiently doing things like joining on genomic intervals.

Another #rstudioconf-inspired addition to fuzzyjoin:— David Robinson (@drob) January 13, 2017

genome_join, for overlapping intervals on the same chromosome@genetics_blog #rstats pic.twitter.com/oUctyNYc09

`?apply`

and plyr `??ply`

functions that I still use here and there. The `map_*`

functions are integral to working with nested list-columns in dplyr, and I think I’m finally starting to grok how to work with these.- Novelty
*(potential danger: triviality)* - Outlier
*(spurious result; see also, p-hacking)* - Archetype
*(oversimplification)* - Trend
*(variance)* - Debunking
*(confirmation bias)* - Forecast
*(overfitting)*

- bookdown.org/yihui/bookdown — The bookdown book, written in RMarkdown with bookdown.
*(whoa, meta)* - r4ds.had.co.nz — Garrett Grolemund and Hadley Wickham’s
*R for Data Science*book. - tidytextmining.com — Julia and David’s book on text mining
- moderndive.com — an open-source introductory statistics class textbook

Finally, a few gems from other talks that I jotted down:

**Chester Ismay**gave a great talk on teaching introductory statistics using R, with the open-source course textbook written in RMarkdown using bookdown.**Bob Rudis**talked about using pipes (`%>%`

), and pipes within pipes, and best piping practices. See his slides here.**Hilary Parker**talked about the idea of an*analysis development*, (and analysis developers), drawing similarities to software development/developers. Hilary discussed this once before on the excellent podcast that she and Roger Peng host, and you can probably find it in their*Conversations On Data Science*ebook that summarize and transcribe these conversations.**Simon Jackson**introduced**corrr package**for exploring and manipulating correlations and correlation matrices in a tidy way.**Gordon Shotwell**introduced the**easymake**package that generates Makefiles from a data frame using R.**Karthik Ram**quickly introduced several of the (many)**rOpenSci packages**related to data publication, data access, scientific literature access, scalable & reproducible computing, databases, visualization, taxonomy, geospatial analysis, and many utility tools for data analysis and manipulation.

With split sessions I missed more than half the talks. Lots of people here are active on Twitter, and you can catch many more notes and tidbits on the #rstudioconf hashtag. The meeting was superbly organized, I learned a ton, and I enjoyed meeting in person many of the folks I follow on Twitter and elsewhere online. A few days of 80-degree weather in mid-January didn’t hurt either. I’ll definitely be coming again next year. Kudos to the rstudio::conf organizers and speakers!

All the talks were recorded and will supposedly find their way to rstudio.com at some point soon. I’ll update this post with a link when that happens.

**Please comment on the article here:** **Getting Genetics Done**

The post RStudio Conference 2017 Recap appeared first on All About Statistics.

]]>Several years ago I learned that the incumbency advantage in India was negative! There, the politicians are so unpopular that when they run for reelection they’re actually at a disadvantage, on average, compared to fresh candidates. At least, that’s what I heard. But Andy Hall and Anthony Fowler just wrote a paper claiming that, no, […]

The post No evidence of incumbency disadvantage? appeared first on Statistical Modeling, Causal Inference, and Social Science.

The post No evidence of incumbency disadvantage? appeared first on All About Statistics.

]]>Several years ago I learned that the incumbency advantage in India was negative! There, the politicians are so unpopular that when they run for reelection they’re actually at a disadvantage, on average, compared to fresh candidates.

At least, that’s what I heard.

But Andy Hall and Anthony Fowler just wrote a paper claiming that, no, there’s no evidence for negative incumbency advantages anywhere. Hall writes,

We suspect the existing evidence is the result of journals’ preference for “surprising” results. Since positive incumbency effects have been known for a long time, you can’t publish “just another incumbency advantage” paper anymore, but finding a counterintuitive disadvantage seems more exciting.

And here’s how their paper begins:

Scholars have long studied incumbency advantages in the United States and other advanced democracies, but a recent spate of empirical studies claims to have identified incumbency disadvantages in other, sometimes less developed, democracies including Brazil, Colombia, India, Japan, Mexico, and Romania. . . . we reassess the existing evidence and conclude that there is little compelling evidence of incumbency disadvantage in any context so far studied. Some of the incumbency disadvantage results in the literature arise from unusual specifications and are not statistically robust. Others identify interesting phenomena that are conceptually distinct from what most scholars would think of as incumbency advantage/disadvantage. For example, some incumbency disadvantage results come from settings where incumbents are not allowed to run for reelection. . . .

Interesting. I’ve not looked at their paper in detail but one thing I noticed is that a lot of these cited papers seem to have been estimating the incumbent *party* advantage, which doesn’t seem to me to be the same as the incumbency advantage as it’s usually understood. This discontinuity thing seems like a classic example of looking for the keys under the lamppost. I discussed the problems with that approach several years ago in this 2005 post, which I never bothered to write up as a formal article. Given that these estimates are still floating around, I kinda wish I had.

The post No evidence of incumbency disadvantage? appeared first on Statistical Modeling, Causal Inference, and Social Science.

**Please comment on the article here:** **Statistical Modeling, Causal Inference, and Social Science**

The post No evidence of incumbency disadvantage? appeared first on All About Statistics.

]]>The post Vintage Years in Econometrics – The 1970′s appeared first on All About Statistics.

]]>Continuing on from my earlier posts about vintage years for econometrics in the 1930's, 1940's, 1950's, 1960's, here's my tasting guide for the 1970's.

Once again, let me note that "in econometrics, what constitutes quality and importance is partly a matter of taste - just like wine! So, not all of you will agree with the choices I've made in the following compilation."

**1970:**

Once again, let me note that "in econometrics, what constitutes quality and importance is partly a matter of taste - just like wine! So, not all of you will agree with the choices I've made in the following compilation."

- Durbin, J., Testing for serial correlation in least-squares regression when some of the regressors are lagged dependent variables.
*Econometrica*, 38, 410-421. - Swamy, P. A. V. B., Efficient inference in a random coefficients model.
*Econometrica*, 38, 311-323. - Fair, R. C., The estimation of simultaneous equations models with lagged endogenous variables and first order serially correlated errors.
*Econometrica*, 38, 507-516. - Zellner, A. and M. S. Geisel, Analysis of distributed lags with applications to consumption function estimation.
*Econometrica*, 38, 865-888.

- Kadane, J. B., Comparison of k-class estimators when the disturbances are small.
*Econometrica*, 39, 723-737*.* - Maddala, G. S., Generalized least squares with an estimated variance covariance matrix.
*Econometrica*, 39, 23-33. - Rothenberg, T. J., Identification in parametric models.
*Econometrica*, 39, 577-591. - Sargan, J. D. and W. M. Mikhail, A general approximation to the distribution of instrumental variables estimates.
*Econometrica*, 39, 131-169. - Zellner, A.,
*An Introduction to Bayesian Inference in Econometrics*. Wiley, New York.

- Dreze, J., Econometrics and decision theory.
*Econometrica*, 40, 1-18. - Goldberger, A. S., Structural equation methods in the social sciences.
*Econometrica*, 40, 979-1001. - Nerlove, M., Lags in economic behavior.
*Econometrica*, 40, 221-251. - Sawa, T., Finite-sample properties of the k-class estimators.
*Econometrica*, 40, 653-680. - Wallis, K. F., Testing for fourth order autocorrelation in quarterly regression equations.
*Econometrica*, 40, 617-636. - Wymer, C. R., Econometric estimation of stochastic differential equation systems.
*Econometrica*, 40, 565-577.

- 1st. volume of
*Journal of Econometrics*. - Anderson, T. W. and T. Sawa, Distributions of estimates of coefficients of a single equation in a simultaneous system and their asymptotic expansions.
*Econometrica*, 41, 683-714. - Grether, D. M. and G. S. Maddala, Errors in variables and serially correlated disturbances in distributed lag models.
*Econometrica*, 41, 255-262. - Wu, D-M., Alternative tests of independence between stochastic regressors and disturbances.
*Econometrica*, 41, 733-750

- Amemiya, T., Multivariate regression and simultaneous equation models when the dependent variables are truncated normal.
*Econometrica*, 42, 999-1012. - Basmann, R. L., D. H. Richardson, and R. J. Rohr, An experimental study of structural estimators and test statistics associated with dynamical econometric models.
*Econometrica*, 42, 717-730. - Deaton, A. S., The analysis of consumer demand in the United Kingdom, 1900-1970.
*Econometrica*, 42, 341-367. - Fair, R. C.,
*A Model of Macroeconomic Activity. Volume I: The Theoretical Model*, Ballinger. - Granger, C.W.J. and P. Newbold, Spurious regressions in econometrics.
*Journal of Econometrics*, 2, 111-120. - Griliches, Z., Errors in variables and other unobservables.
*Econometrica*, 42, 971-998. - Hendry, D. F., Stochastic specification in an aggregate demand model of the United Kingdom.
*Econometrica*, 42, 559-578. - Phillips, P. C. B., The estimation of some continuous time models,
*Econometrica*, 42, 803-823.

- Goldfeld, S.M. & R.E. Quandt, Estimation in a diseqiuilibrium model and the value of information.
*Journal of Econometrics*, 5, 325-348. - Hausman, J. A., An instrumental approach to full information estimators for linear and certain nonlinear econometric models.
*Econometrica*, 43, 727-738. - Sargan, J. D., Gram-Charlier approximations applied to t ratios of k-class estimators.
*Econometrica*, 43, 327-346.

- Hendry, D.F., The structure of simultaneous equations estimators.
*Journal of Econometrics*, 4, 51-88. - McCallum, B. T., Rational expectations and the natural rate hypothesis: Some consistent estimates.
*Econometrica*, 44, 43-52. - Sargan, J. D., Econometric estimators and the Edgeworth approximation.
*Econometrica*, 44, 421-448.

- Aigner, D.J. and G.G. Judge, Application of pre-test and Stein estimators to economic data.
*Econometrica*, 45, 1279-1288. - Aigner, D., C.A.K. Lovell, & P. Schmidt, Formulation and estimation of stochastic frontier production function models.
*Journal of Econometrics*, 6, 21-37. - Amemiya, T., The maximum likelihood and the nonlinear three-stage least squares estimator in the general nonlinear simultaneous equation model.
*Econometrica*, 45, 955-968. - Barten, A. P., The systems of consumer demand functions approach: A review.
*Econometrica*, 45, 23-50. - Berndt, E.R. & N.E. Savin, Conflicting among criteria for testing hypotheses in the multivariate linear regression model.
*Econometrica*, 45, 1263-1277. - Byron, R.P., Efficient estimation and inference in large econometric systems.
*Econometrica*, 45, 1499-1515. - Manski, C.F. & S. R. Lerman, The estimation of choice probabilities from choice based samples.
*Econometrica*, 45, 1977-1988. - Mariano, R.S., Finite sample properties of instrumental variable estimators of structural coefficients.
*Econometrica*, 45, 487-496. - Phillips, P.C.B., Approximations to some finite sample distributions associated with a first-order stochastic difference equation.
*Econometrica*, 45, 463-485. - Phillips, P.C.B., A general theorem in the theory of asymptotic expansions as approximations to the finite sample distributions of econometric estimators.
*Econometrica*, 45, 1517-1534.

- Beach, C.M. & J.G. MacKinnon, A maximum likelihood procedure for regresion with autocorrelated errors.
*Econometrica*, 46, 51-58. - Geweke, J., Temporal aggregation in the linear regression model.
*Econometrica*, 46, 643-651. - Godfrey, L.G., Testing against general autoregressive and moving average error models when the regressors include lagged dependent variables.
*Econometrica*, 46, 1293-1301. - Hausman, J.A., Specification tests in econometrics.
*Econometrica*, 46, 1251-1271. - Heckman, J.J., Dummy endogenous variables in a simultaneous equation system.
*Econometrica*, 46, 931-959. - Koenker, R. & G. Basset Jr., Regression quantiles.
*Econometrica*, 46, 33-50. - Leamer, E.E.,
*Specification Searches: Ad Hoc Inference With Nonexperimental Data.*Wiley, New York. - Mundlak, Y., On the pooling of time series and cross section data.
*Econometrica*, 46, 69-85.

- Dickey, D. A. and W. A. Fuller, Distribution of the estimators for sutoregressive time series with a unit root.
*Journal of the American Statistical Assocation*, 74, 427-431. - Heckman, J.J., sample selection bias as a specification error.
*Econometrica*, 47, 153-161.

© 2017, David E. Giles

**Please comment on the article here:** **Econometrics Beat: Dave Giles' Blog**

The post Vintage Years in Econometrics – The 1970′s appeared first on All About Statistics.

]]>As a surprise welcome to 2017, our paper on how the Stan language works along with an overview of how the MCMC and optimization algorithms work hit the stands this week. Bob Carpenter, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. Stan: […]

The post Stan JSS paper out: “Stan: A probabilistic programming language” appeared first on Statistical Modeling, Causal Inference, and Social Science.

The post Stan JSS paper out: “Stan: A probabilistic programming language” appeared first on All About Statistics.

]]>As a surprise welcome to 2017, our paper on how the Stan language works along with an overview of how the MCMC and optimization algorithms work hit the stands this week.

- Bob Carpenter, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. Stan: A Probabilistic Programming Language.
*Journal of Statistical Software*76(1).

The authors are the developers at the time the first revision was submitted. We now have quite a few more developers. Because of that, we’d still prefer that people cite the manual authored by the development team collectively rather than this paper citing only some of our current developers.

The original motivation for writing a paper was that Wikipedia rejected our attempts at posting a Stan Wikipedia page without a proper citation.

I’d like to thank to Achim Zeileis at *JSS* for his patience and help during the final wrap up.

**Abstract**

Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.14.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectation propagation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can be called from the command line using the cmdstan package, through R using the rstan package, and through Python using the pystan package. All three interfaces support sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, gradients, Hessians, parameter transforms, and specialized plotting.

**BibTeX**

@article{stan:2017, author = {Bob Carpenter and Andrew Gelman and Matthew Hoffman and Daniel Lee and Ben Goodrich and Michael Betancourt and Marcus Brubaker and Jiqiang Guo and Peter Li and Allen Riddell}, title = {Stan: {A} Probabilistic Programming Language}, journal = {Journal of Statistical Software}, volume = {76}, number = {1}, year = {2017} }

**Further reading**

Check out the Papers about Stan section of the Stan Citations web page. There’s more info on our autodiff and on how variational inference works and a link to the original NUTS paper. And of course, don’t miss Michael’s latest if you want to understand HMC and NUTS, A conceptual introduction to HMC.

The post Stan JSS paper out: “Stan: A probabilistic programming language” appeared first on Statistical Modeling, Causal Inference, and Social Science.

**Please comment on the article here:** **Statistical Modeling, Causal Inference, and Social Science**

The post Stan JSS paper out: “Stan: A probabilistic programming language” appeared first on All About Statistics.

]]>