<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">
 
 <title>Mark M. Fredrickson</title>
 
 <link href="http://www.markmfredrickson.com/" />
 <updated>2012-10-25T11:26:26-05:00</updated>
 <id>http://www.markmfredrickson.com/</id>
 <author>
   <name>Mark Fredrickson</name>
   <email>mark.m.fredrickson@gmail.com</email>
 </author>

 
 <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/MarkMFredrickson" /><feedburner:info uri="markmfredrickson" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry>
   <title>SLAMM 2012</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/V8bTOlDYnWk/2012-04-29-slamm-2012.html" />
   <updated>2012-04-29T00:00:00-05:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/slamm-2012</id>
   <content type="html">&lt;p&gt;At the &lt;a href='http://polisci.wustl.edu/node/434'&gt;St. Louis Area Methods Meeting&lt;/a&gt;, another participant pointed out that &lt;a href='http://www.markmfredrickson.com/thoughts/2011-04-20-st-louis-area-methods-meeting-2011.html'&gt;my previous SLAMM posting&lt;/a&gt; promised a follow up that never came. Oops. In lieu of actually finishing that report, I offer notes from the 2012 event.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='P1100970.jpg by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/6980750870/'&gt;&lt;img width='375' alt='P1100970.jpg' src='http://farm8.staticflickr.com/7246/6980750870_15ea749185.jpg' height='500' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;&lt;a href='http://imai.princeton.edu/'&gt;Kosuke Imai&lt;/a&gt; started the conference discussing the relationship between regression with fixed effects and a series of alternative techniques for estimating average causal effects. Imai did an excellent job of motivating the comparisons and showing how, with appropriate weighting, many approaches are special cases of fixed effect regression. I think this will be an important paper in obviating the &amp;#8220;regression vs. matching&amp;#8221; debate. As a contributor to &lt;a href='http://github.com/markmfredrickson/optmatch'&gt;Optmatch&lt;/a&gt;, a R package for matching, I am interested to see how to translate specific matching strategies into regression weights. Imai showed several simple matching techniques (such as matching within covariate strata), but it was not immediately clear how to generalize to arbitrary matching schemes.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Jake Bowers by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/7126862675/'&gt;&lt;img width='500' alt='Jake Bowers' src='http://farm8.staticflickr.com/7037/7126862675_15f9732e87.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;After lunch, &lt;a href='http://www.jakebowers.org'&gt;Jake Bowers&lt;/a&gt; and I presented work on testing complex and substantively interesting models on experimental data. Jake began by setting scene and considering the problem of evaluating models in the presence of interference. I picked up the story by showing off the software we used and demonstrating with a different example.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='P1100983.jpg by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/6980764550/'&gt;&lt;img width='500' alt='P1100983.jpg' src='http://farm8.staticflickr.com/7266/6980764550_9b88bdaa35.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;My presentation had a fair amount of code demonstration. The SLAMM audience is usually a more technical one, so we were willing to risk showing some R code in the hopes that it would provide a concrete basis for understanding. It is not a technique that I would recommend for most audiences, but my sense was that it worked for the sharp folks in attendence.&lt;/p&gt;

&lt;p&gt;During my presentation, &lt;a href='http://www.rochester.edu/college/gradstudents/jolmsted/'&gt;Jonathan Olmsted&lt;/a&gt; from the University of Rochester was kind enough to grab a few photos on my camera.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Jacob Montgomery and Josh Cutler by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/6980755070/'&gt;&lt;img width='500' alt='Jacob Montgomery and Josh Cutler' src='http://farm8.staticflickr.com/7122/6980755070_63d08b78ae.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;The final presentation was from &lt;a href='http://montgomery.wustl.edu/'&gt;Jacob Montgomery&lt;/a&gt; (Washington University) and &lt;a href='https://github.com/joshcutler/'&gt;Josh Culter&lt;/a&gt; (Duke University) on Computer Adapative Testing in political surveys. This was one of those &amp;#8220;why didn&amp;#8217;t I think of that?&amp;#8221; papers that immediately appeals to you. The basic idea is that we can improve on knowledge batteries and other components of surveys by selecting subequent questions based on earlier answers. Using the same techniques from computered testing such as the GRE, we do not need to ask a long battery to get a good estimate of a respondents traits.&lt;/p&gt;

&lt;p&gt;This was an interesting paper because it also stimulated me to think along alternative lines, but with a similar goal. Much of CAT research, and this paper in particular, is very concerned with building a model of latent trait, i.e. something we think we can measure only indirectly through other means. I certainly spend a lot of time thinking about model building (my presentation was largely devoted to that), I would focus more prediction. Given a large training set of respondents who have completed a very a large battery, can I predict overall scores using only a subset of the questions. In other words, if a person has seen questions A and B, which other single question should I pick if I best wanted to estimate his or her total score on all the questions? This is a very similar question to the one posed by Montgomery and Cutler, but just shifts the emphasis on prediction rather than model construction. Different strokes for different folks.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Mark Fredrickson by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/6980758652/'&gt;&lt;img width='375' alt='Mark Fredrickson' src='http://farm9.staticflickr.com/8006/6980758652_6b455afd97.jpg' height='500' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;The organizers and sponsors of SLAMM should be congradulated for putting on another successful conference. This is my 4th (of 5 possible) SLAMMs. They have all been extremely interesting. You can see &lt;a href='http://www.flickr.com/photos/devinf/sets/72157629929655143/'&gt;all the pictures&lt;/a&gt; on Flickr.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/V8bTOlDYnWk" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2012-04-29-slamm-2012.html</feedburner:origLink></entry>
 
 <entry>
   <title>Returning to the Cradle of Democracy: a working paper</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/B8ZZ0XFUhcg/2011-08-20-returning-to-the-cradle-of-democracy-a-working-paper.html" />
   <updated>2011-08-20T00:00:00-05:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/returning-to-the-cradle-of-democracy-a-working-paper</id>
   <content type="html">&lt;p&gt;&lt;em&gt;Update: &lt;a href='http://equalitybylot.wordpress.com/2011/08/21/mark-fredrickson-citizen-responses-under-election-and-sortition/'&gt;Equality by Lot&lt;/a&gt; is discussing this paper.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I have recently posted a working paper entitled &lt;a href='https://github.com/downloads/markmfredrickson/election-sortition-corruption-survey-experiment/election-sortition-corruption-survey-experiment-v0.1.0.pdf'&gt;Returning to the Cradle of Democracy: Citizen responses under election and sortition&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The hallmark of modern democracies is the competitive election. This institution is seen as the primary connection between leaders and the population. This has not always been the case. Sortition, the random selection of leaders from the population, served as the primary institution of democracy in ancient Athens. How would citizens in a modern democracy react to the use of sortition to select leaders? This study employs a survey experiment in which subjects read about a local development grant, overseen by either an elected or randomly selected committee. I find that sortition encourages more citizens to seek leadership positions, though other forms of participation remain unchanged. I also find that despite a stated preference for election, subjects see the two committees as equally capable and responsible, even when confronted with corrupt acts and closed door meetings.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The &lt;a href='https://github.com/markmfredrickson/election-sortition-corruption-survey-experiment'&gt;complete repository&lt;/a&gt;, including the code and data necessary to complete the paper, are available on GitHub.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/B8ZZ0XFUhcg" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2011-08-20-returning-to-the-cradle-of-democracy-a-working-paper.html</feedburner:origLink></entry>
 
 <entry>
   <title>Polmeth 2011</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/PxkXNBQRnq8/2011-08-02-polmeth-2011.html" />
   <updated>2011-08-02T00:00:00-05:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/polmeth-2011</id>
   <content type="html">&lt;p&gt;Princeton University hosted &lt;a href='http://polmeth.princeton.edu'&gt;the 2011 Polmeth conference&lt;/a&gt;, the annual meeting of political science methodology subsection of the APSA. This was my third Polmeth, and I found the talks to be the most approachable thus far (I think this says more about the presentations than my abilities). I only wish I could have attended the other talks during the split sessions, but one can only be in one room at a time (similarly, I could not visit many graduate student poster presentations while presenting my own).&lt;/p&gt;

&lt;p&gt;&lt;a href='https://github.com/downloads/markmfredrickson/natural-experiments-in-genetics/fredrickson-polmeth-2011.pdf'&gt;My poster&lt;/a&gt; was titled &amp;#8220;ACE in the Hole: A constructive critique of classical twins studies.&amp;#8221; While there has been a steady stream of research linking genetic predispositions to political behavior, this research has not been well integrated into the broader political behavior literature. In part, I suspect this is because of the model most frequently employed in classical twin studies. The use of the so-called &amp;#8220;ACE&amp;#8221; model requires strong assumptions and does not directly engage the models and outcomes of other political science research. This poster attempts to layout a path where by the logic of the natural experiment embodied in twin studies can be used to simplify the analysis in a way that more directly engages traditional political science studies.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;ve put my &lt;a href='https://github.com/downloads/markmfredrickson/natural-experiments-in-genetics/fredrickson-polmeth-2011.pdf'&gt;poster&lt;/a&gt; and the &lt;a href='https://github.com/markmfredrickson/natural-experiments-in-genetics/tree/polmeth-poster'&gt;supporting materials&lt;/a&gt; online. I thank the &lt;a href='http://www.unl.edu/polphyslab/data'&gt;UNL PolPhy Lab&lt;/a&gt; and the original data collectors for publishing the data I use in the poster.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/PxkXNBQRnq8" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2011-08-02-polmeth-2011.html</feedburner:origLink></entry>
 
 <entry>
   <title>test_that -- A brief review</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/turtCFGG-JE/2011-07-16-testthat-a-brief-review.html" />
   <updated>2011-07-16T00:00:00-05:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/testthat-a-brief-review</id>
   <content type="html">&lt;p&gt;For the last month or so, I have been using the &lt;a href='https://github.com/hadley/test_that'&gt;test_that&lt;/a&gt; unit testing package for R (a quick note on names: both &lt;code&gt;testthat&lt;/code&gt; and &lt;code&gt;test_that&lt;/code&gt; are used in the documentation. The library, as available from CRAN has no underscore, so use &lt;code&gt;install.packages(&amp;#39;testthat&amp;#39;)&lt;/code&gt; to get a copy). My free-time programming is always written a loosely &lt;a href='http://en.wikipedia.org/wiki/Test-driven_development'&gt;TDD&lt;/a&gt; style, and I have rolled my own unit testing functions for R in the past, but they are not as polished as &lt;code&gt;test_that&lt;/code&gt;. For examples of my test cases using &lt;code&gt;test_that&lt;/code&gt;, see the &lt;a href='https://github.com/markmfredrickson/RItools/tree/randomization-distribution/tests'&gt;RItools&lt;/a&gt; and &lt;a href='https://github.com/markmfredrickson/optmatch/tree/s4/inst/tests'&gt;optmatch&lt;/a&gt; repositories.&lt;/p&gt;

&lt;p&gt;What attracted me to &lt;code&gt;test_that&lt;/code&gt; was the autotesting functionality. As code is updated, tests are automatically re-run and failures are reported. If tests are updated, only the test files are re-run saving a little time. I find &lt;code&gt;R CMD Check&lt;/code&gt; to be too slow for active development, and ad-hoc tests in the interactive session make me cringe. I can say that the autotest functionality in &lt;code&gt;test_that&lt;/code&gt; is as good as any I have used for Ruby or Clojure (well, I&amp;#8217;d still like Growl notifications, but it&amp;#8217;s not a deal breaker). To get the full advantage, I suggest creating a &lt;code&gt;Makefile&lt;/code&gt; in your project directory to handle starting up the autotest. Here is the Makefile from &lt;code&gt;optmatch&lt;/code&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;local-install:
	rm -rf .local
	mkdir .local
	R CMD Install --library=.local .

autotest: local-install

	R -q -e &amp;quot;library(optmatch, lib.loc = &amp;#39;.local&amp;#39;)&amp;quot; \
          -e &amp;quot;library(testthat)&amp;quot; \
          -e &amp;quot;auto_test(&amp;#39;./R&amp;#39;, &amp;#39;./inst/tests&amp;#39;, &amp;#39;summary&amp;#39;)&amp;quot;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then from the command line just type:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ make autotest&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;To get the tests up and running.&lt;/p&gt;

&lt;p&gt;While the package comes with functions for expressing many common expectations (e.g. &lt;code&gt;expect_equal(a, b&lt;/code&gt;), I was hoping to start writing my own expectation functions, but have not had the time to dig into the internals to see how these are implemented. In most cases I end up using &lt;code&gt;expect_true&lt;/code&gt; to evaluate a logical result, which works in most cases. There are two ways to write expectations: &lt;code&gt;expect_equal(a, b)&lt;/code&gt; or &lt;code&gt;expect_that(a, is_equal(b))&lt;/code&gt;. I tend to stick with the first as the second seems more verbose.&lt;/p&gt;

&lt;p&gt;One last note: I had a little trouble integrating the &lt;code&gt;test_that&lt;/code&gt; style tests in to &lt;code&gt;R CMD Check.&lt;/code&gt; I found the &lt;a href='https://github.com/hadley/devtools/wiki/Testing'&gt;devtools wiki&lt;/a&gt; to be helpful in this regard.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/turtCFGG-JE" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2011-07-16-testthat-a-brief-review.html</feedburner:origLink></entry>
 
 <entry>
   <title>Optmatch and RItools -- New homes and techniques</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/HiUfqG7MM88/2011-06-03-optmatch-and-ritools-new-homes-and-techniques.html" />
   <updated>2011-06-03T00:00:00-05:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/optmatch-and-ritools-new-homes-and-techniques</id>
   <content type="html">&lt;p&gt;Co-developers &lt;a href='http://www.jakebowers.org'&gt;Jake Bowers&lt;/a&gt;, &lt;a href='http://www.stat.lsa.umich.edu/~bbh/'&gt;Ben Hansen&lt;/a&gt; and I are happy to announce that our &lt;code&gt;R&lt;/code&gt; packages &lt;a href='http://github.com/markmfredrickson/optmatch'&gt;optmatch&lt;/a&gt; and &lt;a href='http://github.com/markmfredrickson/RItools'&gt;RItools&lt;/a&gt; have new homes on &lt;a href='http://github.com'&gt;GitHub&lt;/a&gt;. We had previously been managing development on private subversion repositories and managed the projects through an ad-hoc collection of emails and text files. We are excited to make development more public and open the door to bug reports, suggestions, and patches from the wider &lt;code&gt;R&lt;/code&gt; community.&lt;/p&gt;

&lt;p&gt;On a related note, my article on using &lt;a href='http://github.com/markmfredrickson/optmatch'&gt;optmatch&lt;/a&gt; and &lt;a href='http://github.com/markmfredrickson/RItools'&gt;RItools&lt;/a&gt; to block, balance test, and analyze randomized experiments has appeared in the &lt;a href='http://scholar.harvard.edu/dtingley/files/may2011.pdf'&gt;Newsletter of the APSA Experimental Section 2.1&lt;/a&gt;. I published a &lt;a href='/thoughts/2011-04-26-draft-designing-and-analyzing-studies-with-optmatch-and-ritools-part-1.html'&gt;draft of the first part of the article&lt;/a&gt; previously. The complete article addresses a few problems I found in the draft and walks the reader through the analysis portion of the research. I really appreciate that the editor &lt;a href='http://scholar.harvard.edu/dtingley'&gt;Dustin Tingley&lt;/a&gt; allowed me to include raw &lt;code&gt;R&lt;/code&gt; code in the document. The newsletter also contains articles biopolitics, using protocols to design studies, and even more on blocking. Well worth a read.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/HiUfqG7MM88" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2011-06-03-optmatch-and-ritools-new-homes-and-techniques.html</feedburner:origLink></entry>
 
 <entry>
   <title>Atlantic Causal Inference Conference</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/xnIuhmFRRr8/2011-05-22-atlantic-casual-inference-conference.html" />
   <updated>2011-05-22T00:00:00-05:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/atlantic-casual-inference-conference</id>
   <content type="html">&lt;p&gt;I recently returned from the Atlantic Causal Inference Conference. The conference is attended by an interesting blend of social science, biostatistics, epidemiology, and pure statistics scholars. My compliments to the organizers. There were panels and presentations on interference in randomized trials and dependence between units, matching as a tool against selection and heterogeneity, instrumental variables, and big think on making more credible causal inferences from observational data.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Atlantic Causal Inference Conference 2011 by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5746877369/'&gt;&lt;img width='500' alt='Atlantic Causal Inference Conference 2011' src='http://farm6.static.flickr.com/5228/5746877369_4bfe689d88.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;I presented some early, but exciting, work on testing models of spillover effects in randomized experiments. This work speaks to two audiences. First, experimenters afraid that interference between experimental units violates the assumptions of their statistical tools (specifically, the Stable Unit Treatment Value Assumption, or SUTVA for short). Second, this research offers new tools to researchers studying the effects of networks themselves. &lt;a href='/academics/papers/acic-2011-poster.pdf'&gt;My ACIC poster&lt;/a&gt; (also available as a &lt;a href='/academics/papers/acic-2011-poster.tar.gz'&gt;source Sweave document&lt;/a&gt;) introduces the basics of the work and demonstrates the technique on some simulated data, for which we know the truth. It also demonstrates new software for flexible randomization inference.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Atlantic Causal Inference Conference 2011 by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5747442364/'&gt;&lt;img width='500' alt='Atlantic Causal Inference Conference 2011' src='http://farm3.static.flickr.com/2365/5747442364_049741cc55.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;The informal theme of the conference was that we need to consider more than a single study or test to build a causal story. Multiple results that point to a common conclusion are stronger than a single indicator. Similarly, researchers should consider and evaluate alternative explanations directly (exemplified in the continual use and calls for sensitivity analyses across papers and panels). From the perspective of critiquing causal research, these points require specific, testable criticisms. In his keynote address, Sandr Greenland called on the audience to formalize these themes and make them accessible to the wider population of statistical consumers.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Atlantic Causal Inference Conference 2011 by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5747450292/'&gt;&lt;img width='500' alt='Atlantic Causal Inference Conference 2011' src='http://farm3.static.flickr.com/2422/5747450292_df8952546d.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;&lt;a href='http://www.flickr.com/photos/devinf/sets/72157626779839394/'&gt;More pictures.&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/xnIuhmFRRr8" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2011-05-22-atlantic-casual-inference-conference.html</feedburner:origLink></entry>
 
 <entry>
   <title>More and less theory, please</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/CY2Md7x_Tp8/2011-04-29-more-and-less-theory-please.html" />
   <updated>2011-04-29T00:00:00-05:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/more-and-less-theory-please</id>
   <content type="html">&lt;p&gt;&lt;a href='http://chrisblattman.com/2011/04/28/more-theory-please/'&gt;Chris Blattman links to an upcoming JEP article&lt;/a&gt; on the need for more theory in field research (&lt;a href='http://elsa.berkeley.edu/~sdellavi/wp/FieldExperimentJEPFeb11Tris.pdf'&gt;article here&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;As a general call for more theory, I am skeptical. Let us consider two phases of experimental research: design and analysis. In the design phase, I agree that more theory can help motivate more nuanced experimental designs. I agree with the authors that writing down theories can help us see ways to play competing theories against each other or add valuable manipulations to the design. If it helps you to write down structural equations during the design phase, all the better. Prior to randomization, we can motivate our designs as we see best.&lt;/p&gt;

&lt;p&gt;I part ways with the authors, however, when it comes to using theory in the analysis. Specifically, using theory in the analysis phase is a form of &lt;a href='http://ann.sagepub.com/content/628/1/200.abstract'&gt;mediation analysis&lt;/a&gt; &amp;#8212; attributing causal interpretations in an experiment where randomization does not justify them. Let us say we have three quantities we can measure A, B, and C. Theory tells us that if A is on, then B turns on, and then C does so too. If A is off, then B is off, and then C is off. In the experiment we were able to manipulate A. We observe that B and C behave as we expected. What then can we say about our theory and the relationship between B and C?&lt;/p&gt;

&lt;p&gt;Randomization of A gives us a strong position to argue that turning on A leads to turning on B. It also gives us a strong case for A turning on C. But the design (randomizing A) says nothing about the relationship of B and C. It may be that A is linked to some unobserved variable D that influences both B and C, but they have no direct link to each other. It may be that C causes B, but not the other way around. In this scenario, theory could have been helpful in the design phase. For example, manipulating both A and B would tell us something about the relationship of B with C. But in the analysis phase, talking about the relationship between B and C is no longer causal. We are back in the realm of observational work.&lt;/p&gt;

&lt;p&gt;This is not meant to be derogatory of observational work, but simply let us be clear about when we have causal interpretations of relationships and when we do not. Basing the analysis of experiments on theory jeopardizes the causal claims we wish to make.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/CY2Md7x_Tp8" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2011-04-29-more-and-less-theory-please.html</feedburner:origLink></entry>
 
 <entry>
   <title>Designing and Analyzing Studies with Optmatch and RItools (Part 1)</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/vxc8gVstSiI/2011-04-26-draft-designing-and-analyzing-studies-with-optmatch-and-ritools-part-1.html" />
   <updated>2011-04-26T00:00:00-05:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/draft-designing-and-analyzing-studies-with-optmatch-and-ritools-part-1</id>
   <content type="html">&lt;p&gt;I am currently writing a brief &amp;#8220;how-to&amp;#8221; for the &lt;a href='http://ps-experiments.ucr.edu/newsletter'&gt;APSA Section on Experimental Research newsletter&lt;/a&gt; on using &lt;a href='http://cran.r-project.org/web/packages/optmatch/index.html'&gt;Optmatch&lt;/a&gt; and &lt;a href='http://cran.r-project.org/web/packages/RItools/index.html'&gt;RItools&lt;/a&gt;. The complete paper (a work in progress) can be found on &lt;a href='https://github.com/markmfredrickson/blocking-with-optmatch-and-ritools'&gt;my github page&lt;/a&gt;. I have the basics of the paper sketched in, but I would love to get feedback from the online R community, so I will be releasing the paper in several installments. Part 1: Creating fake data and blocking.&lt;/p&gt;

&lt;h2 id='data'&gt;Data&lt;/h2&gt;

&lt;p&gt;Let us begin by creating some data in the style of the potential outcomes framework. Let U be all meaningful covariates related to the outcomes Y_c and Y_t. We observe X, but do not observe W, partitions of U. The covariates are a mix of discrete and continuous random variables.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;n &amp;lt;- 100
x1 &amp;lt;- rbinom(n, 10, 0.25)
x2 &amp;lt;- rbinom(n, 1, 0.6)
x3 &amp;lt;- rnorm(n, 50, 10)
x4 &amp;lt;- rnorm(n, 0, 1)
x5 &amp;lt;- runif(n, 0, 100)
w1 &amp;lt;- rnorm(n, 0, 1)
w2 &amp;lt;- rbinom(n, 1, 0.1)
w3 &amp;lt;- runif(n, 0, 1)
X &amp;lt;- data.frame(x1, x2, x3, x4, x5)
W &amp;lt;- data.frame(w1, w2, w3)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The outcome Y is a continuous measure that is a function of the covariates and the treatment indicator. We first create Y_c from the covariates, and Y_t is simply Y_c + tau, where tau is the treatment effect.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;tau &amp;lt;- 10
yc &amp;lt;- 0.25 * x1 + 4 * x2 + exp(x4) + x5 + 10 * w1 * w2 - w3 * 
    x3
yt &amp;lt;- yc + tau&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id='blocking_and_randomization'&gt;Blocking and Randomization&lt;/h2&gt;

&lt;p&gt;To implement blocking, we use the matching procedures in the Optmatch package for R. Optmatch implements a procedure known as &amp;#8220;optimal full matching&amp;#8221; that minimizes the average distance between matched sets. Optmatch was designed with observational studies in mind, where the researcher has discovered &amp;#8220;treatment&amp;#8221; and &amp;#8220;control&amp;#8221; groups. Optmatch will then find matches between similar treated and control units. This strategy is known as &amp;#8220;bipartite matching.&amp;#8221; For more on matching (and using Optmatch in an observational study) see Rosenbaum (2010).&lt;/p&gt;

&lt;p&gt;In our situation, we do not have an existing randomization vector for our data, but we still wish to create similar subsets of our data. Therefore we need to create the two partitions of the data that Optmatch will use. The most straightforward way to create the splitting vector is to do so randomly.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;s &amp;lt;- vector(&amp;quot;logical&amp;quot;, n)
s[sample.int(n, n/2)] &amp;lt;- T&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;To create the blocks, we use the pairmatch function. pairmatch will create matches with one observation from each random set. Optmatch allows tuning the number of observations allowed from each random set. See the documentation for fullmatch for more&lt;/p&gt;

&lt;p&gt;We need to specify a distance matrix between observations, and we can use the convenience function mdist to create a distance matrix based on the Malhanobis distance between observations.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;blocks.all &amp;lt;- pairmatch(mdist(s ~ x1 + x2 + x3 + x4 + x5, 
  data = cbind(s, X)))&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;For reasons of convenience or theoretical importance, we may wish to privilege certain variables and force the matching within levels of those variables. For example, if units are clustered within a geographic unit &amp;#8212; cities within a state &amp;#8212; we can limit matches to within the state. This is also a useful technique when matching large numbers of subjects (see &lt;a href='/thoughts/2010-11-12-speeding-up-optmatch-while-improving-match-quality.html'&gt;my website for more details on speeding up the matching process&lt;/a&gt;). To limit matches within blocks, we specify a factor indicating unit membership. In our case, let us just match within the binary variable x2. Prior to doing so, we will create a new split that places 50% of each level in the partitions.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;count.x2.1 &amp;lt;- sum(x2)
X.ordered &amp;lt;- X[order(x2), ]
s.x2.0 &amp;lt;- sample.int((n - count.x2.1), (n - count.x2.1)/2)
s.x2.1 &amp;lt;- sample.int(count.x2.1, count.x2.1/2)
s.x2 &amp;lt;- vector(&amp;quot;logical&amp;quot;, n)
s.x2[c(s.x2.0, s.x2.1 + (n - count.x2.1))] &amp;lt;- T
blocks.x2 &amp;lt;- pairmatch(mdist(s ~ x1 + x3 + x4 + x5 | x2, data = cbind(s = s.x2, 
    X.ordered)))&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;For simplicity, we will continue with the single stratum blocking, but splitting up matching problems into smaller blocks is a very useful technique to have at your disposal. Once we have blocks, we can then randomize within the blocks. As we used a pair-matching strategy, we will randomize to two treatment levels, call them &amp;#8220;treatment&amp;#8221; and &amp;#8220;control.&amp;#8221; Since each observation is matched to one other we have n/2 = 50 blocks. For each block, we can flip a coin and assign either the first or second unit to the treatment condition.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;tmp &amp;lt;- rbinom(n/2, 1, p = 0.5)
z &amp;lt;- vector(&amp;quot;logical&amp;quot;, n)
for (i in 1:(n/2)) {
    if (tmp[i] == 1) {
        z[i * 2 - 1] &amp;lt;- T
    }
    else {
        z[i * 2] &amp;lt;- T
    }
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;As our last manipulation to the data, create a variable that is the observed outcome Y_c if z = 0 and Y_t if z = 1. For illustration purposes later in the document, I also create a randomization and outcome that ignores the blocking structure.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;all.data &amp;lt;- cbind(X, z, b = blocks.all)
all.data$y &amp;lt;- ifelse(z, yt, yc)
tmp &amp;lt;- vector(&amp;quot;logical&amp;quot;, n)
tmp[sample.int(n, n/2)] &amp;lt;- T
all.data$z.unblocked &amp;lt;- tmp
all.data$y.unblocked &amp;lt;- ifelse(tmp, yt, yc)&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id='preview'&gt;Preview&lt;/h2&gt;

&lt;p&gt;In the next post we will look at testing balance (similar distribution of covariates in the treatment and control groups) and analyzing blocked experiments from a randomization inference perspective.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/vxc8gVstSiI" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2011-04-26-draft-designing-and-analyzing-studies-with-optmatch-and-ritools-part-1.html</feedburner:origLink></entry>
 
 <entry>
   <title>St. Louis Area Methods Meeting 2011 (Friday)</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/si7gsGlEP7c/2011-04-20-st-louis-area-methods-meeting-2011.html" />
   <updated>2011-04-20T00:00:00-05:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/st-louis-area-methods-meeting-2011</id>
   <content type="html">&lt;p&gt;This previous weekend found me and 60 other people attending the &lt;a href='http://stats.wustl.edu/node/258'&gt;St. Louis Area Methods Meeting (SLAMM)&lt;/a&gt;. This is the 3rd SLAMM I have attended, and the quality of the conference is consistently high. This year we had 4 speakers on Friday, which are the subject of this post. The conference also included graduate student speaker presentations on Saturday morning, which will be the subject of a second article.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Jeff Gill by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5627594565/'&gt;&lt;img width='500' alt='Jeff Gill' src='http://farm6.static.flickr.com/5061/5627594565_92eb54c9f5.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;The first speaker on Friday was &lt;a href='http://adm.wustl.edu/'&gt;Andrew Martin&lt;/a&gt; of Washington University reprising his talk of the previous year on &lt;a href='http://scdb.wustl.edu/index.php'&gt;The Supreme Court Database&lt;/a&gt;. The database is a ever growing collection of coded United States Supreme Court decisions. At the moment, the comprises two phases of the court history (from inception to the mid-1800&amp;#8217;s and from the 1950&amp;#8217;s to present).&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Andrew Martin by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5628181774/'&gt;&lt;img width='500' alt='Andrew Martin' src='http://farm6.static.flickr.com/5229/5628181774_525e6614e2.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This year, Martin gave us a peak behind the curtain at some of the more technical aspects of the implementation. Specifically, some details on their use of &lt;a href='http://rapache.net/'&gt;rApache&lt;/a&gt;, an Apache module that ties &lt;a href='http://www.r-project.org'&gt;R&lt;/a&gt; to HTTP requests. While I am eager to have social scientists place their data on the web and integrate it with search and analysis tools, the talk highlighted to me the importance security when writing web front ends. From a subsequent email conversation with Andrew Martin, I am glad to report that the SCDB is working hard on security. Other projects should copy this aspect of the SCDB as well as its excellent feature set.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Jong Hee Park by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5627612521/'&gt;&lt;img width='500' alt='Jong Hee Park' src='http://farm6.static.flickr.com/5144/5627612521_6ec95fd972.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;The second speaker of the day, &lt;a href='http://home.uchicago.edu/~jhp/'&gt;Jong Hee Park&lt;/a&gt; of the University of Chicago, presented new techniques for intervention analysis. Given a time series data set and an intervention at a specific point, how can we compare the post-intervention data with a world in which the intervention did not occur? Additionally, while we may know the exact date of an intervention, the actual impact may come either earlier later (if agents anticipate the intervention in the first case or if there a lag between intervention and results). To this end, Park employs a Bayesian change-point model. In the simplest case, this modeling becomes a robustness check: the method detects the change when we think it will occur. In more interesting cases, the change point analysis points to an earlier or later location for the regime change.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Q&amp;amp;A For Park by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5627628151/'&gt;&lt;img width='375' alt='Q&amp;amp;A For Park' src='http://farm6.static.flickr.com/5063/5627628151_de9daa0abc.jpg' height='500' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;After we detect/confirm the change point, we can then turn to predicting the results in the absence of the actual intervention. Park presented a method that was similar to the &amp;#8220;synthetic control&amp;#8221; technique of &lt;a href='http://pubs.amstat.org/doi/abs/10.1198/jasa.2009.ap08746'&gt;Abadie, Diamond and Hainmueller&lt;/a&gt;. In essence, we use units that did not receive the treatment to create a projection of our treatment unit without the intervention.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Jake Bowers by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5628203160/'&gt;&lt;img width='500' alt='Jake Bowers' src='http://farm6.static.flickr.com/5103/5628203160_40f32a26ed.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;The discussion of the paper was handled in a typically lighthearted fashion by &lt;a href='http://www.jakebowers.org'&gt;Jake Bowers&lt;/a&gt;, getting several laughs from the audience. Bowers prompted us to think about the plenitude of ways in which we might summarize the effect of an intervention (i.e., not just a mean change to stationary process) or might fit our prediction curves.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Minjung Kyung by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5628216166/'&gt;&lt;img width='500' alt='Minjung Kyung' src='http://farm6.static.flickr.com/5025/5628216166_0836acfd56.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;The third Friday talk was given by &lt;a href='http://www.artsci.wustl.edu/~mkyung/'&gt;Minjung Kyung&lt;/a&gt; of Washington University. Kyung is in the mathematics department at WUSTL, and the talk suffered from the unfamiliarity of political scientists with mathematicians and vice-versa. The subject of talk was spline with a Bayesian technique that is less computationally intensive than previous techniques. While the talk was motivated by the problem of analyzing &amp;#8220;synthetic data,&amp;#8221; I was never entirely clear why a spline approach was required for data that has passed through an obfuscation procedure.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='John Jackson by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5628234606/'&gt;&lt;img width='500' alt='John Jackson' src='http://farm6.static.flickr.com/5226/5628234606_2d659d36b8.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;The final Friday talk came from &lt;a href='http://www.lsa.umich.edu/polisci/people/faculty/ci.jacksonjohn_ci.detail'&gt;John Jackson&lt;/a&gt; of the University of Michigan. Jackson is trying to extend individual models of vote choice to aggregate models and measurements of partisanship. In other words, can we develop individual models of learning and attachment that translate to the larger trends we observe in aggregate data. While I think this approach is interesting, I am concerned that it is premised on assumption of individuals as atoms, not influence and learning from each other. Politics is an inherently social behavior. Neighbors, family members, door-to-door canvassers all interact to reinforce and challenge our political positions. While we would observe these effects in the aggregate, they are not captured by an individual based model of partisanship at the lower level.&lt;/p&gt;

&lt;p&gt;&lt;a href='http://www.flickr.com/photos/devinf/sets/72157626517418654/with/5627657231/'&gt;More pictures on the event&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/si7gsGlEP7c" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2011-04-20-st-louis-area-methods-meeting-2011.html</feedburner:origLink></entry>
 
 <entry>
   <title>Getting involved in field research in development</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/Nu7uev1lLCE/2011-04-16-getting-involved-in-field-research-in-development.html" />
   <updated>2011-04-16T00:00:00-05:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/getting-involved-in-field-research-in-development</id>
   <content type="html">&lt;p&gt;For my dissertation, I would like to make a randomized field experiment the cornerstone of the project. While I am still working on framing my project to a political science audience, I thought I would write up a brief description of my idea. It is my understanding that many development projects require a local planning and oversight committee (e.g., the &lt;a href='http://www.worldbank.org/id/kdp'&gt;Indonesia Kecamatan Development Program&lt;/a&gt;). The usual method of selecting leaders for these positions has been an election, but elections are not the only democratic method of selecting representatives. Sortition, or the random selection of leaders from the population, is not frequently used outside of jury selection in modern democracies, but it was a pillar of the Athenian system of democracy. If villages were randomly assigned to either elect or randomly select leaders, we could investigate the relative advantages and disadvantages of each mechanism in an important real world scenario. What types of projects do the two mechanisms encourage? Do citizens see decisions as more legitimate in one system? Do they participate in the decision process more? Is corruption controlled by the method of selecting leaders? Is aid money spent more efficiently?&lt;/p&gt;

&lt;p&gt;These are but a selection of possible outcomes to consider from a field experiment of modest scope. For a NGO already working with local bodies, a simple change from elections to randomly selecting leaders could have large benefits for policy outcomes at relatively little additional cost (or the experiment could show that elections are the right tool for the job). A pilot study in which a subset of villages are randomly assigned to either election or sortition could be a very large return on a very small investment.&lt;/p&gt;

&lt;p&gt;For my part, finding a suitable partner is my next step. Luckily, the &lt;a href='http://blogs.worldbank.org/impactevaluations'&gt;Development Impact blog&lt;/a&gt; at the World Bank has published &lt;a href='http://blogs.worldbank.org/impactevaluations/node/540'&gt;some suggestions&lt;/a&gt; on how young researchers can get their feet in the door on field research, especially randomized experiments.&lt;/p&gt;

&lt;p&gt;Both authors in the World Bank post suggest making connections on the ground through field work. I had not previously considered this avenue, in part because my design does not require a particular location. Having read papers on local committees in Africa, Asia, and South America, I could imagine my experiment being run in almost any location. My flexibility is both a blessing and a curse. On one hand, I need not be discriminating in finding opportunities to engage in on-the-ground work. On the other hand, my lack of focus may not appeal to groups targeting specific populations.&lt;/p&gt;

&lt;p&gt;In either case, the trick is find these groups and reach out. I am going to start with the specific suggestions from the blog post, with a long term goal of making on the ground connections. Anyone want to collaborate on a field experiment?&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/Nu7uev1lLCE" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2011-04-16-getting-involved-in-field-research-in-development.html</feedburner:origLink></entry>
 
 <entry>
   <title>Polborn and Krasa at the Comparative Politics Seminar</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/8Eu2Tbb8BK8/2011-04-14-polborn-and-krasa-at-the-comparative-politics-seminar.html" />
   <updated>2011-04-14T00:00:00-05:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/polborn-and-krasa-at-the-comparative-politics-seminar</id>
   <content type="html">&lt;p&gt;The Comparative Politics Seminar at UIUC played host to local professors Mattias Polborn and Stefan Krasa as they presented &lt;a href='http://works.bepress.com/cgi/viewcontent.cgi?article=1024&amp;amp;context=polborn'&gt;work on the salience of cultural and economic issues in U.S. elections&lt;/a&gt;.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Polborn and Krasa at the Comparative Politics Seminar by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5619547067/'&gt;&lt;img width='500' alt='Polborn and Krasa at the Comparative Politics Seminar' src='http://farm6.static.flickr.com/5303/5619547067_3deb1a4566.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;The authors start the work with a formal model of party position. Parties have &amp;#8220;natural abilities&amp;#8221; that allow them to take more credibly particular positions than others (e.g. Democrats are more credible at delivering public goods, while Republicans are more credible at minimizing the size of government). From these constraints, parties can pick positions in a two dimensional space comprised of cultural and economic dimensions. In this space, one can imagine a line between the two parties separating voters by preference. If you are on the Democratic side of the line, you should prefer Democratic candidates, and likewise if you are on the Republican side of the line.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Polborn and Krasa at the Comparative Politics Seminar by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5620140256/'&gt;&lt;img width='500' alt='Polborn and Krasa at the Comparative Politics Seminar' src='http://farm6.static.flickr.com/5267/5620140256_b81d0908dc.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;This model has some intuitive implications. If one party shifts position on one dimension, the position of the line shifts. If voters remain stationary on their preferences, this shift in position can result in greater electoral gains. Additionally, if both parties move an equal distance away from the splitting line along a perpendicular path, the line remains stationary. These results have an intuitive appeal and speak to an empirical puzzle: party polarization. Elites within parties appear to be more ideological now than in the past, yet survey results place American voters at roughly the same location. Polarization is costless from a party position as they can capture the same voters, even if they take more extreme positions.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Polborn and Krasa at the Comparative Politics Seminar by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5619548841/'&gt;&lt;img width='500' alt='Polborn and Krasa at the Comparative Politics Seminar' src='http://farm6.static.flickr.com/5301/5619548841_3212cf75b1.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;The authors attempt to test this model using empirical data from the &lt;a href='http://www.electionstudies.org'&gt;American National Election Study&lt;/a&gt;. They rate respondents on cultural and economic issues and then infer the party positions from respondent vote choices. In my reading and during the discussion, I found difficult to tell if their data collection supported their model for empirical reasons or by design. The party positions are not empirically verifiable. The fact that they exist and display some of the hypothesized properties may have more to do with modeling than with empirical fact.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Polborn and Krasa at the Comparative Politics Seminar by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5619553409/'&gt;&lt;img width='375' alt='Polborn and Krasa at the Comparative Politics Seminar' src='http://farm6.static.flickr.com/5145/5619553409_e89cb3bc72.jpg' height='500' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Despite these misgivings, I found this an interesting paper. Were I trying to infer party positions, I might apply a clustering approach or a tool like a &lt;a href='http://en.wikipedia.org/wiki/Support_vector_machine'&gt;SVM&lt;/a&gt;, which I suspect is more or less identical to the technique the authors use. Perhaps the biggest difference is that a clustering tool or an SVM would not imply a probability distribution over the results, whereas the probit model employed by the authors does assume a Normal distribution over their latent variable. I would be interested to see how a different choice of clustering algorithm would change the slope of the line, and by implication the parameters of the authors&amp;#8217; formal model.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/8Eu2Tbb8BK8" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2011-04-14-polborn-and-krasa-at-the-comparative-politics-seminar.html</feedburner:origLink></entry>
 
 <entry>
   <title>Collaboration for Social Scientists in TPM</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/cmd_2MQKu9Q/2011-04-03-collaboration-for-social-scientists-in-tpm.html" />
   <updated>2011-04-03T00:00:00-05:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/collaboration-for-social-scientists-in-tpm</id>
   <content type="html">&lt;p&gt;The Spring 2011 issue of &lt;a href='http://polmeth.wustl.edu/methodologist/tpm_v18_n2.pdf'&gt;The Political Methodologist&lt;/a&gt; is online and includes an article on collaboration tools for social scientists (cowritten with Paul F. Testa and &lt;a href='http://nils.weidmann.ws/'&gt;Nils B. Weidmann&lt;/a&gt;). The issue also has articles by &lt;a href='http://jakebowers.org'&gt;Jake Bowers&lt;/a&gt;, &lt;a href='http://www.kieranhealy.org/'&gt;Kieran Healy&lt;/a&gt;, and &lt;a href='http://www.tc.umn.edu/~satreier/'&gt;Shawn Treier&lt;/a&gt;. Well worth a read if you are looking to sharpen your research skills (and we all should be).&lt;/p&gt;

&lt;p&gt;Notably absent from the article was a list of &amp;#8220;thank yous.&amp;#8221; I should have thanked the editorial staff of the TPM for the call for research related papers. I had at least three versions of this paper sketched out, but never finished. This issue should have included an article on getting work finished without a deadline (a must read for me). I also wish to thank Jake Bowers specifically for comments and fixes during the writing process. Finally, I wish to thank all the more experienced programmers I&amp;#8217;ve worked with. Pretty much everything in that article I learned on the job. On the job &amp;#8220;citations&amp;#8221; don&amp;#8217;t fit well into a &lt;code&gt;.bib&lt;/code&gt; file, but are no less important to the success of the article.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/cmd_2MQKu9Q" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2011-04-03-collaboration-for-social-scientists-in-tpm.html</feedburner:origLink></entry>
 
 <entry>
   <title>Melissa Schwartzberg on Democracy, Judgment, and Juries</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/-ZdXYPiHIlk/2011-03-29-melissa-schwartzberg-on-democracy-judgment-and-juries.html" />
   <updated>2011-03-29T00:00:00-05:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/melissa-schwartzberg-on-democracy-judgment-and-juries</id>
   <content type="html">&lt;p&gt;&lt;a href='http://www.columbia.edu/~ms3125/index.html'&gt;Melissa Schwartzberg&lt;/a&gt; of Columbia recently visited Illinois to present political theory work on juries (&lt;a href='https://netfiles.uiuc.edu/msvolik/www/workshop/Spring2011/schwartzberg.pdf'&gt;working paper&lt;/a&gt;). Schwartzberg frames her work using the question &amp;#8220;Are juries democratic?&amp;#8221; While the paper never truly addresses this question, it does highlight several dimensions by which we might categorize specific insitutions, all of which we would label as juries. This is clearly a work in progress, and my comments are directed towards improving this work.&lt;/p&gt;

&lt;p&gt;To start with, it was inevitable I would be (slightly) disappointed by the paper. To me, the most interesting aspect of the American jury system is the use of random selection as a means of constituting the body. The &amp;#8220;cradle of democracy&amp;#8221;, Athens, used random selection to fill many magisterial posts, as well as large juries, but election dominates as the method for selecting representatives in modern democracies. I was hoping to see a theoretical consideration of why random selection may be justified for juries but not for elections (or a critique of such thinking), but Schwartzberg explicitly sets aside the random selection of juries, focusing on other elements.&lt;/p&gt;

&lt;p&gt;While I do not have a problem with focusing this research (there are only so many hours in the day), I find the dismissal symptomatic of a larger difficulty in the paper. Schwartzberg defines three dimensions upon which one could classify a particular institution: situated versus non-situated (i.e. do jurors bring private information (situated) to the court room or are they all blank slates with regard to the case at the start); isolated versus discursive (i.e. do jurors cast their in isolation or discuss among themselves the merits of the case); decision rule, which Schwartzberg limits to majority versus super-majority. Schwartzberg provides interesting historical facts and theoretical implications of the different ends of the three scales, but ultimately I am left wondering why focus on these three? Moreover, if these aspects are allowed to vary, what key elements of a jury remain fixed? What is the core essence of a jury that makes it a jury? Juries vary widely in many more dimensions than the three Schwartzberg lays out (&lt;a href='http://www.annualreviews.org/doi/abs/10.1146/annurev.lawsocsci.4.110707.172319'&gt;Hans (2008)&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Were I approaching this problem, I might first divide jury characteristics into two domains: &lt;em&gt;who&lt;/em&gt; constitutes a jury and &lt;em&gt;how&lt;/em&gt; the jury reaches a verdict (defining a jury as a group of decision makers deciding the outcome of a courtroom trial). Schwartzberg&amp;#8217;s three dimensions span these two categories. As we can&amp;#8217;t reasonably divorce potential jurors from their private information, nor prohibit them from using it, situatedness becomes a question of &lt;em&gt;who&lt;/em&gt;. Whether jurors cast votes in isolation and how those votes are tabulated is a question of &lt;em&gt;how&lt;/em&gt;. It might make sense to take either &lt;em&gt;who&lt;/em&gt; or &lt;em&gt;how&lt;/em&gt; as given, but I would like Schwartzberg to better justify selecting these particular three dimensions that span what, in my opinion, is a higher level of categorizing juries.&lt;/p&gt;

&lt;p&gt;A more careful consideration of these two domains may also help with my final critique. While Schwartzberg asks if juries are democratic, I cannot find in the paper a definition of what it means for an institution to be democratic. Schwartzberg wishes to engage with proponents of &amp;#8220;epistemic democracy,&amp;#8221; which I understand to mean the belief that democracy is good because it makes the right decisions more often than other forms of government. But I do not know that we can invert that belief into a definition: is democracy the system of government that makes correct decisions more often than the alternatives? From a theoretical perspective, one could imagine a philosopher king who, through his wisdom, always makes the &amp;#8220;correct&amp;#8221; decision, but I do not think we would consider this government a democracy. From my perspective, democracy is defined by &lt;em&gt;who&lt;/em&gt; rules, the people perhaps channeled through representatives. While probably insufficient from a theoretical perspective, this definition highlights the importance of considering &lt;em&gt;who&lt;/em&gt; constitutes juries as a central characteristic.&lt;/p&gt;

&lt;p&gt;As I mentioned earlier, this is a work in progress and is already quite well developed. I think addressing my concerns would make from a stronger paper, one which I am interested to read in the near future.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/-ZdXYPiHIlk" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2011-03-29-melissa-schwartzberg-on-democracy-judgment-and-juries.html</feedburner:origLink></entry>
 
 <entry>
   <title>Jose Cheibub on Civil Wars in Africa</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/fdwkLeRY7Z4/2011-03-18-jose-cheibub-on-civil-wars-in-africa.html" />
   <updated>2011-03-18T00:00:00-05:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/jose-cheibub-on-civil-wars-in-africa</id>
   <content type="html">&lt;p&gt;As part of the &lt;a href='https://netfiles.uiuc.edu/msvolik/www/workshop/cpw.html'&gt;Comparative Politics Workshop&lt;/a&gt; at UIUC, &lt;a href='https://netfiles.uiuc.edu/cheibub/www/cheibub.html'&gt;Jose Cheibub&lt;/a&gt; shared some preliminary work on the relationship of civil wars and the holding of elections. As his point of departure, Cheibub takes on the prevailing wisdom that elections destabilize regimes and precipitate civil wars. This is interesting research that takes on a prevailing position with real world ramifications. The popular press and policy makers can often be found supporting the idea that countries should delay elections, in case the election serves to destabilize a fragile civil society, but is this position justified by observable data? Working on data from Africa, Cheibub (with co-author &lt;a href='https://netfiles.uiuc.edu/jchays/www/page.html'&gt;Jude C. Hays&lt;/a&gt;) does not find evidence of this relationship, questioning the wisdom of delaying elections in developing countries.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Jose Chiebub on civil wars by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5487388727/'&gt;&lt;img width='500' alt='Jose Chiebub on civil wars' src='http://farm6.static.flickr.com/5136/5487388727_48bf812f66.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Cheibub wants to focus on the interplay between dictators and opposition. In Cheibub&amp;#8217;s view there are &amp;#8220;weak&amp;#8221; and &amp;#8220;strong&amp;#8221; dictators. Strong dictators will always defeat opposition, either in elections via fraud or in a military battle. Therefore, we always observe peace when strong dictators are present. Weak dictators, however, may select elections as a way to defeat a challenge. Even if weak dictators win the election, there is the possibility of a subsequent civil war, so we may observe either peace or strife when dealing with weak dictators. This makes it difficult to infer whether a dictator was strong or weak, given peace (though war always indicates weak).&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Jose Chiebub on civil wars by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5487386281/'&gt;&lt;img width='500' alt='Jose Chiebub on civil wars' src='http://farm6.static.flickr.com/5255/5487386281_736e5e42bc.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Cheibub is interested in pursuing a matching design, comparing country-years (in Africa) that are equally likely to hold an election. I think this is a reasonable approach, but I think providing a believable set of covariates will be difficult. As I&amp;#8217;ve already noted, Cheibub wants to compare weak dictators or regimes to other equally weak regimes, but measuring regime strength is not going to be an easy task. This analysis is made more difficult by the fact that country-years are certainly not independent, both within and across countries.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Jose Chiebub on civil wars by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5487395477/'&gt;&lt;img width='500' alt='Jose Chiebub on civil wars' src='http://farm6.static.flickr.com/5220/5487395477_7558c056e0.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;During the question and answer period, I tried to suggest, though I think I was not being clear, that it may be easier to change the level of analysis. Cheibub is looking at the level of the country-year and asks if an election or a civil war occurred in those years. Perhaps a better level of analysis may be the regime level. If we can match regimes (perhaps on length of term, type of regime, method of acquiring power, etc.), we might have an easier time defining covariates. We may also eliminate some of the dependence between units. This design also focuses the analysis more carefully on the key question Cheibub wishes to address: how do weak dictators respond to challenges?&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Jose Chiebub on civil wars by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5487405933/'&gt;&lt;img width='500' alt='Jose Chiebub on civil wars' src='http://farm6.static.flickr.com/5220/5487405933_0b5888945e.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;&lt;a href='http://www.flickr.com/photos/devinf/sets/72157626046063785/'&gt;More pictures on Flickr.&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/fdwkLeRY7Z4" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2011-03-18-jose-cheibub-on-civil-wars-in-africa.html</feedburner:origLink></entry>
 
 <entry>
   <title>Wendy Tam Cho on Voter Migration and Partisanship</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/SK63r-J2JkI/2011-02-27-wendy-tam-cho-on-voter-migration-and-partisanship.html" />
   <updated>2011-02-27T00:00:00-06:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/wendy-tam-cho-on-voter-migration-and-partisanship</id>
   <content type="html">&lt;p&gt;This week UIUC professor &lt;a href='http://cho.pol.uiuc.edu/wendy/'&gt;Wendy Tam Cho&lt;/a&gt; shared ongoing research on patterns of voter migration and partisanship (joint work with &lt;a href='http://www.bsos.umd.edu/gvpt/gimpel/research.html'&gt;James Gimpel&lt;/a&gt; and &lt;a href='http://www.ocf.berkeley.edu/~irishui/'&gt;Iris Hui&lt;/a&gt;). Cho and co-authors have compiled state level voter files from a selection of states and use the data to track voters as they move within and across state boundaries.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Wendy Tam Cho on Mobility by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5476882059/'&gt;&lt;img width='375' alt='Wendy Tam Cho on Mobility' src='http://farm6.static.flickr.com/5172/5476882059_dda19692d1.jpg' height='500' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Cho frames this work as an empirical evaluation of Bill Bishop&amp;#8217;s book, &lt;a href='http://www.thebigsort.com/home.php'&gt;The Big Sort&lt;/a&gt; (&lt;a href='http://www.amazon.com/gp/product/0618689354?ie=UTF8&amp;amp;tag=markmfredcom-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0618689354'&gt;at Amazon&lt;/a&gt;). Bishop&amp;#8217;s position is that people are self-sorting into homogeneous neighborhoods and social networks. Whereas people formerly found themselves in diverse neighborhoods or social groups, modern housing patterns and a lack of cross cutting networks diminishes the opportunities for citizens to encounter different viewpoints and political positions. As a result, the nation is polarizing.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Wendy Tam Cho on Mobility by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5477485610/'&gt;&lt;img width='500' alt='Wendy Tam Cho on Mobility' src='http://farm6.static.flickr.com/5298/5477485610_3c3a49806e.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;By matching voters across several years of voter file, Cho and co-authors can track voters as they move within a state or across a limited number of state lines (provided they re-register at the new address). Voter files do not usually contain very much information (often not much more than name, address, and age &amp;#8212; only what is necessary to establish residence and minimum age for voting), but in some cases, states provide an opportunity to register with a party either during registration or when voting in a primary (rules vary as to when and how one registers, a point to which I will return later). In addition to tracking changes in address, Cho and co-authors track changes in party registration as voters move. They also aggregate party registration to get a sense of the districts in which voters live.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Wendy Tam Cho on Mobility by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5477484270/'&gt;&lt;img width='500' alt='Wendy Tam Cho on Mobility' src='http://farm6.static.flickr.com/5178/5477484270_659381307d.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Combining this data, they find that much of voter migration is driven by improving life circumstances. Measurements such as income in the new district, miles of parks, and other measures of quality of life are linked with most voter migration. They also find that &lt;em&gt;almost everyone is moving to more Republican districts&lt;/em&gt;, both voters registered as Republicans and voters registered as Democrats. In part, these two findings may be identical in a sense. If movements indicate increasing wealth, we may well expect movements into more Republican strongholds by all movers. Perhaps more interesting is focusing in on people who switch party registration after moving. The authors find that voters switching from Republican to Democrat registration are more likely to move to a &lt;em&gt;more Democratic&lt;/em&gt; area. On balance, this is mixed evidence for Bishop&amp;#8217;s position. While some people are moving to more homogeneous areas, others are not.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Wendy Tam Cho on Mobility by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5477488602/'&gt;&lt;img width='500' alt='Wendy Tam Cho on Mobility' src='http://farm6.static.flickr.com/5258/5477488602_1c46574b23.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;While I think this is interesting work, I have some concerns about the use of party registration as a measure for political ideology, party attachment, or vote choice. In fact, I am not exactly sure what party registration indicates. Party registration has different implications in different states. For example, California allows voters to register with a party, but independents may vote in any party&amp;#8217;s primary election (semi-closed primaries). Neighboring Nevada, however, has fully closed primaries that allow only party members to vote in party contests (independents may still vote in non-partisan contests). On face, the nature of party registration in these two states is very different. In Nevada, if an independent wishes to participate in a major party&amp;#8217;s primary, he or she must re-register in advance as a party member. In California, the same voter would only have to show up on election day and select the appropriate ballot. Other states allow re-registration for party on the day of a primary, which could potentially lead to a large amount of switching, depending on the hot races in a given election year. Still other states have completely open primaries where voters can pick any party ballot without re-registering, minimizing the need to switch party registrations at any time (some of these states do not even have party registration at the state level, which I would argue is more than a mere measurement problem for this research). These different institutional rules change the importance and meaning of party registration from state to state.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Wendy Tam Cho on Mobility by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5477487212/'&gt;&lt;img width='500' alt='Wendy Tam Cho on Mobility' src='http://farm6.static.flickr.com/5137/5477487212_20f9f062f3.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Could these variations systematically bias the data Cho and co-authors share? Quite possibly. Participation in the 2008 Democratic presidential primary exceeded previous contests by a wide margin. In some cases, the participants were new to primary politics, but at some of them re-registered. Rush Limbaugh encouraged his listeners to vote in the Democratic primary (and throw their support to the second place candidate) to prolong the uncertainty of the Democratic candidate in November. Depending the state, a Limbaugh fan or a newly interested independent could have either re-registered or just picked up a ballot on election day. In one case, the voter shows up as a switcher and in the other case does not. Generally, if a voter simultaneously moved and registered Democratic in 2008, what does that really tell us about this person&amp;#8217;s partisanship? I&amp;#8217;m not sure. Such bias may work &lt;em&gt;against&lt;/em&gt; Cho and co-authors, making their results more surprising, but I consider it a problem in either case.&lt;/p&gt;

&lt;p&gt;I do not think party registration is an informationless measure, but I challenge Cho and co-authors to provide more information on exactly what party registration indicates. Does it correlate with other measures of ideology and partisanship? Does it predict vote choice? Are there other variables that moderate these relationships? For example, how do differences in registration laws moderate the link between party registration and vote choice? These questions are largely empirical, and I think would provide useful information to evaluate these interesting findings more carefully.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/SK63r-J2JkI" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2011-02-27-wendy-tam-cho-on-voter-migration-and-partisanship.html</feedburner:origLink></entry>
 
 <entry>
   <title>Kathy Cramer Walsh on Rural Perspectives of Political Inequality</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/4V5f0lp2lXc/2011-02-20-kathy-cramer-walsh-on-rural-perspectives-of-political-inequality.html" />
   <updated>2011-02-20T00:00:00-06:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/kathy-cramer-walsh-on-rural-perspectives-of-political-inequality</id>
   <content type="html">&lt;p&gt;Politicians are not the only Wisconsins crossing into Illinois. This week &lt;a href='http://users.polisci.wisc.edu/kwalsh/'&gt;Kathy Cramer Walsh&lt;/a&gt; of UW Madison visited UIUC to discuss &lt;a href='http://www.jakebowers.org/MethodsPapers/Walsh-18-02-2011.pdf'&gt;her on-going participant observation research&lt;/a&gt;.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Kathy Cramer Walsh at UIUC by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5461579429/'&gt;&lt;img width='500' alt='Kathy Cramer Walsh at UIUC' src='http://farm6.static.flickr.com/5136/5461579429_2a9941056f.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Cramer Walsh&amp;#8217;s work is very different from most modern political opinion research, eschewing telephone or internet surveys. Instead, she visits coffee shops and informal meeting locations across Wisconsin. In many ways, her style of information gathering is far more similar to the foundational studies of Robert Lane, though with a focus on group interaction rather than individual interviews. Cramer Walsh traces her interest in participant observation to working on the longitudinal political socialization surveys of Jenngings and Stoker. Like UIUC faculty member &lt;a href='http://www.jakebowers.org'&gt;Jake Bowers&lt;/a&gt;, Cramer Walsh conducted in person interviews as a graduate student. She found the most interesting responses were those offered to open ended questions or between questions. This interest led her to conduct participant observation studies in Michigan and now in Wisconsin.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Kathy Cramer Walsh at UIUC by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5461573815/'&gt;&lt;img width='500' alt='Kathy Cramer Walsh at UIUC' src='http://farm6.static.flickr.com/5173/5461573815_fa09ca6d69.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;In &lt;a href='http://www.jakebowers.org/MethodsPapers/Walsh-18-02-2011.pdf'&gt;her most recent paper&lt;/a&gt;, Cramer Walsh focuses on differences in perceptions and understanding of political issues in rural and urban communities:&lt;/p&gt;
&lt;blockquote&gt;
We should expect place to matter for political understanding because representation is allocated by geography in the United States. Government resources are allocated largely according to these districts. Therefore, individuals’ perceptions of which places get which resources, and which places have power are likely integral parts of the way they think about the political world. I call these perceptions individuals’ geographies of power.
&lt;/blockquote&gt;&lt;p class='floatimg'&gt;
&lt;a title='Kathy Cramer Walsh at UIUC by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5462165756/'&gt;&lt;img width='500' alt='Kathy Cramer Walsh at UIUC' src='http://farm6.static.flickr.com/5254/5462165756_32029bcabe.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;After interviewing and interacting with small groups across rural Wisconsin, she finds that rural participants see political power concentrated in the urban areas. She argues, moreover, that these perceptions are not driven by anti-government sentiments per-se, which might be more commonly held in rural environments, but on a framework based on geographic representation.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Kathy Cramer Walsh at UIUC by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5462177574/'&gt;&lt;img width='500' alt='Kathy Cramer Walsh at UIUC' src='http://farm6.static.flickr.com/5253/5462177574_6f217938e2.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;While different in technique than a research agenda I might approach, I think Cramer Walsh&amp;#8217;s work provides fertile ground for theory building. Specifically, I wonder to what degree current institutions exacerbate rural/urban schisms and what alternative institutions could provide. I am reminded of Andrew Rehfeld&amp;#8217;s recent book, &lt;a href='http://www.amazon.com/gp/product/0521057329?ie=UTF8&amp;tag=markmfredcom-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0521057329'&gt;The Concept of Constituency: Political Representation, Democratic Legitimacy, and Institutional Design&lt;/a&gt;. In this book, Rehfeld considers randomly assigning citizens to representative constituencies instead of electing representatives from geographically defined constituencies. While such a plan might dilute the importance of geography in politics, the lack of specifically rural representatives might marginalize rural citizens even more and make the collective action problem facing rural interests an even greater burden.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Kathy Cramer Walsh at UIUC by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5462178928/'&gt;&lt;img width='500' alt='Kathy Cramer Walsh at UIUC' src='http://farm6.static.flickr.com/5094/5462178928_faf667718b.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Less radical solutions are also possible. When asked about the role of institutions in her findings, she suggested holding legislatures throughout a state on a rotating basis, rather than exclusively at a state capitol, usually an urban location. As my fellow graduate student Matthew Hayes pointed out, we have privileged face-to-face legislator interaction over face-to-face interaction between legislators and constituents. Perhaps it is worth considering technological or institutional methods to reverse this imbalance.&lt;/p&gt;

&lt;p&gt;&lt;a href='http://www.flickr.com/photos/devinf/sets/72157626095448118/with/5462168950/'&gt;More photos of the event.&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/4V5f0lp2lXc" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2011-02-20-kathy-cramer-walsh-on-rural-perspectives-of-political-inequality.html</feedburner:origLink></entry>
 
 <entry>
   <title>Christopher Dawes on Psychological Traits as Intermediaries for Genes</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/jskLqxU-23s/2011-02-19-christopher-dawes-on-psychological-traits-as-intermediaries-for-genes.html" />
   <updated>2011-02-19T00:00:00-06:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/christopher-dawes-on-psychological-traits-as-intermediaries-for-genes</id>
   <content type="html">&lt;p&gt;&lt;a href='http://dss.ucsd.edu/~cdawes/Christopher%20Dawes.html'&gt;Christopher Dawes&lt;/a&gt; visited UIUC this week as part of an ongoing colloquium on the links between genetics and politics. Dawes&amp;#8217; argument is that psychological traits, which he largely defines as personality, are the mechanism linking genetic predispositions and observable political outcomes such as ideology or voting behavior. (My apologies to Dawes&amp;#8217; coauthors. I cannot find a working paper online, and I neglected to record their names.)&lt;/p&gt;

&lt;p&gt;Dawes builds his argument by drawing together several strains of research on the micro-foundations of political behavior. First, there is reasonable support that genetic factors play an important, though not determinate, role in political opinion and behavior. Similarly, research in personality has established consistent correlations between some personality traits, but by no means all, and political behaviors. Moreover, personality appears very heritable, at rates similar to height. Dawes attempts to test whether personality acts as an intermediary between genetic predispositions and political behavior.&lt;/p&gt;

&lt;p&gt;Before proceeding to his results, I have several concerns about the methods Dawes applies. Dawes uses a dataset composed of monozygotic (identical) and dizygotic (fraternal) twins. As a natural experiment, I find twin studies convincing, and the data clearly indicate that MZ twins are consistently more similar than DZ twins. If we accept that twin pair zygosity is truly random, then the difference is causal: being an MZ twin pair &lt;em&gt;causes&lt;/em&gt; greater similarity in behavior. But this conclusion is different than concluding that genetics, and not environmental factors, are responsible. Being an MZ twin may place individuals on a very different path than that of a DZ twin. For example, we know that MZ twins are more frequently dressed alike as children. This emphasis on similarity may influence later behavior. Seeing her identical twin vote may engage the &amp;#8220;similarity&amp;#8221; response and lead to participation in the second twin. From a strictly experimental view, the treatment had an effect, but parceling out that effect into a genetic component and an environmental component, which are causal, is beyond the scope of the design. Most twin studies, Dawes&amp;#8217;s work included, simply assume that these events cannot occur. The &amp;#8220;Equal Environments Assumption&amp;#8221; posits that MZ and DZ twins do not experience systematically different environmental stimuli. In a &lt;a href='/academics/papers/ace-in-the-hole.pdf'&gt;working paper&lt;/a&gt; on the topic, I show how if the EEA fails, our findings will be biased towards a larger genetic component. I am concerned that the pattern of a large genetic component coupled with a small shared environmental component (which Dawes finds in his research) is evidence that the EEA frequently does not hold in practice.&lt;/p&gt;

&lt;p&gt;Even if we assume that MZ and DZ twins experience similar environments, Dawes is attempting to trace causal pathways after the treatment, a practice known as mediation analysis. In a perfect experiment, both genetics and personality would be randomly assigned independently. We could then condition on either genetics or personality and draw causal conclusions about the results we saw. If we take the twin study as a natural experiment, only the percentage of shared genetics are randomly assigned. Attempting to explain the causal effect of non-randomly assigned quantities, i.e. personality, falls outside of the scope&lt;br /&gt;of the design. At the risk of being trite, as the well known expression says, &amp;#8220;Correlation is not causation.&amp;#8221; Observed relationships between personality and political behavior, even in a twin study, are correlations, not necessarily causal. (For a more detailed review of the difficulties of mediation analysis, see &lt;a href='http://ann.sagepub.com/content/628/1/200.abstract'&gt;Green, Ha, and Bullock (2010)&lt;/a&gt;.)&lt;/p&gt;

&lt;p&gt;Returning to the article at hand, Dawes&amp;#8217; results are consistent with previous findings that both personality traits and political behavior are related to genetics. He also concludes that personality traits correlate with political behavior and are therefore mediators, causal agents that are consequents of both genetics and environmental factors. To his credit, Dawes conditions these results on the Equal Environments Assumption and an assumption that gene-environment interactions do not occur. But if these assumptions do not hold, and we do not know that they do, Dawes&amp;#8217; findings are ambiguous. Concerning the dangers of mediation analysis, Dawes is not unaware of the issue, but I am concerned that this research extends twins studies beyond their defensible conclusions.&lt;/p&gt;

&lt;p&gt;So where does this leave us on this research and twin studies in general? My intuition is that Dawes is basically correct: genetic predipositions manifest as personality traits, which in turn define default modes of behavior, including political choices. I remain skeptical that the methods employed demonstrate this causal chain. For my own part, I am working on extending randomization inference to twins studies to address these methodological issues. While the basic limitations of twin studies still apply, randomization inference provides a principled method to engage with hypotheses concerning deviations from the assumptions of the standard twins study analysis. Instead of simply assuming an additive model where environment does not interact with genetics, I purpose testing hypotheses on the amount of possible interactive effects under different models, in effect a sensitivity analysis of the classical models. Rather than condition our results on assumptions, we can then provide a range of plausible levels of gene-environment interaction that are consistent with our data, a form of analysis I would much prefer to see in this research and other genetics research.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/jskLqxU-23s" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2011-02-19-christopher-dawes-on-psychological-traits-as-intermediaries-for-genes.html</feedburner:origLink></entry>
 
 <entry>
   <title>Peeking inside R functions</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/8K5aHDlsD38/2011-02-06-peeking-inside-r-functions.html" />
   <updated>2011-02-06T00:00:00-06:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/peeking-inside-r-functions</id>
   <content type="html">&lt;p&gt;&lt;a href='http://www.r-project.org'&gt;R&lt;/a&gt;, like all good programming languages, treats functions as first class objects. Users can create functions, pass them as arguments, and have them returned as the result of other computations. You may be familiar with passing functions as arguments if you have used the &lt;tt&gt;apply&lt;/tt&gt; family of functions (i.e. &lt;tt&gt;apply, sapply, lapply, mapply&lt;/tt&gt;). For example, to get get the median of the columns of a data frame:&lt;/p&gt;
&lt;code&gt;&lt;pre&gt;
&amp;gt; data(airquality)
&amp;gt; apply(airquality, 2, median)
  Ozone Solar.R    Wind    Temp   Month     Day 
     NA      NA     9.7    79.0     7.0    16.0 
&lt;/pre&gt;&lt;/code&gt;
&lt;p&gt;In this example, since some of the columns have &lt;tt&gt;NA&lt;/tt&gt; values, the reported medians are also &lt;tt&gt;NA&lt;/tt&gt;. We can amend the above example to drop missing values and demonstrate creating our own function to pass to &lt;tt&gt;apply&lt;/tt&gt;:&lt;/p&gt;
&lt;code&gt;&lt;pre&gt;
&amp;gt; apply(airquality, 2, function(column) {
+     median(column, na.rm = T)
+ })
  Ozone Solar.R    Wind    Temp   Month     Day 
   31.5   205.0     9.7    79.0     7.0    16.0 
&lt;/pre&gt;&lt;/code&gt;
&lt;p&gt;First class functions are useful in many scenarios. We can use them like objects to hold information. Here is a contrived example that creates functions that increment by a set amount. Observe that each function gets its own value of &lt;tt&gt;n&lt;/tt&gt;, which it uses when called:&lt;/p&gt;
&lt;code&gt;&lt;pre&gt;
&amp;gt; adder &amp;lt;- function(n) {
+     function(i) {
+         n + i
+     }
+ }
&amp;gt; f1 &amp;lt;- adder(7)
&amp;gt; f2 &amp;lt;- adder(3)
&amp;gt; f1(10)
[1] 17
&amp;gt; f2(10)
[1] 13
&lt;/pre&gt;&lt;/code&gt;
&lt;p&gt;Another feature of R is that functions carry their source code around with them. If ever want to know what &lt;tt&gt;f1&lt;/tt&gt; does, we can just ask R to print out the source:&lt;/p&gt;
&lt;code&gt;&lt;pre&gt;
&amp;gt; f1
function (i) 
{
    n + i
}
&amp;lt;environment: 0xcdc600&amp;gt;
&lt;/pre&gt;&lt;/code&gt;
&lt;p&gt;While the source will show us that a variable named &lt;tt&gt;n&lt;/tt&gt; is used, it does not tell us anything about the value of &lt;tt&gt;n&lt;/tt&gt;. We know that the value of &lt;tt&gt;n&lt;/tt&gt; in the two functions is 7 and 3, respectively, but if functions are created programmatically, as say part of a loop, we might not know what these values are. Luckily, functions also expose their &lt;em&gt;environments&lt;/em&gt;, the set of variable names and values from the surrounding scope (the &lt;tt&gt;adder&lt;/tt&gt; function in the above example). While R does not print out these environments by default, we can use a simple helper function to peek inside the function scope:&lt;/p&gt;
&lt;code&gt;&lt;pre&gt;
&amp;gt; fnpeek &amp;lt;- function(f, name = NULL) {
+     env &amp;lt;- environment(f)
+     if (is.null(name)) {
+         return(ls(envir = env))
+     }
+     if (name %in% ls(envir = env)) {
+         return(get(name, env))
+     }
+     return(NULL)
+ }
&amp;gt; fnpeek(f1)
[1] "n"
&amp;gt; fnpeek(f1, "n")
[1] 7
&lt;/pre&gt;&lt;/code&gt;
&lt;p&gt;If you do not have one already, go make a &lt;tt&gt;~/.Rprofile&lt;/tt&gt; file and stick this function in there. You will use it. I promise. I recently used it to diagnose &lt;a href='/thoughts/2010-06-23-scoping-bugs.html'&gt;a problem that had been bugging me for some time&lt;/a&gt;. The problem concerned creating a series of functions. Using the &lt;tt&gt;adder&lt;/tt&gt; example above:&lt;/p&gt;
&lt;code&gt;&lt;pre&gt;
&amp;gt; adders &amp;lt;- lapply(1:5, adder)
&amp;gt; sapply(adders, function(f) {
+     f(10)
+ })
[1] 15 15 15 15 15
&lt;/pre&gt;&lt;/code&gt;
&lt;p&gt;The output should be &lt;tt&gt;11 12 13 14 15&lt;/tt&gt;, but instead it is constantly 15. This is because in the loop that creates the adder functions, they all share a common &lt;tt&gt;n&lt;/tt&gt;, which is overwritten during the loop. The &lt;tt&gt;lapply&lt;/tt&gt; function is equivalent to:&lt;/p&gt;
&lt;code&gt;&lt;pre&gt;
&amp;gt; adders &amp;lt;- vector(mode = "list", length = 5)
&amp;gt; for (i in 1:5) {
+     adders[[i]] &amp;lt;- adder(i)
+ }
&amp;gt; sapply(adders, function(f) {
+     f(10)
+ })
[1] 15 15 15 15 15
&lt;/pre&gt;&lt;/code&gt;
&lt;p&gt;In each loop, the &lt;tt&gt;i&lt;/tt&gt; variable is overwritten with a new value. Since all the functions point to this single memory address, they all effectively share the same value of &lt;tt&gt;n&lt;/tt&gt; in the function body. I suspect this is a consequence of R&amp;#8217;s call by reference function calls. Usually this is not a problem, but in loops, call by value would have been the correct behavior. Luckily, the workaround to create call by value like behavior is relatively simple: save the value of &lt;tt&gt;n&lt;/tt&gt; in the local environment of the outer function.&lt;/p&gt;
&lt;code&gt;&lt;pre&gt;
&amp;gt; safe.adders &amp;lt;- function(n) {
+     n &amp;lt;- n
+     function(i) {
+         n + i
+     }
+ }
&amp;gt; safe.adders &amp;lt;- lapply(1:5, safe.adders)
&amp;gt; sapply(safe.adders, function(f) {
+     f(10)
+ })
[1] 11 12 13 14 15
&lt;/pre&gt;&lt;/code&gt;
&lt;p&gt;While not ideal, at least this workaround is relatively simple (&lt;a href='/thoughts/2010-06-23-scoping-bugs.html'&gt;especially compared to my last solution&lt;/a&gt;) and gets us all the benefits we would expect of first class functions.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The version of R used in this post was 2.11.1 (2010-05-31)&lt;/em&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/8K5aHDlsD38" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2011-02-06-peeking-inside-r-functions.html</feedburner:origLink></entry>
 
 <entry>
   <title>Jon Hurwitz UIUC Seminar</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/ZASaz4cI-Xc/2011-01-30-jon-hurwitz-uiuc-seminar.html" />
   <updated>2011-01-30T00:00:00-06:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/jon-hurwitz-uiuc-seminar</id>
   <content type="html">&lt;p&gt;&lt;a href='http://www.pitt.edu/~politics/faculty/hurwitz.html'&gt;Jon Hurwitz&lt;/a&gt; (University of Pittsburgh) visited the University of Illinois to present findings from his latest book, &lt;a href='http://www.amazon.com/gp/product/0521134757?ie=UTF8&amp;tag=markmfredcom-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0521134757'&gt;Justice in America: The Separate Realities of Blacks and Whites&lt;/a&gt;, with &lt;a href='http://www.uky.edu/AS/PoliSci/Peffley/'&gt;Mark Peffley&lt;/a&gt; (University of Kentucky). The Canadian Journal of Political Science published an &lt;a href='http://dx.doi.org/10.1017/S0008423910000120'&gt;article based on the book&lt;/a&gt;, and Hurwitz largely followed the flow of the article in his talk.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Jon Hurwitz at UIUC by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5396945262/'&gt;&lt;img width='500' alt='Jon Hurwitz at UIUC' src='http://farm5.static.flickr.com/4144/5396945262_941ba0e13a.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Hurwitz framed the research out of vastly different reactions to court decisions, split along racial lines (the most salient being the Rodney King and O.J. Simpson decisions). He and his co-author commissioned a national U.S. survey that combined demographic questions, attitudes towards the justice system, and a series of embedded framing/priming experiments. Unsurprisingly, the authors found whites to be more likely to consider the justice system unbiased, while blacks considered the system to be biased on many different levels.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Jon Hurwitz at UIUC by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5396944188/'&gt;&lt;img width='500' alt='Jon Hurwitz at UIUC' src='http://farm6.static.flickr.com/5053/5396944188_4c1404e936.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;As an explanation, the authors argue that individual experiences frame perceptions of fairness of the entire system. They argue (and their data support the claim) that individuals who have had unfair interactions with police or the justice system (self reported) are more more likely to see the system as biased.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Jeff Mondak asks a question by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5396356969/'&gt;&lt;img width='500' alt='Jeff Mondak asks a question' src='http://farm5.static.flickr.com/4115/5396356969_46a0aecd9e.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Hurwitz granted the causal linkage could be flowing in either direction (the authors did not have data to disentangle their claim); nevertheless, the claim appears plausible given what we know about individuals using their experiences to evaluate larger phenomena.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Jim Kuklinski shares a story by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5396355655/'&gt;&lt;img width='500' alt='Jim Kuklinski shares a story' src='http://farm5.static.flickr.com/4142/5396355655_795fb35f73.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;The authors invoke this logic to explain perhaps the most interesting finding of the paper. The authors embedded a framing experiment asking respondents to express support or objection to capital punishment. From the CJPS paper:&lt;/p&gt;
&lt;blockquote&gt;In our death penalty experiment, we compare the efficacy of two very different arguments against capital punishment, one that contains a racial frame and one that does not, looking particularly at how the impact of the two messages differs across the race of the audience. In the _baseline condition_ (to which one-third of our respondents have been randomly assigned), individuals simply respond to the question: "Here is a question about the death penalty. Do you strongly oppose, somewhat oppose, somewhat favour, or strongly favour the death penalty for persons convicted of murder"? In the racial argument condition, individuals are asked the same question, only preceded by the statement "Here is a question about the death penalty. Some people say that the death penalty is unfair because most of the people who are executed are African Americans."; and in a non-racial argument condition, the baseline question is preceded by "Some people say that the death penalty is unfair because too many innocent people are being executed."
&lt;/blockquote&gt;
&lt;p&gt;The results of the experiment are recreated below in Table 2 of the paper (highlighting mine). Compared to baseline, whites show very little difference in the innocent argument condition, but move an amazing 12 points more supportive of capital punishment when it is framed as a racial issue. Blacks, on the other hand, are equally less supportive of capital punishment in either condition. Hurwitz and Peffley were expecting whites to be more supportive in general but were astounded by the positive shift in the racial condition. They argue that whites tend to view racial disparities as the result of &amp;#8220;dispositional&amp;#8221; causes, that is white blame blacks at an individual level, but see the system as inherently fair, so that the racial cue reminds them that the system is doing its job.&lt;/p&gt;
&lt;p class='floatimg'&gt;
  &lt;img src='/thoughts/2011-01-30-table2.png' /&gt;
&lt;/p&gt;
&lt;p&gt;While the shift in white support is surprising, I actually find the shift in black support more intriguing. Hurwitz and Peffley focus on the similarity between the racial and innocence arguments and argue that blacks already view capital punishment in racial terms, so both frames had the same effect. But if this is the case, &lt;em&gt;why do we see any change from the baseline?&lt;/em&gt; If black respondents already view capital punishment in racial terms, why is any frame necessary? The likely explanation is that there is some moderating factor other than race driving these results. While this attribute may be more common in African American respondents, it probably spans the racial divide as well. Hurwitz and Peffley offer attributions of racial disparities in the justice system (dispositional versus structural), but I do not think that fully explains the results they find. I guess I&amp;#8217;ll have to wait for the next book.&lt;/p&gt;

&lt;p&gt;&lt;a href='http://www.flickr.com/photos/devinf/sets/72157625802063643/'&gt;More images of the event.&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/ZASaz4cI-Xc" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2011-01-30-jon-hurwitz-uiuc-seminar.html</feedburner:origLink></entry>
 
 <entry>
   <title>A quick review of Destination Dissertation</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/zIRSDP79bbY/2011-01-26-a-quick-review-of-destination-dissertation.html" />
   <updated>2011-01-26T00:00:00-06:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/a-quick-review-of-destination-dissertation</id>
   <content type="html">&lt;p&gt;I was recently lent a copy of &lt;a href='http://www.amazon.com/gp/product/0742554406?ie=UTF8&amp;tag=markmfredcom-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0742554406'&gt;Destination Dissertation: A Traveler's Guide to a Done Dissertation&lt;/a&gt;&lt;img style='border:none !important; margin:0px !important;' width='1' alt='' src='http://www.assoc-amazon.com/e/ir?t=markmfredcom-20&amp;l=as2&amp;o=1&amp;a=0742554406' height='1' border='0' /&gt;. This review is based upon a brief skim of the introduction and chapters that most directly appealed to me, given my current progress on my dissertation. I would classify my position has having a rough idea (separating the effects of elections from the effects of democracy), but still working on the finer points of my research question. The value of this book may be less to graduate students further along the process.&lt;/p&gt;

&lt;p&gt;The authors of &lt;em&gt;Destination Dissertation&lt;/em&gt; provide an interesting metaphor for writing a dissertation: instead of a task to be completed, a dissertation should be viewed as a trip or a journey, something pleasurable. Thankfully, the authors do not stress this metaphor too heavily, but simply use it to bind together the 12 chapters with a few lines returning to the theme at the beginning of each chapter. More valuable than the trip metaphor is the authors&amp;#8217; ability to turn vague goals into concrete steps, specifically 29 of them. While 29, at first blush, may appear to a be a high number of steps, the fact that they can define any dissertation in a finite number of concrete steps I find to be an amazing feat. Also note that these steps are not vague brush strokes, but specific actions to be taken, including checklists and techniques to be used in the writing and researching phases. Even more amazing, at least to this reviewer, is that the authors estimate a typical dissertation could be completed in 6.5 months, assuming a 40 hour work week devoted to the study, an encouraging, if ambitious, goal.&lt;/p&gt;

&lt;p&gt;My quick reading of the book focused on three key areas: the literature review step, techniques for writing, and the chapter devoted to common pitfalls. All three of these sections are opinionated. This is both a blessing and a curse. The opinionated writing manifests as concrete, if inflexible, prescriptions. In the literature review section the authors provide a specific way of gathering notes. In the writing section, the authors demand frequent writing and encourage the use of fast-writing techniques, in which little attention is paid to details such as typos and exact citations (substituting comments such as &amp;#8220;CITE THIS LATER&amp;#8221; or &amp;#8220;LOOK UP EXACT NUMBER&amp;#8221;). The authors even go so far as to recommend turning of the computer monitor when writing to eliminate one&amp;#8217;s ability to get tied up in typos or formatting. This review was written using their recommend fast writing/slow review process, and I found it both enjoyable and effective in putting my ideas to paper.&lt;/p&gt;

&lt;p&gt;The downside of the concrete recommendations is inflexibility. In the literature review section the authors provide a detailed and specific method of culling resources and composing the initial review materials. The method looks interesting, and I intend to employ it at least provisionally, but it may not work for all readers. If the solution does not work, then what? The authors have little say if you vary from their positions. Similarly, they define the common pitfalls of dissertation using a series of case studies. If a reader found a story to describe his or her behavior, the recommended solutions may help, but if the authors&amp;#8217; list does not include a reader, there are few other ways of applying the recommendations.&lt;/p&gt;

&lt;p&gt;My previous experience with dissteration guides is limited to a single book on conducting a literature review. This book suffered from either overly vague prescriptions or already outdated technology recommendations (&amp;#8220;Be sure to use Gopher.&amp;#8221;) While I am wary of the inflexibility of the &lt;em&gt;Destination Dissertation&lt;/em&gt; approach, I certainly prefer the authors&amp;#8217; clear recommendations to the problems of my previous reading. Even if you do not think the authors&amp;#8217; approach will serve you, at least you know what they are recommending (compared to vague or technologically irrelevant advice). I would certainly recommend &lt;a href='http://www.amazon.com/gp/product/0742554406?ie=UTF8&amp;tag=markmfredcom-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0742554406'&gt;Destination Dissertation: A Traveler's Guide to a Done Dissertation&lt;/a&gt;&lt;img style='border:none !important; margin:0px !important;' width='1' alt='' src='http://www.assoc-amazon.com/e/ir?t=markmfredcom-20&amp;l=as2&amp;o=1&amp;a=0742554406' height='1' border='0' /&gt; on this basis alone.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/zIRSDP79bbY" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2011-01-26-a-quick-review-of-destination-dissertation.html</feedburner:origLink></entry>
 
 <entry>
   <title>Kevin Clarke visits UIUC</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/vIBDDNVQ6AA/2011-01-22-kevin-clarke-visits-uiuc.html" />
   <updated>2011-01-22T00:00:00-06:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/kevin-clarke-visits-uiuc</id>
   <content type="html">&lt;p&gt;The methodology seminar series kicked off the semester this week with a visit from &lt;a href='http://www.rochester.edu/college/psc/clarke/'&gt;Kevin Clarke&lt;/a&gt; of the University of Rochester. Clarke, with coauthor David Primo, is finishing on a book on the use of models in political science research.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Kevin Clarke at UIUC by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5376685371/'&gt;&lt;img width='500' alt='Kevin Clarke at UIUC' src='http://farm6.static.flickr.com/5248/5376685371_7ca8f5502c.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;The basic premise of Clarke and Primo is that models should be treated like maps. Like maps, models are not inherently true or false, but rather are &lt;em&gt;useful&lt;/em&gt; for some purpose. Likewise, theories, as collections of models, are also outside the true-false dichotomy.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Kevin Clarke at UIUC by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5377288164/'&gt;&lt;img width='500' alt='Kevin Clarke at UIUC' src='http://farm6.static.flickr.com/5208/5377288164_018fc20079.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;On the whole, I would say the idea was well received by the political science department at Illinois. Much of the debate concerned how to define &amp;#8220;usefulness.&amp;#8221; In the case of a map, there is usually a natural, physical use that provides a clear cut metric of usefulness. In political science, justifying the criteria upon which models should be judged is a more difficult task. A related criticism that was raised in the debate is that substituting &amp;#8220;useful&amp;#8221; for &amp;#8220;true&amp;#8221; is simply a semantic exchange. For the purposes of theory testing, a main focus of modern political science research, &amp;#8220;useful&amp;#8221; would be defined as the ability of the model to provide a clear test of an observable deduction, which we might in other discussions simply call falsification of the model.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Kevin Clarke at UIUC by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5377293492/'&gt;&lt;img width='500' alt='Kevin Clarke at UIUC' src='http://farm6.static.flickr.com/5123/5377293492_13fcc923ff.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Clarke&amp;#8217;s response, and a theme of the book, is that actually testing a deduction of a theory requires a model itself. Political scientists are certainly familiar with the phrase &amp;#8220;data do not speak for themselves,&amp;#8221; but still consider a model to be a test based purely on the data. Clarke argues that the test itself is therefore based on a model of the data (though there was some debate whether pure experiments did allow the data to directly test a theory &amp;#8212; I ultimately disagreed, even simple experiments require a model, but that is the topic for another post). In many ways, I found this part of his argument to be the most convincing: how do we test one model (scientific) with another model (statistical)?&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Kevin Clarke at UIUC by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5377301298/'&gt;&lt;img width='500' alt='Kevin Clarke at UIUC' src='http://farm6.static.flickr.com/5090/5377301298_b25f7f8607.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;The book is an ongoing work and is notably missing a final concluding section. Clarke indicated that this chapter would consider counter-arguments to the models as maps position. I look forward to reading his responses, as well as seeing more concrete applications of models in maps in political science. Clarke himself pointed out that change in the discipline will be hard (perhaps accomplished through replacement rather than persuasion), so I do not expect to see a his argument cited in all the articles in the next APSR.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Kevin Clarke at UIUC by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5376696467/'&gt;&lt;img width='500' alt='Kevin Clarke at UIUC' src='http://farm6.static.flickr.com/5164/5376696467_02c162c014.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Nevertheless, I do hope to see more research think carefully about the usefulness the employed models for purposes other than theory testing. In fact, selecting the best model for the research is the core of the argument in a &lt;a href='/academics/papers/political-information-simulation.pdf'&gt;working paper&lt;/a&gt; of mine, with John Ostrowski. In this paper, we argue that simulation studies of knowledge (in which researchers construct a counterfactual world of fully informed citizens) should employ models that are the most accurate at predicting responses in observed data. Previous studies used only linear models, while we consider a variety of linear and non-linear machine learning techniques. While the best linear model is as accurate as the best non-linear model, the non-linear models predict relatively little change under a fully informed population, indicating that political knowledge may not be as important to attitude formation as we previously thought.&lt;/p&gt;
&lt;p class='floatimg'&gt;
&lt;a title='Kevin Clarke at UIUC by devinf, on Flickr' href='http://www.flickr.com/photos/devinf/5377300614/'&gt;&lt;img width='500' alt='Kevin Clarke at UIUC' src='http://farm6.static.flickr.com/5008/5377300614_410fbe73cd.jpg' height='375' /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;In the immediate future, I see simulations (such as our paper or agent based approaches) as the most likely standard bearers for the Clarke and Primo position. Whether the models as maps position takes hold in a larger audience remains to be seen.&lt;/p&gt;

&lt;p&gt;&lt;a href='http://www.flickr.com/photos/devinf/sets/72157625752926769/'&gt;More photos of the event&lt;/a&gt;.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/vIBDDNVQ6AA" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2011-01-22-kevin-clarke-visits-uiuc.html</feedburner:origLink></entry>
 
 <entry>
   <title>Working papers published</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/3cYCXK2Livk/2010-12-06-working-papers-published.html" />
   <updated>2010-12-06T00:00:00-06:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/working-papers-published</id>
   <content type="html">&lt;p&gt;I just published &lt;a href='/academics'&gt;some working papers&lt;/a&gt; that had been collecting dust on my hard drive. In various states of comleteness, these papers range from fairly complete to simply statements of research designs I&amp;#8217;d like to implement. &lt;a href='mailto:mark.m.fredrickson@gmail.com'&gt;Questions, comments, etc.&lt;/a&gt; are always welcome.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/3cYCXK2Livk" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2010-12-06-working-papers-published.html</feedburner:origLink></entry>
 
 <entry>
   <title>Speeding up Optmatch while improving match quality</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/f_NmzDJOqfQ/2010-11-12-speeding-up-optmatch-while-improving-match-quality.html" />
   <updated>2010-11-12T00:00:00-06:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/speeding-up-optmatch-while-improving-match-quality</id>
   <content type="html">&lt;p&gt;&amp;#8220;Fast, cheap, correct: Pick two.&amp;#8221; Does this phrase apply to statistical matching algorithms? In the case of &lt;a href='http://cran.r-project.org/web/packages/optmatch/index.html'&gt;Optmatch&lt;/a&gt;, you can have all three. &amp;#8220;Cheap&amp;#8221; is easy: it is open source. You can download it for free. Today I&amp;#8217;m going to explain how to make the matching process both faster and more substantively relevant using a technique we call &amp;#8220;pre-stratification&amp;#8221;: splitting your data into smaller matching problems.&lt;/p&gt;

&lt;h2 id='the_problem'&gt;The Problem&lt;/h2&gt;

&lt;p&gt;&lt;a href='http://www.stat.lsa.umich.edu/%7Ebbh/'&gt;Ben Hansen&lt;/a&gt; and I often receive messages from Optmatch users of the form: &amp;#8220;I have a very large matching problem, and Optmatch is taking a very, very long time to complete. Is there anything I can do?&amp;#8221; For example, say that you have some data of the form:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;gt; fake.data &amp;lt;- data.frame(z = rep(c(1,0), 3000), 
                          x = runif(6000))&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Where &lt;code&gt;z&lt;/code&gt; is a indicator of whether the unit received &amp;#8220;treatment&amp;#8221; and &lt;code&gt;x&lt;/code&gt; is a covariate you wish to match on (or it could be a summary of covariates, such as a propensity score). If one were to invoke Optmatch on this data directly, it could take a long time. I don&amp;#8217;t suggest you try it, but it might look something like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;gt; library(optmatch)
&amp;gt; my.matching &amp;lt;- pairmatch(mdist(z ~ x, data = fake.data))&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In this code, &lt;code&gt;mdist&lt;/code&gt; prepares a treatment by control matrix where each entry is the &lt;a href='http://en.wikipedia.org/wiki/Mahalanobis_distance'&gt;Mahalanobis distance&lt;/a&gt; between each pair. &lt;code&gt;pairmatch&lt;/code&gt; finds the best set of treatment-control pairs, minimizing the average distance within pairs. We&amp;#8217;ll see another example of &lt;code&gt;mdist&lt;/code&gt; below and more examples of both functions are contained in the online documentation (e.g. &lt;code&gt;&amp;gt; ?mdist&lt;/code&gt;).&lt;/p&gt;

&lt;h2 id='stratification'&gt;Stratification&lt;/h2&gt;

&lt;p&gt;Of course, this example will be &lt;em&gt;slow&lt;/em&gt;. You are telling optmatch to compare 3000 treated units with 3000 control units, which is a very, very large search space. We would recommend limiting the comparisons of treated and control units based on another covariate, stratifying the data into smaller subgroups prior to matching. Usually the best way is to use a categorical variable of substantive purpose.&lt;/p&gt;

&lt;p&gt;For example, say you have a continuous covariate (x), a treatment indicator (z), and a factor indicating male or female (gender)&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;gt; fake.data &amp;lt;- data.frame(z = rep(c(1,0), 3000), x = runif(6000),
                          gender = c(rep(0, 1500), rep(1, 1500)))&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Perhaps previous studies indicate gender to be an important determinant to whether subjects self-select into treatment (this can be verified by &lt;a href='/thoughts/2010-07-30-using-optmatch-and-ritools-for-observational-studies.html'&gt;checking the balance&lt;/a&gt; of male and female treatment and control subjects). Limiting matches to same gender pairs will likely &lt;em&gt;improve&lt;/em&gt; the quality of your matches (as compared to ignoring gender) and will also speed up the matching process.&lt;/p&gt;

&lt;p&gt;Using &lt;code&gt;mdist&lt;/code&gt; again, we can create a set of distances for male subjects (treated and control) and separately female subjects by indicating that &lt;code&gt;gender&lt;/code&gt; should be a stratifying variable and the distances can be fed to optmatch:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;gt; distances &amp;lt;- mdist(z ~ x | gender, data = fake.data)
&amp;gt; my.matches &amp;lt;- pairmatch(distances)
&amp;gt; summary(my.matches)
 Structure of matched sets:
  1:1 
 3000 
 Effective Sample Size:  3000 
 (equivalent number of matched pairs).
 
 sum(matched.distances)=4.33
 (within 5.01 of optimum).
 Percentiles of matched distances:
       0%      50%      95%     100% 
 5.83e-10 8.88e-04 4.75e-03 8.96e-03 &lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Watching the R process as this ran, I saw it took about 200mb of RAM to compute the distances and the pair match, and it only took a few seconds. By comparison, the 3000 by 3000 matching task took all my available memory (forcing other apps to be pushed to swap) and did not complete in the 10 minutes I allowed it to run.&lt;/p&gt;

&lt;h2 id='propensity_scores'&gt;Propensity Scores&lt;/h2&gt;

&lt;p&gt;Clearly, stratification improves the execution time of matches. I&amp;#8217;ve also found that stratified matches do very well compared to propensity score models that would also include the stratification variable. In other words, if were to build a propensity score model of treatment and include gender, I could so in this fashion:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;gt; match.model &amp;lt;- glm(z ~ x + gender, data = fake.data, 
                     family = binomial)
&amp;gt; my.matches &amp;lt;- pairmatch(mdist(match.model))&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;But like the original method, this would require a 3000 by 3000 entry matrix to search. Again, the faster way is to both include the stratifying variable from the propensity model and use it directly in the match:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;gt; match.model &amp;lt;- glm(z ~ x + gender, data = fake.data, 
                     family = binomial)
&amp;gt; my.matches &amp;lt;- pairmatch(mdist(match.model, 
                                structure.fmla = ~ gender))&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Like the stratification above using Mahalanobis distance, the propensity score example completes much more quickly when &lt;code&gt;gender&lt;/code&gt; is used to stratify the data before the match.&lt;/p&gt;

&lt;h2 id='conclusion'&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;While I&amp;#8217;ve been arguing from a speed perspective so far, I also think that stratification improves the substantive quality of matches. Matches that stratify along variables with strong theoretical importance or that have been shown to be strong predictors of treatment selection in previous studies make good choices for stratifying variables as they improve the rational for the matching strategy. Even readers unfamiliar with the matching literature understand that stratifying limits comparisons to comparable units. Ultimately, convincing others that particular matching strategy allows for valid causal inference is a matter of rhetoric. Stratification can be another tool in creating believable matching scenarios.&lt;/p&gt;

&lt;p&gt;The rhetorical aspect of matching can also be improved by quantitative analysis of the match quality, specifically balance testing. &lt;a href='/thoughts/2010-07-30-using-optmatch-and-ritools-for-observational-studies.html'&gt;I&amp;#8217;ve written on testing balance on this website before&lt;/a&gt;. The &lt;a href='http://cran.r-project.org/web/packages/optmatch/index.html'&gt;Optmatch&lt;/a&gt; and &lt;a href='http://cran.r-project.org/web/packages/optmatch/index.html'&gt;RItools&lt;/a&gt; documentation provides more examples of matching and balance testing strategies. When you have sped up the matching process using stratification, it is easy to compare balance on many different matching strategies to find the one that best fits your data.This again is an example of faster matching providing higher quality results. &amp;#8220;Fast, cheap, correct&amp;#8221; &amp;#8211; Optmatch has them all.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/f_NmzDJOqfQ" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2010-11-12-speeding-up-optmatch-while-improving-match-quality.html</feedburner:origLink></entry>
 
 <entry>
   <title>Combinadics in R</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/1MVfJ--tJnU/2010-08-06-combinadics-in-r.html" />
   <updated>2010-08-06T00:00:00-05:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/combinadics-in-r</id>
   <content type="html">&lt;p&gt;A question on &lt;a href='http://stats.stackexchange.com/questions/1286/how-can-i-obtain-some-of-all-possible-combinations-in-r'&gt;stats.stackexchange.com&lt;/a&gt; reminded me of some code I wrote earlier this summer. The code provides a correspondence between the natural numbers 1 to (N choose K) and all the unique K sized combinations one could draw from N items. This relationship is know as the &lt;a href='http://msdn.microsoft.com/en-us/library/aa289166%28VS.71%29.aspx'&gt;combinadic&lt;/a&gt; of an integer (and my code is pased on the reference implementation). Generating combinations is useful for permutation tests, in which one applies a test statistic on all possible allocations of treatment to an experimental pool.&lt;/p&gt;

&lt;p&gt;Since the number of possible combinations grows extremely rapidly, realizing all possible combinations at once can be extremely memory intentsive. Using combinadics, one can trade increased execution time for lower memory usage. Since they are indexed by integers, keeping track of which combination is currently used is trivial. But since the number of combinations grows very quickly, we still need to handle extremely large integers, perhaps larger than the default integer type in R accepts. Luckily, the GMP package provides for &amp;#8220;big ints,&amp;#8221; with which we can write a &amp;#8220;N Choose K&amp;#8221; algorithm for arbitrarily large numbers:&lt;/p&gt;
&lt;code&gt;&lt;pre&gt;
library(gmp)
bigchoose &amp;lt;- function(n, k) {
    if (n &amp;lt; 1 || k &amp;lt; 1 || k &amp;gt; n) {
        return(as.bigz(0))
    }
    if (n == k) {
        return(as.bigz(1))
    }
    if (k &amp;gt; (n/2)) {
        k &amp;lt;- n - k
    }
    numer &amp;lt;- as.bigz(1)
    for (i in n:(n - k + 1)) {
        numer &amp;lt;- numer * i
    }
    denom &amp;lt;- as.bigz(1)
    for (i in 1:k) {
        denom &amp;lt;- denom * i
    }
    return(numer/denom)
}
&lt;/pre&gt;&lt;/code&gt;
&lt;p&gt;Here are two functions to turn an integer into a vector representing a combination (a process I call decoding) and to turn a combination into an integer (encoding). As I expect that these operations will be frequent for a given N and K, &lt;em&gt;these functions produce functions that take integers and combinations&lt;/em&gt;, respectively.&lt;/p&gt;
&lt;code&gt;&lt;pre&gt;
combinadic.decoder.factory &amp;lt;- function(n, k) {
  # n, k are fixed at the start 
  # i is the combinadic index (0 &amp;lt; i &amp;lt; n) 

  # precompute a few sequences we'll need
  ks &amp;lt;- k:1 # the bottom of the choose tests
  max.combinadic &amp;lt;- bigchoose(n, k)
  function(i) {
    i &amp;lt;- as.bigz(i)
    if (i &amp;lt; 1 || `&amp;gt;.bigz`(i, max.combinadic)) {
      stop(paste("Combinatic out of range for", n, "choose", k ))
    }
    
    # part of the frequent translation to R's 1... sequences
    i &amp;lt;- i - 1
    # initialize a vector to hold the values
    
    remaining &amp;lt;- i
    previous.candidate &amp;lt;- n
    combination &amp;lt;- numeric(k)

    for(j in ks) {
      value &amp;lt;- remaining + 1
      while (`&amp;gt;.bigz`(value, remaining)) {
        current.candidate &amp;lt;- previous.candidate - 1
        value &amp;lt;- bigchoose(current.candidate, j)
        previous.candidate &amp;lt;- current.candidate
      }
      remaining &amp;lt;- sub.bigz(remaining, value)
      combination[j] &amp;lt;- current.candidate
    }

    return(combination + 1) # translate to 1... counting
  }  
}

combinadic.encoder.factory &amp;lt;- function(n, k) {
  ks &amp;lt;- 1:k    
  function(encoded) {
    stopifnot(length(encoded) == k)
    encoded &amp;lt;- encoded - 1 # translate from 1... counting
    expanded &amp;lt;- as.bigz(0)
    for (i in ks) {
      expanded &amp;lt;- `+.bigz`(expanded, bigchoose(encoded[i], i))
    }
    return(expanded + 1)
  }  
}
&lt;/pre&gt;&lt;/code&gt;
&lt;p&gt;Finally, as an illustration, the classic &lt;a href='http://en.wikipedia.org/wiki/Lady_tasting_tea'&gt;Lady Tasting Tea problem&lt;/a&gt;. There are 8 cups, 4 of which have the milk added first (for concreteness, say these are the first 4 cups). What is the distribution of correctly labeling cups as having milk added first? To answer the question, we need a test statistic to indicate how many cups were correctly labeled.&lt;/p&gt;
&lt;code&gt;&lt;pre&gt;
&amp;gt; test.statistic &amp;lt;- function(cups) {
+     sum(cups %in% c(1, 2, 3, 4))
+ }
&lt;/pre&gt;&lt;/code&gt;
&lt;p&gt;Apply this test statistic to each possible allocation of cups, which corresponds to a combination of size 4 taken from 8 possible units.&lt;/p&gt;
&lt;code&gt;&lt;pre&gt;
&amp;gt; maxn &amp;lt;- bigchoose(8, 4)
&amp;gt; decoder &amp;lt;- combinadic.decoder.factory(8, 4)
&amp;gt; counts &amp;lt;- numeric(5)
&amp;gt; names(counts) &amp;lt;- 0:4
&amp;gt; i &amp;lt;- 1
&amp;gt; while (i &amp;lt; (maxn + 1)) {
+     tmp &amp;lt;- test.statistic(decoder(i))
+     counts[tmp + 1] = counts[tmp + 1] + 1
+     i &amp;lt;- i + 1
+ }
&amp;gt; barplot(counts/as.numeric(maxn))
&lt;/pre&gt;&lt;/code&gt;
&lt;p&gt;(NB: I&amp;#8217;ve encountered strange results when creating lists/vectors of &amp;#8220;big integers,&amp;#8221; the result of encoding a combination. Use &lt;code&gt;for&lt;/code&gt; and &lt;code&gt;while&lt;/code&gt; loops instead.)&lt;/p&gt;

&lt;p&gt;&lt;img alt='Lady Tasting Tea distribution' src='/thoughts/2010-08-06-lady-tasting-tea.png' /&gt;&lt;/p&gt;

&lt;p&gt;The plot shows that while a guess by chance of one, two, or three cups are not unlikely, a random guess that results in zero or four correct cups would be extremely rare. While astute readers will notice that this result can be found analytically, it still serves as a simple demonstration of a permutation test. The test statistic employed in this example leads to an analytical result. Other test statistics may not be as simple to solve. Permutation tests using all possible combinations always generates an exact distribution for any test statistic.&lt;/p&gt;

&lt;p&gt;This code was originally written for inclusion in &lt;a href='/code'&gt;RItools&lt;/a&gt;. Subsequently, I decided that only rare cases does one need to generate the entire null distribution of the test statistic, and in most cases sampling from possible combinations is sufficient. Future posts will address approximate approaches to permutation tests.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href='/thoughts/2010-08-06-combinadics-in-R.R'&gt;Code for this post.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/1MVfJ--tJnU" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2010-08-06-combinadics-in-r.html</feedburner:origLink></entry>
 
 <entry>
   <title>Using xBalance with MatchIt</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/_0-6VbBAulM/2010-08-02-using-xbalance-with-matchit.html" />
   <updated>2010-08-02T00:00:00-05:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/using-xbalance-with-matchit</id>
   <content type="html">&lt;p&gt;In a &lt;a href='/thoughts/2010-07-30-using-optmatch-and-ritools-for-observational-studies.html'&gt;previous post&lt;/a&gt;, I demonstrated how to create a propensity score matching, test balance, and analyze the outcome variable using the &lt;code&gt;optmatch&lt;/code&gt; and &lt;code&gt;RItools&lt;/code&gt; packages. The same strategy can be used with other matching algorithms, for example the various methods included in the &lt;a href='http://gking.harvard.edu/matchit/'&gt;MatchIt package&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;ll use the same basic question and data from &lt;a href='/thoughts/2010-07-30-using-optmatch-and-ritools-for-observational-studies.html'&gt;my previous article&lt;/a&gt;. The &lt;code&gt;MatchIt&lt;/code&gt; package wraps &lt;code&gt;optmatch&lt;/code&gt; to provide its &amp;#8220;full&amp;#8221; and &amp;#8220;optimal&amp;#8221; matching methods, so I will the &amp;#8220;full&amp;#8221; option to maintain consistency with my previous article. The first step is loading the packages and the data:&lt;/p&gt;
&lt;code&gt;&lt;pre&gt;
&amp;gt; library(MatchIt)
&amp;gt; library(optmatch)
&amp;gt; library(RItools)
&amp;gt; data(nuclearplants)
&lt;/pre&gt;&lt;/code&gt;
&lt;p&gt;The interface for &lt;code&gt;MatchIt&lt;/code&gt; is similar to &lt;code&gt;optmatch&lt;/code&gt; for propensity score matches, except that the &lt;code&gt;matchit()&lt;/code&gt; function compresses the process into a single step of specifying the propensity formula and producing the match, while &lt;code&gt;fullmatch()&lt;/code&gt; allows a user to specify any number of distance matrices. In the end, the interface is fairly similar. As with the previous article, I match on a subset of the covariates.&lt;/p&gt;
&lt;code&gt;&lt;pre&gt;
&amp;gt; example.formula &amp;lt- formula(pr ~ t1 + t2 + cap)
&amp;gt; match.opt &amp;lt- fullmatch(
                      mdist(glm(example.formula, 
                                data = nuclearplants, 
                                family = binomial())))

&amp;gt; all.mit &amp;lt- matchit(example.formula, 
                          data = nuclearplants, 
                          method = "full")
&lt;/pre&gt;&lt;/code&gt;
&lt;p&gt;The &lt;code&gt;all.mit&lt;/code&gt; object contains (among other items) a vector indicating each object&amp;#8217;s matched set. For compatibility, save it as a factor:&lt;/p&gt;
&lt;code&gt;&lt;pre&gt;
&amp;gt; match.mit &amp;lt- as.factor(all.mit$subclass)
&lt;/pre&gt;&lt;/code&gt;
&lt;p&gt;Unsurprisingly, as &lt;code&gt;MatchIt&lt;/code&gt; uses &lt;code&gt;optmatch&lt;/code&gt; the two matches are identical.&lt;/p&gt;
&lt;code&gt;&lt;pre&gt;
&amp;gt; lapply(split(nuclearplants, match.opt), rownames)


$m.1
[1] "N" "Z" "a"

$m.10
[1] "I" "G"

$m.2
[1] "A" "B" "D" "V" "F" "b"

$m.5
[1] "U" "c"

$m.6
 [1] "H" "K" "L" "M" "C" "P" "R" "Y" "e" "f"

$m.8
[1] "J" "O" "Q" "S" "T" "E" "W" "X" "d"


&amp;gt; lapply(split(nuclearplants, match.mit), rownames)


$`1`
[1] "N" "Z" "a"

$`2`
[1] "I" "G"

$`3`
[1] "A" "B" "D" "V" "F" "b"

$`4`
[1] "U" "c"

$`5`
 [1] "H" "K" "L" "M" "C" "P" "R" "Y" "e" "f"

$`6`
[1] "J" "O" "Q" "S" "T" "E" "W" "X" "d"
&lt;/pre&gt;&lt;/code&gt;
&lt;p&gt;Now that I have a factor listing the groups, I can run &lt;code&gt;xBalance&lt;/code&gt; to assess the balance properties of the match:&lt;/p&gt;
&lt;code&gt;&lt;pre&gt;
&amp;gt; xBalance(pr ~ . - (cost + pr), 
              data = nuclearplants, 
              strata = match.mit, 
              report = "chisquare.test")


---Overall Test---
      chisquare df p.value
strat       5.1  9    0.82
---
Signif. codes:  0 ‘***’ 0.001 ‘** ’ 0.01 ‘*  ’ 0.05 ‘.  ’ 0.1 ‘   ’ 1 
&lt;/pre&gt;&lt;/code&gt;
&lt;p&gt;With a reported p-value of 0.82, there is little evidence against the null of balance, so we would fail to reject it.&lt;/p&gt;

&lt;p&gt;This walk through used the the &amp;#8220;full&amp;#8221; method for &lt;code&gt;matchit()&lt;/code&gt;, but the same techniques will work with other &lt;code&gt;matchit()&lt;/code&gt; methods, such as coarsened exact matching or nearest neighbor. If you are reasonably confident that you wish to use optimal matching, you should consider using the &lt;code&gt;optmatch&lt;/code&gt; package directly, instead of using it through &lt;code&gt;MatchIt&lt;/code&gt;. In future posts I will be demonstrating &lt;a href='/thoughts/2010-11-12-speeding-up-optmatch-while-improving-match-quality.html'&gt;important techniques to speed up the matching process&lt;/a&gt; (which can be a great benefit to large datasets) and how you can create matches that incorporate more subject matter information than can be included in a simple logit model.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/_0-6VbBAulM" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2010-08-02-using-xbalance-with-matchit.html</feedburner:origLink></entry>
 
 <entry>
   <title>Using Optmatch and RItools for Observational Studies</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/MdiKEts46gU/2010-07-30-using-optmatch-and-ritools-for-observational-studies.html" />
   <updated>2010-07-30T00:00:00-05:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/using-optmatch-and-ritools-for-observational-studies</id>
   <content type="html">&lt;p&gt;I am a contributor to the &lt;a href='http://cran.r-project.org/web/packages/optmatch/index.html'&gt;optmatch&lt;/a&gt; and the &lt;a href='http://cran.r-project.org/web/packages/RItools/index.html'&gt;RItools&lt;/a&gt; packages for &lt;a href='http://www.r-project.org/'&gt;R&lt;/a&gt;. These two packages are separate, but complimentary. Both packages provide tools for adjusting observational data to exhibit &amp;#8220;balance&amp;#8221; on observed covariates. In a randomized control trial, treatment and control groups should have identical distributions over all covariates, observed and unobserved. Matching provides a method to create smaller groups in an observational study that are similarly balanced. Balance can be quantified so that alternative matches can be compared. When an acceptable match has been found, analysis can then proceed as if nature provided a blocked, randomized study.&lt;/p&gt;

&lt;h2 id='data'&gt;Data&lt;/h2&gt;

&lt;p&gt;Both &lt;code&gt;optmatch&lt;/code&gt; and &lt;code&gt;RItools&lt;/code&gt; use a canonical dataset consisting of nuclear plants. From &lt;code&gt;help(nuclearplants)&lt;/code&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The data relate to the construction of 32 light water reactor (LWR) plants constructed in the U.S.A in the late 1960&amp;#8217;s and early 1970&amp;#8217;s. The data was collected with the aim of predicting the cost of construction of further LWR plants. 6 of the power plants had partial turnkey guarantees and it is possible that, for these plants, some manufacturers&amp;#8217; subsidies may be hidden in the quoted capital costs.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;With these data, we may wish to know if certain variables lead to higher or lower construction costs. One particular variable is &lt;code&gt;pr&lt;/code&gt;, an indicator if a previous lightwater reactor at the same location was present. Such an installation might significantly increase or decrease costs. The rest of this document uses matching and balance testing to provide an answer to just that question.&lt;/p&gt;

&lt;p&gt;I start by loading the packages and the data:&lt;/p&gt;
&lt;code&gt;
&lt;pre&gt;
&amp;gt; library(optmatch)
&amp;gt; library(RItools)
&amp;gt; data(nuclearplants)
&lt;/pre&gt;
&lt;/code&gt;
&lt;p&gt;Before getting into the matching process, lets take a quick look at the balance of the data on all variables, except &lt;code&gt;cost&lt;/code&gt; and &lt;code&gt;pr&lt;/code&gt;, the LWR indicator. &lt;code&gt;xBalance&lt;/code&gt;, among other tests, provides an omnibus balance test across any number of variables. This test compares the null hypothesis of &amp;#8220;the data are balanced&amp;#8221; against the alternative hypothesis of a lack of balance, where balance is what we would expect in a randomized trial with the same sample size. The test follows a chi-squared distribution, which &lt;code&gt;xBalance&lt;/code&gt; will happily report:&lt;/p&gt;
&lt;code&gt;
&lt;pre&gt;
&amp;gt; xBalance(pr ~ . - (cost + pr), 
  data = nuclearplants, 
  report = c("chisquare.test"))

---Overall Test---
        chisquare df p.value
unstrat      12.4  9   0.192
---
Signif. codes:  0 ‘***’ 0.001 ‘** ’ 0.01 ‘*  ’ 0.05 ‘.  ’ 0.1 ‘   ’ 1 
&lt;/pre&gt;
&lt;/code&gt;
&lt;p&gt;With a reported p-value of 0.19, the balance of this sample is not terrible (by conventional levels of hypothesis testing), but we might prefer something closer to 1. While there is no &lt;em&gt;a priori&lt;/em&gt; p-value we should prefer, experience indicates that p-values in the neighborhood of .5 are achievable and mimic true randomized designs (though &lt;a href='http://www.polisci.uiowa.edu/polmeth/papers/polmeth2010Nielsenupdated.pdf'&gt;optimal balance levels are a subject of ongoing research&lt;/a&gt;).&lt;/p&gt;

&lt;h2 id='matching'&gt;Matching&lt;/h2&gt;

&lt;p&gt;A full discussion of matching procedures is beyond the scope of this document (see &lt;a href='http://www.springer.com/statistics/statistical+theory+and+methods/book/978-1-4419-1212-1'&gt;Rosenbaum (2010)&lt;/a&gt; for a more comprehensive discussion). In brief, matching attempts to group units with similar covariates, as if they had been blocked in a randomized experiment. The optimal match would be two units identical on every variable, observed and unobserved. In most datasets, no two units will be identical on all observed covariates. Instead, we can use a measure that summarizes all covariates and match based on the summary. The propensity score, the probability of receiving treatment given the observed covariates, has been a popular summary measure (for more on the theory, see &lt;a href='http://www.jstor.org/pss/2335942'&gt;Rosenbaum and Rubin (1983)&lt;/a&gt;). I&amp;#8217;ll use a logistic regression to estimate the propensity scores of my observations, using a subset of the available variables:&lt;/p&gt;
&lt;code&gt;
&lt;pre&gt;
&amp;gt; model &amp;lt;- glm(pr ~ t1 + t2 + cap, 
  family = binomial(), data = nuclearplants)
&lt;/pre&gt;
&lt;/code&gt;
&lt;p&gt;With a propensity model, &lt;code&gt;optmatch&lt;/code&gt; provides several functions for computing matched sets of observations. The &lt;code&gt;fullmatch&lt;/code&gt; function takes a treatment by control matrix containing distances between observations and returns a factor indicating the set membership, if any, of all observations. Computing the distance matrix is simple using the &lt;code&gt;mdist&lt;/code&gt; function. This function takes a linear model, a function, or a formula to produce distances based on propensity models, aribtrary user functions, or Mahalanobis distances between observations. We&amp;#8217;ll use the propensity model. See the help page for &lt;code&gt;mdist&lt;/code&gt; for the other alternatives. &lt;pre&gt;
m1 &amp;lt;- fullmatch(mdist(model))
&lt;/pre&gt;&lt;/p&gt;

&lt;p&gt;We can compare the first match with a second, in which a caliper is placed on the date variable. This will constrain the matching algorithm, disallowing matches on observations with widely differing date values, even if the over all propensity scores are similar. Calipers can lead to poorer matches on observed variables but provide a method by which researchers can include subject matter information in the matching process. For example, if the cost of construction decreased over time due to increased efficiency in construction practices.&lt;/p&gt;
&lt;code&gt;
&lt;pre&gt;
&amp;gt; m2 &amp;lt;- fullmatch(mdist(model) + 
  caliper(0.25, pr ~ date, data = nuclearplants))
&lt;/pre&gt;
&lt;/code&gt;
&lt;h2 id='balance_testing'&gt;Balance Testing&lt;/h2&gt;

&lt;p&gt;With two possible matches, do either produce adequate balance? As noted previously, the &lt;code&gt;RItools&lt;/code&gt; package provides a method of quantifying balance in a matched set. The method (discussed in detail in &lt;a href='http://projecteuclid.org/DPubS?service=UI&amp;amp;version=1.0&amp;amp;verb=Display&amp;amp;handle=euclid.ss/1219339114'&gt;Hansen and Bowers (2008)&lt;/a&gt;) compares treatment and control units within each block on a difference of means for each variable. Combining the these differences follows a chi-squared distribution. We can compare all the matches at the same time, along with the raw data (see the &lt;code&gt;strata&lt;/code&gt; argument).&lt;/p&gt;
&lt;code&gt;
&lt;pre&gt;
&amp;gt; (allbalance &amp;lt;- xBalance(pr ~ . - (cost + pr), 
    data = nuclearplants, 
    report = c("chisquare.test", "std.diffs"), 
    strata = data.frame(original = factor("none"), m1, m2)))

      strata original                m1                m2         
      stat   std.diff          std.diff          std.diff         
vars                                                              
date         -0.11468          -0.23368          0.06902          
t1           0.10630           -0.01666          0.30232          
t2           1.03269  *        0.27487           0.65635          
cap          0.34012           -0.03631          0.24860          
ne           -0.16312          -0.47647          -0.13433         
ct           -0.30797          -0.65565          -0.78858 *       
bw           0.04511           0.29570           -0.20169         
cum.n        -0.09760          -0.00887          -0.16724         
pt           0.41382           0.60274           0.00000          
---Overall Test---
         chisquare df p.value
original     12.39  9   0.192
m1            5.15  9   0.821
m2           10.07  9   0.345
---
Signif. codes:  0 ‘***’ 0.001 ‘** ’ 0.01 ‘*  ’ 0.05 ‘.  ’ 0.1 ‘   ’ 1 
&lt;/pre&gt;
&lt;/code&gt;
&lt;p&gt;Both matches provide good balance. With a value of 0.821 we might be tempted to prefer the unconstrained match; however, with a p-value of 0.345, the match with a caliper also provides reasonable assurances of balance. As either provides plausible balance, researchers might choose to concentrate on substantively important covariates. When &lt;code&gt;xBalance&lt;/code&gt; reports &amp;#8220;std.diffs&amp;#8221; (as above), we can plot the result to get a visual picture of balance on each covariate.&lt;/p&gt;
&lt;code&gt;
&lt;pre&gt;
&amp;gt; plot(allbalance)
&lt;/pre&gt;
&lt;/code&gt;
&lt;p&gt;&lt;img alt='Covariate balance' src='/thoughts/per-variable-balance.png' /&gt;&lt;/p&gt;

&lt;h2 id='analysis'&gt;Analysis&lt;/h2&gt;

&lt;p&gt;Since we now have data that approximates a randomized experiment, we can use the same techniques to analyze this data as any blocked randomized experiment. For example, one-way ANOVA using &lt;code&gt;pr&lt;/code&gt; as the treatment factor and &lt;code&gt;m1&lt;/code&gt; as the blocking factor.&lt;/p&gt;
&lt;code&gt;
&lt;pre&gt;
&amp;gt; anova(lm(nuclearplants$cost ~ nuclearplants$pr + m1))
Analysis of Variance Table

Response: nuclearplants$cost
                 Df Sum Sq Mean Sq F value Pr(&amp;gt;F)
nuclearplants$pr  1   9037    9037  0.3394 0.5654
m1                5 222410   44482  1.6704 0.1785
Residuals        25 665726   26629               
&lt;/pre&gt;
&lt;/code&gt;
&lt;p&gt;Under conventional levels, we do not observe either the treatment or the blocking factor reach statistical significance. So we can conclude that existing lightwater reactors do not have an effect on construction costs that we can differentiate from chance.&lt;/p&gt;

&lt;h2 id='conclusion'&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;In the analysis, I chose one of two plausible matches. It so happened that I selected the match with the larger p-value. Does this indicate that we should select the match with the highest p-value, as it most closely approximates a randomly allocated treatment? I would caution against that conclusion. Within the set of matches that are plausibly balanced, it is difficult to argue that one match is truly better than another. While in expectation, randomized treatments are perfectly balanced, in pratice, small deviations should be expected (with fewer deviations in larger experimental populations).&lt;/p&gt;

&lt;p&gt;In short, don&amp;#8217;t sweat the small stuff. Find a reasonable match and go with it. In fact, you may find that matches with lower p-values provide interesting substantive results. Here is an analysis of the second match, which included a caliper on the date of construction:&lt;/p&gt;
&lt;code&gt;
&lt;pre&gt;
&amp;gt; anova(lm(nuclearplants$cost ~ nuclearplants$pr + m2))
Analysis of Variance Table

Response: nuclearplants$cost
                 Df Sum Sq Mean Sq F value  Pr(&amp;gt;F)  
nuclearplants$pr  1   4185    4185  0.2035 0.65654  
m2                8 396199   49525  2.4080 0.05098 .
Residuals        21 431905   20567                  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
&lt;/pre&gt;
&lt;/code&gt;
&lt;p&gt;This matching indicates a significant blocking effect, which suggests that limiting matches by date may have something to do with the resulting costs. If we had blindly pursued higher p-value matches, we might not have observed this interesting result.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/MdiKEts46gU" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2010-07-30-using-optmatch-and-ritools-for-observational-studies.html</feedburner:origLink></entry>
 
 <entry>
   <title>Unicoding</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/De3KEFxzPVQ/2010-07-07-unicoding.html" />
   <updated>2010-07-07T00:00:00-05:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/unicoding</id>
   <content type="html">&lt;p&gt;A recent &lt;a href='http://groups.google.com/group/clojure/browse_thread/thread/e826fc303e440b7c/'&gt;thread on the Clojure discussion group&lt;/a&gt;, which linked back to an &lt;a href='http://groups.google.com/group/clojure/browse_thread/thread/66ff0b89229be894/'&gt;earlier thread&lt;/a&gt;, reminded me of my favorite snippet of code I&amp;#8217;ve ever written:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;(let-&amp;gt; ★ 2 (+ ★  3) (- 10 ★ ) (map #(* ★  %) [2 3 4])) &lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This code &amp;#8220;threads&amp;#8221; the star value (&lt;code&gt;★&lt;/code&gt;) through the code, first binding it to 2, adding 3 and rebinding the result, subtracting it from 10 and rebinding the result, and finally multipling it across the vector &lt;code&gt;[2 3 4]&lt;/code&gt;. While the computation is trivial, I love how the non-alphanumeric character jumps out of the code, making its presence know. While I wouldn&amp;#8217;t suggest littering every line of code with characters not on your keyboard, perhaps judicious use of unicode glyphs would make programs more readable.&lt;/p&gt;

&lt;p&gt;Non-ASCII programming has never really caught fire. &lt;a href='http://en.wikipedia.org/wiki/APL_%28programming_language%29'&gt;APL&lt;/a&gt; dates from the earliest IBM mainframes, and has several successor languages available for modern hardware, but has never seen a great deal popularity. Note, however, that APL &lt;em&gt;required&lt;/em&gt; the use of a specialized character set. Optional use might prove more appealing. &lt;a href='http://en.wikipedia.org/wiki/Color_Forth'&gt;colorForth&lt;/a&gt; used different colored text to provide additional semantic information. I could see the value in making the source code more readable, but neither colorForth nor Forth have seen a wide interest. For both APL and colorForth, the reasons for a lack of success are probably unrelated to their use of characters or color, but rather to the fact that neither are in the ALGOL/C/Java family of block languages, which dominate modern programming practices.&lt;/p&gt;

&lt;p&gt;Many languages today allow for Unicode characters, including Java. Perhaps the biggest barrier to using unusual characters in source code is the difficulty of typing them out. To create the snippet above, I had to drag the star glyph from a palette into my text editor. For programmers that avoid the mouse (e.g. me), this could be a show stopper. At the same time, it should be possible to find or define keyboard equivalents for some of the more useful glyphs. I&amp;#8217;m going to experiment with their use for variables that I might otherwise give uninformative names, such as &lt;code&gt;tmp&lt;/code&gt; or &lt;code&gt;x&lt;/code&gt;, or in places that require a little visual pizazz to make the code clearer. It should be a fun experiment.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/De3KEFxzPVQ" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2010-07-07-unicoding.html</feedburner:origLink></entry>
 
 <entry>
   <title>Scoping Bugs</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/S7xSx6AaSZ8/2010-06-23-scoping-bugs.html" />
   <updated>2010-06-23T00:00:00-05:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/scoping-bugs</id>
   <content type="html">&lt;p&gt;I ran a across a strange bug in &lt;a href='http://r-project.org'&gt;R&lt;/a&gt; recently. Like &lt;a href='http://common-lisp.net/'&gt;all&lt;/a&gt; &lt;a href='http://schemers.org'&gt;the&lt;/a&gt; &lt;a href='http://clojure.org'&gt;best&lt;/a&gt; &lt;a href='http://www.ruby-lang.org'&gt;programming&lt;/a&gt; &lt;a href='http://racket-lang.org/'&gt;languages&lt;/a&gt;, R treats functions as first class objects. That is to say that functions can be passed as arguments and return values from functions, named as variables, and, while not part of the strict definition of first class functions, maintain copies of the creating environment. This last point is known as &lt;a href='http://en.wikipedia.org/wiki/Scope_(programming%29'&gt;lexical (or static) scope&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id='on_lexical_scoping'&gt;On lexical scoping&lt;/h2&gt;

&lt;p&gt;Lexical scoping was a major innovation to making programs simpler to understand. With lexical scoping, variable names are defined &amp;#8220;locally&amp;#8221; &amp;#8212; that is, if a function is working a variable &lt;code&gt;foo&lt;/code&gt;, that variable cannot be written over by a caller having its own variable &lt;code&gt;foo.&lt;/code&gt; Here is an example that illustrates the property, using R. What does the last line return?&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;my.variable &amp;lt;- 42
f &amp;lt;- function() {
  return(my.variable)
}

g &amp;lt;- function(my.variable) {
  return(f())  
}

g(100)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you answered 42, you&amp;#8217;d be correct. The two uses of the name &lt;code&gt;my.variable&lt;/code&gt; would cause a dynamically scoped program to return 100. Under lexical scoping, these are distinct variables, defined by the different scopes of the &lt;code&gt;f&lt;/code&gt; and &lt;code&gt;g&lt;/code&gt; functions. Under dynamic scoping, applying &lt;code&gt;g&lt;/code&gt; to 100 would lead &lt;code&gt;f&lt;/code&gt; to look up &lt;code&gt;my.variable&lt;/code&gt; and find the value of 100. While this example is contrived, in programs of any size, lexical scoping (at least as the default) prevents different outer scopes from changing the behavior of inner functions. (See &lt;a href='http://richhickey.github.com/clojure/clojure.core-api.html#clojure.core/binding'&gt;Clojure&amp;#8217;s &lt;code&gt;binding&lt;/code&gt; form&lt;/a&gt; for an example of useful dynamic scope on demand.)&lt;/p&gt;

&lt;p&gt;So lets take this a step further, and create some functions that save their lexical environment. To show the correct behavior, here is a small Scheme program that creates 5 functions, each of which returns its index when called (&lt;code&gt;lambda&lt;/code&gt; means &amp;#8220;create a new function&amp;#8221;):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;(define fns (map (lambda (x) (lambda () x)) &amp;#39;(1 2 3 4 5)))
(map apply fns)
; returns (1 2 3 4 5)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now here is the same thing in R:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;fns &amp;lt;- lapply(1:5, function(i) { function() { i } })
lapply(fns, function(f) { f() })
# returns a list of (5, 5, 5, 5, 5)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Whoa! What is going on?&lt;/p&gt;

&lt;h2 id='rs_scoping_bug'&gt;R&amp;#8217;s scoping bug&lt;/h2&gt;

&lt;p&gt;Clearly, something is amiss with R&amp;#8217;s scoping rules. To be honest, I&amp;#8217;m not entirely sure what (though I will unveil a work around). I had originally written this code in an imperative &lt;code&gt;for&lt;/code&gt; loop, and my immediate thought was that &lt;code&gt;R&lt;/code&gt; was bitten by a classic JavaScript bug. JavaScript has a strange quirk where by loop indices are not considered local to the scope, and are rewritten during each iteration. A simple workaround is to nest the loop code in a function and immediately call it:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/* Bad version */
fns = new Array(5)
for (i in [0,1,2,3,4]) {
  fns[i] = function() { return(i); };  
}
vals = new Array(5)
for (j in [0,1,2,3,4]) {
  vals[j] = fns[j]()
}
/* vals = [4,4,4,4,4] */

/* Good version */
fns = new Array(5)
for (i in [0,1,2,3,4]) {
  (function(i) { fns[i] = function() { return(i) }})(i);
}
vals = new Array(5)
for (j in [0,1,2,3,4]) {
  vals[j] = fns[j]()
}
/* vals = [0,1,2,3,4] */&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;R does not exactly suffer from this issue, as the JavaScript work-around does not, well, work-around the bug:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;fns &amp;lt;- vector(&amp;quot;list&amp;quot;, 5)
for (i in 1:5) {
  fns[[i]] &amp;lt;- (function(i) { return(function() { i })})(i)
}
lapply(fns, function(f) { f() })
# returns c(5,5,5,5,5)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;After poking and prodding, I found a (bizarre) solution in the same vein:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;fmaker &amp;lt;- function(i) { function(j) { i }}

fns &amp;lt;- vector(&amp;quot;list&amp;quot;, 5)
for (i in 1:5) {
  fns[[i]] &amp;lt;- fmaker(i)
  fns[[i]](0)
}
lapply(fns, function(f) { f() })
# returns c(1,2,3,4,5)  &lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The new version is significantly more verbose. The critical aspects are defining a maker function (you can&amp;#8217;t just in-line that code) and applying the function to some dummy argument. Apparently, these are the necessary genuflections to &lt;code&gt;R&lt;/code&gt; to make the calling environment sticky.&lt;/p&gt;

&lt;h2 id='imperative_languages'&gt;Imperative languages&lt;/h2&gt;

&lt;p&gt;There are several reasons why both &lt;code&gt;R&lt;/code&gt; and JavaScript could be getting these scoping rules wrong. First, while both allow first class functions, they are not as frequently used as in some other languages. I may very well be the first user to test R on its ability to properly scope functions created in loops.&lt;/p&gt;

&lt;p&gt;A second possibility may be more fundamental: R and JavaScript are imperative, C-style block languages. One writes a programs as a series of declarative statements: first do this, next do this, now do this. Languages that treat programs as transformations of data (and here I&amp;#8217;m referring to Lisps specifically, as I have the most exposure to this family) do a very good job with scoping rules. In fact, writing your own Lisp is a fairly simple process, covered in &lt;a href='http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-25.html#%_chap_4'&gt;Chapter 4 of SICP&lt;/a&gt;, and getting environments right does not seem an especially difficult task.&lt;/p&gt;

&lt;p&gt;The apparent difficulties in getting scope correct are even greater impetus for doing work in and on &lt;a href='http://incanter.org/'&gt;Incanter&lt;/a&gt;. Combining R&amp;#8217;s wealth of statistical tools with Clojure&amp;#8217;s proper scoping rules would be an ideal combination. Perhaps the best work around for R bugs is to write the program in Clojure?&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The R version used in this post was 2.11.0 (2010-04-22).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Update: &lt;a href='/thoughts/2011-02-06-peeking-inside-r-functions.html'&gt;I found another, more elegant, workaround.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/S7xSx6AaSZ8" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2010-06-23-scoping-bugs.html</feedburner:origLink></entry>
 
 <entry>
   <title>Drinking the Homebrew</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/XopV7r2kXb8/2010-06-17-drinking-the-homebrew.html" />
   <updated>2010-06-17T00:00:00-05:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/drinking-the-homebrew</id>
   <content type="html">&lt;p&gt;After several years of using &lt;a href="http://www.macports.org"&gt;MacPorts&lt;/a&gt;, I&amp;#8217;m giving it the heave-ho. So long MacPorts. I&amp;#8217;ve enjoyed our time together, but I&amp;#8217;m moving on. From now on, I&amp;#8217;m drinking the &lt;a href="http://mxcl.github.com/homebrew/"&gt;Homebrew&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Don&amp;#8217;t get me wrong: MacPorts has been a valuable and reliable tool. When Intel Macs were new, MacPorts saved uncountable hours manually building packages. When I was building different open source packages on a weekly basis, MacPorts performed like a champ. But times have changed, and MacPorts has gone from a time saver to a time waster.&lt;/p&gt;
&lt;p&gt;But things have changed. I&amp;#8217;m no longer updating all my builds every few days. I use fewer source builds and use more built packages (all of which have supported Intel Macs for years now). Now, I use MacPorts to quickly build a package, instead of hunting for a binary. But MacPorts is not geared towards quick builds. Every time I need a new package, MacPorts insists on rebuilding &lt;tt&gt;gcc&lt;/tt&gt;, X11 or both. This is ridiculous. These packages are solid, battle tested software. When I worked as an intern in the Compiler Quality Assurance team at Apple, meetings were held debating whether or not to fix bugs on the merits that fixing bugs would cause work-arounds in the field to break. Rebuilding these packages every minor version number is inexcusable.&lt;/p&gt;
&lt;p&gt;Moreover, these packages are part of the standard OS X build. Why duplicate binaries I already have? Enter &lt;tt&gt;brew&lt;/tt&gt;, a package manager that works with OS X, not against it. Like MacPorts, Homebrew builds from source. Unlike MacPorts, Homebrew relies on software distributed as part of the standard OS X (with Xcode) menu, including &lt;tt&gt;gcc&lt;/tt&gt;, X11, Python, and Ruby. What a time saver!&lt;/p&gt;
&lt;p&gt;At the moment, Homebrew&amp;#8217;s menu of packages is not as large as MacPorts, but it is growing everyday, and the infrastructure is based on &lt;tt&gt;git&lt;/tt&gt;, allowing anyone to fork the code base for changes and improvements. If the canonical branch is not working for you, someone else well surely provide a better, faster, stronger fork. Creating a &amp;#8220;Formula&amp;#8221; for a new package is simpler than the equivalent MacPort process, and I expect the difference in the number of packages to shrink quickly.&lt;/p&gt;
&lt;p&gt;Give Homebrew a shot. It is easy to install and uninstall. You have nothing to lose; at least not the time it takes to build &lt;tt&gt;gcc&lt;/tt&gt;.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/XopV7r2kXb8" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2010-06-17-drinking-the-homebrew.html</feedburner:origLink></entry>
 
 <entry>
   <title>State Politics and Policy Conference 2010</title>
   <link href="http://feedproxy.google.com/~r/MarkMFredrickson/~3/tEGDh7f6VaQ/2010-06-07-sppc-2010.html" />
   <updated>2010-06-07T00:00:00-05:00</updated>
   <id>http://www.markmfredrickson.com/thoughts/sppc-2010</id>
   <content type="html">&lt;p&gt;I&amp;#8217;ve returned from the 10th annual &lt;a href="http://www.sppc2010.org/"&gt;State Politics and Policy Conference&lt;/a&gt;, the yearly conference of APSA&amp;#8217;s &amp;#8220;State Politics&amp;#8221; subsection. This year&amp;#8217;s conference was located a mere 1.5 hours away at the Springfield campus of the University of Illinois. My attendance this year was a crime of opportunity. In addition to the convenient locale, the conference had no attendance fee, and I was able to save on accommodations by camping at &lt;a href="http://dnr.state.il.us/lands/landmgt/parks/r4/sangch.htm"&gt;Sangchris State Park&lt;/a&gt;. I nearly had the camp ground to myself, which is a nice feature after a busy conference.&lt;/p&gt;
&lt;p&gt;I spent most of the conference shopping for data sources for a few projects that are looking to exploit variation in state-level institutions. I&amp;#8217;d rate my results as &amp;#8220;B+&amp;#8221;. I did not find a one-stop-shop of state level data, but I did find several useful discrete data sources (for example, &lt;a href="https://webshare.uchicago.edu/users/bshor/Public/boris%20shor%202009%20-%20common%20space%20ideal%20points.pdf?uniq=soy829"&gt;common space ideal points for state legislators&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.sas.upenn.edu/~marcmere/"&gt;Marc Meredith&lt;/a&gt; contributed my favorite paper of the conference: &lt;a href="http://www.sas.upenn.edu/~marcmere/workingpapers/FriendsAndNeighbors.pdf"&gt;Exploiting Friends and Neighbors: An Instrumental Variable Approach to Estimating Coattail Effects&lt;/a&gt;. Meredith used home counties of governors as an instrumental variable for measuring coattail effects to down ballot races. The logic is fairly straightforward. We would expect candidates for governor tend to do better in their home county, but down ballot candidates of the same party should get no gain from where the governor lives, expect for coattail effects. Its a clever design, and Meredith finds that coattails effects have traditionally been over-estimated. I&amp;#8217;m looking forward to seeing this paper in published form, when Meredith is able to smooth out the remaining wrinkles (of which he is largely aware &lt;del&gt;-&lt;/del&gt; his conclusion slide was almost an exact duplicate of my notes and questions during the talk).&lt;/p&gt;
&lt;p&gt;As a final note, the conference organizers did a fantastic job with this conference. Next year&amp;#8217;s conference in Dartmouth will have large shoes to fill. See you next year in Hanover, NH.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/MarkMFredrickson/~4/tEGDh7f6VaQ" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.markmfredrickson.com/thoughts/2010-06-07-sppc-2010.html</feedburner:origLink></entry>
 
 
</feed>
