<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:blogger="http://schemas.google.com/blogger/2008" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" version="2.0"><channel><atom:id>tag:blogger.com,1999:blog-232777626311457607</atom:id><lastBuildDate>Thu, 23 May 2013 01:23:19 +0000</lastBuildDate><category>observed</category><category>pin pen</category><category>media</category><category>animals</category><category>science journalism</category><category>Etc.</category><category>vocal fry</category><category>phonology</category><category>display</category><category>south</category><category>knitr</category><category>AAVE</category><category>monophthongization</category><category>phonetics</category><category>open data</category><category>southwest</category><category>Chris Matthews</category><category>ggplot2</category><category>ay</category><category>spoonerism</category><category>n-word</category><category>tonogenesis</category><category>jags</category><category>perception</category><category>lost tv media</category><category>l-vocalization</category><category>Louisiana</category><category>first post</category><category>oy</category><category>intelligence</category><category>irene</category><category>mystery</category><category>video</category><category>relaunch</category><category>natural misunderstanding</category><category>Canadian Raising</category><category>Hauser</category><category>rstudio</category><category>probability</category><category>cognition</category><category>linguists</category><category>overheard</category><category>science</category><category>humor</category><category>animal cognition</category><category>individuals</category><category>visualization</category><category>l</category><category>TV</category><category>sound change</category><category>radio</category><category>linguistics</category><category>fill-feel</category><category>observations</category><category>tool</category><category>peeving</category><category>morphology</category><category>graphics</category><category>raising</category><category>language</category><category>Chomsky</category><category>usage</category><category>nature-nurture</category><category>question</category><category>language change</category><category>dialect</category><category>portmanteau</category><category>praat</category><category>murders</category><category>misc.</category><category>positive anymore</category><category>philadelphia</category><category>vowels</category><category>plotting</category><category>design</category><category>language attitudes</category><category>race</category><category>nyc</category><category>data</category><category>short-a</category><category>linguistics media</category><category>R</category><category>merger</category><title>Val Systems</title><description /><link>http://val-systems.blogspot.com/</link><managingEditor>noreply@blogger.com (Josef Fruehwald)</managingEditor><generator>Blogger</generator><openSearch:totalResults>98</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/ValSystems" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="valsystems" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-7926829592362623959</guid><pubDate>Wed, 22 May 2013 21:58:00 +0000</pubDate><atom:updated>2013-05-22T18:48:40.612-04:00</atom:updated><title>Socioeconomic Status and College Enrollment</title><description>There was &lt;a href="http://thesocietypages.org/socimages/2013/05/16/social-class-and-the-college-choices-of-high-school-valedictorians/"&gt;an interesting post over at Sociological Images&lt;/a&gt;&amp;nbsp;about the relationship between socioeconomic status and high school students likelihood of even applying to highly selective colleges.&lt;br /&gt;
The research cited there by&amp;nbsp;&lt;a href="http://www.rti.org/newsroom/experts.cfm?obj=1E2C92E5-5056-B100-313ABD95257676D7"&gt;Alexandria Radford&lt;/a&gt;&amp;nbsp;&lt;a href="http://www.insidehighered.com/sites/default/server_files/files/Final_Radford_AERA_Proposal(1).pdf"&gt;focused on high school valedictorians&lt;/a&gt;, which I think is a pretty cool way to isolate the effect of socioeconomic class. Sociological Images posted this figure from the paper (abstract? proposal?) which shows what proportion of valedictorians wind up in each category with regards to highly selective colleges&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;They applied to at least one highly selective school.&lt;/li&gt;
&lt;li&gt;They were accepted to at least one highly selective school.&lt;/li&gt;
&lt;li&gt;They enrolled in a highly selective school.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-uYPXx5qfGOE/UZ03aUD0T8I/AAAAAAAABOg/twDAlaFkK_0/s1600/21.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="433" src="http://2.bp.blogspot.com/-uYPXx5qfGOE/UZ03aUD0T8I/AAAAAAAABOg/twDAlaFkK_0/s640/21.jpg" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Radford attributes the large gap between High SES valedictorians and Middle and &amp;nbsp;Low SES valedictorians to reduced familiarity with highly selective schools as institutions and their admissions process. That seems to ring true to me, but I was curious how SES affects each step in this process. The graph above doesn't really tell you what proportion of Low SES students &lt;i&gt;who actually applied&lt;/i&gt;&amp;nbsp;to highly selective schools were admitted, or at least I don't believe it does. It says 42% of Low SES valedictorians were admitted to a highly selective school, but you have to apply to be admitted, and only 50% of Low SES valedictorians applied in the first place. That means that 42%/50% = 84% of Low SES students who applied were admitted. The same goes for enrollment. You have to be admitted to be enrolled, meaning the 32%/42%=76% of Low SES students who were admitted to a highly selective school also enrolled. Here's a rejiggered plot I made based on the data in the plot above.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-wOE25EYxLGM/UZ08up9tivI/AAAAAAAABOw/ElMQlW_iuTQ/s1600/ses_college.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="336" src="http://3.bp.blogspot.com/-wOE25EYxLGM/UZ08up9tivI/AAAAAAAABOw/ElMQlW_iuTQ/s640/ses_college.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div&gt;
So, it looks like to highly selective colleges' credit, the proportion of high school valedictorians who sent in applications who were then admitted is just about flat across the socioeconomic scale. 85% of High SES valedictorians who applied were admitted, and 84% of Low SES valedictorians who applied were admitted. Things look really different for enrollment though. High and Low SES students have very similar enrollment yield, (79% and 76%, respectively), but Middle SES students have much lower enrollment yield (59.6%).&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
I can only speculate why Middle SES students have such a lower enrollment yield than Low and High SES students. One possibility is that their family's income is just a little bit too high to qualify for needs-based scholarships. Another might be that the perceived cost-benefit ratio for Middle SES students is higher than for Low SES students.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
However, I think this plot lends credence to Radford's hypothesis that reduced information and familiarity leads to reduced Low SES application rates. For Low SES valedictorians, their application rates are totally disproportionate to the probability that they will be accepted to the highly selective schools, and to the probability that they will actually attend those schools if they are accepted.&lt;br /&gt;
&lt;br /&gt;
&lt;hr&gt;
Update: I meant to include the R code for the data and plot.&lt;/div&gt;
&lt;script src="https://gist.github.com/JoFrhwld/5631064.js"&gt;&lt;/script&gt;
&lt;br&gt;
&lt;hr&gt;
Update 2:&lt;br&gt;
Of course, admission is not even half or a quarter of the battle. I don't know much about the The High 
School Valedictorian Project mentioned in the study, but I hope they keep tabs on the Low SES students admitted to the highly selective schools. I know the culture shock I experienced going from a good diocesan high school to an Ivy League university was pretty huge, but I don't know if it was disproportionately huge than it would have been going from any old high school to any old college, or what effect it might have on academic performance.</description><link>http://val-systems.blogspot.com/2013/05/socioeconomic-status-and-college.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/-uYPXx5qfGOE/UZ03aUD0T8I/AAAAAAAABOg/twDAlaFkK_0/s72-c/21.jpg" height="72" width="72" /><thr:total>3</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-1917053567598250188</guid><pubDate>Tue, 21 May 2013 23:50:00 +0000</pubDate><atom:updated>2013-05-22T14:32:19.459-04:00</atom:updated><title>I Support Inclusive Scouting</title><description>&lt;a href="http://1.bp.blogspot.com/-kW41T5lLhVI/UZwIBefrcdI/AAAAAAAABOQ/4lGPHV-Y_SQ/s1600/tumblr_m8wqmcihSM1rbklpzo1_1280.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="240" src="http://1.bp.blogspot.com/-kW41T5lLhVI/UZwIBefrcdI/AAAAAAAABOQ/4lGPHV-Y_SQ/s320/tumblr_m8wqmcihSM1rbklpzo1_1280.jpg" width="320" /&gt;&lt;/a&gt;This week, the Boy Scouts of America's National Council will be meeting to vote on whether the Boy Scouts should include the following statement in its membership policy for youths.
&lt;br /&gt;
&lt;blockquote&gt;
"No youth may be denied membership in the Boy Scouts of America on the basis of sexual orientation or preference alone."&lt;/blockquote&gt;
I thought I would take a brief moment to publicly state my support for a more inclusive scouting, and one that will hopefully eventually welcome gay and lesbian adult leaders, and non-religious youths and leaders.
&lt;br /&gt;
&lt;br /&gt;
I am bothering to speak out about inclusive scouting because I believe that scouting is an over all positive experience that has great value for the youths involved in it. I outlined some of the benefits I felt that scouting had brought to my life in &lt;a href="http://eaglebadges.tumblr.com/post/29628858030/robert-mazzuca-chief-scout-executive-bsa"&gt;the letter I enclosed with my Eagle Scout award&lt;/a&gt; when I returned it in protest of BSA's exclusionary policies. For example, it was in the scouts that I had my first experiences with organizing with peers to reach a goal. The skills I learned there, like learning how to clearly express my vision of what should be done, how to set aside that vision in favor of ideas offered by others, and how to delegate responsibilities have been invaluable for me in my professional life. I also learned the value of living a life of principle, which is why I'm writing this blog post. I was also able to bond with my father, who is also an Eagle Scout, and my brother. I don't believe that there is any good reason to deny any youth or parent any of these experiences on the basis of their sexual orientation, gender identity, or metaphysical beliefs.
&lt;br /&gt;
&lt;br /&gt;
On this specific issue, I would like to preempt most objections to allowing gay youths into the scouts by coming back to the supreme court ruling that BSA won. The most most common concern I've heard voiced about allowing gay youths into the scouts revolves around who will sleep in which tent on a camping trip. Second and third most common are the discomfort other youths and sponsoring organizations may have with gay youths being in their troop.&lt;br /&gt;
&lt;br /&gt;
But the majority opinion in Boy Scouts of America v. Dale made no mention of the logistics of sleeping arrangements, nor discomfort of individuals or sponsoring organizations when they decided that BSA has constitutional right to exclude gay youths. Rather, it was argued that the Boy Scouts of America had an expressive policy condemning homosexuality, that the Boy Scouts "teach that homosexual conduct is not morally straight." The constitutional right to free association that BSA won was not won on the basis of something so trivial as how tricky sleeping arrangements will be, but rather by arguing that they have a core expressive policy that &lt;i&gt;being gay is wrong&lt;/i&gt;. When voting on whether to adopt the new membership policy, I hope that National Council members have in mind that they are voting on the principle of whether or not &lt;i&gt;being gay is wrong&lt;/i&gt;, and that other logistical concerns are secondary.
&lt;br /&gt;
&lt;br /&gt;
Regarding those logistical concerns I would just briefly say that the concern I have about openly gay youths in shared tents is the real threat of violence towards the gay youths. That is the reality we are living in. Scoutmasters will have to emphasize to their troops that violence towards each other is not ok, and that we should have to treat all people with human decency, a value I should hope is well within the expressive mission of BSA. As for the discomfort of youths and sponsoring organizations at having gay youths in their troops, I would just say that one of the most valuable experiences I had in the scouts was being in a troop with some people who I would have rather not have been. Not any class of people, mind you, just individuals who at first brush I probably wouldn't have volunteered to be friends with. We don't get to choose who lives in the world with us, but we have to live and work with them every day, and hopefully we can also learn to see the value in them. I would say it is doubly important for youths brought up with religious beliefs that strongly condemn gays and lesbians to be friends with and work with gays and lesbians their own age. They are growing up in a world where gays and lesbians (fortunately) feel more comfortable being out about their orientation, and they will be forced to reconcile their religious beliefs with the fact that they &lt;i&gt;will&lt;/i&gt; have gay and lesbian peers as adults, and they will have to do so with grace and aplomb.
&lt;br /&gt;
&lt;br /&gt;
I know that I haven't addressed all of the concerns people have raised about allowing gays into the Boy Scouts. For example, some people have had &lt;a href="http://mediamatters.org/print/research/2013/02/12/pedophilia-myth-tainted-cable-news-coverage-of/192627"&gt;fairly vile things to say about gays and pedophiles&lt;/a&gt; which I won't bother myself with here. And I won't bother myself with any other arguments either, since most of them seem to have the property that once you provide a counter argument, a new argument can be freely generated. I think this is because these concerns about camping in tents and so on are transparently secondary to the core contention about whether &lt;i&gt;being gay is wrong&lt;/i&gt;, a proposition which has become impolite to say itself. Hopefully, the National Council adopts the new membership policy.</description><link>http://val-systems.blogspot.com/2013/05/i-support-inclusive-scouting.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/-kW41T5lLhVI/UZwIBefrcdI/AAAAAAAABOQ/4lGPHV-Y_SQ/s72-c/tumblr_m8wqmcihSM1rbklpzo1_1280.jpg" height="72" width="72" /><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-5026972926556202088</guid><pubDate>Sun, 17 Feb 2013 23:08:00 +0000</pubDate><atom:updated>2013-02-17T20:24:26.721-05:00</atom:updated><title>A difference between men and women.</title><description>This post was originally going to be a lot more mathy, with a bit of explanation about the source-filter model of speech production with an aside about dead dog heads mounted on compressed air tanks thrown in there, and a whole description of my methods, but I felt like I was sort of burying the lede there. Instead, I'm focusing more on how people are interested in magnifying the difference between men and women.&lt;br /&gt;
&lt;br /&gt;
It started off with me estimating the vocal tract lengths of the speakers in &lt;a href="http://fave.ling.upenn.edu/pnc.html"&gt;the Philadelphia Neighborhood Corpus&lt;/a&gt;. Given sufficient acoustic data from a speaker, and making some simplifying assumptions, and taking into account the acoustic theory of speech, you can roughly estimate how long a person's vocal tract (meaning distance from vocal cords to lips) is. I went ahead and did this for the speakers in the PNC, and plotted the results over age.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-Z7GyqF_fB3Y/USFUYLaDH3I/AAAAAAAABK4/Agj761egbcI/s1600/vtl.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="400" src="http://4.bp.blogspot.com/-Z7GyqF_fB3Y/USFUYLaDH3I/AAAAAAAABK4/Agj761egbcI/s640/vtl.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
Pretty cool, right? There's nothing especially earth shattering here. It's known that men, on average, have longer vocal tracts than women. I was a little bit surprised by how late in age the bend in the growth of vocal tracts were.&lt;br /&gt;
&lt;br /&gt;
Here's the density distribution of vocal tract lengths for everyone over 25 in the corpus.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-r9Ui-jPjvn8/USFYRSguwhI/AAAAAAAABLA/iok1HN8Pzko/s1600/vtk_dens.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="400" src="http://4.bp.blogspot.com/-r9Ui-jPjvn8/USFYRSguwhI/AAAAAAAABLA/iok1HN8Pzko/s640/vtk_dens.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
That's a pretty big effect size. &lt;a href="http://languagelog.ldc.upenn.edu/nll/?p=4461"&gt;Mark Liberman has recently posted&lt;/a&gt; about the importance of reporting effect sizes. He was focusing on how even though people are really obsessed with cognitive differences between men and women, the distributions of men and women are almost always highly overlapping.&lt;br /&gt;
&lt;br /&gt;
Following Mark on this, I went ahead and calculated Cohen's-d for these VTL estimates. 
&lt;script src="https://gist.github.com/JoFrhwld/4973557.js?file=cohen_d_vtl.R"&gt;&lt;/script&gt;&lt;br /&gt;
So,  1.71 is a fairly large Cohen's-d effect size. I had heard that the difference in vocal tract length between men and women was disproportionately large given just body size differences. I managed to find some data on American male/female height differences, but the effect size is not impressively smaller than the VTL effect size (1.64, about 95% the VTL effect size).&lt;br /&gt;
&lt;br /&gt;
&lt;script src="https://gist.github.com/JoFrhwld/4973557.js?file=cohen_d_height.R"&gt;&lt;/script&gt;&lt;br /&gt;
&lt;br /&gt;
Compared to the effect that Mark was looking at (science test scores), these effect sizes are enormous. The effect size of height between men and women is about 23 times larger than the science test score differences which warranted a writeup in the New York Times.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
Yet, still not big enough.&lt;/h3&gt;
As I was thinking about how height difference is perhaps one of the largest statistical differences between men and women, it also struck me how often it is still not big enough for social purposes. &lt;a href="http://thesocietypages.org/socimages/2011/04/29/charles-and-diana-the-manufacture-of-mens-height/"&gt;Sociological Images has a good blog post&lt;/a&gt; about how even though Prince Charles was about the same height, if not shorter than Princess Diana, in posed pictures he was posed to look &lt;i&gt;much&lt;/i&gt; taller than her. Here's an example of them on a postage stamp:&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://i.imgur.com/u6gL7f3.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://i.imgur.com/u6gL7f3.jpg" width="227" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
And &lt;a href="http://thesocietypages.org/socimages/2012/08/25/height-norms-and-framing-the-shot/"&gt;in another post&lt;/a&gt;, they provide this picture of a reporter being comically boosted to appear taller than the woman he's interviewing.&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://i.imgur.com/rPeVKOD.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="400" src="http://i.imgur.com/rPeVKOD.jpg" width="265" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br&gt;
&lt;br&gt;
My take away point is that when it comes to socially constructing large and inherent differences between men and women, even the largest statistical difference there is out there is still not good enough for people, and needs to be augmented and supported. Then take into account that most other psychological and cognitive differences have drastically smaller effect sizes, and it really brings into focus how the emphasis on gender differences must draw almost all of its energy from &lt;i&gt;social&lt;/i&gt; motivations, rather than from evidence or data or facts.</description><link>http://val-systems.blogspot.com/2013/02/a-difference-between-men-and-women.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-Z7GyqF_fB3Y/USFUYLaDH3I/AAAAAAAABK4/Agj761egbcI/s72-c/vtl.png" height="72" width="72" /><thr:total>4</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-7655602197354520055</guid><pubDate>Thu, 07 Feb 2013 19:50:00 +0000</pubDate><atom:updated>2013-02-11T18:00:14.442-05:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">linguistics media</category><title>I recommend Lexicon Valley</title><description>Perhaps the most frustrating thing about being a linguist is the enormous gap among educated people about how little they actually know about language, and how confident they are that they know a lot about language. If you keep up with this blog, &lt;a href="http://val-systems.blogspot.com/2013/02/does-language-cool.html"&gt;I spend&lt;/a&gt; &lt;a href="http://val-systems.blogspot.com/2012/11/to-take-zombie-nouns-seriously-you.html"&gt;a lot&lt;/a&gt; &lt;a href="http://val-systems.blogspot.com/2012/07/teens-and-texting-and-grammar.html"&gt;of time&lt;/a&gt; &lt;a href="http://val-systems.blogspot.com/2012/07/dont-worry-im-physicist.html"&gt;venting&lt;/a&gt; &lt;a href="http://val-systems.blogspot.com/2012/04/come-and-see.html"&gt;this frustration&lt;/a&gt;&amp;nbsp;&lt;a href="http://val-systems.blogspot.com/2011/12/on-vocal-fry.html"&gt;here&lt;/a&gt;&amp;nbsp;(&lt;a href="http://val-systems.blogspot.com/2011/08/miraculous-thought-transference.html"&gt;etc.&lt;/a&gt;&amp;nbsp;&lt;a href="http://val-systems.blogspot.com/2011/08/language-communication-and-iphone.html"&gt;etc.&lt;/a&gt;&amp;nbsp;&lt;a href="http://val-systems.blogspot.com/2011/01/grammar-phobia-or-judging-book-by-its.html"&gt;etc.&lt;/a&gt;).&lt;br /&gt;
&lt;br /&gt;
But I didn't start blogging in order to complain about how other people are getting it wrong. I started blogging to have an informal outlet for passion for linguistics! I've been a little concerned about the negative tone of a few of my recent posts, so here's a more positive one. &lt;br /&gt;
&lt;br /&gt;
But... it does start off with a complaint. At the LSA this year, David Pesetsky's plenary focused on the failure of linguistics (and more specifically, generative linguistics) to penetrate the popular science press. Instead, stories about physicists discovering the most common English word is "the," and psychologists arguing that structure of language is really words like beads on a string get a lot more play. At the Q&amp;A, Ray Jackendoff made the point that there is a folk linguistics that is intricately tied up in social politics that acts as a major roadblock to the popular advancement of real linguistic research. &lt;a href="http://val-systems.blogspot.com/2011/12/i-dont-think-its-linguists-fault.html"&gt;I've said similar things before&lt;/a&gt;. &lt;br /&gt;
&lt;br /&gt;
What is to be done about this state of affairs is the topic of another blog post. Right now, I'd like to bring attention to a bright light of potential linguistics popularization.  &lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Lexicon Valley&lt;/h3&gt;&lt;br /&gt;
Lexicon Valley is a podcast hosted by Slate. I've been listening to it off and on since it started, and I have to say I've always enjoyed it. The hosts play two roles in a dialectic. Mike Vuolo is the patient intellectual, and I've always been impressed by the background research he's done. Bob Garfield is the voice of the untutored establishment, and, well, I think that description adequately sums up my opinion of what he brings to the show. It's actually an important role he plays, because without a vocal foil, Vuolo's research would lie rather flat. It's also important for the cause of linguists to have people hear brash knee jerk reactions rebuked by careful research. &lt;br /&gt;
&lt;br /&gt;
They have covered a few topics I know a little bit about, and I've always started listening to each show bracing myself for frustration and disappointment. It's a learned reaction I have from every other discussion of language in popular media. But Lexicon Valley usually carries through for me. They've done great shows on &lt;a href="http://mobile.slate.com/articles/podcasts/lexicon_valley/2012/02/lexicon_valley_is_black_english_a_dialect_or_a_language_.html"&gt;African American English&lt;/a&gt;, &lt;a href="http://mobile.slate.com/articles/podcasts/lexicon_valley/2012/04/lexicon_valley_what_it_means_for_a_language_to_have_grammatical_gender_.html"&gt;grammatical gender&lt;/a&gt;, and &lt;a href="http://mobile.slate.com/articles/podcasts/lexicon_valley/2012/05/lexicon_valley_seeking_a_gender_neutral_alternative_to_he_and_she_.html"&gt;the English epicene pronoun&lt;/a&gt;, speaking to actual linguists in each case, and most recently they've just done a really good portrayal of Labov's department store study (&lt;a href="http://www.slate.com/articles/podcasts/lexicon_valley/2013/01/lexicon_valley_on_r_dropping_in_new_york_and_the_phenomenon_of_prestige.html"&gt;Part 1&lt;/a&gt;,&lt;a href="http://www.slate.com/articles/podcasts/lexicon_valley/2013/01/lexicon_valley_on_r_dropping_or_nonrhoticity_in_new_york_department_stores.html"&gt;Part 2&lt;/a&gt;). &lt;br /&gt;
&lt;br /&gt;
They did catch a lot of flack recently for &lt;a href="http://www.slate.com/articles/podcasts/lexicon_valley/2013/01/lexicon_valley_on_creaky_voice_or_vocal_fry_in_young_american_women.html"&gt;their show on creaky voice&lt;/a&gt;. I was so nervous when I started listening to it, because the recent coverage creaky voice has gotten &lt;a href="http://val-systems.blogspot.com/2011/12/on-vocal-fry.html"&gt;has been worse than terrible&lt;/a&gt;. Per usual, though, Vuolo's research and discussion were excellent. Garfield, on the other hand, spouted some really negative attitudes, and I think he deserves every criticism of sexism that he got. Even within the dialectic of the show, Garfield brought a net negative contribution that time round. On the subsequent show, though, Vuolo read out some pretty harsh commentary about Garfield. Garfield offered a nonpology (something about how he can't be sexist, he has daughters), but it was good to have some of the criticism read out loud.&lt;br /&gt;
&lt;br /&gt;
On average, modulo Garfield's frustrating attitudes, I would highly recommend the podcast, and would recommend recommending the podcast. &lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Could it be better?&lt;/h3&gt;&lt;br /&gt;
While I think Lexicon Valley has done some great work so far, I don't think it has yet provided coverage of linguistics in quite the way Pesetsky dreams of. So far, they've mostly covered topics that are &lt;i&gt;reactive&lt;/i&gt; to popular gripes or misconceptions about language. In some respect, it'd be hard for them to do otherwise, because the popular understanding of language science is far below that of almost any natural science, or so it seems from this angle. &lt;br /&gt;
&lt;br /&gt;
I hope, though, that they might find a way to approach linguistic topics which are not just reactive. Just addressing the idea that there are functional elements which have no phonological realization would be enormous. Garfield could play the skeptic, believing that what you see is what you get. &lt;br /&gt;
&lt;br /&gt;
So linguists, listen in, get a feel for the show, and maybe if you have a topic which could be nicely formatted into a 20 minute conversation, send it in to them!</description><link>http://val-systems.blogspot.com/2013/02/i-recommend-lexicon-valley.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><thr:total>1</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-8042044348773388437</guid><pubDate>Sun, 03 Feb 2013 23:36:00 +0000</pubDate><atom:updated>2013-02-03T18:36:34.335-05:00</atom:updated><title>Does language "cool"?</title><description>A few months ago, &lt;a href="http://val-systems.blogspot.com/2012/07/dont-worry-im-physicist.html"&gt;I posted about how I was relatively unimpressed&lt;/a&gt; by a &lt;a href="http://rsif.royalsocietypublishing.org/content/early/2012/07/23/rsif.2012.0491.full"&gt;paper arguing that the observed Zipfian distribution of words in a corpus is due to "preferential attachment"&lt;/a&gt; aka the Matthew Effect aka the rich get richer. The author of that paper is apparently also a co-author of a paper called&amp;nbsp;"&lt;a href="http://www.nature.com/srep/2012/121210/srep00943/full/srep00943.html"&gt;Languages cool as they expand: Allometric scaling and the decreasing need for new words&lt;/a&gt;." &lt;a href="http://www.insidescience.org/content/physicists-explore-rise-and-fall-words/892"&gt;The writeup in Inside Science&lt;/a&gt; summarizes it like this:
&lt;br /&gt;
&lt;blockquote&gt;
[A] recent analysis has found that as a language grows over time, it becomes more set in its ways. New words are always being added, according to this study, but few become widely used and part of the standard vocabulary.&lt;/blockquote&gt;
My linguist hackles immediately raised at this statement, and that's because there is a large and fundamental difference between what a linguist understands the term "language" to refer to, and what the authors of the column and paper understand it to refer to. What the physicists and the reporter mean by "language" is roughly "a set of words," and in the context of the paper, they almost seem to mean "the set of words which have been published."
&lt;br&gt;&lt;br&gt;
This "language is words" axiom is part of most people's &lt;i&gt;folk&lt;/i&gt; linguistics that we have to train people out of when they take Intro to Linguistics. That's why it's a little hard to take the work of these physicists seriously at first glance. It is as if they were trying to write a serious paper on biological evolution with the assumption that traits acquired by an organism during its life were inheritable.
&lt;br&gt;&lt;br&gt;
But there is an aspect of linguistic knowledge relating to the set of words and morphemes a speaker knows, which linguists call the "lexicon". So, I'll just go ahead and reread the paper mentally replacing each instance of "language" with "lexicon" in order to get through it.
&lt;br&gt;&lt;br&gt;
&lt;h3&gt;Overall Thoughts&lt;/h3&gt;
This paper seems to be a relatively competent (&lt;a href="http://languagelog.ldc.upenn.edu/nll/?p=4456"&gt;modulo Mark Liberman's concerns about OCR errors&lt;/a&gt;) description of the statistical properties of large corpora. But that's really as far as I think any of the claims can go. I am totally unconvinced that their results shed any light on language change, development, evolution, etc. I'm not even sure that the simplest statement that "the lexicon of languages has grown over the past 200 years" can be supported by the results reported.
&lt;br&gt;&lt;br&gt;
The key problem that I see with the paper is the conflation of "new to the corpus" and "new to the lexicon." Here's how the problem of sampling language was describe to me, and I believe it goes back to Good (1953) and is key to &lt;a href="http://en.wikipedia.org/wiki/Good%E2%80%93Turing_frequency_estimation"&gt;Good-Turing Smoothing&lt;/a&gt;. Say you are a entomologist  working in a rain forest, trying to make a survey of insect life. You put out your net for a night to collect a sample, then count up all the species in your net. Some bug species are going to be a lot more frequent than others. You'll have some species that show up many times in the net, but even more species will show up in the net with only one member. Now, let's say that you come back to the same rain forest two years later, and repeat the sample. You are nearly guaranteed to observe new species in your net this time around, but the key question is whether they are just new to the net, or are they new to the rain forest. If they're new to the rain forest, did they migrate in, or are they hybrids of two other species, or has a species you saw previously evolved really rapidly so that you're seeing it as different now?
&lt;br&gt;&lt;br&gt;
These are really interesting and important questions for our entomologist to answer, but you cannot arrive at a definitive answer based simply on the fact that this new species has now showed up in your net. In fact, depending on a few factors, the answer with the highest probability is that the new species is simply new to your net. The Good-Turing estimate of the probability that the very next bug you catch will be a new species is that it's roughly equivalent to the proportion of bugs you've already caught that belong to a species you've only seen once. 
&lt;br&gt;&lt;br&gt;
The situation gets even more confusing if you come back to the same rain forest two years later with a net twice the size.
&lt;br&gt;&lt;br&gt;
The paper has a figure plotting the increase in lexicon size over time. My first thought when I saw it was that it must be the case that the overall size of the corpus at each time point must also be going up. Coming back to the entomologist in the rain forest, the number of species in his net is merely a &lt;i&gt;sample&lt;/i&gt; of how many species there are in forest. In the same exact way, the number of words in a lexicon can only be estimated by the words which people happened to write down. As you increase the size of the net, you're going to find more species which were already in the forest, but not in your net. As you increase the size of your corpus, you're going to find more words which were already in the lexicon, but not in the corpus.
&lt;br&gt;&lt;br&gt;
Now, you need to add to this that at any given point in time, the true maximum number of &lt;i&gt;possible&lt;/i&gt; words you could potentially observe in any given language is ∞. Yes, in fact, the whole reason language is interesting to study is because given a finite set of mental objects, and a finite set of operations to combine them, you can come up with an infinite set of stings, and that goes for words too, not just sentences. In 1951, "iPod" was a possible word of English, it just wasn't used, or at least not for the same purpose it is now. 
&lt;br&gt;&lt;br&gt;
Regarding the question of whether the "active" (as I'll call it) lexicons of languages have grown over the past 200 years, well, indeed, the overall number of printed words has also increased. Almost all of their results seem to have more to do with the technological development of &lt;i&gt;publishing&lt;/i&gt; than it does with any other linguistic or cultural development. It is as if the entomologist said that over the past decade, the biodiversity in his rainforest has exploded, when really what's going on is his nets have been getting progressively larger.
&lt;br&gt;&lt;br&gt;
Now, it might be the case that the active lexicon has grown more than would be expected given the increase in the size of the corpus year over year, but as far as I can tell, the authors did not try to estimate whether this was the case.
&lt;br&gt;&lt;br&gt;
&lt;h3&gt;What about this cooling down?&lt;/h3&gt;
The "cooling" effect referred to by the paper is the suggestion that as a language "grows" (which as I just said is dubious), the frequency with which particular words are used becomes more stable. Some words are more frequent than others, but words are less likely to move up and down in frequency over time/as the lexicon grows. Back to entomology, the suggestion is that as more species cram into a rainforest, each species is less likely to become more or less populous.
&lt;br&gt;&lt;br&gt;
Again, though, the frequency, even relatively frequency, of a word in a corpus is merely an estimate of its true frequency. As the size of the corpus increases, so should the reliability of its frequency estimates, and we would predict decreasing volatility of those frequency estimates. The authors check for this, and find exactly this relationship between corpus size and frequency volatility, but I can't tell whether there was excess "cooling" left over. I wish they had said, "there was x proportion of cooling left unaccounted for by simply accounting for the size of the corpus," but I think this is perhaps another symptom of the assumption that the corpus=the lexicon=the language that I complained about before.
&lt;br&gt;&lt;br&gt;
&lt;h3&gt;The Allure of Big Data&lt;/h3&gt;
The reporter who wrote the Inside Science article did what it appears that the editors of Scientific Reports did not, asked a linguist to comment on the paper. Bill Kretzschmar was "underwhelmed," saying that most of these results are not new to linguists. I would take this as a word of warning about the allure of big data. The results discussed in this paper are not, by and large, new, but rather have never been done with data of this scale. But unfortunately, a fact which is already known does not get more interesting when it is reestablished with data 100 or 1000 times larger than before.</description><link>http://val-systems.blogspot.com/2013/02/does-language-cool.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><thr:total>2</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-541076126968347580</guid><pubDate>Tue, 27 Nov 2012 19:58:00 +0000</pubDate><atom:updated>2012-11-27T14:58:09.083-05:00</atom:updated><title>To take "Zombie Nouns" seriously, you must've had your brains eaten.</title><description>At first, I didn't feel like blogging about the NYT Column on "&lt;a href="http://opinionator.blogs.nytimes.com/2012/07/23/zombie-nouns/" target="_blank"&gt;Zombie Nouns&lt;/a&gt;" because I feel like I've been spending too much time being critical here, arguing against usage advice like this is futile, and I knew &lt;a href="http://languagelog.ldc.upenn.edu/nll/?p=4095" target="_blank"&gt;Mark Liberman would cover it&lt;/a&gt;. In fact, I drafted this post all the way back during the summer, and just let it sit. But now, I've seen the column, nearly verbatim, &lt;a href="http://ed.ted.com/lessons/beware-of-nominalizations-aka-zombie-nouns-helen-sword"&gt;pop up on TED-Ed as a fully animated "lesson"&lt;/a&gt;, which presumably means some educators are actually assigning it to classrooms of fertile and impressionable minds! It really can't pass without comment now.&lt;br /&gt;
&lt;br /&gt;
Helen Sword says that you should avoid using nominalizations, which she calls "zombie nouns." They're nouns that have been made out of other parts of speech. To take one of her examples, &lt;i&gt;calibrate&amp;nbsp;&lt;/i&gt;+ &lt;i&gt;ion&lt;/i&gt; = &lt;i&gt;calibration.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
What is so wrong about nominalizations? Not exactly clear. She seems to take aim at&amp;nbsp;unnecessarily&amp;nbsp;jargonistic writing, which frequently contains novel coinings of words of all types, including nominalizations. So sure, being jargonistic to obscure your other intellectual shortcomings is not so good. But is it really, actually, the mere use of nominalizations that's doing the damage there?&lt;br /&gt;
&lt;br /&gt;
She also seems to take a page out of the anti-passive voice book, saying, "it fails to tell us &lt;i&gt;who&lt;/i&gt; is doing &lt;i&gt;what&lt;/i&gt;," which just like the passive, is just not true. For example, in the sentence&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;My &lt;b&gt;criticism&lt;/b&gt; of her column is a day late and a dollar short.&lt;/li&gt;
&lt;/ul&gt;
It's very clear &lt;i&gt;who&lt;/i&gt;&amp;nbsp;is doing &lt;i&gt;what&lt;/i&gt;, even though I used a nominalization (in bold).
&lt;br /&gt;
&lt;br /&gt;
But on top of the half baked usage advice, there are some more reprehensible social attitudes being expressed. For example, she lists&amp;nbsp;&lt;i&gt;epistemology&lt;/i&gt;&amp;nbsp;as a useful nominalization for expressing a complex idea, but &lt;i&gt;heteronormativity&lt;/i&gt;&amp;nbsp;as one only out of touch academics who are enchanted by jargon use. First off, I would not want to use&amp;nbsp;&lt;i&gt;epistemology&lt;/i&gt;&amp;nbsp;as an example when explaining what&amp;nbsp;nominalizations are. What's it derived from? &lt;i&gt;Episteme&lt;/i&gt;? &lt;i&gt;Episteme&lt;/i&gt;&amp;nbsp;has a Wikipedia page, so I guess it's that. Which brings me to the next issue here. It's&amp;nbsp;embarrassing&amp;nbsp;for me to admit, but whenever someone says or writes &lt;i&gt;epistemology&lt;/i&gt;, I have to go look it up on Wikipedia. How does using &lt;i&gt;epistemology&lt;/i&gt;&amp;nbsp;not count as being out of touch with how ordinary people speak? &lt;i&gt;Heteronormativity&lt;/i&gt;, on the other hand, is pretty easy to wrap your mind around. From Wikipedia:
&lt;br /&gt;
&lt;blockquote&gt;
Heteronormativity is a term to describe any of a set of lifestyle norms that hold that people fall into distinct and complementary genders (man and woman) with natural roles in life. It also holds that heterosexuality is the normal sexual orientation, and states that sexual and marital relations are most (or only) fitting between a man and a woman. Consequently, a "heteronormative" view is one that involves alignment of biological sex, sexuality, gender identity, and gender roles.
&lt;/blockquote&gt;
That's a pretty complex idea. But you know what? It's pretty easy to decode most of that meaning from the word itself, at least, if you're vaguely familiar with the politics of the time. &lt;i&gt;Hetero(sexual)&lt;/i&gt;&amp;nbsp;+ &lt;i&gt;normative + ity&lt;/i&gt;. It seems to me that she's saying more about her position on sex and gender politics here than she is about usage advice.&lt;br /&gt;
&lt;br /&gt;
But who is this person, and why is she writing an opinion column in the New York Times, and getting the full TED treatment? Just like everyone, she's selling something: the icing on the cake, and my reason for blogging about this at all. She has a book out called &lt;i&gt;&lt;a href="http://www.writersdiet.com/WT.php?home" target="_blank"&gt;The Writer's Diet&lt;/a&gt;&lt;/i&gt;, which has an&amp;nbsp;accompanying online &lt;i&gt;&lt;a href="http://www.writersdiet.com/WT.php" target="_blank"&gt;Writer's Diet Test&lt;/a&gt;&lt;/i&gt;. No, it's not &lt;i&gt;diet&lt;/i&gt;&amp;nbsp;as in "food for thought and inspiration," like a &lt;i&gt;Chicken Soup for the Writer's Soul&lt;/i&gt;.&amp;nbsp;It's &lt;i&gt;diet&lt;/i&gt;&amp;nbsp;as in &lt;i&gt;dieting&lt;/i&gt;&amp;nbsp;as in "drop 20 lbs and get the six pack abs you always wanted." Just paste in a paragraph of your writing into the test, and it'll rate you along a five point scaled labeled:&lt;br /&gt;
&lt;br /&gt;
&lt;table cellpadding="5" rules="cols"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;lean&lt;/td&gt;
&lt;td&gt;fit &amp;amp; trim&lt;/td&gt;
&lt;td&gt;needs toning&lt;/td&gt;
&lt;td&gt;flabby&lt;/td&gt;
&lt;td&gt;heart attack territory&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;br /&gt;
Ain't nothing like exploiting the collective dysmorphia of a nation to push your quarter-baked usage decrees. But in doing so, Sword actually clarifies the role that books like hers play. The analogy to the diet and weight loss industry is entirely apt. The dieting industry makes their money by sowing seeds of personal insecurity, then reaps their harvest with offers of unfounded, unscientific, and ultimately futile dieting pills, products, methods, 10 step plans, meals, regimes, books, magazines, etc.&lt;br /&gt;
&lt;br /&gt;
I won't mince words. The NYT column and the TED-Ed video have the equivalent intellectual content of the magazines in the supermarket aisle promising you 5 super easy steps to trim your belly fat to get a sexy beach bod in time for the summer. And they serve the same purpose: to undermine the confidence of every-day folk, so that they may be taken advantage of by self-appointed gurus.</description><link>http://val-systems.blogspot.com/2012/11/to-take-zombie-nouns-seriously-you.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><thr:total>4</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-5007037223930040558</guid><pubDate>Thu, 15 Nov 2012 19:14:00 +0000</pubDate><atom:updated>2012-11-15T14:14:51.057-05:00</atom:updated><title>Creative Work</title><description>Whenever I hear "creative" people describe their creative process, or more&amp;nbsp;precisely&amp;nbsp;their creative woes, I am always struck by the strong similarities to my own experiences trying to do science. I &lt;i&gt;do&lt;/i&gt;&amp;nbsp;consider myself as trying to do science. &lt;br&gt;
&lt;br&gt;
Take, for example, this excellent statement on self-disappointment at the early stages of your career from Ira Glass.&lt;br&gt;
&lt;iframe allowfullscreen="allowfullscreen" frameborder="0" height="281" mozallowfullscreen="mozallowfullscreen" src="http://player.vimeo.com/video/24715531?badge=0" webkitallowfullscreen="webkitallowfullscreen" width="500"&gt;&lt;/iframe&gt;&lt;br /&gt;
&lt;a href="http://vimeo.com/24715531"&gt;Ira Glass on Storytelling&lt;/a&gt; from &lt;a href="http://vimeo.com/thedak"&gt;David Shiyang Liu&lt;/a&gt; on &lt;a href="http://vimeo.com/"&gt;Vimeo&lt;/a&gt;.&lt;br&gt;
&lt;br&gt;
This almost perfectly sums up how I felt about almost all of the early work I did in graduate school. I can't say that I've actually gotten to the point where the work I produce meets up with my my own personal standards, but it has been on an upward trend, and I'd say Ira Glass' advice is spot on. If you want to write good papers, just write a lot of papers, and if you want to be good at giving talks, give a lot of talks, preferably in a context where you feel comfortable being bad or mediocre.&lt;br&gt;
&lt;br&gt;
That last bit, being comfortable with being bad is really reminiscent of things Brother Ali says in this interview.&lt;br&gt;
&lt;iframe src="http://player.vimeo.com/video/49456269?badge=0" width="500" height="281" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen&gt;&lt;/iframe&gt; &lt;p&gt;&lt;a href="http://vimeo.com/49456269"&gt;Ill Doctrine: Brother Ali Meets the Little Hater&lt;/a&gt; from &lt;a href="http://vimeo.com/animalnewyork"&gt;ANIMALNewYork.com&lt;/a&gt; on &lt;a href="http://vimeo.com"&gt;Vimeo&lt;/a&gt;.&lt;/p&gt;
There are a few things Brother Ali says that really resonate with me.
&lt;blockquote&gt;
There was a moment where I was so stressed out. And I'm like, "Man, everything that I ever did that people liked, I just got lucky. I'm a fraud."
&lt;/blockquote&gt;
&lt;blockquote&gt;
...
&lt;/blockquote&gt;
&lt;blockquote&gt;
It's a weird weird thing to have what you create also be your livelihood. What we create is also our sense of self. What we create is also the way the world views us.
&lt;/blockquote&gt;
&lt;blockquote&gt;
...
&lt;/blockquote&gt;
&lt;blockquote&gt;
And so I start thinking about it. Ok, it's not that I'm blocked. It's not that I don't have anything to say. It's that I don't know how to say what I need to say. Or it's that I don't think that it's going to be received well. Or it's that the people that love me and have supported me and have, you know, gave me the little bit of freedom in my life that I have, I don't want to let them down and I don't want to hurt their feelings by saying what needs to be said.
&lt;/blockquote&gt;
I think almost all academics of any variety feel this way from time to time.&lt;br&gt;
&lt;br&gt;
But I wonder if some people might not be surprised that I would feel so similarly to creative artists in the pursuit of my science, or that maybe take it as evidence that I what I do is not science. It is certainly doubted about Linguistics occasionally. But I think these people (probably strawmen) are mistaken in thinking that science is not a creative process. This was recognized by Max Weber in is 1918 essay "Science as a Vocation" (&lt;a href="http://val-systems.blogspot.com/2011/08/max-weber-on-why-there-is-no-decision.html"&gt;which I've blogged about before&lt;/a&gt;).
&lt;blockquote&gt;
[I]nspiration plays no less a role in science than it does in the realm of art. It is a childish notion to think that a mathematician attains any scientifically valuable results by sitting at his desk with a ruler, calculating machines or other mechanical means. The mathematical imagination of a Weierstrass is naturally quite differently oriented in meaning and result than is the imagination of an artist, and differs basically in quality. But the psychological processes do not differ. Both are frenzy (in the sense of Plato's 'mania') and 'inspiration.'
&lt;/blockquote&gt;
He also suggests that the best science and the best art is produced by individuals devoted to the science and art for their own sake, rather than being driven by the express goal of producing something new, for the sake of novelty.&lt;br&gt;
&lt;br&gt;
The distinction that Weber draws between art and science is that science is necessarily committed to the abandonment of old science. That is, art from the Renaissance is still, and always will be, art, but science from the same period is no longer science. It has been superseded by more recent developments.&lt;br&gt;
&lt;br&gt;
Anyway, here's the song Brother Ali was talking about, which I'm sure almost all academics can identify with, except for the suicide ideation, hopefully.

&lt;iframe width="640" height="360" src="http://www.youtube.com/embed/06qWAClqh9I?rel=0" frameborder="0" allowfullscreen&gt;&lt;/iframe&gt;</description><link>http://val-systems.blogspot.com/2012/11/creative-work.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://img.youtube.com/vi/06qWAClqh9I/default.jpg" height="72" width="72" /><thr:total>1</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-84192700206882472</guid><pubDate>Wed, 07 Nov 2012 23:32:00 +0000</pubDate><atom:updated>2012-11-07T19:11:36.779-05:00</atom:updated><title>Nate Silver vs.the baseline</title><description>The 2012 election has been declared a victory for Nate Silver. As Rick Reilly said:

&lt;br /&gt;
&lt;blockquote class="twitter-tweet tw-align-center"&gt;
If Nate Silver told me it's going to rain marshmallows tomorrow, I'd stand outside with cups of hot chocolate.&lt;br /&gt;
— Rick Reilly (@ReillyRick) &lt;a data-datetime="2012-11-07T06:22:49+00:00" href="https://twitter.com/ReillyRick/status/266063137936142336"&gt;November 7, 2012&lt;/a&gt;&lt;/blockquote&gt;
&lt;script charset="utf-8" src="//platform.twitter.com/widgets.js"&gt;&lt;/script&gt;

For me, as a data geek, this is nothing but good news. There's been a lot of talk about  how Silver's high profile during the election could have broader effects on how every day people think about data and prediction. There's also talk about how Silver's performance is challenging to established punditry, as summed up in this XKCD comic.

&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://imgs.xkcd.com/comics/math.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="158" src="http://imgs.xkcd.com/comics/math.png" width="450" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
Coming at this from the other side, though, I'm curious as a data person about how much secret sauce Silver's got. Sure, in broad qualitative strokes, he got the map right. But quantitatively, Silver's model also produced more detailed estimates about voting shares by state. How accurate were those?&lt;br /&gt;
&lt;br /&gt;
Well, to start out, there is not some absolute sense of accuracy. When it comes to predicting which states would go to which candidates, it's easy to say Silver's predictions were maximally accurate. But what's tricker is to figure out how many he could have gotten &lt;i&gt;wrong&lt;/i&gt;&amp;nbsp;and still have us call his prediction accurate. For example, Ohio was a really close race. If Ohio had actually gone to Romney, but all of Silver's other predictions were right, could we call that a pretty accurate prediction? Maybe. But now let's say that he got all of conventional battle ground states right, but out of nowhere, California went for Romney. It's the same situation of getting one state wrong, but in this case it's big state, and an anomalous outcome that Silver's model would have missed. Would his prediction be inaccurate in that case? What if it was Rhode Island instead? That would be equally anomalous, but would have a smaller impact on the final election result. Now let's imagine a different United States where all of the races in all of the states had razor thin margins, and Silver correctly predicted 30 out of 50. In that case, we might say it was an accurate prediction.&lt;br /&gt;
&lt;br /&gt;
All of this is to say that the notion of "accuracy" is really dependent upon what you're comparing the prediction to, and what the goal of the prediction is.&lt;br /&gt;
&lt;br /&gt;
So what I want to know is how much Silver's model improves his prediction over what's just immediately obvious from the available data. That is, I want to see how much closer Silver's prediction of the vote share in different states was than some other baseline prediction. For the baseline, I'll take the average of the most recent polls from that state, as handily provided by &lt;a href="http://fivethirtyeight.blogs.nytimes.com/"&gt;Nate Silver on the 538 site&lt;/a&gt;. I also need to compare both the averaging method and the 538 method to the actual outcomes, which I've copy-pasted from &lt;a href="http://elections2012.npr.org/bigboard/president.html"&gt;the NPR big board&lt;/a&gt;. (Note: I think they might still be updating the results there, so I might have to update this post at some future date with the final tally.)&lt;br /&gt;
&lt;br /&gt;
First I'll look at the Root Mean Square Error for the simple average-of-polls prediction and the 538 prediction. I'll take Obama and Romney separately. The "Silver advantage" row is just the poll averaging prediction divided by the 538 prediction.&lt;br /&gt;
&lt;br /&gt;
&lt;center&gt;
&lt;table rules="rows"&gt;
&lt;tbody&gt;
&lt;tr&gt;
 &lt;th&gt;&lt;/th&gt;&lt;th align="right"&gt;Obama&lt;/th&gt;&lt;th align="right"&gt;Romney&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
 &lt;td&gt;Averaging Polls&lt;/td&gt;&lt;td align="right"&gt;3.3&lt;/td&gt;&lt;td align="right"&gt;4.1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
 &lt;td&gt;538&lt;/td&gt;&lt;td align="right"&gt;1.8&lt;/td&gt;&lt;td align="right"&gt;1.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
 &lt;td&gt;Silver Advantage&lt;/td&gt;&lt;td align="right"&gt;1.8&lt;/td&gt;&lt;td align="right"&gt;2.4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/center&gt;
&lt;br /&gt;
So it looks like Silver has definitely got some secret sauce, effectively halving the RMSE of the stupid poll averaging prediction. I also tried out a version of the RSME weighted by the electoral votes of each state, for a more results oriented view of the accuracy. I just replaced the mean of the squared error by a weighted average of the squared error, weighted by the electoral votes of the state. The results come out basically the same.&lt;br /&gt;
&lt;br /&gt;
&lt;center&gt;
&lt;table rules="rows"&gt;
&lt;tbody&gt;
&lt;tr&gt;
 &lt;th&gt;&lt;/th&gt;&lt;th align="right"&gt;Obama&lt;/th&gt;&lt;th align="right"&gt;Romney&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
 &lt;td&gt;Averaging Polls&lt;/td&gt;&lt;td align="right"&gt;3.2&lt;/td&gt;&lt;td align="right"&gt;3.1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
 &lt;td&gt;538&lt;/td&gt;&lt;td align="right"&gt;1.5&lt;/td&gt;&lt;td align="right"&gt;1.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
 &lt;td&gt;Silver Advantage&lt;/td&gt;&lt;td align="right"&gt;2.2&lt;/td&gt;&lt;td align="right"&gt;2.0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/center&gt;
&lt;br /&gt;
So what was it about the 538 forecast that made it so much better than simply averaging polls? I think these plots might help answer that. They both plot the error in the 538 forecast against the error in poll averaging.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-EkOQHm-W4SE/UJrpuI6vG0I/AAAAAAAABJE/PDxBAaDEWcg/s1600/obama.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/-EkOQHm-W4SE/UJrpuI6vG0I/AAAAAAAABJE/PDxBAaDEWcg/s1600/obama.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-1Tp0sqgs2os/UJrpwwmX-AI/AAAAAAAABJM/8W7VpadTwcs/s1600/romney.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/-1Tp0sqgs2os/UJrpwwmX-AI/AAAAAAAABJM/8W7VpadTwcs/s1600/romney.png" /&gt;&lt;/a&gt;&lt;/div&gt;
It looks like for both Obama and Romney, the 538 forecast did more to boost up the prediction in places where they outperformed their polls than tamping them down where they underperformed. The effect is especially striking for Romney.&lt;br /&gt;
&lt;br /&gt;
So, Silver's model definitely outperforms simple poll watching &amp;amp; averaging. Which is good for him, because it means he's actually doing something to earn his keep.&lt;br /&gt;
&lt;br /&gt;
You can grab the data I and R code was working with at &lt;a href="https://github.com/JoFrhwld/election_2012"&gt;this github repository&lt;/a&gt;. There's also &lt;a href="http://rpubs.com/JoFrhwld/election_2012_538"&gt;this version of the R code on RPubs&lt;/a&gt;.</description><link>http://val-systems.blogspot.com/2012/11/nate-silver-vsthe-baseline.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/-EkOQHm-W4SE/UJrpuI6vG0I/AAAAAAAABJE/PDxBAaDEWcg/s72-c/obama.png" height="72" width="72" /><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-9218875112955034800</guid><pubDate>Fri, 27 Jul 2012 18:34:00 +0000</pubDate><atom:updated>2012-07-27T17:40:39.366-04:00</atom:updated><title>Teens and Texting and Grammar</title><description>I'm just one man, one linguist, impotently shouting into the vast mediascape, "PLEASE POPULAR MEDIA! PLEASE DON'T RUN WITH THE TEEN TEXTING GRAMMAR STORY!"&lt;br /&gt;
&lt;br /&gt;
There is a paper out in New Media and Society called &lt;i&gt;&lt;a href="http://nms.sagepub.com/content/early/2012/05/10/1461444812442927" target="_blank"&gt;Texting, techspeak, and tweens: The relationship between text messaging and English grammar skills&lt;/a&gt;. &lt;/i&gt;If you are a linguist, and you winced at the title, I have to warn you, you're not done wincing yet.&lt;br /&gt;
&lt;br /&gt;
Is the key problem that the authors collected data on text messaging behaviors from self reports? No.&lt;br /&gt;
&lt;br /&gt;
Is the key problem that the authors did not directly assess whether or not the teens in the study used "techspeak"? No. (Let's set aside the fact that &lt;a href="http://languagelog.ldc.upenn.edu/nll/?p=4083" target="_blank"&gt;high volumes of txtspeak are increasingly associated with out of touch adults&lt;/a&gt;).&lt;br /&gt;
&lt;br /&gt;
Is the key problem that the authors didn't include any figures plotting the relationship between any of their measures? No.&lt;br /&gt;
&lt;br /&gt;
Is the key problem that the authors included no control group of teens who don't text, or adults who adopted texting late in life? No.&lt;br /&gt;
&lt;br /&gt;
The key problem is that the authors appear to &lt;b&gt;have no idea what grammar or language are&lt;/b&gt;. I quote:&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
Similar to synchronous online communications such as instant messaging, the speed, ease, and brevity of text messaging have created a perfect platform for adapting the English language to better suit attributes of the technology. This has led to an evolution in grammar, the basis of which we shall call ‘techspeak.’ This language differs from English in that it takes normal English words and modifies them  [...]&lt;/blockquote&gt;
The depth of misunderstanding and naiveté present in this quote about the relationship between actual language and grammar and the way we write is equivalent to thinking that the sun revolves around the Earth, and that stars are bright dots on a large dome in the sky. Mind you, the Earth-centric, skydome model of the universe is a perfectly reasonable one until you are exposed to the most basic, rudimentary scientific understanding of how the world works.&lt;br /&gt;
&lt;br /&gt;
The authors of this paper appear not to have been exposed to the most basic, rudimentary scientific understanding about how language and grammar work.&lt;br /&gt;
&lt;br /&gt;
From Appendix A of the paper, I present to you the 20 point "grammar" assessment used in the study.&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;There (is, are) two ways to make enemies.&lt;/li&gt;
&lt;li&gt;One of the men forgot to bring (his, their) tools.&lt;/li&gt;
&lt;li&gt;Gail and Sue (make, makes) friends easily.&lt;/li&gt;
&lt;li&gt;The coach thought he had (tore, teared, torn) a ligament.&lt;/li&gt;
&lt;li&gt;During the flood, we (dranked, drank, drunk, drunked) bottled water.&lt;/li&gt;
&lt;li&gt;The boy called for help, and I (swum, have swam, swam) out to him.&lt;/li&gt;
&lt;li&gt;Fortunately, Jim’s name was (accepted, excepted) from the roster of those who would have to clean bathrooms because he was supposed to go downtown to (accept, except) a reward for the German Club.&lt;/li&gt;
&lt;li&gt;I don’t know how I could (lose, loose) such a big dress. It is so large that it is (lose, loose) on me when I wear it!&lt;/li&gt;
&lt;li&gt;The man around the corner from the sandlots (come, comes) to our meetings.&lt;/li&gt;
&lt;li&gt;The man and his little girls (was, were) not injured in the accident.&lt;/li&gt;
&lt;li&gt;The pictures in this new magazine (shows, show) the rugged beauty of the West.&lt;/li&gt;
&lt;li&gt;The orders from that company (is, are) on your desk there.&lt;/li&gt;
&lt;li&gt;The (boys, boys’, boy’s, boys’s) hats were lost in the water because they were careless in not tying them to the side of the boat.&lt;/li&gt;
&lt;li&gt;(Its, It’s, Its’) an honor to accept the awards certificates and medals presented to the club.&lt;/li&gt;
&lt;li&gt;Worried, and frayed, the old man paced the floor waiting for his daughter. (Correct/Incorrect)&lt;/li&gt;
&lt;li&gt;The boy yelled, ‘Please help me’! (Correct/Incorrect)&lt;/li&gt;
&lt;li&gt;She got out of the car, waved hello, and walked into the house. (Correct/Incorrect)&lt;/li&gt;
&lt;li&gt;When Suzie arrived at the dance, no one else was there. (Correct/Incorrect)&lt;/li&gt;
&lt;li&gt;Dad and I enjoyed our trip to new york city. (Correct/Incorrect)&lt;/li&gt;
&lt;li&gt;The boy’s mother picked him up from school. (Correct/Incorrect)&lt;/li&gt;
&lt;/ol&gt;
To quote what it was the authors were trying to assess:&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
The first portion of the assessment consisted of 16 questions designed to test the student’s grasp of verb/noun agreement, use of correct tense, homophones, possessives, and apostrophes. [...]  The second portion of the assessment asked participants to indicate whether or not a sentence was correct, such as ‘The boy yelled, “Please help me”!’ (Correct/ Incorrect). This portion tested the student’s understanding of comma usage, punctuation, and capitalization.&lt;/blockquote&gt;
Virtually none of these points (homophones, apostrophes, comma usage, punctuation and capitalization) fall under the purview of what is scientifically understood to be "grammar". Arnold Zwicky has suggested the term "&lt;a href="http://arnoldzwicky.wordpress.com/2012/02/22/its-all-grammar/" target="_blank"&gt;garmmra&lt;/a&gt;" for such things. Punctuation, comma rules, spelling conventions, etc. are all only arbitrary decisions settled upon a long time ago, and have nothing, &lt;i&gt;nothing&lt;/i&gt; to do with human language. You could, by fiat, swap periods and commas (like many cultures do with their numeral systems), insist that sentence initial adverbs be followed by a semicolon, and decide to revert back to the symbols &amp;lt;þ&amp;gt; and &amp;lt;ð&amp;gt; to spell the sounds we currently both spell with &amp;lt;th&amp;gt;, and you know how many things that would change about English grammar? Zero things.&lt;br /&gt;
&lt;br /&gt;
The remaining points of assessment could be considered to be well within the domain of grammar (tense and subject/verb agreement), except authors chose really poor, very variable items for the evaluation. The very first item involves verbal agreement with an expletive subject, and the rest involve cases of coordination, and agreement attraction! These are items which really lie on the outside edges of linguistic &lt;i&gt;processing&lt;/i&gt; abilities, and there is no way that they could serve as reliable measures of fluency and grammatical competence. Search the work of any good writer, and I'm sure you'll find examples of &lt;i&gt;both&lt;/i&gt; kinds of usage.&lt;br /&gt;
&lt;br /&gt;
And then there's the second item: "One of the men forgot to bring (his, their) tools." Both possibilites are acceptable English, &lt;a href="http://itre.cis.upenn.edu/~myl/languagelog/archives/002748.html"&gt;and have been for a long time&lt;/a&gt;.
&lt;br /&gt;
&lt;br /&gt;
The most depressing thing about this grammar assessment is where the researchers say they got it.&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
This assessment was adapted from a ninth-grade grammar review test.&lt;/blockquote&gt;
I'm reminded of a piece I read called &lt;a href="http://linguisticanthropology.org/blog/2011/09/22/ebonics-the-new-milennium/"&gt;&lt;i&gt;For Ebonics, the New Milennium Is Pretty Much Like the Old One&lt;/i&gt;&lt;/a&gt;, which said: "This suggests to me a catastrophic failure of the public school 'language arts' curriculum: people spend years in various language arts classes and leave with the same 19th-century folk notions that they started with."
&lt;br /&gt;
&lt;hr /&gt;
&lt;br /&gt;
So what have these authors actually found? Well, maybe it's the case that the more people who write in a broader range of contexts for a broader range of purposes, the more the arbitrary, conventionalized aspects of the writing system of English will undergo natural drift. What effect with this have on English grammar, as it is represented in the minds of every day English users? Probably just as much as the current writing system does: a minimal one.&lt;br /&gt;
&lt;br /&gt;
&lt;hr /&gt;
&lt;br /&gt;
And what about my plea to the popular media? Even if someone of note finds this post and reads it, I already know that it won't matter at all. Per &lt;a href="http://val-systems.blogspot.com/2011/12/i-dont-think-its-linguists-fault.html"&gt;my commentary on the coverage on vocal fry&lt;/a&gt;, no one is going to report on this piece because they care about science or facts. This research fits snugly into pre-existing biases about young people and the general decline of society, and frankly, these biases seem to have more to do with why these researchers did the study in the first place than science or facts. And there's is no way that something so trivial as a bunch of experts on language and grammar are about to derail this train of garbage and nonsense.
&lt;br&gt;
&lt;hr /&gt;
&lt;b&gt;UPDATE!&lt;/b&gt; There is, in fact, actual paper on the topic of Instant Messaging and Grammar by Sali Tagliamonte and Derek Denis from 2008 called "&lt;a href="http://americanspeech.dukejournals.org/content/83/1/3.refs"&gt;Linguistic Ruin? LOL! Instant Messaging and Teen Language&lt;/a&gt;." Remember hearing about that in the news? Here's selections from their conclusions.
&lt;blockquote&gt;
In a million and a half words of IM discourse among 71 teenagers, the use of 
short forms, abbreviations, and emotional language is inﬁnitesimally small, 
less than 3% of the data.
&lt;/blockquote&gt;
&lt;blockquote&gt;
Our foray into the IM environment through quantitative sociolinguistic 
analysis, encompassing four areas of grammar and over 20,000 individual 
examples, reveals that IM is ﬁrmly rooted in the model of the extant language,reﬂecting the same structured heterogeneity (variation) and the same 
dynamic, ongoing processes of linguistic change that are currently under 
way in the speech community in which the teenagers live.
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;b&gt;UPDATE!&lt;/b&gt; See also &lt;i&gt;&lt;a href="http://journals.cambridge.org/action/displayAbstract?fromPage=online&amp;aid=7873122"&gt;Enregistering internet language&lt;/a&gt;&lt;/i&gt; by Lauren Squires (2010)</description><link>http://val-systems.blogspot.com/2012/07/teens-and-texting-and-grammar.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><thr:total>5</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-4971638384418447818</guid><pubDate>Wed, 25 Jul 2012 19:11:00 +0000</pubDate><atom:updated>2012-07-25T15:14:40.482-04:00</atom:updated><title>Don't worry, I'm a physicist.</title><description>Today, I came across a science news item from ABC (the Australian Broadcasting Corporation) with the title "&lt;a href="http://www.abc.net.au/science/articles/2012/07/25/3553196.htm" style="font-weight: bold;" target="_blank"&gt;Study opens book on English evolution&lt;/a&gt;." Oh goodness. Here are the opening paragraphs:
&lt;br /&gt;
&lt;blockquote&gt;
A study of 500 years of the English language has confirmed that 'the', 'of' and 'and' are the most frequently printed words in the modern era.&lt;br /&gt;
&lt;br /&gt;
The study, by Slovenian physicist Matjaz Perc, also found the top dozen phrases most-printed in books include "at the end of the", "as a result of the" or "on the part of the".
&lt;/blockquote&gt;
That sound you hear is the stunned silence of linguists everywhere over the fact that you can get into &amp;nbsp;the science news with the primary result that "'the' is the most common English word."&lt;br /&gt;
&lt;br /&gt;
But to be fair, what the author was trying to argue is that the Zipfian distribution of word frequencies is a result of "preferential attachment," where frequent words get more frequent. He tried to demonstrate this by showing that the frequency of a word in a given year is predictive of its frequency in the future, specifically that relatively high frequency words will be even more frequent in the future. &amp;nbsp;They key result is shown in Figure 4 in the paper, &lt;a href="http://rsif.royalsocietypublishing.org/content/early/2012/07/23/rsif.2012.0491.full" target="_blank"&gt;available here&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;


Say what?&lt;/h3&gt;
While that quantitative result may stand, the fact that Perc is a physicist probably contributed to some really bananas statements about language. In the first paragraph, he almost completely conflates human language and &lt;i&gt;written&lt;/i&gt; langauge as being the same thing, and erases the validity and richness of cultures with unwritten languages.
&lt;br /&gt;
&lt;blockquote&gt;
Were it not for books, periodicals and other publications, we would hardly be able to continuously elaborate over what is handed over by previous generations, and, consequently, the diversity and efficiency of our products would be much lower than it is today. Indeed, it seems like the importance of the written word for where we stand today as a species cannot be overstated.&lt;/blockquote&gt;
He also presents some results of English "coming of age" and reaching "greater maturity" around 1800 AD (Figure 3). Finally! It only took us like, what, a thousand years or so?&lt;br /&gt;
&lt;br /&gt;
The discussion section kicks off with the statement
&lt;br /&gt;
&lt;blockquote&gt;
The question ‘Which are the most common words and phrases of the English language?’ alone has a certain appeal [...]&lt;/blockquote&gt;
That may be true for physicists, but for people who are dedicated to studying language (what are they called again?) not so much. Fortunately, his ignorance of linguistics is actually a positive quality of this research!
&lt;br /&gt;
&lt;blockquote&gt;
On the other hand, writing about the evolution of a language without considering grammar or syntax, or even without being sure that all the considered words and phrases actually have a meaning, may appear prohibitive to many outside the physics community. Yet, it is precisely this detachment from detail and the sheer scale of the analysis that enables the observation of universal laws that govern the large-scale organization of the written word. 
&lt;/blockquote&gt;
See, linguists are just too caught up in the details to see the big picture! Fire a linguist and your productivity goes up, amirite?&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;



For real though?&lt;/h3&gt;
But back to the substantive claim of the paper. Is the Zipfian distribution of words due to the rich getting richer? That is, are words like snowballs rolling down a hill? The larger they are, the more additional snow the pick up, the even larger they get. Maybe, but maybe not.&lt;br /&gt;
&lt;br /&gt;
Here's a little experiment that I was told about by Charles Yang, who read about it in a paper by Chomsky that I don't know the reference to. Right now, we're defining "words" as being all the characters between white spaces. But what if we redefined "words" as being all the characters between some other kind of delimiter? The example Charles used was "e". If we treat the character "e" as being the delimiter between words, and we apply this a large corpus, we'll get back "words" like " " and " th" and less frequently "d and was not paralyz". What kind of distribution to these kinds of "words" have?&lt;br /&gt;
&lt;br /&gt;
Well, I coded up this experiment (available here: &lt;a href="https://github.com/JoFrhwld/zipf_by_vowels"&gt;https://github.com/JoFrhwld/zipf_by_vowels&lt;/a&gt;) where I compare the ordinary segmentation of the Brown corpus into words by using white spaces to segmentations using "a", "e", "i", "o" and "u." Here's the resulting log-log plot of the frequencies and ranks of the segmentations.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-elktGw8gWC8/UBA_McHx3CI/AAAAAAAABII/57CDIy9JDE4/s1600/zipf_by_seg.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="363" src="http://2.bp.blogspot.com/-elktGw8gWC8/UBA_McHx3CI/AAAAAAAABII/57CDIy9JDE4/s400/zipf_by_seg.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
It all looks quite Zipfian. So are not only the characters between spaces, but the characters between any arbitrary delimiters subject to a rich-get-richer process? Keep in mind that the definition of "word" as being characters between spaces is relatable to representations in human cognition, the definition of "word" as characters between arbitrary delimiters is not, especially not with English's occasionally&amp;nbsp;idiosyncratic&amp;nbsp;orthography.&lt;br /&gt;
&lt;br /&gt;
Maybe it's possible for the results of my little experiment to be parasitic on a larger rich-get-richer process operating over normal words, but for now I'm dubious.</description><link>http://val-systems.blogspot.com/2012/07/dont-worry-im-physicist.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/-elktGw8gWC8/UBA_McHx3CI/AAAAAAAABII/57CDIy9JDE4/s72-c/zipf_by_seg.png" height="72" width="72" /><thr:total>2</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-2231812523086984026</guid><pubDate>Tue, 10 Jul 2012 16:51:00 +0000</pubDate><atom:updated>2012-07-10T12:51:34.540-04:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">visualization</category><category domain="http://www.blogger.com/atom/ns#">jags</category><category domain="http://www.blogger.com/atom/ns#">R</category><title>Visualizing Graphical Models</title><description>I'm anticipating presenting research of mine based on Bayesian graphical models to an audience that might not be familiar with them. When presenting ordinary regression results, there's already the sort of statistical sniper questions along the lines of "What if the effect is actually being driven by this other correlate?" and "That effect might result from assumptions a, b, and c of the test." etc. Sometimes these questions are useful, but sometimes they seem to detract from the substantive issues at hand. And frequently, I see talks get way too bogged down in anticipating questions like this by cramming too much statistical detail into their talk, leaving not enough time to do justice to the theoretical importance of their results.&lt;br /&gt;
&lt;br /&gt;
Add to this the customizability of graphical models, the number of possible distributions and parameter settings, and the notion that "Bayesian" = &amp;nbsp;"subjective", and I'm really feeling stressed out by the presentational task ahead of me.&lt;br /&gt;
&lt;br /&gt;
So, I'm trying to figure out a good way to both make the model I've built fully available and accessible to someone who can't read JAGS code, has a little bit of presentational pizzaz, and also allows me to focus in on the parameters of specific interest. I started off trying to use &lt;a href="http://www.graphviz.org/" target="_blank"&gt;Graphviz&lt;/a&gt;&amp;nbsp;to produce directed graphs, and wound up with this (an actual level in the model I'm hoping to present).&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-QxGh5r_7ma4/T_xZNx2Bx_I/AAAAAAAABHw/W_xQam_YGos/s1600/MyGraph10.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="302" src="http://3.bp.blogspot.com/-QxGh5r_7ma4/T_xZNx2Bx_I/AAAAAAAABHw/W_xQam_YGos/s400/MyGraph10.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&amp;nbsp;It's all a ton of spaghetti, difficult to hilight the particular parameters of interest, and doesn't represent some important distinctions (like stochastic and deterministic nodes).&lt;br /&gt;
&lt;br /&gt;
I've moved on from Graphiz to trying to build an interactive tree diagram using the &lt;a href="http://thejit.org/" target="_blank"&gt;Javascript InfoViz Toolkit&lt;/a&gt;. It's been kind of slow going, since I don't know any Javascript, and am still trying to sort out what functions are basic and which ones are defined by the toolkit. Click on the image below to visit the visualization.&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://jofrhwld.github.com/misc/models_viz/model.html" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="400" src="http://4.bp.blogspot.com/-7UzV4e0nHrE/T_xaLPqAlGI/AAAAAAAABH4/gANe88Is-w0/s400/Screen+Shot+2012-07-10+at+12.08.27+PM.png" width="398" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
It's getting there, but I'm not convinced yet that it'll do the job of making the whole model digestible. For one, I'm modeling effects at a few different levels. The token level is represented in this visualization, but I'm also looking at speaker level effects, treating the linguistic context as a within speaker variable, and at word level effects. The way I'm setting things up now, that's going to call for two more trees like this one.&lt;br /&gt;
&lt;br /&gt;
Maybe the lesson here is that I should just fit and present a simpler model, but remember those sniper questions? I'm worried that if I leave out someone's favorite correlate, I'll 1) have to deal with it in the questions and 2) they'll leave unconvinced, or rather, they'll leave convinced that it was their favorite correlate doing the work all along. But these are really research anxieties that no visualization toolkit on earth could assuage.</description><link>http://val-systems.blogspot.com/2012/07/visualizing-graphical-models.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-QxGh5r_7ma4/T_xZNx2Bx_I/AAAAAAAABHw/W_xQam_YGos/s72-c/MyGraph10.png" height="72" width="72" /><thr:total>7</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-368740487353992735</guid><pubDate>Sun, 08 Jul 2012 16:25:00 +0000</pubDate><atom:updated>2012-07-08T15:44:45.241-04:00</atom:updated><title>On "Welcome to the Internet"</title><description>I interrupt the regularly scheduled (linguistics/data/stats) programming to bring you a special message about a topic which has been really bothering me. This blog is my primary venue for writing publicaly about anything, so even though &lt;a href="http://www.kickstarter.com/projects/566429325/tropes-vs-women-in-video-games" target="_blank"&gt;Anita Sarkeesian's project on Tropes vs. Women in Video Games&lt;/a&gt; doesn't fit into any of my usual topics, I'm going to write about it here.&lt;br /&gt;
&lt;hr /&gt;
I think most people will have heard about what's going on here. Anita Sarkeesian puts together an excellent video series called &lt;a href="http://www.feministfrequency.com/" target="_blank"&gt;Feminist Frequency&lt;/a&gt; which offers accesible feminist critiques of movies, TV shows, etc. She set up a Kickstarter project to help fund research and production of a new video series called &lt;a href="http://www.kickstarter.com/projects/566429325/tropes-vs-women-in-video-games" target="_blank"&gt;Tropes vs. Women in Video Grames&lt;/a&gt;. The project was a great success, raising over 26x the original goal, but the backlash from people on the internet has been really vile. You can look over a summary of links about the issue &lt;a href="http://www.feministfrequency.com/2012/06/kickstarter-project-funded-with-6967-backers/" target="_blank"&gt;here&lt;/a&gt;.&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
I'm not writing about how vile I think the backlash is. Instead I'm writing about how much some people's reactions to the backlash have bothered me. I've read some of these online, and had them come up in conversation. They fall into a few categories.&lt;/div&gt;
&lt;br /&gt;
&lt;h2&gt;








"We can disagree without being disagreeable"&lt;/h2&gt;
I have not heard one person in a respectable forum defend the backlash against Sarkeesian. However, I have heard a lot of "you might disagree with what she says, but you can do so in a civil manner." But at this moment, nobody can disagree with what Sarkeesian says, because she has not, in fact, said it yet. The whole backlash is not against what she said about misogyny in video games, but rather against her stated intention to say &lt;i&gt;anything&lt;/i&gt; about misogyny in video games. What we are looking at is simply unvarnished hatred, and its exponents cannot make pretentions to having intellectual differences of opinions. That would require careful consideration of Sarkeesian's points, which again, is impossible, because she hasn't even had the opportunity to put them forward yet.
&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;






"Welcome to the internet"&lt;/h2&gt;
I've heard more than one person say "welcome to the internet" about the harassment Sarkeesian is experiencing. As if what is happening to her just happens to everybody. A porn bot following you on twitter is a "welcome to the internet" moment. A spam comment on your blog including links to purportedly cheap viagra is a "welcome to the internet" moment. What we're observing with this backlash is not a "welcome to the internet" moment.
&lt;br /&gt;
&lt;br /&gt;
Even if we limit the discussion to the trolling comments on her blog and YouTube pages, the magnitude and intensity of the comments are already far beyond the average person's experience. And as &lt;a href="http://www.illdoctrine.com/2012/06/why_you_should_feed_the_trolls.html"&gt;Jay Smooth pointed out&lt;/a&gt;, it's also the case that members of marginalized groups tend to have a much worse experience with trolling like this. So this isn't just your plain vanilla internet, it's one that is especially bad for for people who are already marginalized IRL.&lt;br /&gt;
&lt;br /&gt;
But we can't really limit the discussion to high volume trollish comments. We have to also bring in &lt;a href="http://www.feministfrequency.com/2012/06/harassment-and-misogyny-via-wikipedia/" target="_blank"&gt;the vandalism of her Wikipedia page&lt;/a&gt;, which included adding a lot of porn. &lt;a href="http://www.feministfrequency.com/2012/07/image-based-harassment-and-visual-misogyny/" target="_blank"&gt;We also have to bring in the meme-ification of her image with the goal of specifically attacking her in specifically sexual ways&lt;/a&gt;. We need to bring in the fact that people are sending her explicit threats of rape and violence. And we also need to bring in the creation of &lt;a href="http://www.escapistmagazine.com/news/view/118310-Flash-Game-Makes-Players-Beat-Up-Tropes-vs-Women-Creator" target="_blank"&gt;a flash game that invited the player to beat Sarkeesian's face in&lt;/a&gt;. This last one is especially disturbing to me, because I've been reading a lot of guys talking about how much they want to hit her. To quote YouTuber MundaneMatt (linked &lt;a href="http://www.youtube.com/watch?v=5-H2fsz8vgE&amp;feature=player_detailpage#t=41s" rel="nofollow" target="_blank"&gt;here&lt;/a&gt; just to provide substantiating evidence, I wouldn't advise visiting it):&lt;br /&gt;
&lt;blockquote&gt;
She's got those eyes that make you just want to punch her in the face.
&lt;/blockquote&gt;
And to quote a user's review on Destructoid of the flash game (I'm not even linking to it this time):
&lt;blockquote&gt;
The voice acting isn’t the best at riling up the player, especially as her videos do this quickly anyway. 
&lt;/blockquote&gt;
We are far far outside the realm of "welcome to the internet" and deep into the very dark, very real topic of silencing women with rape and violence.&lt;br /&gt;
&lt;br /&gt;
And of course, there's the internet vigilantism. Her site has been DDoS-ed, there have been attempted hacks of her e-mail and various social networks, and she's been dox-ed (her personal address and telephone number posted online). This is the kind of treatment reserved for people dubbed&amp;nbsp;villains&amp;nbsp;by the internet. It is more than atypical, it is specifically reserved for the worst of the worst. By no means is it "welcome to the internet." And what did she do worthy of being treated like such a villain?&lt;br /&gt;
&lt;br /&gt;
I think it is justified, given the evidence, to say that what is happening to Anita Sarkeesian is uniquely bad, and it is happening to her because she is a woman.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;



The Mos Eisley Gambit&lt;/h2&gt;
Closely related to "welcome to the internet" is the Mos Eisley Gambit, which is simply stating that on the internet at large (and in YouTube comments specifically) "you will never find a more wretched hive of scum and villainy." This more and more easily believable the more you read about the Sarkeesian backlash.&lt;br /&gt;
&lt;br /&gt;
But, I'm sorry, don't a lot of the same people who deploy the Mos Eisley Gambit also have a lot to say about how the internet is the future of free and open discourse? Wasn't there a whole collective kumbaya moment just a few months ago where "the internet defeated SOPA"? Wasn't the whole SOPA thing a backlash against the possibility government censorship? Isn't the goal of the backlash against Sarkeesian to censor her? You can't have it both ways. You can't go around hailing the internet as a revolutionary space for free communication (a human right even) that must be protected at all costs, &lt;i&gt;and&lt;/i&gt;&amp;nbsp;be so flip about what's happening to Sarkeesian.&lt;br /&gt;
&lt;br /&gt;
And what's more, the residents of this hive of scum and&amp;nbsp;villainy&amp;nbsp;don't &lt;i&gt;actually&lt;/i&gt; live in the internet. The trolls, vandals and harrasers are not internet pixies, they are real actual people. The images of Sarkeesian's likeness being raped by video game characters didn't just pop into existence of their own accord. A person, someone's next door neighbor, son, brother, sat down and spent time drawing the damn thing, and e-mailed it to her. The hive of scum and villainy is actually the real world we're all living in, and it's just reflected in the internet.&amp;nbsp;Trolls are people too, and that's exactly the problem.&amp;nbsp;You don't get away from the racist YouTube commenters by going outside, you ride the bus with them. Which is why, I think, hateful trolling is a worthwhile thing to worry about. It's not just about silly things that happen on the internet. It's about the attitudes and actions of real people who we all interact with every day.</description><link>http://val-systems.blogspot.com/2012/07/on-welcome-to-internet_08.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><thr:total>1</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-8766877106131637183</guid><pubDate>Thu, 05 Jul 2012 00:02:00 +0000</pubDate><atom:updated>2012-07-12T23:11:57.447-04:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">linguistics</category><category domain="http://www.blogger.com/atom/ns#">question</category><title>Question: Work on -ly-less adverbs</title><description>I think I'm going to ask general information gathering &amp;nbsp;questions that I have about linguistics research here on my blog, rather than as Facebook or Twitter posts. Then, I can add the answers I get back to the post.
&lt;br /&gt;
&lt;hr /&gt;
What research is there on -ly-less adverbs? I think the most common one that comes up is "personal," as in
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Don't take it personal.&lt;/li&gt;
&lt;/ul&gt;
Here are two more real life examples (the second one I heard just today, hence the question):
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;I go to South Jersey occasional.&lt;/li&gt;
&lt;li&gt;I need a cigarette desperate.&lt;/li&gt;
&lt;/ul&gt;
I have some vague intuitions about restrictions on the -ly-less forms. Specifically, I think they're only possible post-verbally, so
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;*I personal took it.&lt;/li&gt;
&lt;/ul&gt;
And I doubt we'd ever see it with a sentential adverb, like
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;*Hopeful, we'll find an answer.&lt;/li&gt;
&lt;li&gt;*We'll find an answer, hopeful.&lt;/li&gt;
&lt;/ul&gt;
But then, I don't really trust my intuitions, because I would have also rejected the "occasional" and "desperate" sentences above, which I heard come out of real people's mouths.
&lt;br /&gt;
&lt;br /&gt;
So, anyone know of any research on the topic?

&lt;br /&gt;
&lt;hr /&gt;
&lt;b&gt;Update&lt;/b&gt;&lt;br /&gt;
People came through for me! First, Mercedes Durham pointed me in the right direction on Twitter.
&lt;br /&gt;
&lt;blockquote class="twitter-tweet" data-in-reply-to="220670204047405059"&gt;
@&lt;a href="https://twitter.com/JoFrhwld"&gt;JoFrhwld&lt;/a&gt; Hoping I'm not the 10th to say this: Tagliamonte and Ito in JSoc 2002 looked at it in York UK + Opdahl's 2 books r mentioned there&lt;br /&gt;
— Mercedes D (@drswissmiss) &lt;a data-datetime="2012-07-05T06:35:56+00:00" href="https://twitter.com/drswissmiss/status/220767954671894528"&gt;July 5, 2012&lt;/a&gt;&lt;/blockquote&gt;
&lt;script charset="utf-8" src="//platform.twitter.com/widgets.js"&gt;
&lt;/script&gt;
The Tagliamonte and Ito paper provides a great introduction to the topic of &lt;i&gt;-ly~ø&lt;/i&gt; variation in adverbs. First, in the long view of history, the &lt;i&gt;-ly&lt;/i&gt;&amp;nbsp;adverbs are the innovation creeping in, not the zero forms. Here's how I understand it worked. There used to be a morpheme -&lt;i&gt;lic&lt;/i&gt;&amp;nbsp;which was used to create adverbs from nouns.&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;i&gt;friend + lic&lt;/i&gt;&lt;/li&gt;
&lt;li&gt;&lt;i&gt;man + lic&lt;/i&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;
And there was a separate morpheme &lt;i&gt;-e&lt;/i&gt;&amp;nbsp;that created adverbs from adjectives.&lt;/div&gt;
&lt;div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;i&gt;direct + e&lt;/i&gt;&lt;/li&gt;
&lt;li&gt;&lt;i&gt;open + e&lt;/i&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;
Sometimes you'd get them stacking on top of each other&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;i&gt;friend + lic + e&lt;/i&gt;&lt;/li&gt;
&lt;li&gt;&lt;i&gt;man + lic + e&lt;/i&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;
And sometimes you'd wind up with the -&lt;i&gt;lic+e&lt;/i&gt;&amp;nbsp;morphemes coming together and behaving like one morpheme that turns adjectives into adverbs.&lt;/div&gt;
&lt;div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;i&gt;sweet + lice&lt;/i&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;
This part sounds similar to a more modern situation. We have a morpheme &lt;i&gt;-ate&lt;/i&gt;&amp;nbsp;that turns nouns into verbs.&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;i&gt;assasin + ate&lt;/i&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;
And a morpheme &lt;i&gt;-ion&lt;/i&gt;&amp;nbsp;that turns verbs into nouns, which sometimes stacks on top of &lt;i&gt;-ate.&lt;/i&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;i&gt;delete + ion&lt;/i&gt;&lt;/li&gt;
&lt;li&gt;&lt;i&gt;assasin + ate + ion&lt;/i&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;
But sometimes, we get &lt;i&gt;-ation&lt;/i&gt;&amp;nbsp;coming together and acting like one morpheme that turns verbs into nouns.&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;i&gt;cause + ation &lt;/i&gt;(&lt;i&gt;*causate&lt;/i&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;
Anyway, back to Old English. At some point the little &lt;i&gt;-e&lt;/i&gt;&amp;nbsp;morpheme that turned adjectives into adverbs got lost (probably as part of a larger language change that dropped a lot of word final unstressed &lt;i&gt;e&lt;/i&gt;'s). At that point, adjectives and derived adverbs just all sounded the same. That is, derived adverbs were all zero forms. But then, the fused form &lt;i&gt;-lice&lt;/i&gt;&amp;nbsp;started being used to make adverbs in more places than it used to be, and it eventually changed in pronunciation to modern day &lt;i&gt;-ly.&lt;/i&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
On these historical issues, a lot of ink has been spilled including&lt;a href="http://www.amazon.com/zero-suffix-variation-dual-form-present-day/dp/0820443999" target="_blank"&gt; a whole two volume series on just this case of variation in adverb formation&lt;/a&gt;, and a few &lt;a href="http://books.google.com/books?hl=en&amp;amp;lr=&amp;amp;id=YEF85sWeeEwC&amp;amp;oi=fnd&amp;amp;pg=PA145&amp;amp;dq=english+adverb+formation+variation&amp;amp;ots=2fSYPvYIPK&amp;amp;sig=aTS1WLZ7M4h-5sVcwg7cRLlMe6E#v=onepage&amp;amp;q=english%20adverb%20formation%20variation&amp;amp;f=false" target="_blank"&gt;book&lt;/a&gt; &lt;a href="http://books.google.com/books?hl=en&amp;amp;lr=&amp;amp;id=25yCVi0AA4cC&amp;amp;oi=fnd&amp;amp;pg=PA243&amp;amp;dq=english+adverb+formation+variation&amp;amp;ots=vxC2OEVW-F&amp;amp;sig=zPGTLXh85WtqklRWO8cT8BcTo-c#v=onepage&amp;amp;q=english%20adverb%20formation%20variation&amp;amp;f=false" target="_blank"&gt;chapters&lt;/a&gt;.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Tagliamonte &amp;amp; Ito also provide a lot of cool examples from other studies, like these ones from&amp;nbsp;Appalachian and Ozark English (Christian, Wolfram &amp;amp; Dube, 1988).&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;I come from Virginia original.&lt;/li&gt;
&lt;li&gt;It certain was some reason.&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;
Their own study was on a large corpus of speech from York, Enland. After treating &lt;i&gt;really&lt;/i&gt;&amp;nbsp;separately (they argued the patterns in &lt;i&gt;really&lt;/i&gt;&amp;nbsp;had more to do with its use as a special intensifier and less to do with adverb formation), they found basically no age effects, but working class men strongly favored the zero form compared to everyone else.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
As for language internal effects, they completely excluded preverbal adverbs as being invariantly &lt;i&gt;-ly&lt;/i&gt;&amp;nbsp;forms (per my intuition, but not per that one example above from Appalachian English). After that, the found that the concreteness of the verb had the strongest effect, with concrete verbs favoring the zero form a lot more than abstract verbs.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
I noticed that both the examples that I felt were interesting enough to take a mental note of above involve abstract verbs + zero form adverbs. Maybe the fact that abstract verbs disfavor zero forms is what made them jump out at me.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;a href="https://files.nyu.edu/abs472/public/" target="_blank"&gt;Allison Shapp&lt;/a&gt; pointed me to &lt;a href="https://files.nyu.edu/abs472/public/2011_ShappBlake_NWAV40.pdf" target="_blank"&gt;work she's doing on&amp;nbsp;&lt;i&gt;-ly~ø &lt;/i&gt;variation in American English&lt;/a&gt;, and specifically (if I understood the poster right) African American English. They've found a big effect of education, where higher education favors more &lt;i&gt;-ly&lt;/i&gt;&amp;nbsp;form, and that African American speakers, who are likely to be speakers of African American English, favor the zero form.&lt;br /&gt;
&lt;br /&gt;
So! That was a fruitful information gathering adventure! This is a really cool variable!&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2012/07/question-work-on-ly-less-adjectives.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><thr:total>1</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-2496418349076493536</guid><pubDate>Wed, 20 Jun 2012 22:12:00 +0000</pubDate><atom:updated>2012-06-22T13:21:05.568-04:00</atom:updated><title>Have you been in a Wawa's?</title><description>I would like to pre-empt all discussion here by saying that this blog post is strictly motivated by linguistics, and has no relevance to the US presidential election.&lt;br /&gt;
&lt;br /&gt;
This video of Mitt Romney speaking about his experience in a Wawa has been floating around my newsfeed this week.&lt;br /&gt;
&lt;br /&gt;
&lt;iframe allowfullscreen="" frameborder="0" height="360" src="http://www.youtube.com/embed/Lq5W3gxWLxw?rel=0" width="480"&gt;&lt;/iframe&gt;
&lt;br /&gt;
&lt;i&gt;Full Disclosure: Wawa is my convenience market of choice.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
What strikes me most about this video is the fact that Romney repeatedly said "Wawa's" even though the name of the store is just "Wawa" with no "s." &amp;nbsp;First, there's the linguistic issue of why this was such a natural mistake for Romney to make. Second, there's the sociolinguistic issue about why this particular mistake seems so egregious.&lt;br /&gt;
&lt;br /&gt;
On the first point, there is clearly a strong tendency for store names to be formed in the&amp;nbsp;possessive, indicating their&amp;nbsp;ownership (or at least that's the origin). For example, "Macy's" was founded by  Rowland Hussey Macy, "Wanamaker's" was founded by John Wanamaker, etc. However, not all stores which have names clearly formed in the genitive follow the ordinary orthographic rules for&amp;nbsp;possessives. &amp;nbsp;For example, "Starbucks" is named after the Moby Dick character Starbuck, but the official name doesn't have an apostrophe. Similarly, JCPenney, which today isn't formed in the genitive, used to go by "Penneys" according to this logo from Wikipedia, also lacking the apostrophe.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://upload.wikimedia.org/wikipedia/commons/a/a9/J._C._Penney_Old_Logo.svg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://upload.wikimedia.org/wikipedia/commons/a/a9/J._C._Penney_Old_Logo.svg" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
Perhaps this is some kind of specialized "commercial genitive," I don't know.&lt;br /&gt;
&lt;br /&gt;
At the same time, there are a lot of store names which are &lt;i&gt;not&lt;/i&gt;&amp;nbsp;formed in the genitive for some reason or another. One example someone brought up to me is "Nordstrom," which has no "s" even though it was founded by John Nordstrom. It's a little mysterious to me why this might be, except its original name was "Wallin &amp;amp; Nordstrom" (as in Carl Wallin and John Nordstrom), and coordination structures wreak havoc on everything. A similar story could be told for "Barnes &amp;amp; Noble." &amp;nbsp;In fact, the one kind of business that I know to be named after coordinated personal names are law firms (like "Dewey, Cheatum &amp;amp; Howe") which seem to never be formed in the genitive. There's also &lt;a href="http://www.linguism.co.uk/language/where-do-you-shop"&gt;this blog post from Linguism&lt;/a&gt;&amp;nbsp;which discusses the question of which store names get formed in the genitive and which don't, and he concludes that store names which are originally acronyms, like "Asda" and "Tesco" and foreign imports, like "Aldi" are less likely to be in the genitive.&lt;br /&gt;
&lt;br /&gt;
At the same time, there is also a lot of&amp;nbsp;asymmetric&amp;nbsp;variation. People seem to be likely to form an officially non-genitive store name in the genitive, but not vice versa. How many of you would blink if someone said "I went shopping at Aldi's."? But no one would say "I went into a Starbuck."&lt;br /&gt;
&lt;br /&gt;
&lt;div style="border:1px solid #eee5dd; padding:5px;"&gt;
Update: Ben Zimmer informed me on Twitter that the "Friendly Ice Cream" company officially changed their name to "Friendly's" perhaps because that's what all their customers called them anyway.
&lt;blockquote class="twitter-tweet tw-align-center" data-in-reply-to="215998005201993729"&gt;&lt;p&gt;@&lt;a href="https://twitter.com/GPHemsley"&gt;GPHemsley&lt;/a&gt; @&lt;a href="https://twitter.com/JoFrhwld"&gt;JoFrhwld&lt;/a&gt; @&lt;a href="https://twitter.com/LiteralMinded"&gt;LiteralMinded&lt;/a&gt; Friendly changed to Friendly's and Church Chicken to Church's to match common customer (mis)perceptions.&lt;/p&gt;&amp;mdash; Ben Zimmer (@bgzimmer) &lt;a href="https://twitter.com/bgzimmer/status/216000153604194304" data-datetime="2012-06-22T02:50:24+00:00"&gt;June 22, 2012&lt;/a&gt;&lt;/blockquote&gt;
&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;
&lt;blockquote class="twitter-tweet tw-align-center" data-in-reply-to="216004476564283393"&gt;&lt;p&gt;@&lt;a href="https://twitter.com/JoFrhwld"&gt;JoFrhwld&lt;/a&gt; @&lt;a href="https://twitter.com/gphemsley"&gt;gphemsley&lt;/a&gt; @&lt;a href="https://twitter.com/literalminded"&gt;literalminded&lt;/a&gt; Hmm, I may be wrong about Church's Chicken: &lt;a href="http://t.co/PyLzY6Kj" title="http://bit.ly/aPks92"&gt;bit.ly/aPks92&lt;/a&gt; But Friendly -&amp;gt; Friendly's is legit.&lt;/p&gt;&amp;mdash; Ben Zimmer (@bgzimmer) &lt;a href="https://twitter.com/bgzimmer/status/216006921482797059" data-datetime="2012-06-22T03:17:17+00:00"&gt;June 22, 2012&lt;/a&gt;&lt;/blockquote&gt;
&lt;script src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;
According to &lt;a href="http://www.boston.com/business/gallery/newfriendly/"&gt;this slideshow from the Boston Globe&lt;/a&gt;, the name change happened in 1989.

&lt;/div&gt;&lt;br /&gt;

The point of all this is that Romney was wading into very muddy linguistic waters when he started talking about Wawa, and it's not surprising he screwed it up.&lt;br /&gt;
&lt;br /&gt;
Which brings us to the second point: Why was saying "Wawa's" such a big deal? I just said that I wouldn't blink an eye if someone said "Aldi's" and that's basically the same kind of error. But, and I'm trying to speak here as a Philadelphian and Wawa devotee, not as a partisan hack, when I heard him say "Wawa's" my reaction was "Oh, he doesn't know how it works."&lt;br /&gt;
&lt;br /&gt;
In some ways, my reaction was similar to how I feel when someone screws up the correct use of determiners in proper names. For example, if someone said to me "I looked it up on the Wikipedia," I'd immediately know they were&amp;nbsp;uninitiated&amp;nbsp;to the internet. Similarly, if someone said "&lt;a href="http://www.youtube.com/watch?v=r00IjBdp-ZA"&gt;they were uninitiated to Internet&lt;/a&gt;," I'd immediately know they were hopelessly ignorant.&lt;br /&gt;
&lt;br /&gt;
I think what it comes down to is that where there is variation, there is complexity, and where there is complexity, the ability to successfully navigate complexity the right way is an important social signal that you are the right kind of person. Consider, for example, the needlessly complex language surrounding Twitter, and the communal paroxysm of self satisfaction when a politician says "I sent out a twitter to my followers," or refers to the service as "Tweeter."&lt;br /&gt;
&lt;br /&gt;
I don't think that reaction, or the reaction to Romney saying "Wawa's," is fundamentally different from the dirty word in linguistics: prescriptivism. A lot of prescriptivism is specific&amp;nbsp;discrimination&amp;nbsp;against politically, economically and socially marginalized people, but a lot of it also comes out of nowhere, and just turns into a really complex game that people play for the sake of showing they can play it. So be cautious, fellow linguists, because today's "Wawa's" and "Tweeters" are tomorrow's split&amp;nbsp;infinitives and passive voice.</description><link>http://val-systems.blogspot.com/2012/06/have-you-been-in-wawas.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://img.youtube.com/vi/Lq5W3gxWLxw/default.jpg" height="72" width="72" /><thr:total>6</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-1224619271219417790</guid><pubDate>Mon, 18 Jun 2012 21:21:00 +0000</pubDate><atom:updated>2012-06-19T10:40:03.040-04:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">ggplot2</category><category domain="http://www.blogger.com/atom/ns#">R</category><title>Overplotting solution for black-and-white graphics</title><description>I'm working on producing some black and white graphics of data which has a lot of overplotting. There are three basic groups, which if I made the plot in ordinary full color ggplot2 would look like this (the code for the reverse-log x-axis is available in &lt;a href="https://gist.github.com/2266961"&gt;this gist&lt;/a&gt;, and the code for stat_ellipse() is available in &lt;a href="https://github.com/JoFrhwld/FAAV/blob/master/r/stat-ellipse.R"&gt;this github repository&lt;/a&gt;).&lt;br /&gt;
&lt;script src="https://gist.github.com/2950692.js?file=color.R"&gt;
&lt;/script&gt;

&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-C1wH5zqefoo/T9-Yl45-bdI/AAAAAAAABFo/oLOknGxZuKw/s1600/color.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="298" src="http://3.bp.blogspot.com/-C1wH5zqefoo/T9-Yl45-bdI/AAAAAAAABFo/oLOknGxZuKw/s400/color.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
For a black and white image, however, it's trickier. I don't usually find grey color scales to be sufficiently different for a plot like this, so I'd go for different point shapes. Unfortunately, the default shape scale in ggplot2 isn't very distinct in this case.
&lt;script src="https://gist.github.com/2950692.js?file=bw_default.R"&gt;
&lt;/script&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-AojYaT2k3zw/T9-ZgfvmbzI/AAAAAAAABF0/9L_Z2DffJlA/s1600/default.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="312" src="http://1.bp.blogspot.com/-AojYaT2k3zw/T9-ZgfvmbzI/AAAAAAAABF0/9L_Z2DffJlA/s400/default.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
My first strategy to improve things was to add a custom shape scale, with alternating empty vs solid point shapes.
&lt;script src="https://gist.github.com/2950692.js?file=bw_shape1.R"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-wqqYFyMyg8o/T9-Z8K--ZLI/AAAAAAAABGA/f_iETh2mEKc/s1600/not.so.good.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="313" src="http://4.bp.blogspot.com/-wqqYFyMyg8o/T9-Z8K--ZLI/AAAAAAAABGA/f_iETh2mEKc/s400/not.so.good.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
Better, but not great. All the overplotting of the empty point shapes creates this awful indiscriminate mash in the middle of the clusters.&lt;br /&gt;
&lt;br /&gt;
My solution to this problem was to use filled points. While point shapes 1 and 5 in R correspond to an empty circle and an empty diamond, respectively, point shapes 21 and 23 correspond to a filled circle and a filled diamond, respectively, where the fill color and the border color can be different. So, I used shapes 21 and 23 instead of 1 and 5, and set the fill color to be white.
&lt;script src="https://gist.github.com/2950692.js?file=bw_shape2.R"&gt;
&lt;/script&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-OjcyfRR3D-8/T9-bPume6eI/AAAAAAAABGM/hBlWkwn9BOo/s1600/fig8.ellipse.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="313" src="http://2.bp.blogspot.com/-OjcyfRR3D-8/T9-bPume6eI/AAAAAAAABGM/hBlWkwn9BOo/s400/fig8.ellipse.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
I think it's a big improvement. Here's one more iteration, filling the points with a light grey shade instead of white, just for some aesthetic appeal.
&lt;script src="https://gist.github.com/2950692.js?file=bw_shape3.R"&gt;
&lt;/script&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-NmtClfT5j5M/T9-bnm1-L4I/AAAAAAAABGY/wF0dTJNj1CE/s1600/grey.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="298" src="http://2.bp.blogspot.com/-NmtClfT5j5M/T9-bnm1-L4I/AAAAAAAABGY/wF0dTJNj1CE/s400/grey.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2012/06/overplotting-solution-for-black-and.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-C1wH5zqefoo/T9-Yl45-bdI/AAAAAAAABFo/oLOknGxZuKw/s72-c/color.png" height="72" width="72" /><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-90723469880243078</guid><pubDate>Thu, 17 May 2012 16:44:00 +0000</pubDate><atom:updated>2012-05-17T13:11:27.507-04:00</atom:updated><title>On calculating exponents</title><description>In my post on the decline effect in linguistics, the question came up of how I've calculated the exponents for the Exponential Model in my papers. I think this is a point worth clarifying, but it's not likely to be interesting to a broad audience. You have been forewarned.&lt;br /&gt;
&lt;br /&gt;
To recap as briefly as possible, in English, when a word ends in a consonant cluster, which also ends in a /t/ or a /d/, sometimes that /t/ or /d/ is deleted. This deletion can affect a whole host of different words, but the ones which have been of most interest to the field are the regular&amp;nbsp;&lt;b&gt;past&lt;/b&gt;&amp;nbsp;tense (e.g., &lt;i&gt;packed&lt;/i&gt;), the &lt;b&gt;semiweak&lt;/b&gt;&amp;nbsp;past tense (e.g., &lt;i&gt;kept&lt;/i&gt;) and morphologically simplex words (e.g., &lt;i&gt;pact&lt;/i&gt;), which I'll call &lt;b&gt;mono&lt;/b&gt;. Other morphological cases which can be affected, and which I believe have occasionally and erroneously been categorized with the semiweak are no-change past tense (e.g., &lt;i&gt;cost&lt;/i&gt;), "devoicing" (or something) past tense (e.g., &lt;i&gt;built&lt;/i&gt;), stem changing past tense (e.g., &lt;i&gt;found&lt;/i&gt;), etc. For the sake of this post, I'm only looking at the the main three cases: &lt;b&gt;past&lt;/b&gt;, &lt;b&gt;semiweak&lt;/b&gt;, and &lt;b&gt;mono&lt;/b&gt;.&lt;br /&gt;
&lt;br /&gt;
Now, Guy (1991) came up with a specific proposal where if you described the proportion of pronounced /t d/ for &lt;b&gt;past&lt;/b&gt;&amp;nbsp;as &lt;i&gt;p&lt;/i&gt;, for &lt;b&gt;semiweak&lt;/b&gt;&amp;nbsp;as &lt;i&gt;p&lt;sup&gt;j&lt;/sup&gt;&lt;/i&gt;&lt;b style="font-style: italic;"&gt;&amp;nbsp;&lt;/b&gt;and for &lt;b&gt;mono&lt;/b&gt;&amp;nbsp;as &lt;i&gt;p&lt;sup&gt;k&lt;/sup&gt;&lt;/i&gt;, then &lt;i&gt;j&lt;/i&gt;= 2, and &lt;i&gt;k&lt;/i&gt;&amp;nbsp;= 3. It is specifically whether or not&amp;nbsp;&amp;nbsp;&lt;i&gt;j&lt;/i&gt;= 2 and&amp;nbsp;&lt;i&gt;k&lt;/i&gt;&amp;nbsp;= 3 that I'm interested in here. If you've calculated the proportions of pronounced /t d/ for each grammatical class, you can calculate &lt;i&gt;j&lt;/i&gt;&amp;nbsp;by &lt;sup&gt;log(semiweak)&lt;/sup&gt;⁄&lt;sub&gt;log(past)&lt;/sub&gt; and &lt;i&gt;k&lt;/i&gt;&amp;nbsp;by &lt;sup&gt;log(mono)&lt;/sup&gt;⁄&lt;sub&gt;log(past)&lt;/sub&gt;. The trick is in how you decide to calculate those proportions.&lt;br /&gt;
&lt;br /&gt;
For this post, you can play along at home. Here's code to get set up. It'll load the Buckeye data I've been using, and do some data prep.&lt;br /&gt;
&lt;br /&gt;
&lt;script src="https://gist.github.com/2719770.js?file=load_buckeye.R"&gt;
&lt;/script&gt;
&lt;br /&gt;
So, how do you calculate the rate at which /t d/ are pronounced at the end of the word when you have a big data set from many different speakers? Traditional practice within sociolinguistics has been to just pool all of the observations from each grammatical class across all speakers.&lt;br /&gt;
&lt;script src="https://gist.github.com/2719770.js?file=pool_all_mean.R"&gt;
&lt;/script&gt;&lt;br /&gt;
So you come out with&lt;i&gt;&amp;nbsp;j&lt;/i&gt;&amp;nbsp;= 1.91, &lt;i&gt;k&lt;/i&gt;&amp;nbsp;=&amp;nbsp;3.1, which is a &amp;nbsp;pretty good fit to the proposal of Guy (1991).&lt;br /&gt;
&lt;br /&gt;
The problem is that this isn't really the best way to calculate proportions like this. There are some words which are super frequent, and they therefore get more "votes" in the proportion of their grammatical class. And, some speakers talk more than others, and they get more "votes" towards making the over-all proportions look more similar to their own. One approach to ameliorate this is to first calculate the proportion for each word within a grammatical class within a speaker, then for each grammatical class within a speaker, then within a grammatical class. Here's the code for this nested proportion approach.&lt;br /&gt;
&lt;script src="https://gist.github.com/2719770.js?file=nested_mean.R"&gt;
&lt;/script&gt;&lt;br /&gt;
All of a sudden, we're down to &lt;i&gt;j&lt;/i&gt;&amp;nbsp;= 1.34 and &lt;i&gt;k&lt;/i&gt;&amp;nbsp;= 2.05, and I haven't even dipped into mixed-effects models black magic yet.&lt;br /&gt;
&lt;br /&gt;
But when it comes to modeling the proposal of Guy (1991), calculating the proportions is really just a mean to an end. I &lt;a href="http://stats.stackexchange.com/questions/1261/fitting-a-fixed-exponential-relationship-between-categories-with-categorical-da"&gt;asked Cross Validated&lt;/a&gt;&amp;nbsp;how to directly model &lt;i&gt;j&lt;/i&gt;&amp;nbsp;and &lt;i&gt;k&lt;/i&gt;, and apparently you can do so using a complementary log-log link. So here is the mixed effects model for &lt;i&gt;j&lt;/i&gt;&amp;nbsp;and &lt;i&gt;k&lt;/i&gt;&amp;nbsp;directly.&lt;br /&gt;
&lt;script src="https://gist.github.com/2719770.js?file=exponents_model1.R"&gt;
&lt;/script&gt;
&lt;br /&gt;
The model estimates look very similar to the nested proportions approach, &lt;i&gt;j&amp;nbsp;&lt;/i&gt;= 1.38, &lt;i&gt;k&amp;nbsp;&lt;/i&gt;=&amp;nbsp;2.11.&lt;br /&gt;
&lt;br /&gt;
What if we fit the model without the by-word random intercepts?&lt;br /&gt;
&lt;script src="https://gist.github.com/2719770.js?file=exponents_model2.R"&gt;
&lt;/script&gt;
&lt;br /&gt;
Now we're a bit closer back to the original pooled proportions estimates, &lt;i&gt;j&lt;/i&gt;&amp;nbsp;= 1.57, &lt;i&gt;k&amp;nbsp;&lt;/i&gt;= 3.19.&lt;br /&gt;
&lt;br /&gt;
My personal conclusion from all this is that the apparent &lt;i&gt;j&lt;/i&gt;&amp;nbsp;= 2, &lt;i&gt;k&lt;/i&gt;&amp;nbsp;= 3 pattern is driven mostly by the lexical effects of highly frequent words. This table recaps all of the results, plus the estimates of two more model. One has just a by speaker random intercept, and a flat model, which looks just like the maximum likelihood estimate of the fully pooled approach, because it is.
&lt;center&gt;
&lt;table&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;&lt;th&gt;j&lt;/th&gt;&lt;th&gt;k&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pooled&lt;/td&gt;&lt;td&gt;1.91&lt;/td&gt;&lt;td&gt;3.1&lt;/td&gt;
&lt;tr&gt;
&lt;tr&gt;
&lt;td&gt;Nested&lt;/td&gt;&lt;td&gt;1.34&lt;/td&gt;&lt;td&gt;2.05&lt;/td&gt;
&lt;tr&gt;
&lt;tr&gt;
&lt;td&gt;~Gram+(Gram|Speaker)+(1|Word)&lt;/td&gt;&lt;td&gt;1.38&lt;/td&gt;&lt;td&gt;2.11&lt;/td&gt;
&lt;tr&gt;
&lt;tr&gt;
&lt;td&gt;~Gram+(Gram|Speaker)&lt;/td&gt;&lt;td&gt;1.57&lt;/td&gt;&lt;td&gt;3.19&lt;/td&gt;
&lt;tr&gt;
&lt;tr&gt;
&lt;td&gt;~Gram+(1|Speaker)&lt;/td&gt;&lt;td&gt;1.84&lt;/td&gt;&lt;td&gt;3.14&lt;/td&gt;
&lt;tr&gt;
&lt;tr&gt;
&lt;td&gt;~Gram&lt;/td&gt;&lt;td&gt;1.91&lt;/td&gt;&lt;td&gt;3.1&lt;/td&gt;
&lt;tr&gt;
&lt;/table&gt;
&lt;/center&gt;
&lt;br /&gt;
The lesson is that it can matter a low how you calculate your proportions.</description><link>http://val-systems.blogspot.com/2012/05/on-calculating-exponents.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><thr:total>2</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-1873334258499392994</guid><pubDate>Wed, 16 May 2012 15:03:00 +0000</pubDate><atom:updated>2012-05-16T15:45:36.625-04:00</atom:updated><title>Decline Effect in Linguisics?</title><description>It seems to me that in the past few years, the empirical foundations of the social sciences, especially Psychology, have been coming under increased scrutiny and criticism. For example, there was the New Yorker piece from 2010 called "&lt;a href="http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer"&gt;The Truth Wears Off&lt;/a&gt;" about the "decline effect," or how the effect size of a phenomenon appears to decrease over time. More recently, the Chronicle of Higher Education had a blog post called "&lt;a href="http://chronicle.com/blogs/percolator/is-psychology-about-to-come-undone/29045"&gt;Is Psychology About to Come Undone?&lt;/a&gt;" about the failure to replicate some psychological results.&lt;br /&gt;
&lt;br /&gt;
These kinds of stories are concerning at two levels. At the personal level, researchers want to build a career and reputation around establishing new and reliable facts and principles. We definitely don't want the result that was such a nice feather in our cap to turn out to be wrong! At a more principled level, as scientists, our goal is for our models to approximate reality as closely as possible, and we don't want the course of human knowledge to be diverted down a dead end.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;




Small effects&lt;/h3&gt;
But, I'm a linguist. Do the problems facing psychology face me? To really answer that, I first have to decide which explanation for the decline effect I think is most likely, and I think &lt;a href="http://andrewgelman.com/2010/12/the_truth_wears/"&gt;Andrew Gelman's proposal&lt;/a&gt; is a good candidate:&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
The short story is that if you screen for statistical significance when estimating small effects, you will necessarily overestimate the magnitudes of effects, sometimes by a huge amount.&lt;/blockquote&gt;
&lt;br /&gt;
I've put together &lt;a href="https://gist.github.com/2710902#file_pvalues.r"&gt;some R code&lt;/a&gt; to demonstrate this point. Let's say I'm looking at two populations, and unknown to me as a researcher, there &lt;i&gt;is &lt;/i&gt;a small difference between the two, even though they're highly overlapping. Next, let's say I randomly sample 10 people from each population, do a t-test for the measurement I care about, and write down whether or not the p-value &amp;lt; 0.5 and the estimated size of the difference between the two populations. Then I do this 1000 more times. Some proportion (approximately equal to the power of the test) of the t-tests will have&amp;nbsp;successfully identified a difference. But did those tests which found a significant difference also accurately estimate the &lt;i&gt;size&lt;/i&gt;&amp;nbsp;of the effect?&lt;br /&gt;
&lt;br /&gt;
For the purpose of the simulation, I randomly generated samples from two normal distributions with standard deviations 1, and means 1 and 1.1. I did this for a few different sample sizes, 1000 times each. This figure show how many times larger the estimated effect size was than the true effect for tests which found a significant difference. The size of each point shows the probability of finding a&amp;nbsp;significant&amp;nbsp;difference for a sample of that size.&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-w-sgO7AaUIk/T7LOFYCS3UI/AAAAAAAABD4/YudxQk8UnUE/s1600/pvalue1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="315" src="http://2.bp.blogspot.com/-w-sgO7AaUIk/T7LOFYCS3UI/AAAAAAAABD4/YudxQk8UnUE/s400/pvalue1.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
So, we can see that for small sample sizes, the test has low power. That is, you are not very likely to find a significant difference, even though there &lt;i&gt;is&lt;/i&gt;&amp;nbsp;a true difference (i.e., you have a high rate of Type II error). Even worse, though, is that when the test has "worked," and found a significant difference when there is a true difference, you have both Type M (magnitude) and Type S (sign) errors. For small sample sizes (between 10 and 50 samples each from the two&amp;nbsp;populations), the estimated effect size is between 5 and 10 times greater than the real effect size, and the sign is sometimes flipped!&lt;br /&gt;
&lt;br /&gt;
Taking the approach of just choosing a smaller p-value will help you out insofar as you will be less likely to conclude that you've found a significant difference when there is a true difference (i.e., you ramp up your Type II error rate, by reducing the power of your test), but that doesn't do anything to ameliorate the size of the Type M errors when you &lt;i&gt;do&lt;/i&gt;&amp;nbsp;find a significant difference. This figure facets by different p-value&amp;nbsp;thresholds.&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-Mc3zv9PHqUM/T7LXowdE-MI/AAAAAAAABEE/wzF_qlCxo_U/s1600/pvalue2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="289" src="http://4.bp.blogspot.com/-Mc3zv9PHqUM/T7LXowdE-MI/AAAAAAAABEE/wzF_qlCxo_U/s640/pvalue2.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;h3&gt;





So do I have to worry?&lt;/h3&gt;
So, I think how much I ought to worry about the decline effect in my research, and linguistic research in general, is inversely proportional to the size of the effects we're trying to chase down. If the true size of the effects we're investigating are large, then our tests are more likely to be well powered, and we are less likely to experience Type M errors.&lt;br /&gt;
&lt;br /&gt;
And in general, I don't think the field has exhausted all of our sledgehammer effects. For example, Sprouse and Almeida (2012) [&lt;a href="http://www.socsci.uci.edu/~jsprouse/papers/sprouse.almeida.core.syntax.pdf"&gt;pdf&lt;/a&gt;]&amp;nbsp;successfully&amp;nbsp;replicated somewhere around 98% of the syntactic judgments from the syntax textbook &lt;i&gt;Core Syntax&lt;/i&gt;&amp;nbsp;(Adger 2003) using experimental methods (a pretty good replication rate if you ask me), and in general, &lt;a href="http://ling.auf.net/lingBuzz/001362"&gt;the estimated effect sizes were very large&lt;/a&gt;. So one thing seems clear. Sentence 1 is ungrammatical, and sentences 2 and 3 are grammatical.&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;*What did you see the man who bought?&lt;/li&gt;
&lt;li&gt;Who did you see who bought a cow?&lt;/li&gt;
&lt;li&gt;Who saw the man who bought a cow?&lt;/li&gt;
&lt;/ol&gt;
And the difference in acceptability between these sentences is &lt;i&gt;not&lt;/i&gt; getting smaller over time due to the decline effect. The explanatory theories for &lt;i&gt;why&lt;/i&gt;&amp;nbsp;sentence 1 isn't grammatical may change, and who knows, maybe the field will decide at some point that its ungrammaticality is no longer a fact that needs to be explained, but the fact that it is ungrammatical is not a moving target.
&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;







Maybe I do need to worry&lt;/h3&gt;
However, there is one phenomenon that I've looked at that I think has been following a decline effect pattern: the exponential pattern in /t d/ deletion. For reasons that I won't go into here, Guy (1991) proposed that if the rate at which a word final /t/ or /d/ is pronounced in past tense forms like &lt;i&gt;packed&lt;/i&gt; is given as p, the rate at which it is pronounced in semi-irregular past tense forms like &lt;i&gt;kept&lt;/i&gt; is given as p&lt;sup&gt;&lt;i&gt;j&lt;/i&gt;&lt;/sup&gt;, and the rate at which it is pronounced in regular words like &lt;i&gt;pact&lt;/i&gt; is given as p&lt;sup&gt;&lt;i&gt;k&lt;/i&gt;&lt;/sup&gt;, then &lt;i&gt;j&lt;/i&gt; = 2, &lt;i&gt;k&lt;/i&gt; = 3.
&lt;br /&gt;
&lt;br /&gt;
Here's a table of studies, and their estimates of &lt;i&gt;j&lt;/i&gt; and &lt;i&gt;k&lt;/i&gt;, plus some confidence intervals. See &lt;a href="https://gist.github.com/2710902#file_td_deletion_estimates.r"&gt;this code&lt;/a&gt; for how I calculated the confidence intervals.
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;center&gt;
&lt;table rules="rows"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th&gt;Study&lt;/th&gt;&lt;th&gt;Year&lt;/th&gt;&lt;th&gt;Dialect&lt;/th&gt;&lt;th&gt;&lt;i&gt;j&lt;/i&gt;&lt;/th&gt;&lt;th&gt;&lt;i&gt;k&lt;/i&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Guy&lt;/td&gt;&lt;td&gt;1991&lt;/td&gt;&lt;td&gt;White Philadelphia&lt;/td&gt;&lt;td&gt;&lt;table&gt;
                                   &lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;4.74&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;2.37&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;1.17&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;&lt;td&gt;&lt;table&gt;
                                   &lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;4.26&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;2.75&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;1.86&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Santa Ana&lt;/td&gt;&lt;td&gt;1992&lt;/td&gt;&lt;td&gt;Chicano Los Angeles&lt;/td&gt;&lt;td&gt;&lt;table&gt;
                                   &lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;2.29&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;1.76&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;1.35&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;&lt;td&gt;&lt;table&gt;
                                   &lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;3.39&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;2.91&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;2.51&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bayley&lt;/td&gt;&lt;td&gt;1994&lt;/td&gt;&lt;td&gt;Tejano San Antonio&lt;/td&gt;&lt;td&gt;&lt;table&gt;
                                   &lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;2.08&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;1.51&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;1.11&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;&lt;td&gt;&lt;table&gt;
                                   &lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;3.59&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;2.99&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;2.52&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tagliamonte &amp;amp; Temple&lt;/td&gt;&lt;td&gt;2005&lt;/td&gt;&lt;td&gt;York, Northern England&lt;/td&gt;&lt;td&gt;&lt;table&gt;
                                   &lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;1.85&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;1.12&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;0.66&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;&lt;td&gt;&lt;table&gt;
                                   &lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;1.96&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;1.43&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;1.04&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Smith &amp;amp; Durham &amp;amp; Fortune&lt;/td&gt;&lt;td&gt;2009&lt;/td&gt;&lt;td&gt;Buckie, Scotland&lt;/td&gt;&lt;td&gt;&lt;table&gt;
                                   &lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;1.36&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;0.64&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;0.24&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;&lt;td&gt;&lt;table&gt;
                                   &lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;3.59&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;2.33&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;1.53&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fruehwald&lt;/td&gt;&lt;td&gt;2012&lt;/td&gt;&lt;td&gt;Columbus, OH&lt;/td&gt;&lt;td&gt;&lt;table&gt;
                                   &lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;2.48&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;1.38&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;0.76&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;&lt;td&gt;&lt;table&gt;
                                   &lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;2.35&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;1.93&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;1.59&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/center&gt;&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
I should say right off the bat that all of these studies are not perfect replications of Guy's original study. They have different sample sizes, coding schemes, and statistical approaches. Mine, in the last row, is probably the most divergent, as I directly modeled and estimated the reliability of &lt;i&gt;j&lt;/i&gt;&amp;nbsp;and &lt;i&gt;k&lt;/i&gt;&amp;nbsp;using a mixed effects model, while the others calculated p&lt;sup&gt;&lt;i&gt;j&lt;/i&gt;&lt;/sup&gt;&amp;nbsp;and&amp;nbsp;p&lt;sup&gt;&lt;i&gt;k&lt;/i&gt;&lt;/sup&gt;&amp;nbsp;and compared them to the maximum likelihood estimates for words like &lt;i&gt;kept&lt;/i&gt;&amp;nbsp;and &lt;i&gt;pact&lt;/i&gt;.&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
But needless to say, estimates of &lt;i&gt;j&lt;/i&gt;&amp;nbsp;and &lt;i&gt;k &lt;/i&gt;have not hovered nicely around 2 and 3.&amp;nbsp;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2012/05/decline-effect-in-linguisics.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/-w-sgO7AaUIk/T7LOFYCS3UI/AAAAAAAABD4/YudxQk8UnUE/s72-c/pvalue1.png" height="72" width="72" /><thr:total>26</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-4469695046477443887</guid><pubDate>Thu, 19 Apr 2012 17:10:00 +0000</pubDate><atom:updated>2012-04-19T13:11:31.712-04:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">usage</category><category domain="http://www.blogger.com/atom/ns#">linguistics</category><category domain="http://www.blogger.com/atom/ns#">peeving</category><title>Come and see</title><description>Yesterday, as a pre-amble to an ordinary newsletter sent out via listserv to most PhD students at UPenn, we were offered this piece of advice:&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
Tip of the day: You should all know this by now: It is incorrect to say “come and see” or “come out and help”, or any other “come…and…” phrase.  It is an infinitive phrase: “Come to see”, “Come out to help”, “Come to have fun”.  Don’t aggravate anyone’s pet peeves; just write and say it correctly.  You’re welcome.&lt;/blockquote&gt;
Well, many of us linguistics graduate students felt this merited some kind of response. I don't know about other linguists out there, but if someone said this to me in a personal e-mail, or in conversation, I couldn't &lt;i&gt;not&lt;/i&gt;&amp;nbsp;respond.&lt;br /&gt;
&lt;br /&gt;
And then, an amazing thing happened. We started drafting a letter in a Google document with 16 contributors. It was a litte chaotic, but we marshaled together intuitions, data, and argumentation, and had drafted this message in about an hour's time.&lt;br /&gt;
&lt;blockquote&gt;
To whom it may concern:&lt;br /&gt;
&lt;br /&gt;
We were recently sent a grammar “tip” via the [&lt;i&gt;redacted&lt;/i&gt;]&lt;i&gt;&amp;nbsp;&lt;/i&gt;listserv which read:&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
Tip of the day: You should all know this by now: It is incorrect to say “come and see” or “come out and help”, or any other “come…and…” phrase.  It is an infinitive phrase: “Come to see”, “Come out to help”, “Come to have fun”.  Don’t aggravate anyone’s pet peeves; just write and say it correctly.  You’re welcome.&lt;/blockquote&gt;
The linguistics graduate students felt that this required a response, as in fact, the cited examples “come and see” and “come out and help” are both grammatical and widely used constructions in American English.&lt;br /&gt;
&lt;br /&gt;
The two constructions differ slightly in meaning. If one says,&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Mary came and saw Tupac’s hologram perform.&lt;/li&gt;
&lt;/ul&gt;
it must be the case that the performance actually occurred; it cannot be the case that there were technical difficulties and the performance was cancelled. However,&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Mary came to see Tupac’s hologram perform.&lt;/li&gt;
&lt;/ul&gt;
admits the possibility that the performance was cancelled due to technical difficulties. Therefore, asserting that the infinitive phrase is a uniformly appropriate replacement for the conjoined phrase is not an appropriate representation of the linguistic facts.&lt;br /&gt;
&lt;br /&gt;
Phrases like “come and see” are not restricted to the spoken idiom, but are also used in the written language. They even occur in texts considered by some to be canonical, as the following examples show:&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
He saith unto them, “Come and see”. (John 1:39, King James Bible)&amp;nbsp;&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote class="tr_bq"&gt;
“Then you may come and see the picture”. (Merry Wives of Windsor II:II, William Shakespeare)&amp;nbsp;&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote class="tr_bq"&gt;
“Will you come and see me?” (Pride &amp;amp; Prejudice, chap. 26, Jane Austen)&lt;/blockquote&gt;
Generally, grammatical prescriptivism contributes little to useful discourse, and may even cause intelligent language users to be unfairly stigmatized. Thus, while we appreciate [&lt;i&gt;redacted&lt;/i&gt;]'s light-hearted "tips-of-the-day," we would encourage authors to keep an open mind about the breadth of possible language use, especially in public forums.&lt;br /&gt;
&lt;br /&gt;
Sincerely,&lt;br /&gt;
&lt;br /&gt;
Jana Beck*&lt;br /&gt;
Claire Crawford*&lt;br /&gt;
[&lt;i&gt;redacted&lt;/i&gt;]*&lt;br /&gt;
Sabriya Fisher*&lt;br /&gt;
Aaron Freeman*&lt;br /&gt;
Lauren Friedman*&lt;br /&gt;
Josef Fruehwald*&lt;br /&gt;
Kyle Gorman*&lt;br /&gt;
Marielle Lerner*&lt;br /&gt;
Caitlin Light*&lt;br /&gt;
Laurel MacKenzie*&lt;br /&gt;
Brittany McLaughlin*&lt;br /&gt;
Hilary Prichard*&lt;br /&gt;
Kobey Shwayder*&lt;br /&gt;
Jon Stevens*&lt;br /&gt;
[&lt;i&gt;redacted&lt;/i&gt;]*&lt;br /&gt;
&lt;br /&gt;
*Department of Linguistics&lt;/blockquote&gt;
Thinking about it some more, I think at least the past tense "came to see" even has the implicature that either the seeing was unsuccessful, or there is some other more relevant event than the seeing which the speaker is about to tell us about.&lt;br /&gt;
&lt;br /&gt;
Anyway, I think we did a bang up job, and produced a really excellent message, especially considering there were &lt;i&gt;16 authors&lt;/i&gt;!</description><link>http://val-systems.blogspot.com/2012/04/come-and-see.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-4742357495800098547</guid><pubDate>Sat, 14 Apr 2012 21:55:00 +0000</pubDate><atom:updated>2012-04-14T17:56:33.142-04:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">rstudio</category><category domain="http://www.blogger.com/atom/ns#">knitr</category><category domain="http://www.blogger.com/atom/ns#">R</category><title>Linguistic Notation Inside of R Plots!</title><description>So, I've been playing around with learning &lt;a href="http://yihui.name/knitr/"&gt;knitr&lt;/a&gt;, which is a Sweave-like R package for combining LaTeX and R code into one document. There's almost no learning curve if you already use Sweave, and I find a lot of knitr's design and usage to be a lot nicer.&lt;br /&gt;
&lt;br /&gt;
I wasn't going to make a blog post or tutorial about knitr, because the documentation is already pretty good, and &lt;a href="http://yihui.name/knitr/demos"&gt;contains a lot of tutorials&lt;/a&gt;. &amp;nbsp;However, I've just had a major victory in incorporating linguistic notations into plots using knitr, and I just had to share. I'll show you the payoff first, and then include the details.&lt;br /&gt;
&lt;br /&gt;
First, I managed to successfully use IPA characters as plot symbols and legend keys.&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-5FBHWzS3sQY/T4nY-w7DZBI/AAAAAAAABCc/NC77OaiFR-A/s1600/unnamed-chunk-4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="250" src="http://3.bp.blogspot.com/-5FBHWzS3sQY/T4nY-w7DZBI/AAAAAAAABCc/NC77OaiFR-A/s400/unnamed-chunk-4.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
The actual data in the plot is on car fuel economy, but that's not the point. Look at that IPA!
&lt;br /&gt;
&lt;br /&gt;
Then, I tried to expand on the principles that got me the IPA, and look what I produced.&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-WTHze0rAtbs/T4nZmv8bwmI/AAAAAAAABCk/ipfdvF6jpRo/s1600/unnamed-chunk-5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="250" src="http://3.bp.blogspot.com/-WTHze0rAtbs/T4nZmv8bwmI/AAAAAAAABCk/ipfdvF6jpRo/s400/unnamed-chunk-5.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
Yes, that &lt;em&gt;is&lt;/em&gt; a syntax tree overlaid on top of the plot. But why stop there when you could go completely crazy?
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-LsYMqRfgkVw/T4nbWslt-FI/AAAAAAAABCs/2u8kqVo4StM/s1600/unnamed-chunk-7.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="250" src="http://2.bp.blogspot.com/-LsYMqRfgkVw/T4nbWslt-FI/AAAAAAAABCs/2u8kqVo4StM/s400/unnamed-chunk-7.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;h3&gt;

How to do it.&lt;/h3&gt;
The important thing about making these plots is that they were easy given my pre-existing knowledge of R, LaTeX and what I've learned about knitr. &amp;nbsp;The crucial element here is that knitr supports tikz graphics. I don't know anything about tikz graphics, and I still don't, which means that if you don't know anything about tikz graphics, you can still make plots like these.&lt;br /&gt;
&lt;br /&gt;
Like most linguists who use LaTeX, I already know how to include IPA characters and draw syntactic trees in a LaTeX document. It's simple as&lt;br /&gt;
&lt;pre&gt;...
\usepackage{tipa}
\usepackage{qtree}
...
\textipa{D C P}
\Tree [.S NP VP ]
...
&lt;/pre&gt;
&lt;br /&gt;
What is so cool about the tikz device is that it lets you define these notations in LaTeX syntax, and then incorporates them into R graphs. Here are the important code chunks to include in your knitr document to make it all work.
&lt;br /&gt;
&lt;br /&gt;
&lt;h4&gt;

1 — Load the right R packages&lt;/h4&gt;
Early on, load the ggplot2 and tikzDevice R packages.
&lt;br /&gt;
&lt;div style="overflow: auto;"&gt;
&lt;div class="geshifilter"&gt;
&lt;pre class="r geshifilter-R" style="font-family: monospace;"&gt;&amp;lt;&amp;lt;&amp;gt;&amp;gt;=
    &lt;a href="http://inside-r.org/r-doc/base/library"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;library&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;a href="http://inside-r.org/packages/cran/ggplot2"&gt;ggplot2&lt;/a&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;
    &lt;a href="http://inside-r.org/r-doc/base/library"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;library&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;a href="http://inside-r.org/packages/cran/tikzDevice"&gt;tikzDevice&lt;/a&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;
@&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h4&gt;

2 — Define your LaTeX libraries&lt;/h4&gt;
Then, you need to tell the tikz device which LaTeX packages you want to use.
&lt;br /&gt;
&lt;div style="overflow: auto;"&gt;
&lt;div class="geshifilter"&gt;
&lt;pre class="r geshifilter-R" style="font-family: monospace;"&gt;&amp;lt;&amp;lt;&amp;gt;&amp;gt;=
    &lt;a href="http://inside-r.org/r-doc/base/options"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;options&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;tikzLatexPackages = &lt;a href="http://inside-r.org/r-doc/base/c"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;c&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;a href="http://inside-r.org/r-doc/base/getOption"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;getOption&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;span style="color: blue;"&gt;"tikzLatexPackages"&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt;
                                  &lt;span style="color: blue;"&gt;"&lt;span style="color: #000099; font-weight: bold;"&gt;\\&lt;/span&gt;usepackage{tipa}"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt;
                                  &lt;span style="color: blue;"&gt;"&lt;span style="color: #000099; font-weight: bold;"&gt;\\&lt;/span&gt;usepackage{qtree}"&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;
@&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h4&gt;

3 — Define the plotting elements in LaTeX&lt;/h4&gt;
We're done with the hard part. Now, it's as simple as faking up some data...

&lt;br /&gt;
&lt;div style="overflow: auto;"&gt;
&lt;div class="geshifilter"&gt;
&lt;pre class="r geshifilter-R" style="font-family: monospace;"&gt;&amp;lt;&amp;lt;&amp;gt;&amp;gt;=
    &lt;a href="http://inside-r.org/r-doc/base/levels"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;levels&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;mpg$drv&lt;span style="color: #009900;"&gt;)&lt;/span&gt; &amp;lt;- &lt;a href="http://inside-r.org/r-doc/base/c"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;c&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;span style="color: blue;"&gt;"&lt;span style="color: #000099; font-weight: bold;"&gt;\\&lt;/span&gt;textipa{D}"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt;
                         &lt;span style="color: blue;"&gt;"&lt;span style="color: #000099; font-weight: bold;"&gt;\\&lt;/span&gt;textipa{C}"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt;
                         &lt;span style="color: blue;"&gt;"&lt;span style="color: #000099; font-weight: bold;"&gt;\\&lt;/span&gt;textipa{P}"&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;
&amp;nbsp;
    mpg$tree &amp;lt;- &lt;span style="color: blue;"&gt;"{&lt;span style="color: #000099; font-weight: bold;"&gt;\\&lt;/span&gt;footnotesize &lt;span style="color: #000099; font-weight: bold;"&gt;\\&lt;/span&gt;Tree [.S NP VP ]}"&lt;/span&gt;
@&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h4&gt;

4 — Plot the data using the tikz device&lt;/h4&gt;
...and plotting it, using the tikz device.

&lt;br /&gt;
&lt;div style="overflow: auto;"&gt;
&lt;div class="geshifilter"&gt;
&lt;pre class="r geshifilter-R" style="font-family: monospace;"&gt;&amp;lt;&amp;lt;dev=&lt;span style="color: blue;"&gt;"tikz"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; fig.width=&lt;span style="color: #cc66cc;"&gt;8&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; fig.height=&lt;span style="color: #cc66cc;"&gt;5&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; out.width=&lt;span style="color: blue;"&gt;"0.9&lt;span style="color: #000099; font-weight: bold;"&gt;\\&lt;/span&gt;textwidth"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; fig.align=&lt;span style="color: blue;"&gt;"center"&lt;/span&gt;&amp;gt;&amp;gt;=
    &lt;a href="http://inside-r.org/packages/cran/ggplot"&gt;ggplot&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;mpg&lt;span style="color: #339933;"&gt;,&lt;/span&gt; aes&lt;span style="color: #009900;"&gt;(&lt;/span&gt;displ&lt;span style="color: #339933;"&gt;,&lt;/span&gt; hwy&lt;span style="color: #339933;"&gt;,&lt;/span&gt; label = drv&lt;span style="color: #339933;"&gt;,&lt;/span&gt; color = drv&lt;span style="color: #009900;"&gt;)&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt; + 
            geom_text&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt; + 
            stat_smooth&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;+
            xlab&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;span style="color: blue;"&gt;"&lt;span style="color: #000099; font-weight: bold;"&gt;\\&lt;/span&gt;textipa{IPA!}"&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;    
@&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
Or, in the case of the syntactic trees,



&lt;br /&gt;
&lt;div style="overflow: auto;"&gt;
&lt;div class="geshifilter"&gt;
&lt;pre class="r geshifilter-R" style="font-family: monospace;"&gt;&amp;lt;&amp;lt;dev=&lt;span style="color: blue;"&gt;"tikz"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; fig.width=&lt;span style="color: #cc66cc;"&gt;8&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; fig.height=&lt;span style="color: #cc66cc;"&gt;5&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; out.width=&lt;span style="color: blue;"&gt;"0.7&lt;span style="color: #000099; font-weight: bold;"&gt;\\&lt;/span&gt;textwidth"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; fig.align=&lt;span style="color: blue;"&gt;"center"&lt;/span&gt;&amp;gt;&amp;gt;=
    &lt;a href="http://inside-r.org/packages/cran/ggplot"&gt;ggplot&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;mpg&lt;span style="color: #339933;"&gt;,&lt;/span&gt; aes&lt;span style="color: #009900;"&gt;(&lt;/span&gt;displ&lt;span style="color: #339933;"&gt;,&lt;/span&gt; hwy&lt;span style="color: #339933;"&gt;,&lt;/span&gt; label = &lt;a href="http://inside-r.org/packages/cran/tree"&gt;tree&lt;/a&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;+
            geom_text&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt; + 
            stat_smooth&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;+
            xlab&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;span style="color: blue;"&gt;"TREES"&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;
@&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h4&gt;

5 — Compile the .Rnw to a .tex document&lt;/h4&gt;
Here's some source code to embed these plots in a beamer presentation.
&lt;script src="https://gist.github.com/2387823.js"&gt;
 
&lt;/script&gt;
To compile a .tex document from the .Rnw source, you can run
&lt;br /&gt;
&lt;div style="overflow: auto;"&gt;
&lt;div class="geshifilter"&gt;
&lt;pre class="r geshifilter-R" style="font-family: monospace;"&gt;&lt;a href="http://inside-r.org/r-doc/base/library"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;library&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;knitr&lt;span style="color: #009900;"&gt;)&lt;/span&gt;
knit&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;span style="color: blue;"&gt;"./ling-plot.Rnw"&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
Then, just compile the .tex document however your little heart desires.

&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;

How to do it with one click&lt;/h3&gt;
As if this weren't awesome  and easy enough yet, it's possible to compile the whole document in one click using RStudio, &lt;a href="http://yihui.name/knitr/demo/rstudio/"&gt;as outlined on this knitr page&lt;/a&gt;. You'll need to download &lt;a href="http://www.rstudio.org/download/preview"&gt;the development (i.e. not guaranteed to be stable) RStudio release&lt;/a&gt;, then set the compilation option to use knitr, and you're done!
&lt;br /&gt;
&lt;br /&gt;
I have to say that from &amp;nbsp;a practical standpoint, I've found writing Sweave documents in RStudio to be a much better experience than what I was doing before, because I can run and debug the R code from within the .Rnw source document. No need to go flipping back and forth between a Tex editor and R.&lt;br /&gt;
&lt;br /&gt;
P.S. I highlighted the code above at &lt;a href="http://www.inside-r.org/pretty-r"&gt;http://www.inside-r.org/pretty-r&lt;/a&gt;</description><link>http://val-systems.blogspot.com/2012/04/linguistic-notation-inside-of-r-plots.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-5FBHWzS3sQY/T4nY-w7DZBI/AAAAAAAABCc/NC77OaiFR-A/s72-c/unnamed-chunk-4.png" height="72" width="72" /><thr:total>3</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-9161099933222740939</guid><pubDate>Sat, 31 Mar 2012 18:42:00 +0000</pubDate><atom:updated>2012-04-03T12:26:17.824-04:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">data</category><category domain="http://www.blogger.com/atom/ns#">R</category><category domain="http://www.blogger.com/atom/ns#">murders</category><title>More on Philadelphia Homicide</title><description>I've been doing more analysis of &lt;a href="https://www.google.com/fusiontables/DataSource?snapid=S4035208e94"&gt;the Philadelphia Homicide data&lt;/a&gt; that the Philadelphia Inquirer has published, and presented some of it at the Philadelphia UseR group yesterday. &lt;a href="http://jofrhwld.github.com/talks/UseR/murders/murders.pdf"&gt;My slides&lt;/a&gt; [pdf] and &lt;a href="https://raw.github.com/JoFrhwld/jofrhwld.github.com/master/talks/UseR/murders/murders.Rnw"&gt;source&lt;/a&gt; [knitr .Rnw] are on &lt;a href="https://github.com/JoFrhwld/jofrhwld.github.com/tree/master/talks/UseR/murders"&gt;github&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
I should be clear that I am &lt;i&gt;not&lt;/i&gt; an expert on crime and murder. In fact, I'm not even fairly knowledgeable. If anyone out there with more expertise has strong criticism of my "analysis" (really, it's just a rough exploration of the data), I'll eat it, and I'll look forward to your own analysis of the data (again, it's right &lt;a href="https://www.google.com/fusiontables/DataSource?snapid=S4035208e94"&gt;here&lt;/a&gt;). Here are some of the most striking patterns that I found.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;






Results&lt;/h2&gt;
First, here is the total number of murders that occurred over the past 23 years, broken down by the day of the week. The weekends are worse than the weekdays.&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-us-aOFwWbyo/T3S2lmGn6EI/AAAAAAAABBs/BI8-d_VNJ8I/s1600/weekday.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="250" src="http://4.bp.blogspot.com/-us-aOFwWbyo/T3S2lmGn6EI/AAAAAAAABBs/BI8-d_VNJ8I/s400/weekday.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;/div&gt;
&lt;br /&gt;
Next, here are the total number of murders by hour of the day. The hour of the day was not included in the data until 2006, so this only represents murders between 2006 and 2011. The plot is centered around midnight, so the afternoon of Day 1 is on the left, and the morning of Day 2 is on the right.&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-tcxuCyp2uEw/T3S3fAVWxEI/AAAAAAAABB0/5J1wa5EfVvM/s1600/hour.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="250" src="http://2.bp.blogspot.com/-tcxuCyp2uEw/T3S3fAVWxEI/AAAAAAAABB0/5J1wa5EfVvM/s400/hour.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
It looks like there's something weird going on around 11pm and midnight, which I have to chalk up to the reporting patterns of the PPD. For some reason, it seems like murders which occurred in the midnight hour are more likely to be logged as&amp;nbsp;occurring&amp;nbsp;at 11PM.&lt;br /&gt;
&lt;br /&gt;
Here is the most striking plot that I produced this time around. It plots, by month, the average frequency of murders. The y-axis represents 1 murder every X days.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-Je9F8U26UUM/T3YK-V7AWiI/AAAAAAAABB8/mzqwpH4BTGg/s1600/big-result.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="250" src="http://4.bp.blogspot.com/-Je9F8U26UUM/T3YK-V7AWiI/AAAAAAAABB8/mzqwpH4BTGg/s400/big-result.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
Since 1988, the African American community has been living in a Philadelphia with approximately a murder every day, or every other day. The White community, on the other hand, has been living in a Philadelphia with a murder once a week.&lt;br /&gt;
&lt;br /&gt;
I also did some meager statistical analysis, specifically poisson regression with terms for the month (that is, January, February, etc, to look for a seasonal pattern), race of the victim, and weapon used. There was a significant month effect, but the&amp;nbsp;coefficients&amp;nbsp;didn't have much of a pattern to them. I &lt;i&gt;did&lt;/i&gt;&amp;nbsp;use number of days in the month as an offset in the regression, so it's not that. More importantly, there was an unsurprising main effect of race, but also a big interaction between race and weapon. Specifically, African American victims were way more likely to be killed by a gun.&lt;br /&gt;
&lt;br /&gt;
Guns and knives are the two most common weapons used in murders in the data. &lt;strike&gt;White murder victims are 2.54x more likely to have been shot than stabbed, while an African American murder victim is 7.19x more likely to have been shot than stabbed, meaning that &lt;b&gt;African American murder victims are 2.83x more likely to have been shot than a White murder victim was&lt;/b&gt;.&lt;/strike&gt;&lt;br /&gt;
&lt;b&gt;Update:&lt;/b&gt; There was a pretty serious flaw in my regression, in that if there was a Month where, say, no African Americans were murdered with a knife (and there were plenty), that month's data was missing, rather than 0. Filling in the data appropriately to reflect months with 0 murders for a particular race x weapon combination, the estimates are pretty different. White murder victims are 5.71x times more likely to be murdered with a gun than a knife, while African American murder victims were 8.62x times more likely to be murdered with a gun than a knife, meaning &lt;b&gt;African Americans are 1.51x times more likely to be shot than stabbed&lt;/b&gt;. So, that's a pretty serious revision approximately halving the multiplier. I've already updated the linked code and slides.
&lt;br /&gt;
So, gun deaths are an especially acute problem in the African American community. In fact, if you exclude gun deaths from the data, it actually looks like the racial disparity in murder rates has been &lt;i&gt;narrowing.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-q8ZEEHkWhTM/T3dIXKHjzjI/AAAAAAAABCE/mTSPwnpH76Y/s1600/noguns.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="250" src="http://4.bp.blogspot.com/-q8ZEEHkWhTM/T3dIXKHjzjI/AAAAAAAABCE/mTSPwnpH76Y/s400/noguns.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
It is purely coincidental that I'm posting this on the same day that &lt;a href="http://philadelphia.cbslocal.com/2012/03/30/another-philadelphia-gun-buyback-is-saturday-on-south-broad-street/"&gt;the Philadelphia Police Department are doing a gun buyback&lt;/a&gt;. You can bring in a gun and receive a $100 Shoprite voucher, no questions asked. Seems like a good initiative.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;





Analysis Discussion&lt;/h2&gt;
I spent a bit of time trying to figure out what I thought the most meaningful way to represent the murder rate was. First, I calculated the murder frequency by counting how many &lt;i&gt;n murders&lt;/i&gt; there were a month, then divided that by the number of days in the month for (&lt;i&gt;n murders&lt;/i&gt;/&lt;i&gt;n days&lt;/i&gt;)=&lt;i&gt;murders per day&lt;/i&gt;. But the resulting measure had values like 0.14 &lt;i&gt;murders per day&lt;/i&gt;, which isn't too informative. What people want to know about murders, or at least what I want to know, is how often murders happen, not how many happened in a given time window. So, instead, I calculated (&lt;i&gt;n days&lt;/i&gt;/&lt;i&gt;n murders&lt;/i&gt;)=&lt;i&gt;days per murder&lt;/i&gt;.&lt;br /&gt;
&lt;br /&gt;
The y-axis for the murder rate figures is also a logarithmic scale, which is both reasonable given the distribution of the data, and the impression of the timescale. From a human perspective, the difference between 1 day and 2 days feels larger than the difference between 3 weeks and 4 weeks. The y axis is also flipped, to indicate that smaller numbers mean "more often". &amp;nbsp;I managed the reversed log transformation by writing my own coordinate transformation using the new &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;scales&lt;/span&gt; package. Here's the R code.&lt;br /&gt;
&lt;script src="https://gist.github.com/2266961.js?file=revlog.R"&gt;
&lt;/script&gt;</description><link>http://val-systems.blogspot.com/2012/03/more-on-philadelphia-homicide.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-us-aOFwWbyo/T3S2lmGn6EI/AAAAAAAABBs/BI8-d_VNJ8I/s72-c/weekday.png" height="72" width="72" /><thr:total>7</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-8114860418705172894</guid><pubDate>Wed, 07 Mar 2012 20:35:00 +0000</pubDate><atom:updated>2012-03-09T13:41:37.630-05:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">R</category><title>Philadelphia Schools</title><description>I'm on spring break, and yesterday I took some time to check off some items on my to-do list, namely:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Start getting acquainted with &lt;a href="http://cloud.github.com/downloads/hadley/ggplot2/guide-col.pdf"&gt;all the new features of ggplot2 [PDF]&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Get a handle on dealing with geographic data in R.&lt;/li&gt;
&lt;/ol&gt;
&lt;div&gt;
I've done &lt;a href="http://www.ling.upenn.edu/~joseff/papers/Fruehwald_IGERT2010_poster.pdf"&gt;some furtive geographic analysis using R [pdf]&lt;/a&gt;, but &lt;a href="https://github.com/JoFrhwld/Mapping-Scripts"&gt;the code behind it&lt;/a&gt; was very hacky. There is a whole field of geospatial data analysis out there that I am really ignorant of, and still am, but I've made a little bit of progress.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
I mostly followed &lt;a href="https://github.com/hadley/ggplot2/wiki/plotting-polygon-shapefiles"&gt;the tutorial laid out here&lt;/a&gt; for making maps in ggplot2. The most difficult part was getting the &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;rgdal&lt;/span&gt; package installed. It's one of these packages that relies on other, &amp;nbsp;non-R libraries being installed. I managed to get &lt;a href="http://www.gdal.org/"&gt;GDAL&lt;/a&gt; and &lt;a href="http://trac.osgeo.org/proj/"&gt;Proj.4&lt;/a&gt; installed (even though I honestly don't know what they do,), and got &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;rgdal&lt;/span&gt; installed (I had to work around an apparently non-standard installation location for Proj.4).&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Now, it's all about getting some good data, and fortunately, I stumbled across &lt;a href="http://opendataphilly.org/"&gt;opendataphilly.org&lt;/a&gt; yesterday as well! I found &lt;a href="http://www.opendataphilly.org/opendata/resource/38/schools/"&gt;a shapefile of all schools in Philadelphia&lt;/a&gt;, and a separate data set about&lt;a href="http://www.opendataphilly.org/opendata/resource/79/high-school-graduates-2004-2010/"&gt; how many public and charter high school graduates in 2010 went on to postsecondary education of various sorts&lt;/a&gt;. Unfortunately, there weren't any shared IDs of any sort between the two data sets, so to join them I had to hack it by hand, mostly.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
So, here is the result.&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-hSn75S_7tyg/T1e4Nq1nrfI/AAAAAAAABAY/zlw3D-vPPAc/s1600/postsecondary.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="321" src="http://3.bp.blogspot.com/-hSn75S_7tyg/T1e4Nq1nrfI/AAAAAAAABAY/zlw3D-vPPAc/s400/postsecondary.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
I'm not sure what I expected to see, which certainly weakens any conclusions I'd like to draw, but I am surprised at how little geographic patterning there is. I'm also almost certain that there are some data reporting problems. For example, that huge dark blue dot in the Northeast is Northeast High School, which reports that of their 652 graduates, 0 went on to any postsecondary education. I just don't think that can be true, and not because I'm an idealist. Northeast is right down the street from where I grew up, and while its not a fancy prep school by any means, it has both a Magnet program, and an International Baccalaureatte program.&lt;br /&gt;
&lt;br /&gt;
There's no way that zero students from Northeast went on to postsecondary education, a category which includes non-degree granting programs and specialized training programs. It's a lot more likely that they either didn't report the numbers, or the Pennsylvania Department of Education lost them, and then &lt;a href="http://www.ling.upenn.edu/~joseff/rstudy/summer2010_general_data_structure.html#collection_principles"&gt;didn't distinguish between missing data and 0&lt;/a&gt;. Unfortunately, that calls all schools with reports of 0% postsecondary education into question, even though some schools probably &lt;i&gt;did&lt;/i&gt;&amp;nbsp;have 0 students go on to further education.&lt;br /&gt;
&lt;br /&gt;
Looking at the distribution of the proportion of graduates going on to postsecondary education, the numbers are hugely bimodal (at least for the public schools).&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-1BL3IteZDGg/T1fAKmAnwmI/AAAAAAAABAg/WIvndzFZtuI/s1600/dist1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="178" src="http://2.bp.blogspot.com/-1BL3IteZDGg/T1fAKmAnwmI/AAAAAAAABAg/WIvndzFZtuI/s400/dist1.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
Even after excluding the schools which reported 0 students going on to postsecondary education, there are still 3 schools with basically 0 students getting further education out of high school: Frankford (1/341), &amp;nbsp;West Philly (1/208) and University City (2/205).&lt;br /&gt;
&lt;br /&gt;
Excluding the schools which reported less than 1% of students going on the further education (assuming either that they have faulty data, or have acute problems of other sorts), I replotted the map (note that the colors now run from 50% to 100%).&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-Q5DEhLTLONQ/T1fFLyfgJSI/AAAAAAAABAo/6jsXtYSfXCk/s1600/postsecondary2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="322" src="http://4.bp.blogspot.com/-Q5DEhLTLONQ/T1fFLyfgJSI/AAAAAAAABAo/6jsXtYSfXCk/s400/postsecondary2.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
Still no huge geographic patterns.&lt;br /&gt;
&lt;br /&gt;
Here's the R code that I used (including links to the data).&lt;br /&gt;
&lt;script src="https://gist.github.com/1995932.js"&gt; &lt;/script&gt;</description><link>http://val-systems.blogspot.com/2012/03/philadelphia-schools.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-hSn75S_7tyg/T1e4Nq1nrfI/AAAAAAAABAY/zlw3D-vPPAc/s72-c/postsecondary.png" height="72" width="72" /><thr:total>2</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-8445020867449796729</guid><pubDate>Sun, 04 Mar 2012 06:20:00 +0000</pubDate><atom:updated>2012-03-09T13:57:48.115-05:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">R</category><title>My Pocket Change</title><description>I'm playing around with some personal data collection, and using some cloud computing to visualize it. Following &lt;a href="http://www.r-bloggers.com/embedding-a-live-plot-in-your-website/"&gt;the directions in this blog post&lt;/a&gt;, I've written an R function which visualizes data it draws from a Google Docs spreadsheet, and uploaded it to OpenCPU's servers. The plots you're seeing in this post were actually generated by OpenCPU when you loaded this page, meaning they're &lt;i&gt;live!&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;&lt;br /&gt;&lt;/i&gt;&lt;br /&gt;
So, I've been logging, daily, my pocket change. The first plot shows the cumulative growth of the change in my change jar by 3 different measures, raw number of each kind of coin, total value as contributed by each kind of coin, and total mass contributed by each kind of coin (based on official data on how much each kind of coin &lt;i&gt;should&lt;/i&gt;&amp;nbsp;weigh).&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://beta1.opencpu.org/R/call/store:tmp/4615487596a359629ea7f869bc0c8d19/png?&amp;amp;!width=1050&amp;amp;!height=375" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="227" src="http://beta1.opencpu.org/R/call/store:tmp/4615487596a359629ea7f869bc0c8d19/png?&amp;amp;!width=1050&amp;amp;!height=375" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
This plot shows the proportional contribution each coin makes to each measure. The first panel shows what percent of all my coins belong to each type, the second panel shows how much each coin contributes to the over-all value proportionally, and the third how much each kind of coin contributes to &amp;nbsp;the over-all mass.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://beta1.opencpu.org/R/call/store:tmp/4615487596a359629ea7f869bc0c8d19/png?y=%22p%22&amp;amp;!width=1050&amp;amp;!height=375" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="228" src="http://beta1.opencpu.org/R/call/store:tmp/4615487596a359629ea7f869bc0c8d19/png?y=%22p%22&amp;amp;!width=1050&amp;amp;!height=375" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
So, depending on how long I keep this habit up, if you keep checking in on this post, you'll see new plots every day.&lt;br /&gt;
&lt;br /&gt;
I have two primary motivations for logging my coins. First, last time I cashed in all my change, someone asked me how long it took me to save it up, and I had no idea! Second, I'm curious to see how much effort I'm putting into carrying around relatively heavy coins, like pennies, for their small contribution to the over-all value of my coin jar.</description><link>http://val-systems.blogspot.com/2012/03/my-pocket-change.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-8969700434770834612</guid><pubDate>Fri, 02 Mar 2012 06:54:00 +0000</pubDate><atom:updated>2012-03-09T13:58:06.514-05:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">R</category><title>A terrible 2000 words</title><description>I've only just started looking at the &lt;a href="https://www.google.com/fusiontables/DataSource?snapid=S4035208e94"&gt;homicide data&lt;/a&gt; &lt;a href="http://val-systems.blogspot.com/2012/02/update-on-inquirer-data.html"&gt;made available by the Philadelphia Inquirer&lt;/a&gt; in my free time (which is hard to come by lately). I've been thinking about what sorts of statistics I could do, or what kinds of additional data sets I could merge in, but I think these simple plots already tell a terrible story about what is happening to who.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-vk6d1nfbxqw/T1Br6Uhh_AI/AAAAAAAAA_w/YTsajL4AVL4/s1600/murder_month.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="344" src="http://3.bp.blogspot.com/-vk6d1nfbxqw/T1Br6Uhh_AI/AAAAAAAAA_w/YTsajL4AVL4/s640/murder_month.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-ydu2PfHPiTQ/T1Br8hRam2I/AAAAAAAAA_4/ihoD_XoVq2Q/s1600/murder_age.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="344" src="http://2.bp.blogspot.com/-ydu2PfHPiTQ/T1Br8hRam2I/AAAAAAAAA_4/ihoD_XoVq2Q/s640/murder_age.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
I should point out that for the plot with month on the x-axis is also missing a whole year's worth of data, because apparently in 1991 the day of a reported homicide wasn't recorded.</description><link>http://val-systems.blogspot.com/2012/03/terrible-2000-words.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-vk6d1nfbxqw/T1Br6Uhh_AI/AAAAAAAAA_w/YTsajL4AVL4/s72-c/murder_month.png" height="72" width="72" /><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-3334681183992184695</guid><pubDate>Wed, 29 Feb 2012 14:10:00 +0000</pubDate><atom:updated>2012-03-02T01:35:04.739-05:00</atom:updated><title>Update on Inquirer Data</title><description>Well, I just got word that the Inquirer has decided to make their dataset on homicides in Philadelphia&amp;nbsp;publicly&amp;nbsp;available. Apparently they haven't settled on a general data policy, but this one is now accessible. You can find data on every reported homicide in Philadelphia between Jan 1, 1988 and December 31, 2011 here:&amp;nbsp;&lt;a href="https://www.google.com/fusiontables/DataSource?snapid=S4035208e94"&gt;https://www.google.com/fusiontables/DataSource?snapid=S4035208e94&lt;/a&gt;</description><link>http://val-systems.blogspot.com/2012/02/update-on-inquirer-data.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-3269900292410282247</guid><pubDate>Mon, 20 Feb 2012 16:40:00 +0000</pubDate><atom:updated>2012-02-20T11:41:12.358-05:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">data</category><category domain="http://www.blogger.com/atom/ns#">open data</category><title>Inquirer, Inquirer, let down your data!</title><description>&lt;div class="tr_bq"&gt;
So, I discovered last night that &lt;a href="http://www.philly.com/philly/news/special_packages/inquirer/136746563.html"&gt;the Philadelphia Inquirer&lt;/a&gt; has put together a Google Fusion table containing a record for &lt;i&gt;&lt;a href="https://www.google.com/fusiontables/DataSource?snapid=S363604JIcJ"&gt;every homicide in Philadelphia county since 1988&lt;/a&gt;&lt;/i&gt;. I've used homicide data compiled by the Inquirer before to estimate&lt;a href="http://val-systems.blogspot.com/2010/11/unsafe-for-who.html"&gt; the risk of homicide that normal Philadelphia residents have compared to UPenn affiliates&lt;/a&gt;. With 23 years of data, the possibilities to find all sorts of patterns are enormous. Homicide rate could be compared to economic indices, public policies, or climate even, and we could get some reliable results with a time depth like this!&lt;/div&gt;
&lt;br /&gt;
&lt;b&gt;But&lt;/b&gt;, the ability to export the data was turned off by the owner of the fusion table, by accident I assumed. I wrote to them about it, and apparently it is the Inquirer's policy to not let anyone access the data! They're concerned that someone might alter the data, and attribute it back the Inquirer. Here's the message I sent them when I heard about this.&lt;br /&gt;
&lt;blockquote&gt;
I am a student at Penn, and that's why I'm interested in data generally. But I have no specific interest in the data related to my academic pursuits. I'm merely a concerned and interested Philadelphian who also has some quantitative know how.
&lt;br /&gt;
&lt;br /&gt;
I appreciate the sensitivity of the subject. In my own research, we spend a lot of time anonymizing interviews, and of course, it was a big issue with some of the Wikileaks data distributed by the NYT that it wasn't anonymized enough. However, is there precedent for altered data being hung around the neck of the original compiler? If there were an example case or two, your unease would make more sense to me. As it is though, since you are already maintaining the original data in a (relatively) publicly accessible way, it would be trivial for you, or anyone else, to demonstrate alteration or falsification of data attributed to the Inquirer.&lt;br /&gt;
&lt;br /&gt;
The fact that you're already only distributing something which is publicly available from the PPD makes allowing public access to your compiled version even less risky. There are then two sources to turn to to verify the accuracy of data that someone attributes to the Inquirer.&lt;br /&gt;
&lt;br /&gt;
My interest in this data spawns mostly from the fact that I'm a concerned Philadelphian with the necessary skills to analyze a data set like this. It looks like the Inquirer has done a great public service by compiling this data into a useful format from the various PPD reports. But it has only done so by a half measure so far, because the data is of no use when we can only look at the tables with our eyes. I'm also strongly influenced by the open data movement from within the research world. The best way to assert your confidence in your own research and analyses is to make the data openly available for anyone to recreate your results. Researchers who keep their data private are more and more looked upon with suspicion, and rightly so. The same goes for data journalism.&lt;br /&gt;
&lt;br /&gt;
Moreover, there is a huge opportunity here for the Inquirer too. I am not the only person in Philadelphia who cares about data like this and knows how to analyze it. You have a forum to curate and display analyses and mashups contributed by your readers. The Guardian does something like this with their Data Blog &lt;a href="http://www.guardian.co.uk/news/datablog"&gt;http://www.guardian.co.uk/news/datablog&lt;/a&gt;, but frankly, the data sets they distribute are thin and uninteresting compared to what you could make available.&lt;br /&gt;
&lt;br /&gt;
I hope you reconsider your data policy.&lt;/blockquote&gt;
I'm frankly not too hopeful of a change of heart regarding making the data available. There's sure to be a lot more cases like this, of news organizations jumping onto the data journalism train, without really getting how it's supposed to work.</description><link>http://val-systems.blogspot.com/2012/02/inquirer-inquirer-let-down-your-data.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><thr:total>0</thr:total></item></channel></rss>
