<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" version="2.0"><channel><atom:id>tag:blogger.com,1999:blog-232777626311457607</atom:id><lastBuildDate>Mon, 21 May 2012 06:14:45 +0000</lastBuildDate><category>observed</category><category>pin pen</category><category>media</category><category>animals</category><category>science journalism</category><category>Etc.</category><category>vocal fry</category><category>phonology</category><category>display</category><category>south</category><category>knitr</category><category>AAVE</category><category>monophthongization</category><category>phonetics</category><category>open data</category><category>southwest</category><category>Chris Matthews</category><category>ay</category><category>spoonerism</category><category>n-word</category><category>tonogenesis</category><category>perception</category><category>lost tv media</category><category>l-vocalization</category><category>Louisiana</category><category>first post</category><category>oy</category><category>intelligence</category><category>irene</category><category>mystery</category><category>video</category><category>relaunch</category><category>natural misunderstanding</category><category>Canadian Raising</category><category>Hauser</category><category>rstudio</category><category>probability</category><category>cognition</category><category>overheard</category><category>linguists</category><category>science</category><category>humor</category><category>animal cognition</category><category>individuals</category><category>visualization</category><category>l</category><category>TV</category><category>sound change</category><category>radio</category><category>linguistics</category><category>fill-feel</category><category>observations</category><category>tool</category><category>peeving</category><category>morphology</category><category>graphics</category><category>raising</category><category>language</category><category>Chomsky</category><category>usage</category><category>nature-nurture</category><category>dialect</category><category>language change</category><category>portmanteau</category><category>praat</category><category>murders</category><category>misc.</category><category>positive anymore</category><category>philadelphia</category><category>vowels</category><category>plotting</category><category>design</category><category>language attitudes</category><category>race</category><category>nyc</category><category>data</category><category>short-a</category><category>R</category><category>merger</category><title>Val Systems</title><description /><link>http://val-systems.blogspot.com/</link><managingEditor>noreply@blogger.com (Josef Fruehwald)</managingEditor><generator>Blogger</generator><openSearch:totalResults>83</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/ValSystems" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="valsystems" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-90723469880243078</guid><pubDate>Thu, 17 May 2012 16:44:00 +0000</pubDate><atom:updated>2012-05-17T13:11:27.507-04:00</atom:updated><title>On calculating exponents</title><description>In my post on the decline effect in linguistics, the question came up of how I've calculated the exponents for the Exponential Model in my papers. I think this is a point worth clarifying, but it's not likely to be interesting to a broad audience. You have been forewarned.&lt;br /&gt;
&lt;br /&gt;
To recap as briefly as possible, in English, when a word ends in a consonant cluster, which also ends in a /t/ or a /d/, sometimes that /t/ or /d/ is deleted. This deletion can affect a whole host of different words, but the ones which have been of most interest to the field are the regular&amp;nbsp;&lt;b&gt;past&lt;/b&gt;&amp;nbsp;tense (e.g., &lt;i&gt;packed&lt;/i&gt;), the &lt;b&gt;semiweak&lt;/b&gt;&amp;nbsp;past tense (e.g., &lt;i&gt;kept&lt;/i&gt;) and morphologically simplex words (e.g., &lt;i&gt;pact&lt;/i&gt;), which I'll call &lt;b&gt;mono&lt;/b&gt;. Other morphological cases which can be affected, and which I believe have occasionally and erroneously been categorized with the semiweak are no-change past tense (e.g., &lt;i&gt;cost&lt;/i&gt;), "devoicing" (or something) past tense (e.g., &lt;i&gt;built&lt;/i&gt;), stem changing past tense (e.g., &lt;i&gt;found&lt;/i&gt;), etc. For the sake of this post, I'm only looking at the the main three cases: &lt;b&gt;past&lt;/b&gt;, &lt;b&gt;semiweak&lt;/b&gt;, and &lt;b&gt;mono&lt;/b&gt;.&lt;br /&gt;
&lt;br /&gt;
Now, Guy (1991) came up with a specific proposal where if you described the proportion of pronounced /t d/ for &lt;b&gt;past&lt;/b&gt;&amp;nbsp;as &lt;i&gt;p&lt;/i&gt;, for &lt;b&gt;semiweak&lt;/b&gt;&amp;nbsp;as &lt;i&gt;p&lt;sup&gt;j&lt;/sup&gt;&lt;/i&gt;&lt;b style="font-style: italic;"&gt;&amp;nbsp;&lt;/b&gt;and for &lt;b&gt;mono&lt;/b&gt;&amp;nbsp;as &lt;i&gt;p&lt;sup&gt;k&lt;/sup&gt;&lt;/i&gt;, then &lt;i&gt;j&lt;/i&gt;= 2, and &lt;i&gt;k&lt;/i&gt;&amp;nbsp;= 3. It is specifically whether or not&amp;nbsp;&amp;nbsp;&lt;i&gt;j&lt;/i&gt;= 2 and&amp;nbsp;&lt;i&gt;k&lt;/i&gt;&amp;nbsp;= 3 that I'm interested in here. If you've calculated the proportions of pronounced /t d/ for each grammatical class, you can calculate &lt;i&gt;j&lt;/i&gt;&amp;nbsp;by &lt;sup&gt;log(semiweak)&lt;/sup&gt;⁄&lt;sub&gt;log(past)&lt;/sub&gt; and &lt;i&gt;k&lt;/i&gt;&amp;nbsp;by &lt;sup&gt;log(mono)&lt;/sup&gt;⁄&lt;sub&gt;log(past)&lt;/sub&gt;. The trick is in how you decide to calculate those proportions.&lt;br /&gt;
&lt;br /&gt;
For this post, you can play along at home. Here's code to get set up. It'll load the Buckeye data I've been using, and do some data prep.&lt;br /&gt;
&lt;br /&gt;
&lt;script src="https://gist.github.com/2719770.js?file=load_buckeye.R"&gt;
&lt;/script&gt;
&lt;br /&gt;
So, how do you calculate the rate at which /t d/ are pronounced at the end of the word when you have a big data set from many different speakers? Traditional practice within sociolinguistics has been to just pool all of the observations from each grammatical class across all speakers.&lt;br /&gt;
&lt;script src="https://gist.github.com/2719770.js?file=pool_all_mean.R"&gt;
&lt;/script&gt;&lt;br /&gt;
So you come out with&lt;i&gt;&amp;nbsp;j&lt;/i&gt;&amp;nbsp;= 1.91, &lt;i&gt;k&lt;/i&gt;&amp;nbsp;=&amp;nbsp;3.1, which is a &amp;nbsp;pretty good fit to the proposal of Guy (1991).&lt;br /&gt;
&lt;br /&gt;
The problem is that this isn't really the best way to calculate proportions like this. There are some words which are super frequent, and they therefore get more "votes" in the proportion of their grammatical class. And, some speakers talk more than others, and they get more "votes" towards making the over-all proportions look more similar to their own. One approach to ameliorate this is to first calculate the proportion for each word within a grammatical class within a speaker, then for each grammatical class within a speaker, then within a grammatical class. Here's the code for this nested proportion approach.&lt;br /&gt;
&lt;script src="https://gist.github.com/2719770.js?file=nested_mean.R"&gt;
&lt;/script&gt;&lt;br /&gt;
All of a sudden, we're down to &lt;i&gt;j&lt;/i&gt;&amp;nbsp;= 1.34 and &lt;i&gt;k&lt;/i&gt;&amp;nbsp;= 2.05, and I haven't even dipped into mixed-effects models black magic yet.&lt;br /&gt;
&lt;br /&gt;
But when it comes to modeling the proposal of Guy (1991), calculating the proportions is really just a mean to an end. I &lt;a href="http://stats.stackexchange.com/questions/1261/fitting-a-fixed-exponential-relationship-between-categories-with-categorical-da"&gt;asked Cross Validated&lt;/a&gt;&amp;nbsp;how to directly model &lt;i&gt;j&lt;/i&gt;&amp;nbsp;and &lt;i&gt;k&lt;/i&gt;, and apparently you can do so using a complementary log-log link. So here is the mixed effects model for &lt;i&gt;j&lt;/i&gt;&amp;nbsp;and &lt;i&gt;k&lt;/i&gt;&amp;nbsp;directly.&lt;br /&gt;
&lt;script src="https://gist.github.com/2719770.js?file=exponents_model1.R"&gt;
&lt;/script&gt;
&lt;br /&gt;
The model estimates look very similar to the nested proportions approach, &lt;i&gt;j&amp;nbsp;&lt;/i&gt;= 1.38, &lt;i&gt;k&amp;nbsp;&lt;/i&gt;=&amp;nbsp;2.11.&lt;br /&gt;
&lt;br /&gt;
What if we fit the model without the by-word random intercepts?&lt;br /&gt;
&lt;script src="https://gist.github.com/2719770.js?file=exponents_model2.R"&gt;
&lt;/script&gt;
&lt;br /&gt;
Now we're a bit closer back to the original pooled proportions estimates, &lt;i&gt;j&lt;/i&gt;&amp;nbsp;= 1.57, &lt;i&gt;k&amp;nbsp;&lt;/i&gt;= 3.19.&lt;br /&gt;
&lt;br /&gt;
My personal conclusion from all this is that the apparent &lt;i&gt;j&lt;/i&gt;&amp;nbsp;= 2, &lt;i&gt;k&lt;/i&gt;&amp;nbsp;= 3 pattern is driven mostly by the lexical effects of highly frequent words. This table recaps all of the results, plus the estimates of two more model. One has just a by speaker random intercept, and a flat model, which looks just like the maximum likelihood estimate of the fully pooled approach, because it is.
&lt;center&gt;
&lt;table&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;&lt;th&gt;j&lt;/th&gt;&lt;th&gt;k&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pooled&lt;/td&gt;&lt;td&gt;1.91&lt;/td&gt;&lt;td&gt;3.1&lt;/td&gt;
&lt;tr&gt;
&lt;tr&gt;
&lt;td&gt;Nested&lt;/td&gt;&lt;td&gt;1.34&lt;/td&gt;&lt;td&gt;2.05&lt;/td&gt;
&lt;tr&gt;
&lt;tr&gt;
&lt;td&gt;~Gram+(Gram|Speaker)+(1|Word)&lt;/td&gt;&lt;td&gt;1.38&lt;/td&gt;&lt;td&gt;2.11&lt;/td&gt;
&lt;tr&gt;
&lt;tr&gt;
&lt;td&gt;~Gram+(Gram|Speaker)&lt;/td&gt;&lt;td&gt;1.57&lt;/td&gt;&lt;td&gt;3.19&lt;/td&gt;
&lt;tr&gt;
&lt;tr&gt;
&lt;td&gt;~Gram+(1|Speaker)&lt;/td&gt;&lt;td&gt;1.84&lt;/td&gt;&lt;td&gt;3.14&lt;/td&gt;
&lt;tr&gt;
&lt;tr&gt;
&lt;td&gt;~Gram&lt;/td&gt;&lt;td&gt;1.91&lt;/td&gt;&lt;td&gt;3.1&lt;/td&gt;
&lt;tr&gt;
&lt;/table&gt;
&lt;/center&gt;
&lt;br /&gt;
The lesson is that it can matter a low how you calculate your proportions.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/232777626311457607-90723469880243078?l=val-systems.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2012/05/on-calculating-exponents.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><thr:total>2</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-1873334258499392994</guid><pubDate>Wed, 16 May 2012 15:03:00 +0000</pubDate><atom:updated>2012-05-16T15:45:36.625-04:00</atom:updated><title>Decline Effect in Linguisics?</title><description>It seems to me that in the past few years, the empirical foundations of the social sciences, especially Psychology, have been coming under increased scrutiny and criticism. For example, there was the New Yorker piece from 2010 called "&lt;a href="http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer"&gt;The Truth Wears Off&lt;/a&gt;" about the "decline effect," or how the effect size of a phenomenon appears to decrease over time. More recently, the Chronicle of Higher Education had a blog post called "&lt;a href="http://chronicle.com/blogs/percolator/is-psychology-about-to-come-undone/29045"&gt;Is Psychology About to Come Undone?&lt;/a&gt;" about the failure to replicate some psychological results.&lt;br /&gt;
&lt;br /&gt;
These kinds of stories are concerning at two levels. At the personal level, researchers want to build a career and reputation around establishing new and reliable facts and principles. We definitely don't want the result that was such a nice feather in our cap to turn out to be wrong! At a more principled level, as scientists, our goal is for our models to approximate reality as closely as possible, and we don't want the course of human knowledge to be diverted down a dead end.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;




Small effects&lt;/h3&gt;
But, I'm a linguist. Do the problems facing psychology face me? To really answer that, I first have to decide which explanation for the decline effect I think is most likely, and I think &lt;a href="http://andrewgelman.com/2010/12/the_truth_wears/"&gt;Andrew Gelman's proposal&lt;/a&gt; is a good candidate:&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
The short story is that if you screen for statistical significance when estimating small effects, you will necessarily overestimate the magnitudes of effects, sometimes by a huge amount.&lt;/blockquote&gt;
&lt;br /&gt;
I've put together &lt;a href="https://gist.github.com/2710902#file_pvalues.r"&gt;some R code&lt;/a&gt; to demonstrate this point. Let's say I'm looking at two populations, and unknown to me as a researcher, there &lt;i&gt;is &lt;/i&gt;a small difference between the two, even though they're highly overlapping. Next, let's say I randomly sample 10 people from each population, do a t-test for the measurement I care about, and write down whether or not the p-value &amp;lt; 0.5 and the estimated size of the difference between the two populations. Then I do this 1000 more times. Some proportion (approximately equal to the power of the test) of the t-tests will have&amp;nbsp;successfully identified a difference. But did those tests which found a significant difference also accurately estimate the &lt;i&gt;size&lt;/i&gt;&amp;nbsp;of the effect?&lt;br /&gt;
&lt;br /&gt;
For the purpose of the simulation, I randomly generated samples from two normal distributions with standard deviations 1, and means 1 and 1.1. I did this for a few different sample sizes, 1000 times each. This figure show how many times larger the estimated effect size was than the true effect for tests which found a significant difference. The size of each point shows the probability of finding a&amp;nbsp;significant&amp;nbsp;difference for a sample of that size.&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-w-sgO7AaUIk/T7LOFYCS3UI/AAAAAAAABD4/YudxQk8UnUE/s1600/pvalue1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="315" src="http://2.bp.blogspot.com/-w-sgO7AaUIk/T7LOFYCS3UI/AAAAAAAABD4/YudxQk8UnUE/s400/pvalue1.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
So, we can see that for small sample sizes, the test has low power. That is, you are not very likely to find a significant difference, even though there &lt;i&gt;is&lt;/i&gt;&amp;nbsp;a true difference (i.e., you have a high rate of Type II error). Even worse, though, is that when the test has "worked," and found a significant difference when there is a true difference, you have both Type M (magnitude) and Type S (sign) errors. For small sample sizes (between 10 and 50 samples each from the two&amp;nbsp;populations), the estimated effect size is between 5 and 10 times greater than the real effect size, and the sign is sometimes flipped!&lt;br /&gt;
&lt;br /&gt;
Taking the approach of just choosing a smaller p-value will help you out insofar as you will be less likely to conclude that you've found a significant difference when there is a true difference (i.e., you ramp up your Type II error rate, by reducing the power of your test), but that doesn't do anything to ameliorate the size of the Type M errors when you &lt;i&gt;do&lt;/i&gt;&amp;nbsp;find a significant difference. This figure facets by different p-value&amp;nbsp;thresholds.&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-Mc3zv9PHqUM/T7LXowdE-MI/AAAAAAAABEE/wzF_qlCxo_U/s1600/pvalue2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="289" src="http://4.bp.blogspot.com/-Mc3zv9PHqUM/T7LXowdE-MI/AAAAAAAABEE/wzF_qlCxo_U/s640/pvalue2.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;h3&gt;





So do I have to worry?&lt;/h3&gt;
So, I think how much I ought to worry about the decline effect in my research, and linguistic research in general, is inversely proportional to the size of the effects we're trying to chase down. If the true size of the effects we're investigating are large, then our tests are more likely to be well powered, and we are less likely to experience Type M errors.&lt;br /&gt;
&lt;br /&gt;
And in general, I don't think the field has exhausted all of our sledgehammer effects. For example, Sprouse and Almeida (2012) [&lt;a href="http://www.socsci.uci.edu/~jsprouse/papers/sprouse.almeida.core.syntax.pdf"&gt;pdf&lt;/a&gt;]&amp;nbsp;successfully&amp;nbsp;replicated somewhere around 98% of the syntactic judgments from the syntax textbook &lt;i&gt;Core Syntax&lt;/i&gt;&amp;nbsp;(Adger 2003) using experimental methods (a pretty good replication rate if you ask me), and in general, &lt;a href="http://ling.auf.net/lingBuzz/001362"&gt;the estimated effect sizes were very large&lt;/a&gt;. So one thing seems clear. Sentence 1 is ungrammatical, and sentences 2 and 3 are grammatical.&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;*What did you see the man who bought?&lt;/li&gt;
&lt;li&gt;Who did you see who bought a cow?&lt;/li&gt;
&lt;li&gt;Who saw the man who bought a cow?&lt;/li&gt;
&lt;/ol&gt;
And the difference in acceptability between these sentences is &lt;i&gt;not&lt;/i&gt; getting smaller over time due to the decline effect. The explanatory theories for &lt;i&gt;why&lt;/i&gt;&amp;nbsp;sentence 1 isn't grammatical may change, and who knows, maybe the field will decide at some point that its ungrammaticality is no longer a fact that needs to be explained, but the fact that it is ungrammatical is not a moving target.
&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;







Maybe I do need to worry&lt;/h3&gt;
However, there is one phenomenon that I've looked at that I think has been following a decline effect pattern: the exponential pattern in /t d/ deletion. For reasons that I won't go into here, Guy (1991) proposed that if the rate at which a word final /t/ or /d/ is pronounced in past tense forms like &lt;i&gt;packed&lt;/i&gt; is given as p, the rate at which it is pronounced in semi-irregular past tense forms like &lt;i&gt;kept&lt;/i&gt; is given as p&lt;sup&gt;&lt;i&gt;j&lt;/i&gt;&lt;/sup&gt;, and the rate at which it is pronounced in regular words like &lt;i&gt;pact&lt;/i&gt; is given as p&lt;sup&gt;&lt;i&gt;k&lt;/i&gt;&lt;/sup&gt;, then &lt;i&gt;j&lt;/i&gt; = 2, &lt;i&gt;k&lt;/i&gt; = 3.
&lt;br /&gt;
&lt;br /&gt;
Here's a table of studies, and their estimates of &lt;i&gt;j&lt;/i&gt; and &lt;i&gt;k&lt;/i&gt;, plus some confidence intervals. See &lt;a href="https://gist.github.com/2710902#file_td_deletion_estimates.r"&gt;this code&lt;/a&gt; for how I calculated the confidence intervals.
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;center&gt;
&lt;table rules="rows"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th&gt;Study&lt;/th&gt;&lt;th&gt;Year&lt;/th&gt;&lt;th&gt;Dialect&lt;/th&gt;&lt;th&gt;&lt;i&gt;j&lt;/i&gt;&lt;/th&gt;&lt;th&gt;&lt;i&gt;k&lt;/i&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Guy&lt;/td&gt;&lt;td&gt;1991&lt;/td&gt;&lt;td&gt;White Philadelphia&lt;/td&gt;&lt;td&gt;&lt;table&gt;
                                   &lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;4.74&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;2.37&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;1.17&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;&lt;td&gt;&lt;table&gt;
                                   &lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;4.26&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;2.75&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;1.86&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Santa Ana&lt;/td&gt;&lt;td&gt;1992&lt;/td&gt;&lt;td&gt;Chicano Los Angeles&lt;/td&gt;&lt;td&gt;&lt;table&gt;
                                   &lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;2.29&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;1.76&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;1.35&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;&lt;td&gt;&lt;table&gt;
                                   &lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;3.39&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;2.91&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;2.51&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bayley&lt;/td&gt;&lt;td&gt;1994&lt;/td&gt;&lt;td&gt;Tejano San Antonio&lt;/td&gt;&lt;td&gt;&lt;table&gt;
                                   &lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;2.08&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;1.51&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;1.11&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;&lt;td&gt;&lt;table&gt;
                                   &lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;3.59&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;2.99&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;2.52&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tagliamonte &amp;amp; Temple&lt;/td&gt;&lt;td&gt;2005&lt;/td&gt;&lt;td&gt;York, Northern England&lt;/td&gt;&lt;td&gt;&lt;table&gt;
                                   &lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;1.85&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;1.12&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;0.66&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;&lt;td&gt;&lt;table&gt;
                                   &lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;1.96&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;1.43&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;1.04&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Smith &amp;amp; Durham &amp;amp; Fortune&lt;/td&gt;&lt;td&gt;2009&lt;/td&gt;&lt;td&gt;Buckie, Scotland&lt;/td&gt;&lt;td&gt;&lt;table&gt;
                                   &lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;1.36&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;0.64&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;0.24&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;&lt;td&gt;&lt;table&gt;
                                   &lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;3.59&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;2.33&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;1.53&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fruehwald&lt;/td&gt;&lt;td&gt;2012&lt;/td&gt;&lt;td&gt;Columbus, OH&lt;/td&gt;&lt;td&gt;&lt;table&gt;
                                   &lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;2.48&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;1.38&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;0.76&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;&lt;td&gt;&lt;table&gt;
                                   &lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;2.35&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;1.93&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;small&gt;1.59&lt;/small&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/center&gt;&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
I should say right off the bat that all of these studies are not perfect replications of Guy's original study. They have different sample sizes, coding schemes, and statistical approaches. Mine, in the last row, is probably the most divergent, as I directly modeled and estimated the reliability of &lt;i&gt;j&lt;/i&gt;&amp;nbsp;and &lt;i&gt;k&lt;/i&gt;&amp;nbsp;using a mixed effects model, while the others calculated p&lt;sup&gt;&lt;i&gt;j&lt;/i&gt;&lt;/sup&gt;&amp;nbsp;and&amp;nbsp;p&lt;sup&gt;&lt;i&gt;k&lt;/i&gt;&lt;/sup&gt;&amp;nbsp;and compared them to the maximum likelihood estimates for words like &lt;i&gt;kept&lt;/i&gt;&amp;nbsp;and &lt;i&gt;pact&lt;/i&gt;.&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
But needless to say, estimates of &lt;i&gt;j&lt;/i&gt;&amp;nbsp;and &lt;i&gt;k &lt;/i&gt;have not hovered nicely around 2 and 3.&amp;nbsp;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/232777626311457607-1873334258499392994?l=val-systems.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2012/05/decline-effect-in-linguisics.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/-w-sgO7AaUIk/T7LOFYCS3UI/AAAAAAAABD4/YudxQk8UnUE/s72-c/pvalue1.png" height="72" width="72" /><thr:total>22</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-4469695046477443887</guid><pubDate>Thu, 19 Apr 2012 17:10:00 +0000</pubDate><atom:updated>2012-04-19T13:11:31.712-04:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">usage</category><category domain="http://www.blogger.com/atom/ns#">linguistics</category><category domain="http://www.blogger.com/atom/ns#">peeving</category><title>Come and see</title><description>Yesterday, as a pre-amble to an ordinary newsletter sent out via listserv to most PhD students at UPenn, we were offered this piece of advice:&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
Tip of the day: You should all know this by now: It is incorrect to say “come and see” or “come out and help”, or any other “come…and…” phrase.  It is an infinitive phrase: “Come to see”, “Come out to help”, “Come to have fun”.  Don’t aggravate anyone’s pet peeves; just write and say it correctly.  You’re welcome.&lt;/blockquote&gt;
Well, many of us linguistics graduate students felt this merited some kind of response. I don't know about other linguists out there, but if someone said this to me in a personal e-mail, or in conversation, I couldn't &lt;i&gt;not&lt;/i&gt;&amp;nbsp;respond.&lt;br /&gt;
&lt;br /&gt;
And then, an amazing thing happened. We started drafting a letter in a Google document with 16 contributors. It was a litte chaotic, but we marshaled together intuitions, data, and argumentation, and had drafted this message in about an hour's time.&lt;br /&gt;
&lt;blockquote&gt;
To whom it may concern:&lt;br /&gt;
&lt;br /&gt;
We were recently sent a grammar “tip” via the [&lt;i&gt;redacted&lt;/i&gt;]&lt;i&gt;&amp;nbsp;&lt;/i&gt;listserv which read:&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
Tip of the day: You should all know this by now: It is incorrect to say “come and see” or “come out and help”, or any other “come…and…” phrase.  It is an infinitive phrase: “Come to see”, “Come out to help”, “Come to have fun”.  Don’t aggravate anyone’s pet peeves; just write and say it correctly.  You’re welcome.&lt;/blockquote&gt;
The linguistics graduate students felt that this required a response, as in fact, the cited examples “come and see” and “come out and help” are both grammatical and widely used constructions in American English.&lt;br /&gt;
&lt;br /&gt;
The two constructions differ slightly in meaning. If one says,&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Mary came and saw Tupac’s hologram perform.&lt;/li&gt;
&lt;/ul&gt;
it must be the case that the performance actually occurred; it cannot be the case that there were technical difficulties and the performance was cancelled. However,&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Mary came to see Tupac’s hologram perform.&lt;/li&gt;
&lt;/ul&gt;
admits the possibility that the performance was cancelled due to technical difficulties. Therefore, asserting that the infinitive phrase is a uniformly appropriate replacement for the conjoined phrase is not an appropriate representation of the linguistic facts.&lt;br /&gt;
&lt;br /&gt;
Phrases like “come and see” are not restricted to the spoken idiom, but are also used in the written language. They even occur in texts considered by some to be canonical, as the following examples show:&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
He saith unto them, “Come and see”. (John 1:39, King James Bible)&amp;nbsp;&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote class="tr_bq"&gt;
“Then you may come and see the picture”. (Merry Wives of Windsor II:II, William Shakespeare)&amp;nbsp;&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote class="tr_bq"&gt;
“Will you come and see me?” (Pride &amp;amp; Prejudice, chap. 26, Jane Austen)&lt;/blockquote&gt;
Generally, grammatical prescriptivism contributes little to useful discourse, and may even cause intelligent language users to be unfairly stigmatized. Thus, while we appreciate [&lt;i&gt;redacted&lt;/i&gt;]'s light-hearted "tips-of-the-day," we would encourage authors to keep an open mind about the breadth of possible language use, especially in public forums.&lt;br /&gt;
&lt;br /&gt;
Sincerely,&lt;br /&gt;
&lt;br /&gt;
Jana Beck*&lt;br /&gt;
Claire Crawford*&lt;br /&gt;
[&lt;i&gt;redacted&lt;/i&gt;]*&lt;br /&gt;
Sabriya Fisher*&lt;br /&gt;
Aaron Freeman*&lt;br /&gt;
Lauren Friedman*&lt;br /&gt;
Josef Fruehwald*&lt;br /&gt;
Kyle Gorman*&lt;br /&gt;
Marielle Lerner*&lt;br /&gt;
Caitlin Light*&lt;br /&gt;
Laurel MacKenzie*&lt;br /&gt;
Brittany McLaughlin*&lt;br /&gt;
Hilary Prichard*&lt;br /&gt;
Kobey Shwayder*&lt;br /&gt;
Jon Stevens*&lt;br /&gt;
[&lt;i&gt;redacted&lt;/i&gt;]*&lt;br /&gt;
&lt;br /&gt;
*Department of Linguistics&lt;/blockquote&gt;
Thinking about it some more, I think at least the past tense "came to see" even has the implicature that either the seeing was unsuccessful, or there is some other more relevant event than the seeing which the speaker is about to tell us about.&lt;br /&gt;
&lt;br /&gt;
Anyway, I think we did a bang up job, and produced a really excellent message, especially considering there were &lt;i&gt;16 authors&lt;/i&gt;!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/232777626311457607-4469695046477443887?l=val-systems.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2012/04/come-and-see.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-4742357495800098547</guid><pubDate>Sat, 14 Apr 2012 21:55:00 +0000</pubDate><atom:updated>2012-04-14T17:56:33.142-04:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">rstudio</category><category domain="http://www.blogger.com/atom/ns#">knitr</category><category domain="http://www.blogger.com/atom/ns#">R</category><title>Linguistic Notation Inside of R Plots!</title><description>So, I've been playing around with learning &lt;a href="http://yihui.name/knitr/"&gt;knitr&lt;/a&gt;, which is a Sweave-like R package for combining LaTeX and R code into one document. There's almost no learning curve if you already use Sweave, and I find a lot of knitr's design and usage to be a lot nicer.&lt;br /&gt;
&lt;br /&gt;
I wasn't going to make a blog post or tutorial about knitr, because the documentation is already pretty good, and &lt;a href="http://yihui.name/knitr/demos"&gt;contains a lot of tutorials&lt;/a&gt;. &amp;nbsp;However, I've just had a major victory in incorporating linguistic notations into plots using knitr, and I just had to share. I'll show you the payoff first, and then include the details.&lt;br /&gt;
&lt;br /&gt;
First, I managed to successfully use IPA characters as plot symbols and legend keys.&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-5FBHWzS3sQY/T4nY-w7DZBI/AAAAAAAABCc/NC77OaiFR-A/s1600/unnamed-chunk-4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="250" src="http://3.bp.blogspot.com/-5FBHWzS3sQY/T4nY-w7DZBI/AAAAAAAABCc/NC77OaiFR-A/s400/unnamed-chunk-4.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
The actual data in the plot is on car fuel economy, but that's not the point. Look at that IPA!
&lt;br /&gt;
&lt;br /&gt;
Then, I tried to expand on the principles that got me the IPA, and look what I produced.&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-WTHze0rAtbs/T4nZmv8bwmI/AAAAAAAABCk/ipfdvF6jpRo/s1600/unnamed-chunk-5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="250" src="http://3.bp.blogspot.com/-WTHze0rAtbs/T4nZmv8bwmI/AAAAAAAABCk/ipfdvF6jpRo/s400/unnamed-chunk-5.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
Yes, that &lt;em&gt;is&lt;/em&gt; a syntax tree overlaid on top of the plot. But why stop there when you could go completely crazy?
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-LsYMqRfgkVw/T4nbWslt-FI/AAAAAAAABCs/2u8kqVo4StM/s1600/unnamed-chunk-7.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="250" src="http://2.bp.blogspot.com/-LsYMqRfgkVw/T4nbWslt-FI/AAAAAAAABCs/2u8kqVo4StM/s400/unnamed-chunk-7.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;h3&gt;

How to do it.&lt;/h3&gt;
The important thing about making these plots is that they were easy given my pre-existing knowledge of R, LaTeX and what I've learned about knitr. &amp;nbsp;The crucial element here is that knitr supports tikz graphics. I don't know anything about tikz graphics, and I still don't, which means that if you don't know anything about tikz graphics, you can still make plots like these.&lt;br /&gt;
&lt;br /&gt;
Like most linguists who use LaTeX, I already know how to include IPA characters and draw syntactic trees in a LaTeX document. It's simple as&lt;br /&gt;
&lt;pre&gt;...
\usepackage{tipa}
\usepackage{qtree}
...
\textipa{D C P}
\Tree [.S NP VP ]
...
&lt;/pre&gt;
&lt;br /&gt;
What is so cool about the tikz device is that it lets you define these notations in LaTeX syntax, and then incorporates them into R graphs. Here are the important code chunks to include in your knitr document to make it all work.
&lt;br /&gt;
&lt;br /&gt;
&lt;h4&gt;

1 — Load the right R packages&lt;/h4&gt;
Early on, load the ggplot2 and tikzDevice R packages.
&lt;br /&gt;
&lt;div style="overflow: auto;"&gt;
&lt;div class="geshifilter"&gt;
&lt;pre class="r geshifilter-R" style="font-family: monospace;"&gt;&amp;lt;&amp;lt;&amp;gt;&amp;gt;=
    &lt;a href="http://inside-r.org/r-doc/base/library"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;library&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;a href="http://inside-r.org/packages/cran/ggplot2"&gt;ggplot2&lt;/a&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;
    &lt;a href="http://inside-r.org/r-doc/base/library"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;library&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;a href="http://inside-r.org/packages/cran/tikzDevice"&gt;tikzDevice&lt;/a&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;
@&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h4&gt;

2 — Define your LaTeX libraries&lt;/h4&gt;
Then, you need to tell the tikz device which LaTeX packages you want to use.
&lt;br /&gt;
&lt;div style="overflow: auto;"&gt;
&lt;div class="geshifilter"&gt;
&lt;pre class="r geshifilter-R" style="font-family: monospace;"&gt;&amp;lt;&amp;lt;&amp;gt;&amp;gt;=
    &lt;a href="http://inside-r.org/r-doc/base/options"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;options&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;tikzLatexPackages = &lt;a href="http://inside-r.org/r-doc/base/c"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;c&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;a href="http://inside-r.org/r-doc/base/getOption"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;getOption&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;span style="color: blue;"&gt;"tikzLatexPackages"&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt;
                                  &lt;span style="color: blue;"&gt;"&lt;span style="color: #000099; font-weight: bold;"&gt;\\&lt;/span&gt;usepackage{tipa}"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt;
                                  &lt;span style="color: blue;"&gt;"&lt;span style="color: #000099; font-weight: bold;"&gt;\\&lt;/span&gt;usepackage{qtree}"&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;
@&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h4&gt;

3 — Define the plotting elements in LaTeX&lt;/h4&gt;
We're done with the hard part. Now, it's as simple as faking up some data...

&lt;br /&gt;
&lt;div style="overflow: auto;"&gt;
&lt;div class="geshifilter"&gt;
&lt;pre class="r geshifilter-R" style="font-family: monospace;"&gt;&amp;lt;&amp;lt;&amp;gt;&amp;gt;=
    &lt;a href="http://inside-r.org/r-doc/base/levels"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;levels&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;mpg$drv&lt;span style="color: #009900;"&gt;)&lt;/span&gt; &amp;lt;- &lt;a href="http://inside-r.org/r-doc/base/c"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;c&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;span style="color: blue;"&gt;"&lt;span style="color: #000099; font-weight: bold;"&gt;\\&lt;/span&gt;textipa{D}"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt;
                         &lt;span style="color: blue;"&gt;"&lt;span style="color: #000099; font-weight: bold;"&gt;\\&lt;/span&gt;textipa{C}"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt;
                         &lt;span style="color: blue;"&gt;"&lt;span style="color: #000099; font-weight: bold;"&gt;\\&lt;/span&gt;textipa{P}"&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;
&amp;nbsp;
    mpg$tree &amp;lt;- &lt;span style="color: blue;"&gt;"{&lt;span style="color: #000099; font-weight: bold;"&gt;\\&lt;/span&gt;footnotesize &lt;span style="color: #000099; font-weight: bold;"&gt;\\&lt;/span&gt;Tree [.S NP VP ]}"&lt;/span&gt;
@&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h4&gt;

4 — Plot the data using the tikz device&lt;/h4&gt;
...and plotting it, using the tikz device.

&lt;br /&gt;
&lt;div style="overflow: auto;"&gt;
&lt;div class="geshifilter"&gt;
&lt;pre class="r geshifilter-R" style="font-family: monospace;"&gt;&amp;lt;&amp;lt;dev=&lt;span style="color: blue;"&gt;"tikz"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; fig.width=&lt;span style="color: #cc66cc;"&gt;8&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; fig.height=&lt;span style="color: #cc66cc;"&gt;5&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; out.width=&lt;span style="color: blue;"&gt;"0.9&lt;span style="color: #000099; font-weight: bold;"&gt;\\&lt;/span&gt;textwidth"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; fig.align=&lt;span style="color: blue;"&gt;"center"&lt;/span&gt;&amp;gt;&amp;gt;=
    &lt;a href="http://inside-r.org/packages/cran/ggplot"&gt;ggplot&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;mpg&lt;span style="color: #339933;"&gt;,&lt;/span&gt; aes&lt;span style="color: #009900;"&gt;(&lt;/span&gt;displ&lt;span style="color: #339933;"&gt;,&lt;/span&gt; hwy&lt;span style="color: #339933;"&gt;,&lt;/span&gt; label = drv&lt;span style="color: #339933;"&gt;,&lt;/span&gt; color = drv&lt;span style="color: #009900;"&gt;)&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt; + 
            geom_text&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt; + 
            stat_smooth&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;+
            xlab&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;span style="color: blue;"&gt;"&lt;span style="color: #000099; font-weight: bold;"&gt;\\&lt;/span&gt;textipa{IPA!}"&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;    
@&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
Or, in the case of the syntactic trees,



&lt;br /&gt;
&lt;div style="overflow: auto;"&gt;
&lt;div class="geshifilter"&gt;
&lt;pre class="r geshifilter-R" style="font-family: monospace;"&gt;&amp;lt;&amp;lt;dev=&lt;span style="color: blue;"&gt;"tikz"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; fig.width=&lt;span style="color: #cc66cc;"&gt;8&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; fig.height=&lt;span style="color: #cc66cc;"&gt;5&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; out.width=&lt;span style="color: blue;"&gt;"0.7&lt;span style="color: #000099; font-weight: bold;"&gt;\\&lt;/span&gt;textwidth"&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; fig.align=&lt;span style="color: blue;"&gt;"center"&lt;/span&gt;&amp;gt;&amp;gt;=
    &lt;a href="http://inside-r.org/packages/cran/ggplot"&gt;ggplot&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;mpg&lt;span style="color: #339933;"&gt;,&lt;/span&gt; aes&lt;span style="color: #009900;"&gt;(&lt;/span&gt;displ&lt;span style="color: #339933;"&gt;,&lt;/span&gt; hwy&lt;span style="color: #339933;"&gt;,&lt;/span&gt; label = &lt;a href="http://inside-r.org/packages/cran/tree"&gt;tree&lt;/a&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;+
            geom_text&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt; + 
            stat_smooth&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;+
            xlab&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;span style="color: blue;"&gt;"TREES"&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;
@&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h4&gt;

5 — Compile the .Rnw to a .tex document&lt;/h4&gt;
Here's some source code to embed these plots in a beamer presentation.
&lt;script src="https://gist.github.com/2387823.js"&gt;
 
&lt;/script&gt;
To compile a .tex document from the .Rnw source, you can run
&lt;br /&gt;
&lt;div style="overflow: auto;"&gt;
&lt;div class="geshifilter"&gt;
&lt;pre class="r geshifilter-R" style="font-family: monospace;"&gt;&lt;a href="http://inside-r.org/r-doc/base/library"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;library&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;knitr&lt;span style="color: #009900;"&gt;)&lt;/span&gt;
knit&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;span style="color: blue;"&gt;"./ling-plot.Rnw"&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
Then, just compile the .tex document however your little heart desires.

&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;

How to do it with one click&lt;/h3&gt;
As if this weren't awesome  and easy enough yet, it's possible to compile the whole document in one click using RStudio, &lt;a href="http://yihui.name/knitr/demo/rstudio/"&gt;as outlined on this knitr page&lt;/a&gt;. You'll need to download &lt;a href="http://www.rstudio.org/download/preview"&gt;the development (i.e. not guaranteed to be stable) RStudio release&lt;/a&gt;, then set the compilation option to use knitr, and you're done!
&lt;br /&gt;
&lt;br /&gt;
I have to say that from &amp;nbsp;a practical standpoint, I've found writing Sweave documents in RStudio to be a much better experience than what I was doing before, because I can run and debug the R code from within the .Rnw source document. No need to go flipping back and forth between a Tex editor and R.&lt;br /&gt;
&lt;br /&gt;
P.S. I highlighted the code above at &lt;a href="http://www.inside-r.org/pretty-r"&gt;http://www.inside-r.org/pretty-r&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/232777626311457607-4742357495800098547?l=val-systems.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2012/04/linguistic-notation-inside-of-r-plots.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-5FBHWzS3sQY/T4nY-w7DZBI/AAAAAAAABCc/NC77OaiFR-A/s72-c/unnamed-chunk-4.png" height="72" width="72" /><thr:total>3</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-9161099933222740939</guid><pubDate>Sat, 31 Mar 2012 18:42:00 +0000</pubDate><atom:updated>2012-04-03T12:26:17.824-04:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">data</category><category domain="http://www.blogger.com/atom/ns#">R</category><category domain="http://www.blogger.com/atom/ns#">murders</category><title>More on Philadelphia Homicide</title><description>I've been doing more analysis of &lt;a href="https://www.google.com/fusiontables/DataSource?snapid=S4035208e94"&gt;the Philadelphia Homicide data&lt;/a&gt; that the Philadelphia Inquirer has published, and presented some of it at the Philadelphia UseR group yesterday. &lt;a href="http://jofrhwld.github.com/talks/UseR/murders/murders.pdf"&gt;My slides&lt;/a&gt; [pdf] and &lt;a href="https://raw.github.com/JoFrhwld/jofrhwld.github.com/master/talks/UseR/murders/murders.Rnw"&gt;source&lt;/a&gt; [knitr .Rnw] are on &lt;a href="https://github.com/JoFrhwld/jofrhwld.github.com/tree/master/talks/UseR/murders"&gt;github&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
I should be clear that I am &lt;i&gt;not&lt;/i&gt; an expert on crime and murder. In fact, I'm not even fairly knowledgeable. If anyone out there with more expertise has strong criticism of my "analysis" (really, it's just a rough exploration of the data), I'll eat it, and I'll look forward to your own analysis of the data (again, it's right &lt;a href="https://www.google.com/fusiontables/DataSource?snapid=S4035208e94"&gt;here&lt;/a&gt;). Here are some of the most striking patterns that I found.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;






Results&lt;/h2&gt;
First, here is the total number of murders that occurred over the past 23 years, broken down by the day of the week. The weekends are worse than the weekdays.&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-us-aOFwWbyo/T3S2lmGn6EI/AAAAAAAABBs/BI8-d_VNJ8I/s1600/weekday.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="250" src="http://4.bp.blogspot.com/-us-aOFwWbyo/T3S2lmGn6EI/AAAAAAAABBs/BI8-d_VNJ8I/s400/weekday.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;/div&gt;
&lt;br /&gt;
Next, here are the total number of murders by hour of the day. The hour of the day was not included in the data until 2006, so this only represents murders between 2006 and 2011. The plot is centered around midnight, so the afternoon of Day 1 is on the left, and the morning of Day 2 is on the right.&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-tcxuCyp2uEw/T3S3fAVWxEI/AAAAAAAABB0/5J1wa5EfVvM/s1600/hour.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="250" src="http://2.bp.blogspot.com/-tcxuCyp2uEw/T3S3fAVWxEI/AAAAAAAABB0/5J1wa5EfVvM/s400/hour.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
It looks like there's something weird going on around 11pm and midnight, which I have to chalk up to the reporting patterns of the PPD. For some reason, it seems like murders which occurred in the midnight hour are more likely to be logged as&amp;nbsp;occurring&amp;nbsp;at 11PM.&lt;br /&gt;
&lt;br /&gt;
Here is the most striking plot that I produced this time around. It plots, by month, the average frequency of murders. The y-axis represents 1 murder every X days.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-Je9F8U26UUM/T3YK-V7AWiI/AAAAAAAABB8/mzqwpH4BTGg/s1600/big-result.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="250" src="http://4.bp.blogspot.com/-Je9F8U26UUM/T3YK-V7AWiI/AAAAAAAABB8/mzqwpH4BTGg/s400/big-result.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
Since 1988, the African American community has been living in a Philadelphia with approximately a murder every day, or every other day. The White community, on the other hand, has been living in a Philadelphia with a murder once a week.&lt;br /&gt;
&lt;br /&gt;
I also did some meager statistical analysis, specifically poisson regression with terms for the month (that is, January, February, etc, to look for a seasonal pattern), race of the victim, and weapon used. There was a significant month effect, but the&amp;nbsp;coefficients&amp;nbsp;didn't have much of a pattern to them. I &lt;i&gt;did&lt;/i&gt;&amp;nbsp;use number of days in the month as an offset in the regression, so it's not that. More importantly, there was an unsurprising main effect of race, but also a big interaction between race and weapon. Specifically, African American victims were way more likely to be killed by a gun.&lt;br /&gt;
&lt;br /&gt;
Guns and knives are the two most common weapons used in murders in the data. &lt;strike&gt;White murder victims are 2.54x more likely to have been shot than stabbed, while an African American murder victim is 7.19x more likely to have been shot than stabbed, meaning that &lt;b&gt;African American murder victims are 2.83x more likely to have been shot than a White murder victim was&lt;/b&gt;.&lt;/strike&gt;&lt;br /&gt;
&lt;b&gt;Update:&lt;/b&gt; There was a pretty serious flaw in my regression, in that if there was a Month where, say, no African Americans were murdered with a knife (and there were plenty), that month's data was missing, rather than 0. Filling in the data appropriately to reflect months with 0 murders for a particular race x weapon combination, the estimates are pretty different. White murder victims are 5.71x times more likely to be murdered with a gun than a knife, while African American murder victims were 8.62x times more likely to be murdered with a gun than a knife, meaning &lt;b&gt;African Americans are 1.51x times more likely to be shot than stabbed&lt;/b&gt;. So, that's a pretty serious revision approximately halving the multiplier. I've already updated the linked code and slides.
&lt;br /&gt;
So, gun deaths are an especially acute problem in the African American community. In fact, if you exclude gun deaths from the data, it actually looks like the racial disparity in murder rates has been &lt;i&gt;narrowing.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-q8ZEEHkWhTM/T3dIXKHjzjI/AAAAAAAABCE/mTSPwnpH76Y/s1600/noguns.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="250" src="http://4.bp.blogspot.com/-q8ZEEHkWhTM/T3dIXKHjzjI/AAAAAAAABCE/mTSPwnpH76Y/s400/noguns.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
It is purely coincidental that I'm posting this on the same day that &lt;a href="http://philadelphia.cbslocal.com/2012/03/30/another-philadelphia-gun-buyback-is-saturday-on-south-broad-street/"&gt;the Philadelphia Police Department are doing a gun buyback&lt;/a&gt;. You can bring in a gun and receive a $100 Shoprite voucher, no questions asked. Seems like a good initiative.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;





Analysis Discussion&lt;/h2&gt;
I spent a bit of time trying to figure out what I thought the most meaningful way to represent the murder rate was. First, I calculated the murder frequency by counting how many &lt;i&gt;n murders&lt;/i&gt; there were a month, then divided that by the number of days in the month for (&lt;i&gt;n murders&lt;/i&gt;/&lt;i&gt;n days&lt;/i&gt;)=&lt;i&gt;murders per day&lt;/i&gt;. But the resulting measure had values like 0.14 &lt;i&gt;murders per day&lt;/i&gt;, which isn't too informative. What people want to know about murders, or at least what I want to know, is how often murders happen, not how many happened in a given time window. So, instead, I calculated (&lt;i&gt;n days&lt;/i&gt;/&lt;i&gt;n murders&lt;/i&gt;)=&lt;i&gt;days per murder&lt;/i&gt;.&lt;br /&gt;
&lt;br /&gt;
The y-axis for the murder rate figures is also a logarithmic scale, which is both reasonable given the distribution of the data, and the impression of the timescale. From a human perspective, the difference between 1 day and 2 days feels larger than the difference between 3 weeks and 4 weeks. The y axis is also flipped, to indicate that smaller numbers mean "more often". &amp;nbsp;I managed the reversed log transformation by writing my own coordinate transformation using the new &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;scales&lt;/span&gt; package. Here's the R code.&lt;br /&gt;
&lt;script src="https://gist.github.com/2266961.js?file=revlog.R"&gt;
&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/232777626311457607-9161099933222740939?l=val-systems.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2012/03/more-on-philadelphia-homicide.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-us-aOFwWbyo/T3S2lmGn6EI/AAAAAAAABBs/BI8-d_VNJ8I/s72-c/weekday.png" height="72" width="72" /><thr:total>7</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-8114860418705172894</guid><pubDate>Wed, 07 Mar 2012 20:35:00 +0000</pubDate><atom:updated>2012-03-09T13:41:37.630-05:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">R</category><title>Philadelphia Schools</title><description>I'm on spring break, and yesterday I took some time to check off some items on my to-do list, namely:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Start getting acquainted with &lt;a href="http://cloud.github.com/downloads/hadley/ggplot2/guide-col.pdf"&gt;all the new features of ggplot2 [PDF]&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Get a handle on dealing with geographic data in R.&lt;/li&gt;
&lt;/ol&gt;
&lt;div&gt;
I've done &lt;a href="http://www.ling.upenn.edu/~joseff/papers/Fruehwald_IGERT2010_poster.pdf"&gt;some furtive geographic analysis using R [pdf]&lt;/a&gt;, but &lt;a href="https://github.com/JoFrhwld/Mapping-Scripts"&gt;the code behind it&lt;/a&gt; was very hacky. There is a whole field of geospatial data analysis out there that I am really ignorant of, and still am, but I've made a little bit of progress.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
I mostly followed &lt;a href="https://github.com/hadley/ggplot2/wiki/plotting-polygon-shapefiles"&gt;the tutorial laid out here&lt;/a&gt; for making maps in ggplot2. The most difficult part was getting the &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;rgdal&lt;/span&gt; package installed. It's one of these packages that relies on other, &amp;nbsp;non-R libraries being installed. I managed to get &lt;a href="http://www.gdal.org/"&gt;GDAL&lt;/a&gt; and &lt;a href="http://trac.osgeo.org/proj/"&gt;Proj.4&lt;/a&gt; installed (even though I honestly don't know what they do,), and got &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;rgdal&lt;/span&gt; installed (I had to work around an apparently non-standard installation location for Proj.4).&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Now, it's all about getting some good data, and fortunately, I stumbled across &lt;a href="http://opendataphilly.org/"&gt;opendataphilly.org&lt;/a&gt; yesterday as well! I found &lt;a href="http://www.opendataphilly.org/opendata/resource/38/schools/"&gt;a shapefile of all schools in Philadelphia&lt;/a&gt;, and a separate data set about&lt;a href="http://www.opendataphilly.org/opendata/resource/79/high-school-graduates-2004-2010/"&gt; how many public and charter high school graduates in 2010 went on to postsecondary education of various sorts&lt;/a&gt;. Unfortunately, there weren't any shared IDs of any sort between the two data sets, so to join them I had to hack it by hand, mostly.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
So, here is the result.&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-hSn75S_7tyg/T1e4Nq1nrfI/AAAAAAAABAY/zlw3D-vPPAc/s1600/postsecondary.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="321" src="http://3.bp.blogspot.com/-hSn75S_7tyg/T1e4Nq1nrfI/AAAAAAAABAY/zlw3D-vPPAc/s400/postsecondary.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
I'm not sure what I expected to see, which certainly weakens any conclusions I'd like to draw, but I am surprised at how little geographic patterning there is. I'm also almost certain that there are some data reporting problems. For example, that huge dark blue dot in the Northeast is Northeast High School, which reports that of their 652 graduates, 0 went on to any postsecondary education. I just don't think that can be true, and not because I'm an idealist. Northeast is right down the street from where I grew up, and while its not a fancy prep school by any means, it has both a Magnet program, and an International Baccalaureatte program.&lt;br /&gt;
&lt;br /&gt;
There's no way that zero students from Northeast went on to postsecondary education, a category which includes non-degree granting programs and specialized training programs. It's a lot more likely that they either didn't report the numbers, or the Pennsylvania Department of Education lost them, and then &lt;a href="http://www.ling.upenn.edu/~joseff/rstudy/summer2010_general_data_structure.html#collection_principles"&gt;didn't distinguish between missing data and 0&lt;/a&gt;. Unfortunately, that calls all schools with reports of 0% postsecondary education into question, even though some schools probably &lt;i&gt;did&lt;/i&gt;&amp;nbsp;have 0 students go on to further education.&lt;br /&gt;
&lt;br /&gt;
Looking at the distribution of the proportion of graduates going on to postsecondary education, the numbers are hugely bimodal (at least for the public schools).&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-1BL3IteZDGg/T1fAKmAnwmI/AAAAAAAABAg/WIvndzFZtuI/s1600/dist1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="178" src="http://2.bp.blogspot.com/-1BL3IteZDGg/T1fAKmAnwmI/AAAAAAAABAg/WIvndzFZtuI/s400/dist1.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
Even after excluding the schools which reported 0 students going on to postsecondary education, there are still 3 schools with basically 0 students getting further education out of high school: Frankford (1/341), &amp;nbsp;West Philly (1/208) and University City (2/205).&lt;br /&gt;
&lt;br /&gt;
Excluding the schools which reported less than 1% of students going on the further education (assuming either that they have faulty data, or have acute problems of other sorts), I replotted the map (note that the colors now run from 50% to 100%).&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-Q5DEhLTLONQ/T1fFLyfgJSI/AAAAAAAABAo/6jsXtYSfXCk/s1600/postsecondary2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="322" src="http://4.bp.blogspot.com/-Q5DEhLTLONQ/T1fFLyfgJSI/AAAAAAAABAo/6jsXtYSfXCk/s400/postsecondary2.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
Still no huge geographic patterns.&lt;br /&gt;
&lt;br /&gt;
Here's the R code that I used (including links to the data).&lt;br /&gt;
&lt;script src="https://gist.github.com/1995932.js"&gt; &lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/232777626311457607-8114860418705172894?l=val-systems.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2012/03/philadelphia-schools.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-hSn75S_7tyg/T1e4Nq1nrfI/AAAAAAAABAY/zlw3D-vPPAc/s72-c/postsecondary.png" height="72" width="72" /><thr:total>2</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-8445020867449796729</guid><pubDate>Sun, 04 Mar 2012 06:20:00 +0000</pubDate><atom:updated>2012-03-09T13:57:48.115-05:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">R</category><title>My Pocket Change</title><description>I'm playing around with some personal data collection, and using some cloud computing to visualize it. Following &lt;a href="http://www.r-bloggers.com/embedding-a-live-plot-in-your-website/"&gt;the directions in this blog post&lt;/a&gt;, I've written an R function which visualizes data it draws from a Google Docs spreadsheet, and uploaded it to OpenCPU's servers. The plots you're seeing in this post were actually generated by OpenCPU when you loaded this page, meaning they're &lt;i&gt;live!&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;&lt;br /&gt;&lt;/i&gt;&lt;br /&gt;
So, I've been logging, daily, my pocket change. The first plot shows the cumulative growth of the change in my change jar by 3 different measures, raw number of each kind of coin, total value as contributed by each kind of coin, and total mass contributed by each kind of coin (based on official data on how much each kind of coin &lt;i&gt;should&lt;/i&gt;&amp;nbsp;weigh).&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://beta1.opencpu.org/R/call/store:tmp/4615487596a359629ea7f869bc0c8d19/png?&amp;amp;!width=1050&amp;amp;!height=375" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="227" src="http://beta1.opencpu.org/R/call/store:tmp/4615487596a359629ea7f869bc0c8d19/png?&amp;amp;!width=1050&amp;amp;!height=375" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
This plot shows the proportional contribution each coin makes to each measure. The first panel shows what percent of all my coins belong to each type, the second panel shows how much each coin contributes to the over-all value proportionally, and the third how much each kind of coin contributes to &amp;nbsp;the over-all mass.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://beta1.opencpu.org/R/call/store:tmp/4615487596a359629ea7f869bc0c8d19/png?y=%22p%22&amp;amp;!width=1050&amp;amp;!height=375" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="228" src="http://beta1.opencpu.org/R/call/store:tmp/4615487596a359629ea7f869bc0c8d19/png?y=%22p%22&amp;amp;!width=1050&amp;amp;!height=375" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
So, depending on how long I keep this habit up, if you keep checking in on this post, you'll see new plots every day.&lt;br /&gt;
&lt;br /&gt;
I have two primary motivations for logging my coins. First, last time I cashed in all my change, someone asked me how long it took me to save it up, and I had no idea! Second, I'm curious to see how much effort I'm putting into carrying around relatively heavy coins, like pennies, for their small contribution to the over-all value of my coin jar.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/232777626311457607-8445020867449796729?l=val-systems.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2012/03/my-pocket-change.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-8969700434770834612</guid><pubDate>Fri, 02 Mar 2012 06:54:00 +0000</pubDate><atom:updated>2012-03-09T13:58:06.514-05:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">R</category><title>A terrible 2000 words</title><description>I've only just started looking at the &lt;a href="https://www.google.com/fusiontables/DataSource?snapid=S4035208e94"&gt;homicide data&lt;/a&gt; &lt;a href="http://val-systems.blogspot.com/2012/02/update-on-inquirer-data.html"&gt;made available by the Philadelphia Inquirer&lt;/a&gt; in my free time (which is hard to come by lately). I've been thinking about what sorts of statistics I could do, or what kinds of additional data sets I could merge in, but I think these simple plots already tell a terrible story about what is happening to who.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-vk6d1nfbxqw/T1Br6Uhh_AI/AAAAAAAAA_w/YTsajL4AVL4/s1600/murder_month.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="344" src="http://3.bp.blogspot.com/-vk6d1nfbxqw/T1Br6Uhh_AI/AAAAAAAAA_w/YTsajL4AVL4/s640/murder_month.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-ydu2PfHPiTQ/T1Br8hRam2I/AAAAAAAAA_4/ihoD_XoVq2Q/s1600/murder_age.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="344" src="http://2.bp.blogspot.com/-ydu2PfHPiTQ/T1Br8hRam2I/AAAAAAAAA_4/ihoD_XoVq2Q/s640/murder_age.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
I should point out that for the plot with month on the x-axis is also missing a whole year's worth of data, because apparently in 1991 the day of a reported homicide wasn't recorded.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/232777626311457607-8969700434770834612?l=val-systems.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2012/03/terrible-2000-words.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-vk6d1nfbxqw/T1Br6Uhh_AI/AAAAAAAAA_w/YTsajL4AVL4/s72-c/murder_month.png" height="72" width="72" /><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-3334681183992184695</guid><pubDate>Wed, 29 Feb 2012 14:10:00 +0000</pubDate><atom:updated>2012-03-02T01:35:04.739-05:00</atom:updated><title>Update on Inquirer Data</title><description>Well, I just got word that the Inquirer has decided to make their dataset on homicides in Philadelphia&amp;nbsp;publicly&amp;nbsp;available. Apparently they haven't settled on a general data policy, but this one is now accessible. You can find data on every reported homicide in Philadelphia between Jan 1, 1988 and December 31, 2011 here:&amp;nbsp;&lt;a href="https://www.google.com/fusiontables/DataSource?snapid=S4035208e94"&gt;https://www.google.com/fusiontables/DataSource?snapid=S4035208e94&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/232777626311457607-3334681183992184695?l=val-systems.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2012/02/update-on-inquirer-data.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-3269900292410282247</guid><pubDate>Mon, 20 Feb 2012 16:40:00 +0000</pubDate><atom:updated>2012-02-20T11:41:12.358-05:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">data</category><category domain="http://www.blogger.com/atom/ns#">open data</category><title>Inquirer, Inquirer, let down your data!</title><description>&lt;div class="tr_bq"&gt;
So, I discovered last night that &lt;a href="http://www.philly.com/philly/news/special_packages/inquirer/136746563.html"&gt;the Philadelphia Inquirer&lt;/a&gt; has put together a Google Fusion table containing a record for &lt;i&gt;&lt;a href="https://www.google.com/fusiontables/DataSource?snapid=S363604JIcJ"&gt;every homicide in Philadelphia county since 1988&lt;/a&gt;&lt;/i&gt;. I've used homicide data compiled by the Inquirer before to estimate&lt;a href="http://val-systems.blogspot.com/2010/11/unsafe-for-who.html"&gt; the risk of homicide that normal Philadelphia residents have compared to UPenn affiliates&lt;/a&gt;. With 23 years of data, the possibilities to find all sorts of patterns are enormous. Homicide rate could be compared to economic indices, public policies, or climate even, and we could get some reliable results with a time depth like this!&lt;/div&gt;
&lt;br /&gt;
&lt;b&gt;But&lt;/b&gt;, the ability to export the data was turned off by the owner of the fusion table, by accident I assumed. I wrote to them about it, and apparently it is the Inquirer's policy to not let anyone access the data! They're concerned that someone might alter the data, and attribute it back the Inquirer. Here's the message I sent them when I heard about this.&lt;br /&gt;
&lt;blockquote&gt;
I am a student at Penn, and that's why I'm interested in data generally. But I have no specific interest in the data related to my academic pursuits. I'm merely a concerned and interested Philadelphian who also has some quantitative know how.
&lt;br /&gt;
&lt;br /&gt;
I appreciate the sensitivity of the subject. In my own research, we spend a lot of time anonymizing interviews, and of course, it was a big issue with some of the Wikileaks data distributed by the NYT that it wasn't anonymized enough. However, is there precedent for altered data being hung around the neck of the original compiler? If there were an example case or two, your unease would make more sense to me. As it is though, since you are already maintaining the original data in a (relatively) publicly accessible way, it would be trivial for you, or anyone else, to demonstrate alteration or falsification of data attributed to the Inquirer.&lt;br /&gt;
&lt;br /&gt;
The fact that you're already only distributing something which is publicly available from the PPD makes allowing public access to your compiled version even less risky. There are then two sources to turn to to verify the accuracy of data that someone attributes to the Inquirer.&lt;br /&gt;
&lt;br /&gt;
My interest in this data spawns mostly from the fact that I'm a concerned Philadelphian with the necessary skills to analyze a data set like this. It looks like the Inquirer has done a great public service by compiling this data into a useful format from the various PPD reports. But it has only done so by a half measure so far, because the data is of no use when we can only look at the tables with our eyes. I'm also strongly influenced by the open data movement from within the research world. The best way to assert your confidence in your own research and analyses is to make the data openly available for anyone to recreate your results. Researchers who keep their data private are more and more looked upon with suspicion, and rightly so. The same goes for data journalism.&lt;br /&gt;
&lt;br /&gt;
Moreover, there is a huge opportunity here for the Inquirer too. I am not the only person in Philadelphia who cares about data like this and knows how to analyze it. You have a forum to curate and display analyses and mashups contributed by your readers. The Guardian does something like this with their Data Blog &lt;a href="http://www.guardian.co.uk/news/datablog"&gt;http://www.guardian.co.uk/news/datablog&lt;/a&gt;, but frankly, the data sets they distribute are thin and uninteresting compared to what you could make available.&lt;br /&gt;
&lt;br /&gt;
I hope you reconsider your data policy.&lt;/blockquote&gt;
I'm frankly not too hopeful of a change of heart regarding making the data available. There's sure to be a lot more cases like this, of news organizations jumping onto the data journalism train, without really getting how it's supposed to work.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/232777626311457607-3269900292410282247?l=val-systems.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2012/02/inquirer-inquirer-let-down-your-data.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-7988062161342709908</guid><pubDate>Fri, 27 Jan 2012 19:36:00 +0000</pubDate><atom:updated>2012-03-09T14:00:10.227-05:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">R</category><title>Distressing Numbers for Women</title><description>Sometimes I play with non-linguistic data sets recreationally. It's a totally valid hobby! I tend to gravitate towards data on the disparities between men and women, because gender equality is something that matters to me.&lt;br /&gt;
&lt;br /&gt;
I've had this one data set for a while which I got from the &lt;a href="http://www.guardian.co.uk/news/datablog/2009/mar/10/gender-educationsgendergap"&gt;Guardian Data Blog&lt;/a&gt;. It's 2006 data compiled by Unesco on men and women across a number of indicators. The ones of particular interest to me were student enrollment and estimated earned income. The student enrollment data is the percentage of potential students who are currently enrolled as students.&lt;br /&gt;
&lt;br /&gt;
So, for each country for these two indicators, I calculated the ratio of Female/Male, to have one comparable measure. And then I took the log of the ratio, cause that's a good thing to do.&lt;br /&gt;
&lt;br /&gt;
Before you look at the graph, make a guess. In countries with more gender equality in student enrollment, what do you think happens to gender equality in income?&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-QjdavZ4p1iU/TyL50SwQrmI/AAAAAAAAA_M/-9BvoAsvhF8/s1600/edu_income.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="353" src="http://4.bp.blogspot.com/-QjdavZ4p1iU/TyL50SwQrmI/AAAAAAAAA_M/-9BvoAsvhF8/s400/edu_income.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
The answer is &lt;i&gt;nothing&lt;/i&gt;. And these are not all high income, high education countries either. These are global estimates, not just OECD countries.

&lt;br /&gt;
&lt;br /&gt;
On this graph, the red lines indicate total equality, a 1:1 ratio. What's especially striking about this graph is how many countries are cluster on the right of the red line. There are &lt;i&gt;a lot&lt;/i&gt; of countries where more women are enrolled as students than men. But those countries have no better income equality on average than those countries with extreme education inequality!&lt;br /&gt;
&lt;br /&gt;
This figure plots the density function (an estimate of how many countries are located at each point along the education dimension) and the cumulative density function (what percent of countries have at least that much equality or less).&lt;br /&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-fixkYHJ3bWI/TyL7Xpsi2UI/AAAAAAAAA_U/pSCpCZfR-gc/s1600/densities.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="353" src="http://2.bp.blogspot.com/-fixkYHJ3bWI/TyL7Xpsi2UI/AAAAAAAAA_U/pSCpCZfR-gc/s400/densities.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
In about 60% of the countries in the world, more women are students than men! The US is one of these. Maybe you've heard about it. They're calling it the "crisis of boys". Quite a crisis for boys, that on average they have about 90% the education, but&amp;nbsp;156% of the money.&lt;br /&gt;
&lt;br /&gt;
I wonder what this means for the education panacea for world problems.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/232777626311457607-7988062161342709908?l=val-systems.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2012/01/distressing-numbers-for-women.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-QjdavZ4p1iU/TyL50SwQrmI/AAAAAAAAA_M/-9BvoAsvhF8/s72-c/edu_income.png" height="72" width="72" /><thr:total>1</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-7795150418056826230</guid><pubDate>Mon, 19 Dec 2011 19:18:00 +0000</pubDate><atom:updated>2011-12-20T22:53:16.347-05:00</atom:updated><title>I don't think it's linguists' fault.</title><description>Whenever media coverage of a linguistic phenomenon goes &lt;a href="http://val-systems.blogspot.com/2011/12/on-vocal-fry.html"&gt;as far off the rails as the recent vocal fry fiasco&lt;/a&gt;, linguists blame themselves. To quote some commentary that &lt;a href="http://dsbigham.net/"&gt;Doug Bigham&lt;/a&gt; posted to Facebook:&lt;br /&gt;
&lt;blockquote&gt;
It's not the journalists' fault; it's ours. We've failed miserably at public outreach because the "leaders of our field" don't believe the public will ever understand what we do and don't care to try and explain it at a level people will understand. [...] The culture of irrelevance we've created for ourselves can't be dismissed with a hand wave...&lt;/blockquote&gt;
That's some tough love, but I'm inclined to disagree. Perhaps linguists, like all academics, have some isolationist tendencies. Doug himself had a lot of trouble drumming up contributions to &lt;a href="http://popularlinguisticsonline.org/"&gt;the Popular Linguistics Magazine&lt;/a&gt;. But I think a more severe problem is that linguists' point of view is actively unwanted.&lt;br /&gt;
&lt;br /&gt;
To flesh out what I mean, I think it's worth speculating about why this particular piece of research on vocal fry captured the collective media imagination. The research itself was very modest in its scope, and there is a vast universe of research out there that media outlets could have chosen to report on. Putting aside the academic press, you could fill hours of television with just the postings to Science Now, where the vocal fry piece first got some play. So why did this particular piece of research get reported on TV, and all over the internet?&lt;br /&gt;
&lt;br /&gt;
The answer lies, I think, in the supposed culprits: young women. This is a very simple case of language shaming. The Today Show clip described vocal fry as "animal-like," and buffered the piece with iconic images of female frivolity: shopping, gossiping, talking about boys, and watching &lt;i&gt;Sex and the City&lt;/i&gt;. The original MSNBC blog post was updated with the "best comment so far" from Facebook, which said&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
"These girls sound like a bunch of neurotic dolphins who do not make sense."&lt;/blockquote&gt;
"Brilliant," says the MSNBC blogger, "can you top that?" Vocal fry has thus been successfully framed as a negative behavior. &lt;br /&gt;
&lt;br /&gt;
Why is vocal fry framed so negatively? Well, it's almost a tautology to say that young women do something, and it is undesirable.&amp;nbsp;Vocal fry is an especially striking case. Before all of this media coverage, no one, except people who work on speech, even knew what it was, or commented on it. Once it was defined and explained, and associated with young women, suddenly it fit snugly into a classic declinism frame, and a linguistic inferiority of women frame.&lt;br /&gt;
&lt;br /&gt;
The supposed motives of young women for doing vocal fry are also a key element in the media coverage. They want to 1) emulate pop artists and 2) fit in with their friends. That is, they are shallow, frivolous, and thoughtless.&amp;nbsp;Really, the tone of the story is only a slightly refined version of &lt;a href="http://www.youtube.com/watch?v=jbhnRuJBHLs"&gt;this&lt;/a&gt;&amp;nbsp;or &lt;a href="http://www.youtube.com/watch?v=v5KmIXZM-V8"&gt;this&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
Perhaps the coverage of vocal fry could be understood as being part of a larger trend of policing the behavior of women. In a lot of ways (dietarily, sexually, physically, professionally, etc.), there is a razor thin range of acceptability for young women, which now apparently includes their pitch contours. If you end your utterances with a final pitch rise, you're doing uptalk (a.k.a. &lt;a href="http://www.youtube.com/watch?v=SCNIBV87wV4"&gt;ending all your sentences with question marks&lt;/a&gt;), and if you end them with falling pitches, you're doing vocal fry.
&lt;br /&gt;
&lt;br /&gt;
&lt;hr /&gt;
&lt;br /&gt;
So where does the work of a linguist fit in here? Could we have provided higher quality research and better facts, in an equally&amp;nbsp;digestible&amp;nbsp;manner? Probably, but I submit that media interest in vocal fry has nothing to do with facts, or the quality of the research. The commentary of a linguist would not add grist to the mill of female inferiority, and would therefore just be ignored. In fact, that's exactly what happened with Janet Pierrehumbert's contribution to the Today Show story. What she said was completely lucid, and contained no technical mumbo jumbo, but the point of the coverage was not to educate, but to shame.&lt;br /&gt;
&lt;br /&gt;
The problem is that most people want to be able to use language as a device to separate the inferior from the superior. This kind of desire surfaces in almost every conversation I have about language with a non-expert. It becomes amplified in the media, and it operates at all levels of the social&amp;nbsp;hierarchy. There is &lt;a href="http://val-systems.blogspot.com/2011/01/grammar-phobia-or-judging-book-by-its.html"&gt;the denigration of people who speak non-standard Englishes&lt;/a&gt;.&amp;nbsp;Then, there is the denigration of women's and youth's speech. At the higher levels of the cultural elite, self-worth can be determined by your choice of &lt;i&gt;&lt;a href="http://val-systems.blogspot.com/2011/12/ignorant-slobs.html"&gt;octopuses, octopi, octopodes&lt;/a&gt;&lt;/i&gt;, or by whether you agree that by saying "&lt;a href="http://val-systems.blogspot.com/2010/08/on-bagel-lady.html"&gt;A whole wheat bagel, please&lt;/a&gt;," you should not have to be asked to specify that you don't want cream cheese.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;This&lt;/i&gt;&amp;nbsp;is the kind of social work that people want to use language for, and it is a frustrating cultural juggernaut to be at cross purposes with. And that is exactly why, in my opinion, most linguistic research does not gain traction in popular discourse. Before we can get to the interesting stuff, we first have to turn &lt;i&gt;everyone's&lt;/i&gt;&amp;nbsp;moral universe upside down.&lt;br /&gt;
&lt;br /&gt;
And that kind of task requires something more than just scientists being open to popularizing their research. We really have to be more agressive in a way that other sciences don't have to be. Really, it's necessary to be politicized, and I can fully understand that step being a difficult one to take for a researcher.&lt;br /&gt;
&lt;br /&gt;
I see this tension being the biggest roadblock to developing larger social relevance for linguistics. Are we scientists, or are we politicians? Can we be both, effectively?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/232777626311457607-7795150418056826230?l=val-systems.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2011/12/i-dont-think-its-linguists-fault.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><thr:total>4</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-1359992710569971014</guid><pubDate>Sat, 17 Dec 2011 06:40:00 +0000</pubDate><atom:updated>2011-12-20T17:44:02.959-05:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">vocal fry</category><title>On Vocal Fry</title><description>"Vocal fry" has been a trending topic for about a week now. It began with &lt;a href="http://news.sciencemag.org/sciencenow/2011/12/vocal-fry-creeping-into-us-speec.html"&gt;a Science Now post&lt;/a&gt; that starts&amp;nbsp;out ominously&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
A curious vocal pattern has crept into the speech of young adult women who speak American English...&lt;/blockquote&gt;
And then, it exploded. I've seen it posted all over the web, and have largely tried to ignore it. For me, when it comes to reading pieces like these, ignorance is bliss.&lt;br /&gt;
&lt;br /&gt;
But then, &lt;a href="https://twitter.com/#!/dialect"&gt;Lauren Hall-Lew&lt;/a&gt;&amp;nbsp;shared&amp;nbsp;an MSNBC blog post on the topic, entitled "&lt;a href="http://bodyodd.msnbc.msn.com/_news/2011/12/12/9393348-more-college-women-speak-in-creaks-thanks-to-pop-stars"&gt;More college women speak in creaks, thanks to pop stars&lt;/a&gt;." If I were religious, this would call for the serenity prayer. The post comes along with video from the Today Show, with Matt Lauer discussing the phenomenon.
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;center&gt;
&lt;object classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=10,0,0,0" height="245" id="msnbc7d46fb" width="420"&gt;&lt;param name="movie" value="http://www.msnbc.msn.com/id/32545640" /&gt;

&lt;param name="FlashVars" value="launch=45681253&amp;amp;width=420&amp;amp;height=245" /&gt;

&lt;param name="allowScriptAccess" value="always" /&gt;

&lt;param name="allowFullScreen" value="true" /&gt;

&lt;param name="wmode" value="transparent" /&gt;

&lt;embed name="msnbc7d46fb" src="http://www.msnbc.msn.com/id/32545640" width="420" height="245" FlashVars="launch=45681253&amp;amp;width=420&amp;amp;height=245" allowscriptaccess="always" allowFullScreen="true" wmode="transparent" type="application/x-shockwave-flash" pluginspage="http://www.adobe.com/shockwave/download/download.cgi?P1_Prod_Version=ShockwaveFlash"&gt;&lt;/embed&gt;&lt;/object&gt;
&lt;/center&gt;
&lt;br /&gt;
&lt;br /&gt;
What is wrong with this video is everything. There is a brief snippet where they interview a real linguist (Janet Pierrehumbert) who says (I paraphrase) "This isn't a new phenomenon, and it's not caused by pop-stars" (see also, &lt;a href="http://languagelog.ldc.upenn.edu/nll/?p=3626"&gt;the related Language Log post&lt;/a&gt;). But see how much air time that gets! The whole premise of the piece is wrong, and she says so, and they power right along like it's irrelevant. If you were to, say, introduce a political figure on air with the incorrect party or state affiliation, you'd have to apologize on air moments later. If you report that the jury found a defendant guilty when they were actually acquitted, you'd be ripped to shreds.&amp;nbsp;You state a bunch of garbage about language, and an expert tells you you've got it all wrong, oh, whatever, it's more fun this way. On this topic, and most others about language, the media coverage is of the same journalistic quality as "&lt;a href="http://en.wikipedia.org/wiki/Dewey_Defeats_Truman"&gt;Dewey Defeats Truman.&lt;/a&gt;"&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;

What do I know about vocal fry?&lt;/h2&gt;
Frankly, I'm not much of an expert on voice quality or register. I'm especially not too familiar with sociolinguistic work on voice quality, and that kind of knowledge seems to be necessary to evaluate the claims of this story.&lt;br /&gt;
&lt;br /&gt;
However, I have had quite a bit of experience dealing with vocal fry. Vowels and their acoustics are my thing, if you didn't know, and a vowel pronounced with vocal fry can be difficult to measure. I've looked at a lot of vowels, which means I've seen a lot of vocal fry, and have my own impressions about where it occurs. Basically, it happens most often when a speaker's pitch drops, like at a phrase boundary, or sometimes when a voiceless consonant follows the vowel.&lt;br /&gt;
&lt;br /&gt;
I'd agree that there is something more than simple mechanics of articulation going on with the use of vocal fry. There is definitely a stylistic component. I'd also agree, impressionistically, that women tend to do a bit more vocal fry than men, or at least it's more noticeable when they do.&lt;br /&gt;
&lt;br /&gt;
But vocal fry is by no means an exclusively female quality. Arguing from anecdotes is poor form, but here is an example of a relatively high profile male doing a lot of vocal fry.&lt;br /&gt;
&lt;iframe allowfullscreen="" frameborder="0" height="315" src="http://www.youtube.com/embed/loxJ3FtCJJA?rel=0" width="560"&gt;&lt;/iframe&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;

I read the paper.&lt;/h2&gt;
When watching science reporting like this, there's always the possibility that the researchers' work is being misconstrued, either by the media outlet, or by their institution's press office. So, I made good use of my institutional access to academic journals, and read the original paper (even &lt;a href="https://twitter.com/#!/search/realtime/jofrhwld%20%23vocalfry"&gt;livetweeted&lt;/a&gt; the process) by  Wolk, Abdelli-Beruh &amp;amp; Slavin (2011), which was published in the Journal of Voice. Here are the claims that rubbed me so wrong about the Today Show clip.&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Use of vocal fry is a new phenomenon.&lt;/li&gt;
&lt;li&gt;Vocal fry is exclusively a female phenomenon.&lt;/li&gt;
&lt;li&gt;Vocal fry is created and spread by figures in popular media (e.g. Ke$ha, Kim Kardashian).&lt;/li&gt;
&lt;/ul&gt;
I read the original paper with the aim of determining whether
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;there is evidence in the paper supporting these claims,&lt;/li&gt;
&lt;li&gt;the researchers themselves made these claims.&lt;/li&gt;
&lt;/ul&gt;
Wolk et al. recorded 34 women between the ages of 18 and 25, both producing a sustained vowel sound, and reading a short passage. Then, three carefully selected sentences from the reading passage were evaluated by trained speech pathologists for whether the speaker was using vocal fry. About 2/3 of the speakers were judged to use vocal fry. They also did some acoustic analysis of the vocal fry.&lt;br /&gt;
&lt;br /&gt;
That is all the evidence that Wolk et al. collected, analyzed, and presented. Needless to say, it provides no support for any of the three points. On the first, they only analyzed one age group, so there is no way to tell if young people do it more or less than older people. Their discussion of background literature actually cites a number of papers from the mid 60s which argue that vocal fry is part of normal speech. So much for it being a new phenomenon. In the discussion, the authors don't outright claim that vocal fry is a new phenomenon, but they do frame the interesting research question as figuring out how much college students do it. They deserve a pass on this point, I think, but they should perhaps consider reframing their research questions as pertaining to a larger cultural pattern.&lt;br /&gt;
&lt;br /&gt;
On vocal fry as an exclusively female phenomenon, I think the structure of this study presupposes that outcome, rather than investigating it. Why study only female college students if you didn't already think that only women did vocal fry? Part of the answer to that seems to be that male subjects are hard to come by for speech pathologists. Wolk et al. cite a previous study of vocal fry that looked at first year speech pathology graduate students. The sample turned out to be 94% female. Abdelli-Beruh, the second author, told the Today Show reporter that 99% of her students are female. Regardless, without a male sample, it's really impossible to draw any hard conclusions about the gender difference. At any rate, Wolk et al. don't outright say that "men don't do it," so I'll give them a pass there.&lt;br /&gt;
&lt;br /&gt;
Now, for the worst part: the all important influence of popular media figures. There is less than zero evidence presented by Wolk et al. for causal influence of any variety. In fact, they cannot even claim that the patterns they found are primarily social rather than being primarily anatomical, or automatic. However, on page 4, they say
&lt;br /&gt;
&lt;blockquote&gt;
It is possible that these college students have either practiced or observed this vocal register and modeled it to match popular figures.&lt;/blockquote&gt;
They said it. On the basis of zero evidence, they went ahead and said it. This is not a case of the big bad media twisting an earnest researcher's words. These researchers went ahead and speculated in an unsubstantiated and, I think, irresponsible manner.&amp;nbsp;Claims require evidence, and on this point, they have none.
&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;

Vocal Hygiene
&lt;/h2&gt;
This paper also introduced me to a new range of concepts: "vocal abuse," "vocal misuse", "vocal hygiene." I have to admit, this was all news to me. They sound vaguely familiar as something a professional singer or actor worries about. &lt;br /&gt;
&lt;br /&gt;
But in this paper, there was some speculation that the common use of vocal fry might be detrimental to these speakers' vocal health. This aspect was picked up on in &lt;a href="http://www.npr.org/2011/12/17/143865090/limericks"&gt;the Limericks section of NPR's Wait Wait Don't Tell Me&lt;/a&gt;
&lt;br /&gt;
&lt;blockquote&gt;
That low crack when I sing is my choice, &lt;br /&gt;
but my E.N.T. (Ear, Nose &amp;amp; Throat Specialist) doesn't rejoice.&lt;br /&gt;
I end phrases real low, &lt;br /&gt;
where my cords shouldn't go.&lt;br /&gt;
I'm so cool that I'm hurting my voice.
&lt;/blockquote&gt;
I'm not a speech pathologist, but I'd be surprised that even speakers who use vocal fry at a high rate could do so to an extent that injures them. Wolk et al. actually don't report how &lt;i&gt;often&lt;/i&gt; their speakers used vocal fry, just now many used vocal fry at all (one time out of three sentences). But let's go extreme and say some speakers do it once per sentence with a falling final pitch. This would exclude questions, for instance, or sentences produced with a final rise for some other reason, like uptalk (women just can't win, can they?). That's still not a lot.&lt;br /&gt;
&lt;br /&gt;
I mean, there are languages out there with contrastive creaky voice. That means that in order to say the word you intend to, you &lt;i&gt;have&lt;/i&gt; to use vocal fry.
&lt;br /&gt;
&lt;hr /&gt;
Stay tuned for next time, where I will talk more about the media's coverage, and why &lt;a href="http://val-systems.blogspot.com/2011/12/i-dont-think-its-linguists-fault.html"&gt;I don't think train wrecks like this one are linguists' fault&lt;/a&gt;, which I think is a controversial position among linguists.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/232777626311457607-1359992710569971014?l=val-systems.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2011/12/on-vocal-fry.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://img.youtube.com/vi/loxJ3FtCJJA/default.jpg" height="72" width="72" /><thr:total>13</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-6994663922045645454</guid><pubDate>Wed, 07 Dec 2011 18:50:00 +0000</pubDate><atom:updated>2011-12-07T18:17:34.826-05:00</atom:updated><title>Ignorant Slobs!</title><description>Following up on &lt;a href="http://val-systems.blogspot.com/2011/12/adventures-in-plurality.html"&gt;my plurality post&lt;/a&gt;, Jon Stevens showed me &lt;a href="http://www.youtube.com/watch?v=wFyY2mK8pxk"&gt;this video&lt;/a&gt; done by &lt;a href="http://twitter.com/korystamper"&gt;Kory Stamper&lt;/a&gt;, an associate editor at Merriam-Webster. Based on the comments, it looks like it's gone a little bit viral.
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;center&gt;
&lt;iframe allowfullscreen="" frameborder="0" height="315" src="http://www.youtube.com/embed/wFyY2mK8pxk?rel=0" width="420"&gt;&lt;/iframe&gt;
&lt;/center&gt;&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
What is so striking to me is the fictional dialogue she presents at the beginning.

&lt;br /&gt;
&lt;blockquote&gt;
So let's say you're swimming in the ocean, and you see some eight legged cephalopods. You say to your friend, "Hey! I saw a group of octopuses." And your friend says, "Hey! &lt;b&gt;You're an ignorant slob!&lt;/b&gt;  You saw a group of octopi."
&lt;/blockquote&gt;
I'm sure that Kory Stamper herself doesn't believe that &lt;a href="http://val-systems.blogspot.com/2010/08/language-use-and-morality.html"&gt;a person's moral fiber is assayable from &lt;i&gt;how&lt;/i&gt; they speak&lt;/a&gt;. Instead, I think she is simply, and accurately, representing the attitude of a great many people who some us have to deal with quite regularly.&lt;br /&gt;
&lt;br /&gt;
And, I think that the trigger of the "ignorant slob" judgment here is very telling. We're not talking about a non-standard dialect which may, for instance, employ negative concord (a.k.a. double negatives), or feature different verb agreement patterns. &lt;i&gt;Those&lt;/i&gt; people are too far gone to even begin engaging with. We're not even talking about misguided prescriptive proclamations, like "don't end a sentence with a preposition," or "don't use the passive voice." That's high school English class material, unworthy of debate.&lt;br /&gt;
&lt;br /&gt;
No, we are talking about the plural form of &lt;i&gt;octopus&lt;/i&gt;. &lt;br /&gt;
&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-YTFtWviouh4/Tt_I0GzOVKI/AAAAAAAAA-0/nvYuZ6fCJio/s1600/Octopus2.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="243" src="http://4.bp.blogspot.com/-YTFtWviouh4/Tt_I0GzOVKI/AAAAAAAAA-0/nvYuZ6fCJio/s320/Octopus2.jpg" width="320" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;You are unworthy.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
Only performance on a task as esoteric and irrelevant to every day life as forming the plural of &lt;i&gt;octopus&lt;/i&gt; is adequate to separate the elect from the damned. Woe unto you who accepts the heresy of &lt;i&gt;octopi&lt;/i&gt;. You must accept the Truth of &lt;i&gt;octopodes&lt;/i&gt; into your heart if you don't want to sound like a fucking idiot.&lt;br /&gt;
&lt;br /&gt;
&lt;hr /&gt;
&lt;br /&gt;
On a related note, no matter what their origins were, I suspect prescriptive proclamations like "don't end a sentence in a preposition" and "don't use the passive voice" only continue to be considered virtuous because they are nearly impossible to adhere to. (Hey! A twofer!)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/232777626311457607-6994663922045645454?l=val-systems.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2011/12/ignorant-slobs.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://img.youtube.com/vi/wFyY2mK8pxk/default.jpg" height="72" width="72" /><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-3042245426054476640</guid><pubDate>Tue, 06 Dec 2011 02:57:00 +0000</pubDate><atom:updated>2011-12-08T10:49:16.085-05:00</atom:updated><title>Adventures in Plurality</title><description>Update: December 8, 2011&lt;br /&gt;
I'm going to use this post as a running list of examples of over-latinate plurals.&lt;br /&gt;
&lt;br /&gt;
&lt;hr /&gt;
&lt;br /&gt;
Almost everyone is familiar with the uncertainty surrounding the plural words like &lt;i&gt;&lt;a href="http://en.wikipedia.org/wiki/Platypus#Taxonomy_and_etymology"&gt;platypus&lt;/a&gt;&lt;/i&gt;, &lt;i&gt;&lt;a href="http://en.wikipedia.org/wiki/Octopus#Etymology_and_pluralization"&gt;octopus&lt;/a&gt;, &lt;/i&gt;and &lt;i&gt;&lt;a href="http://languagelog.ldc.upenn.edu/nll/?p=2684"&gt;syllabus&lt;/a&gt;&lt;/i&gt;. They look kind of Latin, and a lot of high profile words with this kind of shape form their ending by changing the last syllable to&lt;i&gt;&amp;nbsp;&lt;/i&gt;"&lt;i&gt;i&lt;/i&gt;" (&lt;i&gt;alumni, foci, fungi&lt;/i&gt;). But in these uncertain cases, prescriptivists tell us we are hypercorrecting, and engaging in pseudo-Latin.&lt;br /&gt;
&lt;br /&gt;
But, I'm not so sure if this is simply a case where people are well educated enough to know the &lt;i&gt;-us&amp;nbsp;&lt;/i&gt;→&amp;nbsp;&lt;i&gt;-i&lt;/i&gt;, rule, but not enough to know a Greek word when they see it. For instance, I've seen it overapplied to words which aren't even spelled &lt;i&gt;-us&lt;/i&gt;. At 1:10 in this video, John Stewart says&lt;br /&gt;
&lt;blockquote&gt;
"We cannot allow ourselves, to get complacent, for the face of&amp;nbsp;tyranny&amp;nbsp;has many... orifi."&lt;/blockquote&gt;
&lt;div style="background-color: black; width: 520px;"&gt;
&lt;div style="padding: 4px;"&gt;
&lt;embed allowfullscreen="true" allowscriptaccess="always" base="." flashvars="" height="288" src="http://media.mtvnservices.com/mgid:cms:video:thedailyshow.com:250297" type="application/x-shockwave-flash" width="512"&gt;&lt;/embed&gt;&lt;br /&gt;
&lt;div style="background-color: white; font-family: Arial, Helvetica, sans-serif; font-size: 12px; margin-bottom: 0px; margin-top: 4px; padding: 4px; text-align: left;"&gt;
&lt;b&gt;&lt;a href="http://www.thedailyshow.com/watch/mon-september-28-2009/america--target-america"&gt;The Daily Show with Jon Stewart&lt;/a&gt;&lt;/b&gt;&lt;br /&gt;
Get More: &lt;a href="http://www.thedailyshow.com/full-episodes/"&gt;Daily Show Full Episodes&lt;/a&gt;,&lt;a href="http://www.indecisionforever.com/"&gt;Political Humor &amp;amp; Satire Blog&lt;/a&gt;,&lt;a href="http://www.facebook.com/thedailyshow"&gt;The Daily Show on Facebook&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;br /&gt;
Ok, clearly this was done for comedic effect, but I think it's only funny because we recognize &lt;i&gt;"orifi"&lt;/i&gt;&amp;nbsp;as well formed, but prescriptively incorrect.&lt;br /&gt;
&lt;br /&gt;
Even stranger, I recently had an experience where I wasn't quite sure how to form the plural of &lt;i&gt;danish&lt;/i&gt;&amp;nbsp;(as in pastry). I was telling a dinner party that I wasn't very hungry because I'd eaten a few at a coffee shop earlier. I said "I had a few..." and paused, because the first thing that came to my mind was "&lt;i&gt;dani". &amp;nbsp;&lt;/i&gt;Even stranger, my sister, who had seen me eat the offending pastries, offered "Dani?" And we are not alone! check out this &lt;a href="http://answers.yahoo.com/question/index?qid=20080628202250AAGFVIR"&gt;Yahoo! Question&lt;/a&gt;.
&lt;br /&gt;
&lt;blockquote&gt;
Whats the plural for danish?
Like if you have two danish(es?) is it dani?
Or just danishes?&lt;/blockquote&gt;
So for some people, the semi-productive latinate plural rule doesn't care if it's dealing with &lt;i&gt;s&lt;/i&gt; or &lt;i&gt;sh&lt;/i&gt;.&lt;br /&gt;
&lt;br /&gt;
In some ways, it makes total sense. I'd argue that the the sequence [ɨsɨs] isn't the greatest one in the world. Once you've got a rule which would let you avoid it, why not use that all the time?&lt;br /&gt;
&lt;br /&gt;
&lt;hr /&gt;
&lt;br /&gt;
In a note related to irregular plurals, I was once asked in a question period about what kind of "&lt;i&gt;metrices"&lt;/i&gt;&amp;nbsp;I use. This is way more interesting than it initially seems. "Oh, that's just analogy from &lt;i&gt;matrix&lt;/i&gt;," you say, but it isn't quite. The singular form is just &lt;i&gt;metric. &lt;/i&gt;The word doesn't have the appropriate shape to undergo the irregular pluralization until &lt;u&gt;after you've already added the regular plural suffix&lt;/u&gt;! So you wind up with &lt;i&gt;metric&lt;/i&gt; → &lt;i&gt;metrics&lt;/i&gt; → &lt;i&gt;metrices.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;hr /&gt;
UPDATE: December 7, 2011
&lt;br /&gt;
&lt;a href="http://www.ling.upenn.edu/~hilaryp/"&gt;Hilary Prichard&lt;/a&gt; has pointed me to this (rather depressing) example from Donald Trump, &lt;a href="http://www.washingtonpost.com/opinions/republicans-color-the-abortion-debate/2011/12/06/gIQAbNvpaO_story.html"&gt;discussing his plans on creating a version of the Apprentice for children&lt;/a&gt;
&lt;br /&gt;
&lt;blockquote&gt;
“We’re going to be picking 10 young wonderful children, and we’re going to make them apprenti,” Trump said. “We’re going to have a little fun with it.”&lt;/blockquote&gt;
&lt;br /&gt;
&lt;hr /&gt;
UPDATE: December 8, 2011&lt;br /&gt;
Jon Stevens pointed me to &lt;a href="http://ac360.blogs.cnn.com/2011/12/07/video-stephen-colbert-on-the-ridiculist/"&gt;this segment of Anderson Cooper's show called the RidicuList&lt;/a&gt; (originally broadcast December 7, 2011). At 2:35, Cooper says
&lt;blockquote&gt;
I did this story three different times six months ago on the RidicuList, and some of the video from the Colbert Report that-- Some of the video they used, came from the Third Eagle's video responses to my &lt;b&gt;RidicuLists&lt;/b&gt;. I like to call them &lt;b&gt;Ridiculi&lt;/b&gt;, but you get the point.
&lt;/blockquote&gt;

&lt;object width="416" height="374" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" id="ep"&gt;&lt;param name="allowfullscreen" value="true" /&gt;&lt;param name="allowscriptaccess" value="always" /&gt;&lt;param name="wmode" value="transparent" /&gt;&lt;param name="movie" value="http://i.cdn.turner.com/cnn/.element/apps/cvp/3.0/swf/cnn_416x234_embed.swf?context=embed&amp;videoId=bestoftv/2011/12/07/exp-ac-stephen-colbert-ridiculist.cnn" /&gt;&lt;param name="bgcolor" value="#000000" /&gt;&lt;embed src="http://i.cdn.turner.com/cnn/.element/apps/cvp/3.0/swf/cnn_416x234_embed.swf?context=embed&amp;videoId=bestoftv/2011/12/07/exp-ac-stephen-colbert-ridiculist.cnn" type="application/x-shockwave-flash" bgcolor="#000000" allowfullscreen="true" allowscriptaccess="always" width="416" wmode="transparent" height="374"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/232777626311457607-3042245426054476640?l=val-systems.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2011/12/adventures-in-plurality.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><thr:total>4</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-6732246615986915413</guid><pubDate>Wed, 23 Nov 2011 21:44:00 +0000</pubDate><atom:updated>2011-11-23T17:24:26.839-05:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">linguistics</category><category domain="http://www.blogger.com/atom/ns#">phonology</category><category domain="http://www.blogger.com/atom/ns#">phonetics</category><title>Siri's strange phonotactics</title><description>&lt;a href="http://cogsci.jhu.edu/people/wilson.html"&gt;Colin Wilson&lt;/a&gt; recently gave a talk here at Penn about why speakers don't&amp;nbsp;necessarily&amp;nbsp;say words in a foreign language the way foreign language speakers do. For example, the capital of Georgia (the country) is Tbilisi, which an initial [tb] onset cluster. Here, listen to pronunciation on Wikipedia: &lt;a href="http://upload.wikimedia.org/wikipedia/commons/6/64/Tbilisi.ogg"&gt;Tbilisi&lt;/a&gt;, then say it back out loud. That's basically the experiment Colin was talking about.&lt;br /&gt;
&lt;br /&gt;
So, I'm guessing that if you didn't manage to say Tbilisi exactly like the recording did, you probably said something like [tɨbilisi], adding in an extra vowel between the [t] and [b]. There are a few different explanations for why you might have added in that extra sound. &lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;You hallucinated, and thought you heard&amp;nbsp;[tɨbilisi].&lt;/li&gt;
&lt;li&gt;You accurately heard&amp;nbsp;[tbilisi], but then when you tried to say it, it came out&amp;nbsp;[tɨbilisi].&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;
Colin is pursuing another kind of analysis, where the way a Georgian speaker says /tbilisi/ sounds more like the way you would say /tɨbilisi/ in English, than the way you would say /tbilisi/ in English (if you were ever to say such a thing).&lt;br /&gt;
&lt;br /&gt;
It's pretty cool stuff, and strangely reminded me of a similar repetition experiment I inadvertently performed with my iPhone. Here's a video re-enactment:&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;object class="BLOGGER-youtube-video" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0" data-thumbnail-src="http://2.gvt0.com/vi/qsgbAFsbAio/0.jpg" height="266" width="320"&gt;&lt;param name="movie" value="http://www.youtube.com/v/qsgbAFsbAio&amp;fs=1&amp;source=uds" /&gt;


&lt;param name="bgcolor" value="#FFFFFF" /&gt;


&lt;embed width="320" height="266"  src="http://www.youtube.com/v/qsgbAFsbAio&amp;fs=1&amp;source=uds" type="application/x-shockwave-flash"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;/div&gt;
&lt;br /&gt;&lt;/div&gt;
How weird is that! Siri heard me say [ʃəvan], but for some reason repeated it back [sajobən]!&lt;br /&gt;
&lt;br /&gt;
Ok, I guess I &lt;i&gt;really&lt;/i&gt;&amp;nbsp;know what's going on here, and it's not phonotactics, but it's fun to pretend. Clearly, the transcription with the highest probability given my speech was the Irish spelling "Siobhan": P(transcription | audio). &amp;nbsp;But, given the text, the text to speech (P(audio | transcription)) produces&amp;nbsp;[sajobən].&lt;br /&gt;
&lt;br /&gt;
It still strikes me weird that Siri has some kind of dictionary lookup to give me "Siobhan" for&amp;nbsp;[ʃəvan], but then does a procedural text-to-speech.&lt;br /&gt;
&lt;br /&gt;
P.S. I think that I have an intrusive /l/ after "how" the second time I say "How do you spell Siobhan?".&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/232777626311457607-6732246615986915413?l=val-systems.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2011/11/siris-strange-phonotactics.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-8664450658777621438</guid><pubDate>Fri, 21 Oct 2011 14:26:00 +0000</pubDate><atom:updated>2011-10-21T10:29:55.061-04:00</atom:updated><title>Academia and Innovation?</title><description>Quick post today (the academic year is here, hence my recent silence).&lt;br /&gt;
&lt;br /&gt;
Robert A. Muenchen is maintaining a report &lt;a href="https://sites.google.com/site/r4statistics/popularity"&gt;here&lt;/a&gt; on the popularity of R, a programming environment for statistics.&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
He's got a bunch of measures, but these really caught my eye. A site called Rexter Analytics did a survey in 2010 asking respondents which pieces of software they used in 2009. These were the results:&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-tlIzFUq4WzU/TqF-rIqNjEI/AAAAAAAAA-Q/fbjp34Fs12M/s1600/RexerSurvey.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="400" src="http://3.bp.blogspot.com/-tlIzFUq4WzU/TqF-rIqNjEI/AAAAAAAAA-Q/fbjp34Fs12M/s400/RexerSurvey.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
So, R is at the top of the list. KDnuggets did a similar poll,  and returned very similar results.&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-MAGdgVFYdxI/TqF-qvs-2aI/AAAAAAAAA-A/AwilOBo4FmQ/s1600/Fig_6_KDnuggetsPollLanguages.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="229" src="http://3.bp.blogspot.com/-MAGdgVFYdxI/TqF-qvs-2aI/AAAAAAAAA-A/AwilOBo4FmQ/s320/Fig_6_KDnuggetsPollLanguages.PNG" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;
The take away message so far is that a lot of people who do data analysis use R. The plurality even. That is the zeitgeist. 

&lt;br /&gt;
&lt;br /&gt;
Now we come the the results that worry me.&amp;nbsp;Muenchen also did an analysis of Google Scholar citations of software packages, and produced this graph.&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/--jQkFLOAKuc/TqF-qgKhd_I/AAAAAAAAA-I/K8_TQpds-bs/s1600/Fig_7_ScholarlyImpact.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="400" src="http://1.bp.blogspot.com/--jQkFLOAKuc/TqF-qgKhd_I/AAAAAAAAA-I/K8_TQpds-bs/s400/Fig_7_ScholarlyImpact.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
Clearly R has a pretty sharply rising slope, but it still comes in fourth after a bunch of software that, frankly, only academics can use because they get institutional licenses.&lt;br /&gt;
&lt;br /&gt;
I'm not worried because I think academics should be using R (even though I do). It has more to do with the fact that people in academia like to think of themselves as the forward thinkers, and the innovators of new ideas. But in this regard they are clearly &lt;i&gt;following behind&lt;/i&gt;&amp;nbsp;the trend that everyone else is setting. Maybe it's fitting that the SPSS curve looks not unlike what I'd imagine an ivory tower to be.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/232777626311457607-8664450658777621438?l=val-systems.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2011/10/academia-and-innovation.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-tlIzFUq4WzU/TqF-rIqNjEI/AAAAAAAAA-Q/fbjp34Fs12M/s72-c/RexerSurvey.png" height="72" width="72" /><thr:total>4</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-4816656999269236968</guid><pubDate>Thu, 01 Sep 2011 15:15:00 +0000</pubDate><atom:updated>2011-09-01T11:15:41.180-04:00</atom:updated><title>Battlestar Galactica: InvoVis</title><description>I recently re-watched &lt;a href="http://en.wikipedia.org/wiki/Battlestar_Galactica"&gt;Battlestar Galactica&lt;/a&gt; (the re-imagined series). I had never watched the end after the season 4 mid-season break. Over all, I liked the series a lot, but wasn't a big fan of the decidedly anti-modernity finale. Do you know what is great? Medicine, and good odds of not dying in your 40s. You know what's even better? Space ships and faster-than-light travel.&lt;br /&gt;
&lt;br /&gt;
Anyway, I don't want to give away spoilers (even thought that wouldn't ruin it for you, &lt;a href="http://ucsdnews.ucsd.edu/newsrel/soc/2011_08spoilers.asp"&gt;so says science&lt;/a&gt;). My point of posting is this cool medical display from season 4 (and maybe earlier, I just noticed it in season 4).&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-X8D90R34l4k/Tl-YbEvMtMI/AAAAAAAAA80/-pIv3pfzkwo/s1600/Screen+Shot+2011-08-24+at+3.20.08+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="246" src="http://2.bp.blogspot.com/-X8D90R34l4k/Tl-YbEvMtMI/AAAAAAAAA80/-pIv3pfzkwo/s400/Screen+Shot+2011-08-24+at+3.20.08+PM.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
I like this display a lot, because it fits in with the general BSG style of keeping things close to current reality, ish. Sure, they have humanoid robots, but they also still use nukes, not photon torpedoes.&lt;br /&gt;
&lt;br /&gt;
I could almost imagine seeing this display today, maybe in a tech company's speculative design video. It appears to incorporate some contemporary data display ideas, like &lt;a href="http://en.wikipedia.org/wiki/Sparkline"&gt;sparklines&lt;/a&gt;. My feeling is that in a lot of sci-fi, data displays like this are a lot more cryptic, and hardly seem practical from the view of an analyst. This display, while definitely looking futuristic, also looks like it's all about practicality.&lt;br /&gt;
&lt;br /&gt;
The element that gets the most screen space is the EKG, which animates and bleeps just like in any medical drama.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-O1QAy1rM3f0/Tl-cgQ4i0MI/AAAAAAAAA84/nGRF3DPTK8U/s1600/ekg.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="85" src="http://4.bp.blogspot.com/-O1QAy1rM3f0/Tl-cgQ4i0MI/AAAAAAAAA84/nGRF3DPTK8U/s400/ekg.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
Then, there are these little widgets.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-90FQR8Q9GVQ/Tl-cx8496KI/AAAAAAAAA88/nawKu-YJCps/s1600/heart_rate.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/-90FQR8Q9GVQ/Tl-cx8496KI/AAAAAAAAA88/nawKu-YJCps/s1600/heart_rate.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
I believe the larger number is the current heart rate. It updates fairly regularly, going up or down a few beats-per-minute (I missed a whole bunch of dialogue staring at this in the background). The little blue light above the heart rate blinks with every heart beat, or at least every time the display beeps. I don't know what the smaller number represents. I didn't see it update, so it might not represent dynamic data.&lt;br /&gt;
&lt;br /&gt;
Then, there's these three panels, probably small-multiples of some kind.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-TlfHUxibHI4/Tl-dzyM_9zI/AAAAAAAAA9A/CIqBu405yqI/s1600/frequency.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="45" src="http://2.bp.blogspot.com/-TlfHUxibHI4/Tl-dzyM_9zI/AAAAAAAAA9A/CIqBu405yqI/s400/frequency.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
They're largely static, except they redraw themselves every few seconds. So maybe they could be density distributions over a time interval, or maybe frequency analyses.&lt;br /&gt;
&lt;br /&gt;
Then there are these bars.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-jey5TibAV7M/Tl-eck_cP5I/AAAAAAAAA9E/CMO0WiDCbxk/s1600/bars.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="61" src="http://3.bp.blogspot.com/-jey5TibAV7M/Tl-eck_cP5I/AAAAAAAAA9E/CMO0WiDCbxk/s400/bars.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
This is maybe the most vexing element on the display for me. At first I thought that they might display blood sugar or oxygen relative to some baseline, but you'll notice that at some points, there are bars that go both above and below the baseline. So, they have to be two kinds of measures that are usually in a complementary distribution, but not always. Either way, it seems to clearly be a time series at a relatively large granularity, since it never redraws itself during a scene.&lt;br /&gt;
&lt;br /&gt;
Lastly, there's this strip at the bottom.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-WWc-at2YXBc/Tl-fXtKO7SI/AAAAAAAAA9I/OwwBOLu8haU/s1600/spectral.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="16" src="http://1.bp.blogspot.com/-WWc-at2YXBc/Tl-fXtKO7SI/AAAAAAAAA9I/OwwBOLu8haU/s400/spectral.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
It's relatively understated compared to everything else in the display, meaning it can't be any sort of really vital statistic. It looks like maybe a spectral analysis of some kind, or maybe another time series (sleeping and waking time?). This also remains static during scenes.&lt;br /&gt;
&lt;br /&gt;
There are also a lot of elements of the user interface which are very contemporary. Take these boxes for instance.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-aQgevwOp8SQ/Tl-gFs8CRyI/AAAAAAAAA9M/-FnUQIqHGms/s1600/ui.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="18" src="http://3.bp.blogspot.com/-aQgevwOp8SQ/Tl-gFs8CRyI/AAAAAAAAA9M/-FnUQIqHGms/s400/ui.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
I think we all know that if you were to press on the screen on one of those triangles, these little boxes would expand to show more information, or contract and hide the information they're currently displaying. This is definitely something that wouldn't have been incorporated into speculative UI designs 20 years ago.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/232777626311457607-4816656999269236968?l=val-systems.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2011/09/battlestar-galactica-invovis.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/-X8D90R34l4k/Tl-YbEvMtMI/AAAAAAAAA80/-pIv3pfzkwo/s72-c/Screen+Shot+2011-08-24+at+3.20.08+PM.png" height="72" width="72" /><thr:total>1</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-367451739889802429</guid><pubDate>Sun, 28 Aug 2011 22:05:00 +0000</pubDate><atom:updated>2011-08-28T18:08:14.683-04:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">irene</category><title>Irene!</title><description>Well, here in Philadelphia, we've just braved Hurricane Irene. From what I've heard, damage here was relatively minimal, and we haven't lost power. My friends further north in NYC are in my thoughts, cause it looks like they got really hammered. &lt;br /&gt;
&lt;br /&gt;
The silver lining here for me is that I was able to go collect data from &lt;a href="http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=KPAPHILA21"&gt;the Weather Underground station&lt;/a&gt; about six blocks away from where I live. Here are the numbers.&lt;br /&gt;
&lt;br /&gt;
We got 5.68 inches of rain, which fell most steadily between 6PM and midnight last night.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-m-gEG7_dmGI/Tlq590b4zDI/AAAAAAAAA8k/OlE9YQmfb14/s1600/rain.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="250" src="http://1.bp.blogspot.com/-m-gEG7_dmGI/Tlq590b4zDI/AAAAAAAAA8k/OlE9YQmfb14/s400/rain.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
Barometric pressure, on the other hand, hit the floor at 6AM today.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-lDN3DJ7QWFk/Tlq5_uzHIsI/AAAAAAAAA8o/w6cr7geQ6QI/s1600/pressure.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="250" src="http://3.bp.blogspot.com/-lDN3DJ7QWFk/Tlq5_uzHIsI/AAAAAAAAA8o/w6cr7geQ6QI/s400/pressure.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
As for wind speeds, there are two measures from the weather station. Speed is, I believe, average wind speed over the reporting time bin (which varies between 1 and 7 minutes...), and Gust is, I believe, the maximum speed during that time bin. Either way, our max wind speeds were around 11PM last night, and they've stayed pretty high into this afternoon.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-NMQRDJUcR44/Tlq6AItegqI/AAAAAAAAA8s/znJHAYwjG7M/s1600/wind.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="250" src="http://2.bp.blogspot.com/-NMQRDJUcR44/Tlq6AItegqI/AAAAAAAAA8s/znJHAYwjG7M/s400/wind.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/232777626311457607-367451739889802429?l=val-systems.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2011/08/cumulative-rain-function.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/-m-gEG7_dmGI/Tlq590b4zDI/AAAAAAAAA8k/OlE9YQmfb14/s72-c/rain.png" height="72" width="72" /><thr:total>1</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-2566426274273001635</guid><pubDate>Tue, 23 Aug 2011 19:48:00 +0000</pubDate><atom:updated>2011-08-23T17:02:12.007-04:00</atom:updated><title>Earthquake: Do your part for data collection!</title><description>An earthquake just happened on the East Coast, my first! It turns out the US Geological survey has an online survey for earthquakes called "Did you feel it?" and the data is freely available! So&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://earthquake.usgs.gov/earthquakes/dyfi/events/se/082311a/us/form.en.disabled.html"&gt;Go take the survey!&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
As of now, it looks like survey response has really petered out.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://earthquake.usgs.gov/earthquakes/dyfi/events/se/082311a/us/form.en.disabled.html" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="329" src="http://earthquake.usgs.gov/earthquakes/dyfi/events/us/c0005ild/us/usc0005ild_plot_numresp.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
But you can download the data and some graphs here, &lt;a href="http://earthquake.usgs.gov/earthquakes/dyfi/events/se/082311a/us/index.html"&gt;in the downloads tab&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
I whipped up this quick visualization of the responses.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://earthquake.usgs.gov/earthquakes/dyfi/events/se/082311a/us/form.en.disabled.html" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="640" src="http://2.bp.blogspot.com/-REcaY7SOPJU/TlQDPVGqLjI/AAAAAAAAA8Y/IP8DJMtgN3o/s640/quake.png" width="533" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
Look at that big depressing gap in the response data, right where the epicenter was! And all across Pennsylvania.&lt;br /&gt;
&lt;br /&gt;
If you're from those areas, you really ought to &lt;a href="http://earthquake.usgs.gov/earthquakes/dyfi/events/se/082311a/us/form.en.disabled.html"&gt;go take the survey!&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Update&lt;/h2&gt;Well, I feel a little stupid. It looks like there are two locations on the USGS site for this earthquake, and the one I was looking at is not up-to-date... Maybe I don't feel so stupid, it's not the best kind of design.&lt;br /&gt;
&lt;br /&gt;
The real data to download is &lt;a href="http://earthquake.usgs.gov/earthquakes/dyfi/events/se/082311a/us/index.html"&gt;here&lt;/a&gt;. I've already updated the links above.&lt;br /&gt;
&lt;br /&gt;
And here's the real visualizations. Here's the raw data:&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-zL7ryVNaVc0/TlQU47oGufI/AAAAAAAAA8c/NgU3fujBuLU/s1600/quake2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="640" src="http://2.bp.blogspot.com/-zL7ryVNaVc0/TlQU47oGufI/AAAAAAAAA8c/NgU3fujBuLU/s640/quake2.png" width="564" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
And here's mean values across a grid.&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-u_6Hd23M7dE/TlQU5NPnSbI/AAAAAAAAA8g/9QshFFlFeCA/s1600/quake3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="640" src="http://1.bp.blogspot.com/-u_6Hd23M7dE/TlQU5NPnSbI/AAAAAAAAA8g/9QshFFlFeCA/s640/quake3.png" width="565" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/232777626311457607-2566426274273001635?l=val-systems.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2011/08/earthquake-do-your-part-for-data.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/-REcaY7SOPJU/TlQDPVGqLjI/AAAAAAAAA8Y/IP8DJMtgN3o/s72-c/quake.png" height="72" width="72" /><thr:total>1</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-1313818700794973058</guid><pubDate>Wed, 17 Aug 2011 18:58:00 +0000</pubDate><atom:updated>2011-08-31T22:31:22.120-04:00</atom:updated><title>Does blogging do me any good? A quantitative analysis.</title><description>I've been wondering if blogging does me any good. I don't mean for the heart and soul. I enjoy blogging and am going to keep it up (except for those end-of-semester hiatuses). But I've been wondering if blogging does me any good professionally, or whatever. Obviously, "a professional or whatever good" is hard to define, so I'll define it according to the data that I have.&lt;br /&gt;
&lt;br /&gt;
I maintain, along with this blog, &lt;a href="http://www.ling.upenn.edu/~joseff/"&gt;an academic website&amp;nbsp;&lt;/a&gt;where I have all of my more serious research stuff. I've got Google analytics set up on both my blog, and my academic site, keeping track of page views. So, if I can detect that page views of my blog drive some page views to my academic website, then I'll conclude that blogging is doing me some professional good. This makes a certain kind of sense, since what matters to me at this particular stage of my professional life is getting my ideas out there, and my ideas are catalogued on my academic site.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;
The raw data&lt;/h2&gt;
Here is one year's worth of traffic to Val Systems. Those two huge spikes are thanks to Mark Liberman, who reblogged &lt;a href="http://val-systems.blogspot.com/2010/08/britney-spears-tongue.html"&gt;my post about Brittany Spears' tongue&lt;/a&gt;, and to&amp;nbsp;&lt;strike&gt;the Car Talk Guys, who linked to &lt;a href="http://val-systems.blogspot.com/2010/09/pretentious-hole.html"&gt;my post about their short-a system&lt;/a&gt; on the Car Talk site for a bit&lt;/strike&gt; Sociological images, where &lt;a href="http://val-systems.blogspot.com/2011/01/grammar-phobia-or-judging-book-by-its.html"&gt;I guest posted about a "grammar" book&lt;/a&gt;.&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-NK_fmn6YQ7U/Tkvt5lU7o0I/AAAAAAAAA7c/h4AZxqCTVZU/s1600/blog.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="205" src="http://3.bp.blogspot.com/-NK_fmn6YQ7U/Tkvt5lU7o0I/AAAAAAAAA7c/h4AZxqCTVZU/s400/blog.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
Now here is the traffic from my academic site, and my research page on that site from the same time period.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-_BP550GSMAY/Tkvt6PrscDI/AAAAAAAAA7k/2d6zmN3yDN0/s1600/site.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="206" src="http://3.bp.blogspot.com/-_BP550GSMAY/Tkvt6PrscDI/AAAAAAAAA7k/2d6zmN3yDN0/s400/site.png" width="400" /&gt;&lt;/a&gt;&lt;a href="http://2.bp.blogspot.com/-C4kdems0OV4/Tkvt51xYseI/AAAAAAAAA7g/PneDuLU1rKQ/s1600/research.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="206" src="http://2.bp.blogspot.com/-C4kdems0OV4/Tkvt51xYseI/AAAAAAAAA7g/PneDuLU1rKQ/s400/research.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
As you can see, my academic site gets a &lt;i&gt;lot&lt;/i&gt;&amp;nbsp;less page views than my blog. Prospects are not very bright.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;
Autocorrelation&lt;/h2&gt;
My first step of analysis was to figure out how correlated page views of each site were within each site. That is, how correlated are page views on my blog with page views from one day later on my blog, or two days later, etc. To calculate this, I used the &lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;acf()&lt;/span&gt; function in R. Here's the autocorrelation function from my blog. The x-axis represents how many days into the future you're comparing page views, and the y-axis represents the correlation between page views separated by that many days.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-kIcEycQglfE/Tkv3SJDVivI/AAAAAAAAA7o/zDj-_Kj6XYA/s1600/blog.acf.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="308" src="http://3.bp.blogspot.com/-kIcEycQglfE/Tkv3SJDVivI/AAAAAAAAA7o/zDj-_Kj6XYA/s400/blog.acf.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
It looks like page views on my blog are pretty well correlated with the pages views from one day before (0.45). After that, there is a correlation drop off, which I'll interpret as new-post-decay. It seems like influence that a single new post has on my blog traffic is fairly minimal after five days.&lt;br /&gt;
&lt;br /&gt;
Here's the autocorrelation function for my academic site.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-qEZWX-7Se20/TkwC6i_3DeI/AAAAAAAAA7s/Mp-PpoWyuvI/s1600/site.acf.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="308" src="http://2.bp.blogspot.com/-qEZWX-7Se20/TkwC6i_3DeI/AAAAAAAAA7s/Mp-PpoWyuvI/s400/site.acf.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
As you can see, the over-all size of the correlations are much smaller than for the blog. This is most likely because each new post is a new event that happens on my blog, which can have an effect which lasts for a few days, whereas nothing &lt;i&gt;happens&lt;/i&gt; on my academic site in the same way. However, there is an apparently cyclic pattern, where page views are most positively correlated at 7 day intervals, and most negatively correlated at 3 to 4 day intervals.&lt;br /&gt;
&lt;br /&gt;
Duh! Who does work on the weekends?&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-pnPW62PXIDc/TkwF_u71D2I/AAAAAAAAA7w/09-NDE5pSW0/s1600/cycle.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="262" src="http://1.bp.blogspot.com/-pnPW62PXIDc/TkwF_u71D2I/AAAAAAAAA7w/09-NDE5pSW0/s400/cycle.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
To factor out this cyclic pattern, I fit a linear regression of page views for my academic site and research page with weekday as a categorical predictor. I'll use the residuals from these regressions for doing the cross-correlation.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;
Cross-correlation&lt;/h2&gt;
Next, I checked the cross-correlation of (residualized) page views. This checks to see how correlated page views are between any two of the sites at different time lags. First, here's the cross correlation of my main academic site and my research page. I knew these would have to be highly correlated, since my research page is the most clicked link on my main page. &lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-RKzJO0PRIQk/TkwLNtUe7WI/AAAAAAAAA74/wksX2SYY0CM/s1600/s.r.acf.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="308" src="http://4.bp.blogspot.com/-RKzJO0PRIQk/TkwLNtUe7WI/AAAAAAAAA74/wksX2SYY0CM/s400/s.r.acf.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
Correlations with negative lag indicate that visits to my research page were correlated with visits to my main academic site a few days later. Positive lags mean visits to my academic page indicate that visits to my academic site were correlated with visits to my research page a few days later. The correlation at 0 indicates how correlated visits to my academic page and my research page were on the same day.&lt;br /&gt;
&lt;br /&gt;
Unsurprisingly, the only strong correlation between visits to my main academic site and my research page are on the same day. That spike around 10 days makes no sense, so it's probably just noise.&lt;br /&gt;
&lt;br /&gt;
So, drum-roll please, how correlated are visits to my blog and my main academic site?&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-Ic9ujFrgiQQ/TkwL3Z5163I/AAAAAAAAA78/DFY_6G2UwNs/s1600/b.s.acf.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="308" src="http://3.bp.blogspot.com/-Ic9ujFrgiQQ/TkwL3Z5163I/AAAAAAAAA78/DFY_6G2UwNs/s400/b.s.acf.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
I would analyze this as bupkis. Likewise for my research page.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-Cl_Q8yeMdm4/TkwMk6s75YI/AAAAAAAAA8A/2bEZayTPLBE/s1600/b.r.acf.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="308" src="http://1.bp.blogspot.com/-Cl_Q8yeMdm4/TkwMk6s75YI/AAAAAAAAA8A/2bEZayTPLBE/s400/b.r.acf.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;h2&gt;
To sum up&lt;/h2&gt;
It looks like blogging is just a fun diversion for me right now. Even though it would have been a lot of fun to come to my advisor or department chair with strong results that blogging is professionally fruitful, I'm fine with the way things turned out.&lt;br /&gt;
&lt;br /&gt;
However, I shouldn't have been surprised. If I &lt;i&gt;was&lt;/i&gt;&amp;nbsp;trying to use blogging as a platform for promoting my professional work, I wasn't doing it very well. If you're looking at my blog now (vs an RSS subscription), you may notice that I've added some links to the right, which lead to my academic site, and to my github site.&amp;nbsp;Why not try to make blogging work for me a little bit?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/232777626311457607-1313818700794973058?l=val-systems.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2011/08/does-blogging-do-me-any-good.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-NK_fmn6YQ7U/Tkvt5lU7o0I/AAAAAAAAA7c/h4AZxqCTVZU/s72-c/blog.png" height="72" width="72" /><thr:total>0</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-7473814351441658969</guid><pubDate>Mon, 15 Aug 2011 03:05:00 +0000</pubDate><atom:updated>2011-08-14T23:05:44.087-04:00</atom:updated><title>Max Weber on why there is no decision process for research</title><description>In the process of moving, I've come across a bunch of books from my undergrad Sociology minor days, including a book of collected works by Max Weber. You may know him best for the notion of &lt;a href="http://en.wikipedia.org/wiki/Protestant_work_ethic"&gt;the Protestant work ethic&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
At any rate, the volume includes text from a lecture called &lt;i&gt;Science as a Vocation &lt;/i&gt;(&lt;a href="http://www.ne.jp/asahi/moriyuki/abukuma/weber/lecture/science_frame.html"&gt;available free online here&lt;/a&gt;), which I've decided to read through because of its personal relevancy, and I've come across this wonderful paragraph.&lt;br /&gt;
&lt;blockquote&gt;"Nowadays in circles of youth there is a widespread notion that science has become a problem in calculation, fabricated in laboratories or statistical filing systems just as 'in a factory,' a calculation involving only the cool intellect and not one's 'heart and soul.' First of all, one must say that such comments lack all clarity about what goes on in a factory or in a laboratory. In both, some idea has to occur to someone's mind, and it has to be a correct idea, if one is to accomplish anything worthwhile. And such intuition cannot be forced. It has nothing to do with any cold calculation. Certainly calculation is also an indispensable prerequisite. No sociologist, for instance, should think himself too good, even in his old age, to make tens of thousands of quite trivial computations in his head and perhaps for months at a time. One cannot with impunity try to transfer this task entirely to mechanical assistants if one wishes to figure something, even though the final result is often small indeed. But if no 'idea' occurs to his mind about the direction of his computations and, during his computations, about the bearing of the emergent single results, then even this small result will not be yielded."&lt;/blockquote&gt;&lt;br /&gt;
This seems to me to be a nice enough refutation, 90 years prescient, of &lt;a href="http://www.wired.com/science/discoveries/magazine/16-07/pb_theory"&gt;that strange Wired article&lt;/a&gt; from a few years ago which claimed that big-data is going to kill the scientific method.&lt;br /&gt;
&lt;br /&gt;
It also resonates with an issue near and dear to my heart: promoting statistical literacy within linguistics. And that takes a two pronged approach. The first is developing statistical competency to be able to run and analyze your own statistics, without relying on semi-automated techniques, like stepwise regression, or put slightly differently, transferring the task entirely to mechanical assistants. The second is to be sure to treat statistical methods as tools for investigation, not to reify them as the objects if inquiry themselves, nor their results as god's truth, spoken by its R-acle.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/232777626311457607-7473814351441658969?l=val-systems.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2011/08/max-weber-on-why-there-is-no-decision.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><thr:total>1</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-631612988211521443</guid><pubDate>Tue, 09 Aug 2011 17:09:00 +0000</pubDate><atom:updated>2011-08-09T13:09:51.672-04:00</atom:updated><category domain="http://www.blogger.com/atom/ns#">language</category><title>Miraculous Thought Transference</title><description>&lt;a href="http://www.linuxkungfu.org/images/fun/geek/project.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" src="http://rookery9.aviary.com.s3.amazonaws.com/9323500/9323921_fb3c.png" /&gt;&lt;/a&gt;I've &lt;a href="http://val-systems.blogspot.com/2011/08/language-communication-and-iphone.html"&gt;already blogged&lt;/a&gt; about what I didn't like about &lt;a href="http://www.ted.com/talks/mark_pagel_how_language_transformed_humanity.html"&gt;Mark Pagel's TED talk&lt;/a&gt;. I'm not going to beat up on it more, specifically. Rather, I'd like to problematize the meme that he kicked it off with.&lt;br /&gt;
&lt;blockquote&gt;"Each of you possesses the most powerful, dangerous and subversive trait that natural selection has ever devised. It's a piece of neural audio technology for rewiring other people's minds. I'm talking about your &lt;b&gt;language&lt;/b&gt;, of course, because it &lt;b&gt;allows you to implant a thought from your mind directly into someone else's mind&lt;/b&gt;, and they can attempt to do the same to you, without either of you having to perform surgery." [emphasis added]&lt;/blockquote&gt;Hopefully by now, you've caught on to my own subversive juxtaposition. Briefly, I think this meme is cuter than it is true.&lt;br /&gt;
&lt;br /&gt;
I call it a meme, because I seem to recall it showing up in Steven Pinker's &lt;i&gt;The Language Instinct&lt;/i&gt;, and I'm sure it's popped up other places too. Obviously, this meme brushes right up against other issues regarding language and thought. For instance, is language the structure of thought, and does language somehow constrain our thoughts? I'm not well versed enough in these issues to comment, and I only mention them here in order to say that I won't be saying anything about them, except for what I have already said.&lt;br /&gt;
&lt;br /&gt;
Did that make sense? If so, I have succeeded in externalized telepathy. If not, that's sort of my point. Unsuccessful thought implants are a pervasive fact. Just ask the customer and the project leader, or the teacher and the student. If it were so easy to implant thoughts in others' minds, would schooling really take so long?&amp;nbsp;Perhaps thought implant rejection can be blamed on external factors, like inattention on the hearer's part, or the complexity of the thought being transmitted, but I'd be surprised if that was all there was to it.&lt;br /&gt;
&lt;br /&gt;
I'd guess, and this is where I enter into purest speculation, that successful communication between a speaker and hearer has a lot more to do with the fact that people are willing to attribute minds and intentional stances to just about anything, including other people, than with the design specifications of language.&lt;br /&gt;
&lt;br /&gt;
In fact, the ability to implant (false) beliefs in someone else's mind is most definitely not only possible within the domain of language. Just ask Marcel Marceau.&lt;br /&gt;
&lt;br /&gt;
&lt;iframe allowfullscreen="" frameborder="0" height="349" src="http://www.youtube.com/embed/i99k7nCnVwM?rel=0" width="425"&gt;&lt;/iframe&gt;&lt;br /&gt;
&lt;br /&gt;
Or, puzzle over this interesting item.&lt;br /&gt;
&lt;br /&gt;
&lt;img border="0" src="http://i.imgur.com/ARV2K.jpg" /&gt;&lt;br /&gt;
&lt;br /&gt;
Perhaps language is better&amp;nbsp;&amp;nbsp;than other natural forms of communication&amp;nbsp;at transmitting propositional content, but it's certainly not ideal for it either. If it were, then there wouldn't have been any need to develop&amp;nbsp;&lt;a href="http://en.wikipedia.org/wiki/Logic"&gt;formal logic&lt;/a&gt;, or&amp;nbsp;&lt;a href="http://en.wikipedia.org/wiki/Propositional_calculus"&gt;propositional calculus&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
So there is the problem that I want to create for this meme. Language does not really "implant a thought from your mind directly into someone else's mind," and insofar as it does, it doesn't do so uniquely above all other forms of communication. It's a pretty meme though, sort of like a poem about linguistics, and it's attention grabbing. But if it matters whether it's true and accurate, I don't think it stands up.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/232777626311457607-631612988211521443?l=val-systems.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2011/08/miraculous-thought-transference.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://img.youtube.com/vi/i99k7nCnVwM/default.jpg" height="72" width="72" /><thr:total>1</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-9118162415430306856</guid><pubDate>Wed, 03 Aug 2011 17:36:00 +0000</pubDate><atom:updated>2011-08-03T18:37:12.905-04:00</atom:updated><title>Language, Communication, and iPhone</title><description>I'm a bit of a caffeine junky. Every day, regardless of where I am, I need to get my fix. I've also been very lucky to do some international traveling, which has put me in the situation where I need a coffee, but I don't speak the local language. And you know what? I've &lt;i&gt;always&lt;/i&gt; successfully ordered and paid for my coffee, and even gotten what I intended to order.&lt;br /&gt;
&lt;br /&gt;
Ok, enough speaking in parables. My point is that communication is not the same thing as language, and even complex economic transactions can be successfully carried out with only communication and no language.&lt;br /&gt;
&lt;br /&gt;
And that's why I'm not a big fan of this TED Talk by Mark Pagel, called &lt;i&gt;&lt;a href="http://www.ted.com/talks/mark_pagel_how_language_transformed_humanity.html"&gt;How language transformed humanity&lt;/a&gt;&lt;/i&gt;.&lt;br /&gt;
&lt;!--copy and paste--&gt;&lt;object width="526" height="374"&gt; &lt;param name="movie" value="http://video.ted.com/assets/player/swf/EmbedPlayer.swf"&gt;&lt;/param&gt;&lt;param name="allowFullScreen" value="true" /&gt;&lt;param name="allowScriptAccess" value="always"/&gt;&lt;param name="wmode" value="transparent"&gt;&lt;/param&gt;&lt;param name="bgColor" value="#ffffff"&gt;&lt;/param&gt;&lt;param name="flashvars" value="vu=http://video.ted.com/talk/stream/2011G/Blank/MarkPagel_2011G-320k.mp4&amp;su=http://images.ted.com/images/ted/tedindex/embed-posters/MarkPagel_2011G-embed.jpg&amp;vw=512&amp;vh=288&amp;ap=0&amp;ti=1203&amp;lang=eng&amp;introDuration=15330&amp;adDuration=4000&amp;postAdDuration=830&amp;adKeys=talk=mark_pagel_how_language_transformed_humanity;year=2011;theme=new_on_ted_com;theme=words_about_words;theme=a_taste_of_tedglobal_2011;theme=evolution_s_genius;event=TEDGlobal+2011;tag=Culture;tag=Science;tag=biology;tag=communication;tag=evolution;tag=language;&amp;preAdTag=tconf.ted/embed;tile=1;sz=512x288;" /&gt;&lt;embed src="http://video.ted.com/assets/player/swf/EmbedPlayer.swf" pluginspace="http://www.macromedia.com/go/getflashplayer" type="application/x-shockwave-flash" wmode="transparent" bgColor="#ffffff" width="526" height="374" allowFullScreen="true" allowScriptAccess="always" flashvars="vu=http://video.ted.com/talk/stream/2011G/Blank/MarkPagel_2011G-320k.mp4&amp;su=http://images.ted.com/images/ted/tedindex/embed-posters/MarkPagel_2011G-embed.jpg&amp;vw=512&amp;vh=288&amp;ap=0&amp;ti=1203&amp;lang=eng&amp;introDuration=15330&amp;adDuration=4000&amp;postAdDuration=830&amp;adKeys=talk=mark_pagel_how_language_transformed_humanity;year=2011;theme=new_on_ted_com;theme=words_about_words;theme=a_taste_of_tedglobal_2011;theme=evolution_s_genius;event=TEDGlobal+2011;tag=Culture;tag=Science;tag=biology;tag=communication;tag=evolution;tag=language;&amp;preAdTag=tconf.ted/embed;tile=1;sz=512x288;"&gt;&lt;/embed&gt; &lt;/object&gt; &lt;br /&gt;
&lt;br /&gt;
I think his introduction is far too simplistic, especially with regards to his passing comments about language acquisition. He says &lt;blockquote&gt;"Just imagine the sense of wonder in a baby when it first discovers that merely by uttering a sound, it can get objects to move across a room, as if by magic, and maybe into its mouth."&lt;/blockquote&gt;It is obvious that there must be more to the secret sauce of language acquisition than that. Even Nim Chimpsky was able to work out that by merely waving his hands around, he could get things into his mouth. Just read his quotations: &lt;a href="http://en.wikipedia.org/wiki/Nim_Chimpsky#Quotations"&gt;Wikipedia/Nim Chimpsky/Quotations&lt;/a&gt;. But Nim never acquired language.&lt;br /&gt;
&lt;br /&gt;
There's also something strangely self defeating about his entire evolutionary argument. He seems to say that humans evolved language as a means to the end of creating large, modern societies. I'm sure he doesn't &lt;i&gt;really&lt;/i&gt; think it worked like that. Evolution isn't goal oriented, and he's a biologist. Anyway, the last part of his talk is devoted to the "problem" of language diversity, and how we use it to build barriers between populations. The whole talk, laid out in one sentence, becomes: &lt;blockquote&gt;Humans evolved language in order to encourage cooperation and to build large societies, but then, we actually used it to build divisions between population groups, and that's a problem because of globalization.&lt;/blockquote&gt;How on earth could language be failing at the very goal for which it was apparently evolved?&lt;br /&gt;
&lt;br /&gt;
Now, I'm not saying the world would be exactly the same if there was no language. We probably wouldn't have an iPhone, as Pagel playfully illustrated in his talk. But how much language do we really need to achieve the goal of a large society, and arrive at iPhone? Does language really need to be recursive? If we couldn't say &lt;br /&gt;
&lt;ul&gt;&lt;li&gt;I know [that you hate me].&lt;/li&gt;
&lt;/ul&gt;could we still have arrived at iPhone? Who really needs relative clauses anyway? On the flip side, what if language were more "permissive," and we &lt;i&gt;could&lt;/i&gt; say&lt;br /&gt;
&lt;ul&gt;&lt;li&gt;What&lt;sub&gt;i&lt;/sub&gt; did you see the man who bought &lt;i&gt;t&lt;sub&gt;i&lt;/sub&gt;&lt;/i&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;br /&gt;
These are technical properties of language I'm talking about. They may seem like little details, but they're actually very fundamental to very nature of language. And it's almost impossible to connect them directly to the evolutionary story Mark Pagel is telling. All that story needs is &lt;i&gt;some&lt;/i&gt; means of communication, but says nothing about why we have the specific system of language that we do, out of all the possible systems that could have existed.&lt;br /&gt;
&lt;br /&gt;
Needless to say, linguists never concern themselves with questions like "is the evolutionary consequence of high applicatives an iPhone?" and good thing too.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;* * *&lt;/h2&gt;One thing that I did like was that he said "Tower of B[ei]bel." That's the way I say it.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Update&lt;/h2&gt;Apparently Pagel has a habit of saying strange things in public places: &lt;a href="http://languagelog.ldc.upenn.edu/nll/?p=1186"&gt;LanguageLog/Scrabble tips for time travelers?&lt;/a&gt;. Hat tip to Charles Yang.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/232777626311457607-9118162415430306856?l=val-systems.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2011/08/language-communication-and-iphone.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><thr:total>4</thr:total></item><item><guid isPermaLink="false">tag:blogger.com,1999:blog-232777626311457607.post-8086741355897772856</guid><pubDate>Sun, 31 Jul 2011 18:30:00 +0000</pubDate><atom:updated>2011-07-31T14:30:32.193-04:00</atom:updated><title>A Review of Project Nim</title><description>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.the-numbers.com/video/Project-Nim/Project-Nim-poster.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" height="179" src="http://www.the-numbers.com/video/Project-Nim/Project-Nim-poster.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;i&gt;Project Nim&lt;/i&gt; is a new documentary out about the life of Nim Chimpsky, the chimpanzee that a group of researchers at Columbia tried to teach sign language. Here's a brief synopsis.&lt;br /&gt;
&lt;blockquote&gt;"Let's take a chimpanzee, put it in a house in the upper west side with a psychoanalyst who doesn't know anything about chimpanzees, language, language acquisition, or sign language. Also, she has 7 other children in that house. &lt;i&gt;What could go wrong?"&lt;/i&gt;&lt;/blockquote&gt;&lt;br /&gt;
To put Project Nim in some perspective, Nim Chimpsky was born in 1973, which is two years after &lt;a href="http://en.wikipedia.org/wiki/Stanford_prison_experiment"&gt;the Stanford Prison Experiment&lt;/a&gt;, and one year before the first legislation requiring Institutional Review Boards for institutions carrying out human subjects research. This is not to say that most social science research was so by-the-seat-of-their-pants back then, but it &lt;i&gt;was&lt;/i&gt; a different time.&lt;br /&gt;
&lt;br /&gt;
I came away from this film with a few different lessons.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Don't sleep with your advis(or/ee).&lt;/h2&gt;Just don't do it. Twice in the film, two different interviewees said about two different sexual entanglements, "I don't think it affected the science." But, as I heard Christopher Hitchens once say about interview subjects, a guilty mind wants to confess. &lt;br /&gt;
&lt;br /&gt;
The movie starts out with Nim being placed in the home of Stephanie LaFarge to be raised as a human child. Stephanie had 3 children of her own, and her husband had 4, bringing the total residency of her Manhattan brownstone to 7 human children, 2 adults, and 1 baby chimp. This frankly sounds a lot more like a reality TV show than a scientific experiment. Add to that the fact that they gave baby Nim alcohol and pot, and that Stephanie breast fed Nim, I'm not sure MTV could even air it.&lt;br /&gt;
&lt;br /&gt;
Why on earth was Stephanie LaFarge recruited to be Nim's mother? As far as I can tell, her only qualification was her sexual history with Project Nim PI, Herb Terrace. Her graduate degree was in psychoanalysis. She had no experience with chimpanzee research, or language research of any kind, and in fact, she was hostile to the scientific goals. She wouldn't keep logs, didn't have a project plan, and eventually tried to restrict the other researchers' access to Nim.&lt;br /&gt;
&lt;br /&gt;
The second affair which came up was, again, between the PI, Herb Terrace, and the head teacher on the project, who was only an undergrad at the time. The fallout of this brief relationship led to the head teacher leaving the project.&lt;br /&gt;
&lt;br /&gt;
First of all, I just don't think it's possible to pursue a relationship between a professor and an advisee (especially an undergraduate) in an ethical way. Given the power dynamic, some form of coercion is nearly impossible to avoid. I feel a little uneasy saying so in a public forum, which I think goes to say that this is not a problem that academia has left behind in the 70's. &lt;br /&gt;
&lt;br /&gt;
Secondly, all sorts of strange and bad things happened to the science because of the sex aspect. Nim would have never had such a strange early childhood, and would have had greater constancy with the project if the PI had not pursued inappropriate relationships.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Beware those with media savvy.&lt;/h2&gt;One frequently hears that scientists in general, and linguists in particular, don't do enough to popularize their research. Occasionally, we are scolded for holing up in our ivory towers, since we are too arrogant to try to share our love of science broadly.&lt;br /&gt;
&lt;br /&gt;
However, I think &lt;i&gt;Project Nim&lt;/i&gt; has a lot to say about the perils of researchers who are a little too keen to popularize their research. One of the ASL teachers on the project described Herb Terrace as an "absentee landlord," who only showed up for photo-ops and media interviews. All in all, the project appears to have been planned far better from a media perspective than from a research perspective. &lt;br /&gt;
&lt;br /&gt;
In case you were unaware, research, even really cool and good research, doesn't just show up on TV out of nowhere. It takes deliberate attempts on the part of the researcher or the university to drum up attention. And everything about this project seems perfectly constructed to be media fodder.&lt;br /&gt;
&lt;br /&gt;
In the meantime, there were &lt;i&gt;serious&lt;/i&gt; problems with the project, mostly having to do with Nim mauling research assistants, which Herb Terrace didn't really address, and had a hard time recollecting in the documentary interviews. The most serious incident, where Nim nearly bit through an interpreter's face, Terrace's reported reaction was that he was worried she would sue him, or that "it would get out."&lt;br /&gt;
&lt;br /&gt;
It was a little hard for me not to think of Marc Hauser during the movie, another high profile non-human primate researcher who has recently fallen on hard times due to questionable ethics. The connection between Terrace and Hauser is tenuous, but they run together in my mind, I guess, because they both worked hard to popularize their research.&lt;br /&gt;
&lt;br /&gt;
And this is why I, at least, am frequently wary of active researchers who are also active popularizers of their own research. It seems almost synonymous with sloppy research and compromised ethics in my mind.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Humans are not socialized chimpanzees&lt;/h2&gt;This certainly isn't a new lesson for me, because I've never really thought that humans are just socialized chimpanzees. However, I really like how this point was hammered home in a real way.&lt;br /&gt;
&lt;br /&gt;
In discussions about "human nature," the notion that our "true" nature is somehow more brutish and violent seems to come up a lot. In this conception, society is merely a veneer over top our inner chimp. &lt;br /&gt;
&lt;br /&gt;
Well, society didn't do too much to cover over Nim's external chimp. Our "true" human nature is manifest in the activity of all humans, meaning it must be very broad, and non-uniform, but non-arbitrary at the same time.&lt;br /&gt;
&lt;br /&gt;
Interestingly, I've also heard of research trying to figure out if dogs are just socialized wolves. A bunch of researchers tried to raise wolf pups as if they were dogs, a much more achievable task, I think, than raising a chimp as a human. The results were much the same as for Nim. After infancy, the wolves went nuts and tore the place apart, and the experiment had to be abandoned.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;I really liked the movie, and would suggest it to anyone who appreciates a good documentary.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/232777626311457607-8086741355897772856?l=val-systems.blogspot.com' alt='' /&gt;&lt;/div&gt;</description><link>http://val-systems.blogspot.com/2011/07/review-of-project-nim.html</link><author>noreply@blogger.com (Josef Fruehwald)</author><thr:total>0</thr:total></item></channel></rss>

