<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>Heureusement, ici, c'est le Blog!</title>
	
	<link>http://tomflesher.com</link>
	<description>Random thoughts about Canada, economics, baseball, and ... well, that's about it.</description>
	<pubDate>Tue, 23 Feb 2010 21:12:19 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
	<language>en</language>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/tomflesher" /><feedburner:info uri="tomflesher" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><feedburner:browserFriendly></feedburner:browserFriendly><item>
		<title>Quickie: MLB Playoffs by Pitching Statistics</title>
		<link>http://tomflesher.com/2010/02/quickie-mlb-playoffs-by-pitching-statistics/</link>
		<comments>http://tomflesher.com/2010/02/quickie-mlb-playoffs-by-pitching-statistics/#comments</comments>
		<pubDate>Tue, 23 Feb 2010 21:12:19 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
		
		<category><![CDATA[Baseball]]></category>

		<category><![CDATA[Economics]]></category>

		<category><![CDATA[OLS]]></category>

		<category><![CDATA[playoffs]]></category>

		<category><![CDATA[probit]]></category>

		<category><![CDATA[regression]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=73</guid>
		<description><![CDATA[It&#8217;s cold out today. Last night, Buffalo was covered in a thin layer of freezing rain. I&#8217;m trying to stay warm by turning up my hot stove the way only an economist can - crunching the numbers on playoffs.
I&#8217;m re-using the dataset from my Cy Young Predictor a few entries ago in the interest of [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s cold out today. Last night, Buffalo was covered in a thin layer of freezing rain. I&#8217;m trying to stay warm by turning up my hot stove the way only an economist can - crunching the numbers on playoffs.</p>
<p>I&#8217;m re-using the dataset from my Cy Young Predictor a few entries ago in the interest of parsimony. It contains dummy variables <em>teamdivwin</em> and <em>teamwildcard</em> which take value 1 if the pitcher&#8217;s team won the division or the wildcard respectively. I then created a variable <em>playoffs</em> which took the value of the sum of <em>teamdivwin</em> and <em>teamwildcard</em> - just a playoff dummy variable.</p>
<p>Using a Probit model and a standard OLS regression model, I estimated the effects of individual pitching stats on <em>playoffs</em>. Neither model has very strong predictive value (linear has R-squared of about .05), which is unsurprising since it doesn&#8217;t take the team&#8217;s batting into account at all. None of the coefficient values are shocking - in the American League (designated as <em>lg </em>= 1), teams have a higher probability of making the playoffs because there are fewer teams, and although complete games appear to have a negative effect, the positive shutout effect more than makes up for that in both models. I&#8217;m interested in whether complete game wins and complete game losses have differential effects - that will probably be my next snowy-day project.</p>
<p>Results are behind the cut.</p>
<p><span id="more-73"></span></p>
<p>Results:</p>
<p>Call:<br />
glm(formula = playoffs ~ W + SHO + CG + weightedsaves + SV +<br />
Lg + R, family = binomial(link = &#8220;probit&#8221;))</p>
<p>Deviance Residuals:<br />
Min       1Q   Median       3Q      Max<br />
-1.8444  -0.7356  -0.6261  -0.3803   2.4768</p>
<p>Coefficients:<br />
Estimate Std. Error z value Pr(&gt;|z|)<br />
(Intercept)   -0.756627   0.046176 -16.386  &lt; 2e-16 ***<br />
W              0.123523   0.011183  11.046  &lt; 2e-16 ***<br />
SHO            0.187091   0.107494   1.740 0.081774 .<br />
CG            -0.140882   0.060472  -2.330 0.019822 *<br />
weightedsaves -0.076265   0.020332  -3.751 0.000176 ***<br />
SV             0.097770   0.025446   3.842 0.000122 ***<br />
Lg             0.190521   0.050481   3.774 0.000161 ***<br />
R             -0.015532   0.001556  -9.985  &lt; 2e-16 ***<br />
&#8212;<br />
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1</p>
<p>(Dispersion parameter for binomial family taken to be 1)</p>
<p>Null deviance: 3423.4  on 3221  degrees of freedom<br />
Residual deviance: 3251.9  on 3214  degrees of freedom<br />
AIC: 3267.9</p>
<p>Number of Fisher Scoring iterations: 4</p>
<p>Call:<br />
lm(formula = playoffs ~ W + SHO + CG + weightedsaves + SV + Lg +<br />
R)</p>
<p>Residuals:<br />
Min       1Q   Median       3Q      Max<br />
-0.72890 -0.24345 -0.18725 -0.04165  1.01024</p>
<p>Coefficients:<br />
Estimate Std. Error t value Pr(&gt;|t|)<br />
(Intercept)    0.225013   0.013119  17.151  &lt; 2e-16 ***<br />
W              0.035344   0.003105  11.382  &lt; 2e-16 ***<br />
SHO            0.058328   0.030826   1.892 0.058560 .<br />
CG            -0.040513   0.017029  -2.379 0.017417 *<br />
weightedsaves -0.022451   0.005671  -3.959 7.70e-05 ***<br />
SV             0.029226   0.007193   4.063 4.96e-05 ***<br />
Lg             0.055360   0.014435   3.835 0.000128 ***<br />
R             -0.004171   0.000401 -10.401  &lt; 2e-16 ***<br />
&#8212;<br />
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1</p>
<p>Residual standard error: 0.406 on 3214 degrees of freedom<br />
Multiple R-squared: 0.05262,    Adjusted R-squared: 0.05056<br />
F-statistic:  25.5 on 7 and 3214 DF,  p-value: &lt; 2.2e-16</p>
]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/02/quickie-mlb-playoffs-by-pitching-statistics/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Research</title>
		<link>http://tomflesher.com/2010/01/research/</link>
		<comments>http://tomflesher.com/2010/01/research/#comments</comments>
		<pubDate>Sun, 17 Jan 2010 17:55:29 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
		
		<category><![CDATA[Academia]]></category>

		<category><![CDATA[US Politics]]></category>

		<category><![CDATA[Amherst]]></category>

		<category><![CDATA[corruption]]></category>

		<category><![CDATA[Harry Williams]]></category>

		<category><![CDATA[housing prices]]></category>

		<category><![CDATA[property tax]]></category>

		<category><![CDATA[reassessments]]></category>

		<category><![CDATA[Research]]></category>

		<category><![CDATA[Satish Mohan]]></category>

		<category><![CDATA[working papers]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=72</guid>
		<description><![CDATA[This semester&#8217;s research note involved data collection and analysis regarding housing prices in Amherst, New York. The paper, which I&#8217;ve posted in PDF format here, contains a detailed description of my methodology and results. Data and SAS code are available by request.
With the usual caveats about sample size (as discussed in the paper), it seems [...]]]></description>
			<content:encoded><![CDATA[<p>This semester&#8217;s research note involved data collection and analysis regarding housing prices in Amherst, New York. The paper, which I&#8217;ve posted in PDF format <a href="http://tomflesher.com/docs/AmherstHousing.pdf">here</a>, contains a detailed description of my methodology and results. Data and SAS code are available by request.</p>
<p>With the usual caveats about sample size (as discussed in the paper), it seems that officials were systematically under-assessed and so carried a lower-than-expected property tax burden.</p>
]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/01/research/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Cy Young gives me a headache.</title>
		<link>http://tomflesher.com/2010/01/cy-young-gives-me-a-headache/</link>
		<comments>http://tomflesher.com/2010/01/cy-young-gives-me-a-headache/#comments</comments>
		<pubDate>Fri, 15 Jan 2010 17:01:29 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
		
		<category><![CDATA[Baseball]]></category>

		<category><![CDATA[Economics]]></category>

		<category><![CDATA[Bill James]]></category>

		<category><![CDATA[Cy Young predictor]]></category>

		<category><![CDATA[economics]]></category>

		<category><![CDATA[Eric Gagne]]></category>

		<category><![CDATA[linear regression]]></category>

		<category><![CDATA[R]]></category>

		<category><![CDATA[Rob Neyer]]></category>

		<category><![CDATA[sabermetrics]]></category>

		<category><![CDATA[Tim Lincecum]]></category>

		<category><![CDATA[Weighted saves]]></category>

		<category><![CDATA[Weighted shutouts]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=71</guid>
		<description><![CDATA[As usual, I&#8217;ve started my yearly struggle against a Cy Young predictor. Bill James and Rob Neyer&#8217;s predictor (which I&#8217;ve preserved for posterity here) did a pretty poor job this year, having predicted the wrong winner in both leagues and even getting the order very wrong compared to the actual results. Inside, I&#8217;d like to [...]]]></description>
			<content:encoded><![CDATA[<p>As usual, I&#8217;ve started my yearly struggle against a Cy Young predictor. Bill James and Rob Neyer&#8217;s <a title="ESPN.com" href="http://espn.go.com/mlb/features/cyyoung">predictor</a> (which I&#8217;ve preserved for posterity <a href="http://tomflesher.com/docs/CyPredictor.pdf">here</a>) did a pretty poor job this year, having predicted the wrong winner in both leagues and even getting the order very wrong compared to the <a href="http://www.baseball-reference.com/awards/awards_2009.shtml#ALcya">actual results</a>. Inside, I&#8217;d like to share some of my pain, since I can&#8217;t seem to do much better.</p>
<p><span id="more-71"></span></p>
<p>I&#8217;m using a <a href="http://tomflesher.com/docs/pitchers0509.txt">dataset</a> I culled from baseball-reference.com&#8217;s <a href="http://www.baseball-reference.com/play-index/">Play Index</a> to which I added Cy Young points for each year, as well as a number of binary variables for team division wins, team wildcard appearances, and so on. It includes every player who pitched from the 2005 through 2009 seasons, all told about 3000 observations. Using <a href="http://cran.r-project.org/">R</a>, I tried a number of linear regression models to test their veracity.</p>
<p>First, I tried a variation of the James/Neyer formula, CYP = ((5*IP/9)-ER) + (SO/12) + (SV*2.5) + Shutouts + ((W*6)-(L*2)) + VB. I included IP, ER, SO, SV, SHO, W, L, and VB and got this result:</p>
<p><em>Call:<br />
lm(formula = model &lt;- cypoints ~ IP + ER + SO + SV + SHO + W +<br />
L + VB)</em></p>
<p><em>Residuals:<br />
Min       1Q   Median       3Q      Max<br />
-31.2641  -1.4715   0.1084   0.9949 144.4079</em></p>
<p><em>Coefficients:<br />
Estimate Std. Error t value Pr(&gt;|t|)<br />
(Intercept) -0.1057887  0.2341857  -0.452    0.651<br />
IP           0.0080245  0.0136774   0.587    0.557<br />
ER          -0.0960892  0.0184517  -5.208 2.03e-07 ***<br />
SO           0.0483835  0.0090107   5.370 8.45e-08 ***<br />
SV           0.0001499  0.0218261   0.007    0.995<br />
SHO          5.5749651  0.4340868  12.843  &lt; 2e-16 ***<br />
W            0.5653568  0.0899062   6.288 3.64e-10 ***<br />
L           -0.3987691  0.0901410  -4.424 1.00e-05 ***<br />
VB          -0.0191531  0.3781868  -0.051    0.960<br />
&#8212;<br />
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1</em></p>
<p><em>Residual standard error: 7.977 on 3213 degrees of freedom<br />
Multiple R-squared: 0.1952,     Adjusted R-squared: 0.1932<br />
F-statistic: 97.43 on 8 and 3213 DF,  p-value: &lt; 2.2e-16</em></p>
<p>This isn&#8217;t promising. Over the past five years, these factors aren&#8217;t very predictive at all - the model explains only about 19% of the variation in voting; innings pitched, saves, and the victory bonus aren&#8217;t statistically significant, and the victory bonus has a negative effect. The caveat, of course, is that James and Neyer aren&#8217;t predicting <em>actual</em> Cy Young voting points but rather a statistical construct that shows the relative likelihood that a given pitcher will receive the Cy. I&#8217;m predicting actual Cy Young points. Still, the effects should be similar.</p>
<p>In fact, the model grossly overestimates the proclivity of Cy Young voters for choosing relievers. A pitcher with Saves as his primary statistic hasn&#8217;t been given the Cy since Eric Gagne in 2003. This is a double-edged sword - on the one hand, saves have apparently been historically significant for the Cy, but on the other hand, the voting appears to be trending away from them. The five-year time set I used is a compromise to get enough data without compromising the trend.</p>
<p>After playing with R for a little while, I ended up creating a few extra measures that seem to capture the voting a little bit better (but not much). First, to approximate the relief effect, I created a &#8220;weighted saves&#8221; statistic that multiplies SV*GF and then takes the square root. To maximize the stat for a given number of games finished, all of those games would be saves. (Every save is a game finished, by definition.) Thus, it helps show that the pitcher was relied on as a clutch player. I did the same thing for Complete Games and Shutouts - weighted shutouts is the square root of CG*SHO. Again, to maximize this, every complete game should be a shutout. It ends up being far more predictive than CG or SHO alone. Finally, to capture the added value of each marginal win and marginal strikeout and the added penalty for each marginal home run and marginal walk, I included the squares of those terms. I also tried a dummy variable for previous year winner, since Lincecum&#8217;s so-so predicted points must have been bumped up by something.</p>
<p>After playing with the stats with parsimony in mind, I came up with a number of models, the best of which is:</p>
<p><em>Call:<br />
lm(formula = model &lt;- cypoints ~ W + Wsq + HR + HRsq + K + Ksq +<br />
BB + BBsq + weightedsv + weightedsho)</em></p>
<p><em>Residuals:<br />
Min       1Q   Median       3Q      Max<br />
-40.7374  -1.0710  -0.1198   1.1044 122.7243</em></p>
<p><em>Coefficients:<br />
Estimate Std. Error t value Pr(&gt;|t|)<br />
(Intercept)  1.995e-03  2.795e-01   0.007   0.9943<br />
W           -1.295e+00  1.315e-01  -9.844  &lt; 2e-16 ***<br />
Wsq          1.260e-01  7.371e-03  17.091  &lt; 2e-16 ***<br />
HR           1.807e-01  7.286e-02   2.480   0.0132 *<br />
HRsq        -1.499e-02  2.143e-03  -6.996 3.19e-12 ***<br />
K           -8.473e-02  1.642e-02  -5.161 2.61e-07 ***<br />
Ksq          5.972e-04  6.734e-05   8.869  &lt; 2e-16 ***<br />
BB           2.292e-01  3.143e-02   7.292 3.82e-13 ***<br />
BBsq        -2.826e-03  3.041e-04  -9.295  &lt; 2e-16 ***<br />
weightedsv   7.411e-02  1.652e-02   4.487 7.49e-06 ***<br />
weightedsho  2.443e+00  3.252e-01   7.513 7.43e-14 ***<br />
&#8212;<br />
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1</em></p>
<p><em>Residual standard error: 7.245 on 3211 degrees of freedom<br />
Multiple R-squared: 0.3367,     Adjusted R-squared: 0.3346<br />
F-statistic:   163 on 10 and 3211 DF,  p-value: &lt; 2.2e-16</em></p>
<p>It&#8217;s not a great predictor, explaining only about 33% of the variation in points. However, all of the regressors are statistically significant at at leas the 99% level. Some of the other models I tried are <a href="http://tomflesher.com/docs/cymodels2009.txt">here</a>, so you can get an idea of how significant or insignificant other stats might have been at predicting the Cy Young winner.</p>
<p>The long and the short of it is, there appears to be very little predictive value for the Cy Young voting with respect to common statistical measures.</p>
]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2010/01/cy-young-gives-me-a-headache/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Three Catchers, Four Starters, and Other Playoff Thoughts</title>
		<link>http://tomflesher.com/2009/10/three-catchers-four-starters-and-other-playoff-thoughts/</link>
		<comments>http://tomflesher.com/2009/10/three-catchers-four-starters-and-other-playoff-thoughts/#comments</comments>
		<pubDate>Mon, 26 Oct 2009 15:53:54 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
		
		<category><![CDATA[Baseball]]></category>

		<category><![CDATA[2009 ALCS]]></category>

		<category><![CDATA[Angels]]></category>

		<category><![CDATA[Phillies]]></category>

		<category><![CDATA[pinch hitters]]></category>

		<category><![CDATA[pinch runners]]></category>

		<category><![CDATA[rosters]]></category>

		<category><![CDATA[world series]]></category>

		<category><![CDATA[Yankees]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=69</guid>
		<description><![CDATA[Last night, the LA Angels lost Game 6 of the 2009 ALCS to the New York Yankees. Mike Scioscia started left-handed pitcher Joe Saunders; he carries, as is becoming the norm, three catchers including light-hitting third catcher Bobby Wilson. Joe Girardi also carries three catchers, although his array includes defensive specialist Jose Molina, sometime-DH Jorge [...]]]></description>
			<content:encoded><![CDATA[<p>Last night, the LA Angels lost <a href="http://scores.espn.go.com/mlb/boxscore?gameId=291025110&amp;teams=los-angeles-angels-vs-new-york-yankees">Game 6 of the 2009 ALCS</a> to the New York Yankees. Mike Scioscia started left-handed pitcher Joe Saunders; he carries, as is becoming the norm, three catchers including light-hitting third catcher Bobby Wilson. Joe Girardi also carries three catchers, although his array includes defensive specialist Jose Molina, sometime-DH Jorge Posada, and Francisco Cervelli, who hit .298 in 94 at-bats this season. Though Mike Napoli was hot during the postseason, Scioscia&#8217;s group of catchers wasn&#8217;t as specialized as it was in 2005, when he carried big-hitter Bengie Molina, Jose Molina for his glove, and Josh Paul <a href="http://baseballanalysts.com/archives/2005/10/what_was_josh_p_1.php">for emergencies</a>. Here, he appeared to be carrying three catchers solely <em>because</em> none of them are big hitters. In retrospect, although Napoli and Mathis are both a big part of the Angels clubhouse, Scioscia should have made a move during the regular season to replace one of them with a catcher who was more of the Bengie Molina or Jorge Posada mold - someone whose glove or arm is slightly defective, but who can hit the ball when necessary. Instead, Scioscia was forced to burn two pinch hitters and a second catcher in his attempt to win the game last night, whereas Girardi has in previous games been able to use the traditional approach of starting Molina and using Posada to pinch hit, or starting Posada and using Molina as a defensive replacement late in the game. In a perfect world, Scoscia could have traded Kendry Morales away and acquired Victor Martinez to use mainly at first base and as an emergency third catcher, replacing Wilson&#8217;s more or less dead weight with a big bat but not forgoing any real utility.</p>
<p>In addition, Scioscia started Joe Saunders. This isn&#8217;t a crime in and of itself. However, in the ALCS, he started John Lackey, Saunders, Jered Weaver, and Scott Kazmir. Girardi, meanwhile, is using Joe Torre&#8217;s time-honored trick of carrying only three starters (CC Sabathia, Andy Pettitte, and AJ Burnett) and using traditional long-relief men like Dan Robertson in addition to standard situational relief like Joba Chamberlain, Damaso Marte, and Mariano Rivera. In Game 6, Saunders went only 3.1 innings. Weaver performed well in relief and, frankly, should have been left there for the duration of the series. Instead, Scioscia spread his men too thin and was left making an all-hands-on-deck call in the late games where he used both Weaver and Kazmir in relief. Saunders pitched brilliantly in Game 2, and Scioscia should have been prepared to maximize his usage of Lackey, Saunders, and Kazmir while leaving Weaver in the bullpen. Granted, Saunders pitched like crap last night, but all pitchers have their off nights.</p>
<p>Finally, Girardi will probably do quite well in the World Series, as he&#8217;s experienced in managing under National League rules. Hideki Matsui, with his legs in bad shape, will be almost entirely useless in the Phillies&#8217; park. In a perfect world, Girardi would be able to dump fifth-outfielder Freddy Guzman and use Matsui in the field. However, that seems unlikely, so Matsui will remain an overpaid pinch-hitter. With Jerry Hairston, Jr., on the bench, Guzman&#8217;s utility as a pinch runner is moderate at best. It would be a gutsy move, but I think Girardi would do best to dump Guzman and bring Shelley Duncan in as a pinch hitter and emergency outfielder.</p>
<p>Still, Girardi gets paid the big bucks to do his job, so I&#8217;m sure every move he makes is well-considered.</p>
]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2009/10/three-catchers-four-starters-and-other-playoff-thoughts/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Probability of a perfect game</title>
		<link>http://tomflesher.com/2009/07/probability-of-a-perfect-game/</link>
		<comments>http://tomflesher.com/2009/07/probability-of-a-perfect-game/#comments</comments>
		<pubDate>Sat, 25 Jul 2009 02:00:09 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
		
		<category><![CDATA[Baseball]]></category>

		<category><![CDATA[Baseball Analysts]]></category>

		<category><![CDATA[Buerhle's perfect game]]></category>

		<category><![CDATA[links]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=67</guid>
		<description><![CDATA[Sky over at Baseball Analysts ran some probabilities using on-base percentages to calculate particular pitchers&#8217; probabilities of pitching a perfect game once and over a career. The method&#8217;s simple enough that it&#8217;s easy to calculate for any pitcher.
]]></description>
			<content:encoded><![CDATA[<p>Sky over at <a mce_href="http://baseballanalysts.com/archives/2009/07/perfect_games_a.php" href="http://baseballanalysts.com/archives/2009/07/perfect_games_a.php">Baseball Analysts</a> ran some probabilities using on-base percentages to calculate particular pitchers&#8217; probabilities of pitching a perfect game once and over a career. The method&#8217;s simple enough that it&#8217;s easy to calculate for any pitcher.<strong><br /></strong></p>
]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2009/07/probability-of-a-perfect-game/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Manny bidding Manny</title>
		<link>http://tomflesher.com/2009/07/manny-bidding-manny/</link>
		<comments>http://tomflesher.com/2009/07/manny-bidding-manny/#comments</comments>
		<pubDate>Thu, 16 Jul 2009 15:45:56 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
		
		<category><![CDATA[Academia]]></category>

		<category><![CDATA[Baseball]]></category>

		<category><![CDATA[Economics]]></category>

		<category><![CDATA[Albuquerque Isotopes]]></category>

		<category><![CDATA[auctions]]></category>

		<category><![CDATA[Dodgers]]></category>

		<category><![CDATA[Economics haiku]]></category>

		<category><![CDATA[externalities]]></category>

		<category><![CDATA[Manny Ramirez]]></category>

		<category><![CDATA[Pigouvian tax]]></category>

		<category><![CDATA[steroids in baseball]]></category>

		<category><![CDATA[suspension]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=66</guid>
		<description><![CDATA[There&#8217;s been some debate as to whether Manny Ramirez should have been allowed to make his rehab starts in AAA Albuquerque before returning to his Major League club, the Los Angeles Dodgers, after a 50-game suspension for drug use. Behind the cut, I&#8217;d like to think about some of the reasons behind the punishment and [...]]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s been <a href="http://mlb.fanhouse.com/2009/06/25/from-the-windup-manny-ramirez-rehab-assignment-a-farce/">some</a> <a href="http://ballhype.com/story/manny_ramirez_deserves_his_rehab_assignment/">debate</a> as to whether Manny Ramirez should have been allowed to make his rehab starts in AAA Albuquerque before returning to his Major League club, the Los Angeles Dodgers, after a 50-game suspension for drug use. Behind the cut, I&#8217;d like to think about some of the reasons behind the punishment and propose a solution.</p>
<p><span id="more-66"></span></p>
<p>Why was Ramirez suspended? Because he was using a banned substance, yes, but let&#8217;s unpack that. The purpose of the suspension is, presumably, to attempt to align the incentives such that a player who is tempted to use performance-enhancing drugs will find that the expected value of the marginal productivity of the drug use is lower than the expected value of the penalty. To break that down, there&#8217;s a probability π that a player who chooses to use banned substances will be detected, and a complementary probability (1-π) that he won&#8217;t be detected. As I discussed in an earlier post, we can run a regression and figure out what the values of the various statistics are worth. If the player is rational, he&#8217;ll be considering that using steroids will adjust his stats by some positive amount (i.e., that there will be a marginal product of drug use) and that will increase his salary when he next negotiates his contract. It&#8217;s also likely that the increased chance to be voted into the Hall of Fame or win a batting title, for example, will provide non-cash utility to the player, which we could also factor into <em>MPdu</em>. With the complementary probability, the player will be caught and will lose 50 games worth of salary (that is, the disincentive is 50*Salary/162, or 25/81 of his salary). Additionally, there will be disutility generated by the fans&#8217; unwillingness to vote for him in the All-Star Game, for example, and the diminished likelihood of making the Hall of Fame.</p>
<p>If π*MPdu &gt; (1-π)*25*Salary/81, then the player will rationally choose to use drugs.</p>
<p>If π*MPdu &lt; (1-π)*25*Salary/81, then the player will rationally choose not to use drugs.</p>
<p>If π*MPdu = (1-π)*25*Salary/81, then the player will be indifferent between using drugs and not using drugs, so either choice makes sense based on the player&#8217;s tastes.</p>
<p>There are two main ways to decrease the proportion of players who use drugs - increase the probability of detection through more testing, or increase the disincentive to be caught using drugs by adding a lump-sum fine or increasing the length of the suspension (the 25 in our model) or both. I&#8217;m going to presume that 50 games was chosen as the length of the suspension for no good reason other than that it&#8217;s a nice big round number, and thus that the multiplier is essentially arbitrary. I&#8217;m also going to presume that Manny playing for the Isotopes imposes some positive externality on them and on the Dodgers - that the parent club will get better gate receipts from his appearances and that the players will benefit (probably by learning) from playing with a Major League-caliber player. The players in the Dodgers system will presumably be considered for MLB appearances at some point, and so the Dodgers benefit form Ramirez&#8217;s coaching function in his appearances at the AAA level.</p>
<p>(As a side note, a lump-sum fine would be a fine example of a <a href="http://en.wikipedia.org/wiki/Pigouvian_tax">Pigouvian Tax</a>.)</p>
<p>It hardly seems fair that the Dodgers should benefit from Ramirez&#8217;s drug use. How do we solve this problem?</p>
<p><strong>Auction Manny.</strong></p>
<p>When a Major League player is suspended for drug use, allow him to make his 10 rehab starts. However, don&#8217;t automatically grant the right to assign him to the AAA club to the team he plays for. Instead, allow the clubs to bid on the right to have him do his rehab starts for their AAA team. Thus, Manny generates an externality on, say, the Buffalo Bisons (and therefore the Mets); however, the Mets have to pay an amount back to MLB that is, by the definition of an auction, more than anyone else was willing to pay.</p>
<p>A rational team will bid almost as much as they expect the player to generate in ticket sales and general utility, so the system is self-correcting with respect to the fame and ability of the player. However, MLB seems to benefit here. We can&#8217;t really allow that under a fair system, so I propose that the winning team&#8217;s bid be allocated to some combination of baseball development and drug education. Thus, every detected player loses $SuspensionMultiplier*Salary/162 in salary, some arbitrary club is granted the opportunity to profit from a shrewd bid but is unlikely to do so, and some combination of kids and drug education programs benefit about the amount that the arbitrary club feels Manny is worth to them as a AAA player.</p>
<p><em>Minor-league rehab<br />
should not benefit users;</em><br />
<em>auction Manny off.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2009/07/manny-bidding-manny/feed/</wfw:commentRss>
		</item>
		<item>
		<title>K-Rod, Castillo, and Externalities</title>
		<link>http://tomflesher.com/2009/06/k-rod-castillo-and-externalities/</link>
		<comments>http://tomflesher.com/2009/06/k-rod-castillo-and-externalities/#comments</comments>
		<pubDate>Wed, 17 Jun 2009 20:17:53 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
		
		<category><![CDATA[Baseball]]></category>

		<category><![CDATA[Economics]]></category>

		<category><![CDATA[Economics haiku]]></category>

		<category><![CDATA[errors]]></category>

		<category><![CDATA[externalities]]></category>

		<category><![CDATA[K-Rod]]></category>

		<category><![CDATA[Luis Castillo]]></category>

		<category><![CDATA[Mets]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=65</guid>
		<description><![CDATA[On Friday, Luis Castillo committed an error in the bottom of the 9th inning with a one-run lead, two men on base, and two men out. The error was such that had Castillo made the play cleanly, the game would have ended with Francisco Rodriguez notching a save; however, Castillo&#8217;s error was directly responsible for [...]]]></description>
			<content:encoded><![CDATA[<p><!--[if !mso]></p>
<style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style>
<p><![endif]--><!--[if !mso]></p>
<style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style>
<p><![endif]--><!--[if gte mso 9]><xml> Normal   0   false            false   false   false      EN-US   X-NONE   X-NONE </xml><![endif]--><!--[if gte mso 9]><xml> </xml><![endif]--><!--[if gte mso 10]></p>
<style>
 /* Style Definitions */
 table.MsoNormalTable
	{mso-style-name:"Table Normal";
	mso-tstyle-rowband-size:0;
	mso-tstyle-colband-size:0;
	mso-style-noshow:yes;
	mso-style-priority:99;
	mso-style-qformat:yes;
	mso-style-parent:"";
	mso-padding-alt:0in 5.4pt 0in 5.4pt;
	mso-para-margin-top:0in;
	mso-para-margin-right:0in;
	mso-para-margin-bottom:10.0pt;
	mso-para-margin-left:0in;
	line-height:115%;
	mso-pagination:widow-orphan;
	font-size:11.0pt;
	font-family:"Calibri","sans-serif";
	mso-ascii-font-family:Calibri;
	mso-ascii-theme-font:minor-latin;
	mso-hansi-font-family:Calibri;
	mso-hansi-theme-font:minor-latin;}
</style>
<p><![endif]-->On Friday, Luis Castillo <a href="http://news.yahoo.com/s/ap/20090613/ap_on_sp_ba_ne/bbo_mets_yankees_rdp">committed an error in the bottom of the 9th inning with a one-run lead, two men on base, and two men out. </a>The error was such that had Castillo made the play cleanly, the game would have ended with Francisco Rodriguez notching a save; however, Castillo&#8217;s error was directly responsible for two unearned runs scoring, giving Frankie a loss instead of a save.</p>
<p>The question: How much money does Castillo owe Rodriguez? I have a pretty good estimate.</p>
<p><span id="more-65"></span></p>
<p><!--[if gte vml 1]> <![endif]-->Let&#8217;s assume, as usual, that clubs base their contract offers on the results the players create, that teams negotiate by placing roughly the same emphasis on the same statistics, and that the market for baseball players is competitive. It is then the case that when a player is a free agent, his stats will dictate the amount of money he&#8217;s offered in his next contract, and therefore we can model (using <a href="http://en.wikipedia.org/wiki/Linear_regression">linear regression</a>) the weight placed on each statistic. In this case, a linear regression model could tell us exactly how much money one save is worth come contract time.</p>
<p>Obviously, this is a complicated procedure, and it would take a lot of work to account for all possible variables, so to simplify matters, I did the following:</p>
<ul type="disc">
<li>Using <a href="http://www.baseball-reference.com/">Baseball-Reference.com</a>&#8217;s      Play Index, I found all pitchers seven or more years into their careers      (to avoid the inefficiencies of arbitration and rookies being locked into      negotiating with their own teams and receiving the league minimum salary)      who were free agents at the end of the 2007 season (the last free agency      season available on Baseball Reference) and who pitched in relief in at      least 80% of their appearances. There were 92.</li>
<li>I threw out pitchers who did not play in 2008, to avoid      distorting the salary output.</li>
<li>I found the pitchers&#8217; 2008 salaries and created a <a href="../../../../../docs/07pitchersactive08.txt">data file</a>.</li>
<li>Using that data, I ran a linear regression in R with <a href="../../../../../docs/PitcherSalaryRegression.txt">these results</a>.</li>
</ul>
<p>First, the objections: there are obviously a number of vagaries in the data. We can&#8217;t, for example, easily account for popularity or clubhouse leadership, which are important factors in salary negotiations. This model has an R-squared statistic of .4174, meaning it explains 41% of the variation in salaries - not a huge number. The sample of pitchers is small, and it doesn&#8217;t account for people who might have had high reserve prices and refused to sign with any team despite having stats that would have helped solidify the model.</p>
<p>However, the SV (saves) statistic is highly statistically significant, and UR (unearned runs) is significant at the 90% level. Thus, we can estimate that if Frankie were to negotiate tomorrow for his contract, the figure that he was offered would be one save less and two unearned runs more than he would have been had Castillo not committed his error.</p>
<p>Since a save is worth $111,727, and an unearned run conceded is worth -$311,517, that would mean that Rodrigues would lose $[111727+2*(311517)] = $734,761 if he were to negotiate his contract tomorrow. Of course, he isn&#8217;t - he won&#8217;t negotiate until the end of the 2011 season. Thus, we have to discount twice.</p>
<p>Assuming a 20% interest rate, 734716*.8*.8 = 470,218.24 or about $470,000. (We&#8217;ll assume that teams discount the prior years&#8217; performance and focus mainly on the immediately preceding year. 20% is a fairly high level of discounting, meaning that Frankie would be able to make up for lost stats by performing well next year and the year after.) Thus, Luis Castillo may have imposed a $470,000 externality on Francisco Rodriguez&#8217; next contract negotiation.</p>
<p>If I were Luis, I&#8217;d offer him a steak dinner instead.</p>
<p><em>Luis Castillo<br />
commits minor league error;<br />
externality.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2009/06/k-rod-castillo-and-externalities/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Quickie: Kiss the Sheff</title>
		<link>http://tomflesher.com/2009/04/quickie-kiss-the-sheff/</link>
		<comments>http://tomflesher.com/2009/04/quickie-kiss-the-sheff/#comments</comments>
		<pubDate>Wed, 22 Apr 2009 14:41:33 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
		
		<category><![CDATA[Baseball]]></category>

		<category><![CDATA[Barry Bonds]]></category>

		<category><![CDATA[collusion]]></category>

		<category><![CDATA[Gary Sheffield]]></category>

		<category><![CDATA[Mets]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=63</guid>
		<description><![CDATA[Why is Gary Sheffield employed for the league minimum when Barry Bonds can&#8217;t get a job?
Sheffield had a Batting Average, On-Base Percentage, Slugging Percentage and On-Base Plus Slugging of .225/.326/.400/.725 in 2008; Bonds was last active in 2007 and hit .276/.480/.565/1.045 (with the .480 OBP leading the National League). Clearly, something&#8217;s wrong. Collusion?
What&#8217;s wrong, in [...]]]></description>
			<content:encoded><![CDATA[<p>Why is Gary Sheffield employed for the league minimum when Barry Bonds can&#8217;t get a job?</p>
<p>Sheffield had a Batting Average, On-Base Percentage, Slugging Percentage and On-Base Plus Slugging of .225/.326/.400/.725 in 2008; Bonds was last active in 2007 and hit .276/.480/.565/1.045 (with the .480 OBP leading the National League). Clearly, something&#8217;s wrong. Collusion?</p>
<p>What&#8217;s wrong, in my estimation, is still that Bonds represents a negative externality on his team&#8217;s production, reputation, and revenue; Sheffield, meanwhile, is less of a threat to ticket sales. Despite being unpopular and saying bizarre things, Sheffield has not yet to my knowledge irritated fans to the extent that Bonds has, nor is he quite the clubhouse menace Bonds is said to be.</p>
<p>Of course, time will tell whether Sheffield produces $400,000 worth of runs for the ailing Mets.</p>
]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2009/04/quickie-kiss-the-sheff/feed/</wfw:commentRss>
		</item>
		<item>
		<title>So why doesn’t Nick Swisher pitch every night?</title>
		<link>http://tomflesher.com/2009/04/so-why-doesnt-nick-swisher-pitch-every-night/</link>
		<comments>http://tomflesher.com/2009/04/so-why-doesnt-nick-swisher-pitch-every-night/#comments</comments>
		<pubDate>Wed, 15 Apr 2009 13:51:06 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
		
		<category><![CDATA[Baseball]]></category>

		<category><![CDATA[Economics]]></category>

		<category><![CDATA[Cardinals]]></category>

		<category><![CDATA[Cody Ransom]]></category>

		<category><![CDATA[comparative advantage]]></category>

		<category><![CDATA[Economics haiku]]></category>

		<category><![CDATA[emergency relievers]]></category>

		<category><![CDATA[Gabe Kapler]]></category>

		<category><![CDATA[Joe Girardi]]></category>

		<category><![CDATA[market for pitchers]]></category>

		<category><![CDATA[Moneyball alumni]]></category>

		<category><![CDATA[Nick Swisher]]></category>

		<category><![CDATA[position players pitching]]></category>

		<category><![CDATA[Rays]]></category>

		<category><![CDATA[Scott Spiezio]]></category>

		<category><![CDATA[Wade Boggs]]></category>

		<category><![CDATA[Yankees]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=62</guid>
		<description><![CDATA[Nick Swisher pitched for the first time in the major leagues on Monday night during the Yankees&#8217; 15-5 loss to the Tampa Bay Rays. As you can see from the box score, Swish pitched pretty well. In fact, in 22 pitches, he gave up only one hit and one walk, threw 12 strikes, and struck [...]]]></description>
			<content:encoded><![CDATA[<p>Nick Swisher pitched for the first time in the major leagues on <a href="http://sports.espn.go.com/mlb/boxscore?gameId=290413130">Monday night</a> during the Yankees&#8217; 15-5 loss to the Tampa Bay Rays. As you can see from the box score, Swish pitched pretty well. In fact, in 22 pitches, he gave up only one hit and one walk, threw 12 strikes, and struck out a major-league batter (left-fielder Gabe Kapler). So, will Yankees manager Joe Girardi tap him in relief again soon?</p>
<p>No, of course not. Find out why behind the cut.</p>
<p><span id="more-62"></span></p>
<p>It&#8217;s a tempting story - that a secret, untapped pitching ability lurks inside players known more for their bats, and the idea that someone playing in the outfield could be the world&#8217;s greatest reliever if only they&#8217;d give him the chance. Scott Spiezio <a href="http://www.baseball-reference.com/boxes/OAK/OAK200706150.shtml">pitched once</a> for the St. Louis Cardinals and has a <a href="http://www.facebook.com/groups.php#/group.php?gid=2391859170">facebook group</a> dedicated to his pitching prowess.</p>
<p>The problem is that a position player pitching has two advantages, one much stronger than the other. The weak advantage is that there&#8217;s no chance to scout a position player before he pitches, with the possible exception of a known pitching threat like Wade Boggs. Even then, it&#8217;s difficult to know what the player has been holding back. The strong advantage is that, well, position players aren&#8217;t very good pitchers.</p>
<p>How does that work? Intuitively, a major-league batter is used to a pitcher performing at a high level. Once he&#8217;s warmed up, he has a set of skills maximized for hitting a 90-plus-mile-per-hour ball thrown at him. Timing has become second nature. This is why changeups are so effective - a player isn&#8217;t expecting a ball being hurled slowly at him, and so he swings as if a fastball were coming. Being thrown nonstop changeups (which is effectively what a position player will do, given that he doesn&#8217;t regularly practice pitching) is jarring and will throw off the batter&#8217;s concentration. To a lesser extent, this is seen when a left-handed pitcher relieves a right-handed pitcher.</p>
<p>Does that make sense? Let&#8217;s make the assumption that a player at the major league level will be used where his manager assumes he will make the strongest contribution to the team, as constrained by the rest of the talent available. Thus, while Swish would make a perfectly cromulent designated hitter on some teams, and plays enough first base to be a starter for some clubs, his best fit for the Yankees is playing the corners in the outfield. It would be economically inefficient and thus irrational for Joe Girardi to start him at, say, shortstop, because he has a better shortstop (Cody Ransom).</p>
<p>So, almost entirely because Swisher is an outfielder, we can assume that he cannot pitch at the major league level. Unpacking this, he lacks some quality - consistency, endurance, speed, control, something like that - and therefore cannot be a consistently good pitcher. However, the payoff of using a player who can&#8217;t pitch consistently shrinks in emergency relief situations, since the cost of exhausting a real reliever outweighs the expected cost of using a non-pitcher to pitch (in most cases, giving up a few runs). However, as an outfielder, we know he has the arm strength to throw the ball. (This also explains why catchers, who have to have strong throwing arms and throwing reflexes, are often used as emergency relievers.) So, given that it makes economic sense to use Swisher instead of using, say, Mariano Rivera simply to fulfill the idea that only a relief pitcher should be used as a relief pitcher, it also makes sense that Swisher will perform somewhat well. He lacks only some of the qualities of a good pitcher, not all of them. Once you factor in the lack of preparation that the Rays had to face a jarring series of changeups, and the difficulty of making that mental adjustment, it is perfectly sensible to expect Swisher to have a good outing.</p>
<p>So why doesn&#8217;t Swish pitch every night? For the simple reason that if players expect to face a slow-hurling outfielder every night, there would be practice time dedicated to hitting 75-mile-per-hour fastballs. It would then become inefficient to use Swisher, when a harder-throwing real reliever could get outs with greater predictability.</p>
<p>Sorry, Swish. Great outing, but we won&#8217;t be using you again for a while.</p>
<p><em>Nick Swisher can pitch<br />
Struck out Kapler with a change<br />
Now stay in the field.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2009/04/so-why-doesnt-nick-swisher-pitch-every-night/feed/</wfw:commentRss>
		</item>
		<item>
		<title>The Misery Index</title>
		<link>http://tomflesher.com/2009/04/the-misery-index/</link>
		<comments>http://tomflesher.com/2009/04/the-misery-index/#comments</comments>
		<pubDate>Thu, 02 Apr 2009 16:27:02 +0000</pubDate>
		<dc:creator>tomflesher</dc:creator>
		
		<category><![CDATA[Academia]]></category>

		<category><![CDATA[Economics]]></category>

		<category><![CDATA[US Politics]]></category>

		<category><![CDATA[economics]]></category>

		<category><![CDATA[Economics haiku]]></category>

		<category><![CDATA[macroeconomics]]></category>

		<category><![CDATA[Misery Index]]></category>

		<category><![CDATA[research project ideas]]></category>

		<guid isPermaLink="false">http://tomflesher.com/?p=61</guid>
		<description><![CDATA[The Misery Index is a measure of national economic health derived by adding the unemployment rate to the rate of inflation. It was famously used by Jimmy Carter to declare that Gerald Ford, under whom the rate had risen to 12.5%, had no right to run the country, and then by Ronald Reagan to declare [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://en.wikipedia.org/wiki/Misery_index_(economics)">Misery Index</a> is a measure of national economic health derived by adding the unemployment rate to the rate of inflation. It was famously used by Jimmy Carter to declare that Gerald Ford, under whom the rate had risen to 12.5%, had no right to run the country, and then by Ronald Reagan to declare that Carter was unfit for the presidency after it rose to over 20%. (It&#8217;s available in real time at <a href="http://www.miseryindex.us/">MiseryIndex.us</a>.)<span id="more-61"></span></p>
<p>I haven&#8217;t had time to run the numbers, but I&#8217;m a bit dissatisfied with the Misery Index in this case. The most obvious issue is that while inflation is a bad thing, so is deflation; however, under the Index, a high deflation rate is seen to <em>mitigate</em> high unemployment. The second is that steady, targeted inflation is a sign that the economy is growing smoothly and under control.</p>
<p>Again, without crunching the numbers, I can&#8217;t say anything specific, but it seems to me that a formula with nicer properties might measure either the absolute rate of change from one period to the next (capturing volatility fairly cleanly) or, for the less mathematically inclined, the absolute value of the change. The problem of measuring a rate of change is that you&#8217;d need to correct for unemployment as well; measuring rates of change also leaves you sensitive to different lengths of time being measured, whereas the misery index as it stands can be seen as a snapshot.</p>
<p>So, a compromise: set a benchmark - perhaps 3% for inflation and 5% for unemployment, since those are numbers that are bandied about as &#8220;targets.&#8221; Snapshot the measure by measuring the absolute value of the rate minus the benchmark figure.</p>
<p><em>Misery Index<br />
Accurate, but hamfisted<br />
Plausible? Who knows?</em></p>
]]></content:encoded>
			<wfw:commentRss>http://tomflesher.com/2009/04/the-misery-index/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
