So, many exciting things will come of this. First, on a hitter level, we can see how each hitter attacks each part of the zone, and by count. We can of course do even more exciting things than just his plate count, but the sequence of the plate count. He’s in a 2-1 count: did he get there from 2-0 or 1-1? Given how he got there, what is his swing zone? If he got to the 2-1 count from a 2-0 count, did he get there on a swinging strike or a called strike? And based on that, what is his swing zone? We can include whether the pitch was a fastball or curve. We can do it based on the previous pitch being a fastball and the current being a fastball. Really, there’s no end to the combination. Or, you can simply just look at the most high level view. The user can decide how micro- or macro- to look at the splits.

Then, think of it from the pitcher’s perspective: compare the swing zones of Verlander and Felix. And repeat all the above variables.

This is the first step. Lots more to come.

]]>The overall numbers: .561 win% for the home team, which is above the .540 regular season average, but just one standard deviation above. Breaking it up though based on who started the series:

- .605 home win% if they started the series at home
- .514 home win% if they didn’t start the series at home

The first is over 2 standard deviations away. Naturally, look hard enough, and you’ll find something that is 2 standard deviations away. That by itself doesn’t mean anything. But I think we would have been more shocked if those numbers had been flipped. That is, we probably had a prior that given that the overall was .540, that the “true” split might have been .550/.530 perhaps based on whether they started game 1 at home.

Of course, we don’t need to rely on wins, but rather on runs, or just plain ole wOBA.

But we also need to know the underlying talent in those games, since the Game 1 pitcher would likely be disproportionately different from games 2 and 3.

Anyway, fascinating premise, and we just need more data at this point.

]]>

If you do runs at the seasonal level, then why not runs at the game level? Or at the inning level. That in fact is what RE24 is: it makes sure all the runs are accounted for at the inning level (indeed at the play level). It is actually the closest bridge we have between sabermetrics and the mainstream.

But what about wins? Matt Cain gives up no runs in 9 innings, while in the same game Cliff Lee gives up no runs in TEN innings, a game that the Phillies lose. If the intent is to use wins and losses as a natural end point to make sure things add up, a checksum so to speak, then we want things to add up at the game level. We shouldn’t come up with something that says that the Phillies had 0.45 wins and 0.55 losses in a game they lost, and similarly, the Giants shouldn’t add up to 0.55 wins and 0.45 losses in a game they won.

Well, maybe YOU do. Maybe you actually don’t care about who actually won and lost. You just care about what the players did, and a margin of victory of 1 run and 10 runs should lead to different answers at the team level. Suddenly, to YOU, it’s not just about wins and losses but also about margin of victory. And if you lose two games 1-0 and you win one game 10-0, you don’t have a won-loss record of 1-2, but a won-loss record of 1.9-1.1.

So, this preamble is to setup the article that Bill James wrote here, and you can see his responses in the comments area, as well as mine. For those who aren’t members on his site, I’ll copy/paste all of my comments below, plus a tiny snippet from Bill that directly relates to my comments:

A reader wrote:

The “Luck or Timing” element, as I understand it, is completely ignored in WAR calculations.

Actually…. WAR is a FRAMEWORK. Baseball Reference has its own IMPLEMENTATION as does Fangraphs. Much like not all houses look alike, but they all follow the same standards. You supply all the building materials, the SAME building materials, and some will build a house one way, and some will build it another way. Some won’t use all the materials.

So, you definitely could include luck/timing in YOUR implemenation of WAR. The WAR framework is there. It’s solid. If you don’t like the houses that BR and Fangraphs built, then no problem! Build your own. And you might be surprised how similar it will look to the others.

***

I like the way Bill James characterizes these things: these are all estimates, from various different viewpoints. There’s a great deal of commonality or overlap. That Win Shares and the various WARs out there will generally agree on say the top 10 nonpitchers and top 10 pitchers is a point in favor of having different approaches. It proves that there’s multiple ways to estimate the same thing.

If you prefer a different analogy: the “inflation” rate is not something that is just handed down, and it’s not something where there’s exactly one way to calculate it. It’s an estimate. We are trying to model reality with the limited data we have, which itself is subject to potentially a great deal of bias.

That’s all we’re doing here, trying to come up with the best truth we can.

***

As for the consideration of wins: the way Win Shares does it, it allocates whatever it can’t account for in some proportional sense. So, if it’s short 10 wins, then it’ll assign those 10 wins in some manner. Is that necessarily a good thing? Could it be a bad thing? Would we better off having a bucket that says “timing”? I don’t know.

But you can do this with WAR right now. Just build your own WAR version based off of Baseball Reference version. And whatever gap is remaining, just distribute that gap say by adding +10 wins to the players based on playing time. Or, anything really that you want.

That’s EXACTLY how I do it here:

http://tangotiger.net/wonloss/index3.php?teamid=SEA&yearid=2001

There you go, a WAR-based system where the wins and losses add up to the team win totals.

***

The question is how to bridge that gap. And does bridging that gap automatically make it better? The method Bill applies, and the method I apply, is simply to distribute that gap with no real meaning. Bill says “let’s make it proportionate to claim points”, and I say “let’s make it proportionate to playing time”. But for all I know, both those choices are worse than simply creating a “I dunno” bucket, because the reality is, we don’t know!

But people don’t want that. Since they know the players played, they want that gap filled… by the players. So, Bill gave it to them, and I give it to them. I don’t know that it’s a good choice. I don’t know it’s a bad choice. But it is the most palatable choice.

So, I challenge the assertion that it’s good that you have a system that adds things up to wins as necessarily being good. It’s only good if you can somehow prove that doing so is done in a way that reflects the wins.

As much as people don’t like WPA, WPA is likely the best way to bridge that gap. But, people come out guns blazing on WPA. So, instead of doing it that way, we simply fill the gap the very simple way.

***

Bill responded, of which I’ll copy a tiny snippet:

... To adjust for it, as Tom says, is speculative—but NOT to adjust for it is equally speculative, and certain to be wrong.

...

I don’t adjust for that over-achievement because the public prefers it; in fact, my perception is that the public would prefer that I NOT adjust for it. I adjust for it because I think you HAVE to adjust for it. It’s too large to ignore, and it creates large-scale inaccuracies if you ignore—just as much as if you ignore fielding or ignore base running. It’s part of the game; you can’t ignore it.

...

Is the won-lost record “real”, or are the individual stats the end point of the line or real accomplishments.

What I was trying to get at in this article is, in part, that if you treat the individual stats as the end point of the line, then you’ve wiped out the games. The games no longer exist; only the individual accomplishments. I don’t think that’s a viable position. I think that we HAVE to treat the outcomes of the games as real events demanding acknowledgement in the analysis.

***

Bill would you agree therefore to be consistent to your goal that the accounting should work at the game level? For example regardless what other Redsox starters do, we should account for Porcello performance not only in his starts but start by start. And make sure in games tthe Sox won, that it adds up . And same with the losses.

As an extreme example, Matt Cain pitched 9 innings of no runs, that went into extra innings. Cliff Lee pitched TEN innings of no runs… and the Phillies eventually lost.

http://www.baseball-reference.com/boxes/SFN/SFN201204180.shtml

If we truly want to encapsulate this game to account for the 1 win for the Giants and 1 loss for the Phillies by assigning all the accounting to the players, we’re going to get into some fairly tough situations.

If you want to take a higher level view, and then just treat this one game as part of a 162 game season, and so, do everything at the seasonal level, then I think Bill is going away from his point that the game is what we have to account for. We’re adding a layer of abstraction by taking advantage of luck (mostly) evening out… and when it doesn’t, we’ll just have less difficult decisions to make than doing it game by game.

So, I don’t know that it’s more wrong or less wrong to just create a “timing” bucket, and dump everything that we don’t know in there.

]]>RJ had asked me for an interview on this, but I told him at the time that I haven’t kept up with DRA enough to be able to offer any good comments. Nonetheless, Jonathan does great work, so, DRA is in good hands.

]]>http://www.tangotiger.net/hall/

Then go see the results here. I update it a few times a day.

I’ll guarantee you will say to yourself “ok, just one more, then I’ll stop”. And you won’t stop.

]]>For any ballot, you take the top 8 scores. No bonus points for 9th and 10th players, and no penalties for leaving blank spots. Max score is 100.

]]>What you need is the top 6 pitchers in whatever system you are looking at. I chose 1998-2016, AL, NL. That gives me 38 sets of 6 pitchers. We figure out the average WAR (or whatever point system you are using) for each ranked player.

The #1 player has an average of 8.01 WAR. The #2 player has 6.94 WAR. All the way down to the #6 WAR player at 5.14 WAR. It looks like this:

- 8.01
- 6.94
- 6.36
- 5.92
- 5.49
- 5.14

We know that with only 5 spots on the ballot, the 6th (and lower) placed player can only be worth 0 points. The 6th place player is essentially a “replacement” level player, the baseline to which everyone is compared to. So, from each of them, we remove 5.14 wins (or whatever point system you are using). We have this:

- 2.88
- 1.81
- 1.22
- 0.79
- 0.36
- 0.00

We can simply stop here, but we’d like to have integers, and we’d like the 5th place guy to be worth 1 point. So, we just divide all the numbers by 0.36. It keeps the proportion the same. We get:

- 8.0
- 5.0
- 3.4
- 2.2
- 1.0
- 0.0

So, this approach would suggest a point system of 8-5-3-2-1, rather than 7-4-3-2-1. However, you will note that 3.4 and 2.2 in there. If we were to take the above scale and multiply it by 84%, we get:

- 6.8
- 4.2
- 2.9
- 1.8
- 0.8
- 0.0

As you can see, everything now rounds off by 0.1 or 0.2 in one direction or the other, rather than two of the spots doing all the rounding. And rounded to the nearest integer, we get…. 7… 4… 3… 2… 1.

So, yeah, 7-4-3-2-1 may SEEM arbitrary, but it’s really a choice between that and 8-5-3-2-1. In either case, Porcello would take it over Verlander.

]]>

- 90 Verlander
- 86 Price

But, my system has a little check and balance in there: if you have more wins and better ERA than the guy above you, you bubble above him. So, even though Price overall was 4 points behind, my system would call it a Cy Young for Price. But, it’s so very close. How close? In 2012, Price got 14 1st place votes, and 13 2nd place. Verlander got 13 1st and 13 2nd. That’s how close.

And make no mistake about it, alot of what my model is is luck. After all, switch a few voters around, and maybe Verlander gets the Cy in 2012. But, it just goes to show how much ERA and Wins drives the award.

***

It’s 2016, and Porcello lead the league in wins. He had six more than Verlander, which is six points. However, Verlander nearly led the league in ERA and was slightly ahead of Porcello by 2 1/2 earned runs. So, Porcello was +6 points on Wins and -2.5 points on ER. Overall, Porcello was +3.5 points in the big two.

Price you may remember was +5 points in the big two, and he led in each. Porcello did not. The system has to draw hard lines, and so, insists that we go to the secondary stats, the IP and the K. And with Verlander’s massive lead in K, and tiny lead in IP, Verlander had a 7.5 point gain on Porcello. Overall, it looks like this:

- 78 Verlander
- 74 Porcello

The exact same 4 point lead that this same Verlander had over Price. And while Price had a tiny lead in both W and ERA over Verlander, enough for the system to seal the win, the system didn’t think as much in Porcello’s huge lead in W and slightly behind in ERA. The system says: “eh, I need the secondary stats… I guess”.

And the Cy voters were just as conflicted. Porcello had 8 1st and 18 2nd, to Verlander’s 14 1st and 2 2nd. The two closest Cy Young races both involved the same guy by the same lead over the other guy.

In trying to model the behaviour of the voters, the system was trying to find the middle ground here: some voters look at W+ERA, and if there’s no clear leader, also look at IP and K. That’s where Verlander got all his 1st place votes. The other voters look at W+ERA, and if there’s a decent enough leader, won’t bother looking at IP and K. That’s where Porcello got all his 1st place votes.

The real head-scratcher is where did all the 2nd place votes for Verlander go. Kluber is basically the spoiler here, playing the middle ground, with his ERA and W and K between the two. For voters leaning more on the W, they have more reason to put Kluber ahead of Verlander.

11 voters had both Kluber+Porcello ahead of Verlander. Only 2 voters had Kluber+Verlander ahead of Porcello. So, you have one camp of voters that looked at W+ERA as a total and not only pushed Porcello ahead of Verlander, but Kluber ahead of Verlander. So, this camp had Porcello 1st and had reason to push Verlander below #2. The other camp of voters went by the point system, the one that had Verlander 4 points ahead of Porcello (and 5 ahead of Kluber). Those guys had Porcello 2nd.

Porcello could only get 1st and 2nd place votes, while Verlander could get 1st through 3rd, and even lower once you introduce Britton, and Porcello-lite (Happ).

You basically had a very conflicted voter group, with not much to separate everyone. And whereas in 2012, it was really a two-man race, in 2016 we had other guys to squeeze between the two. But, as I noted earlier, if you went with instant runoff, and the voter was forced to choose between Porcello and Verlander, it went 16-14 for Porcello.

]]>As you can see, when we adjust Verlander’s WAR, we shouldn’t adjust it based on the fact that the Tigers cost him some ten hits, but rather adjust him so that the Tigers fielders actually HELPED him to two hits.

...with a healthy level of comments from MGL among others:

It looks like your assumption is that pitchers with low BABIP necessarily had very good defense behind them regardless of how good the team’s defense was for the year and vice versa if a pitcher had a high BABIP. So it’s a Bayesian type adjustment, which is indeed correct. Also if B-R is using team DRS to adjust pitcher WAR without regressing that DRS they are making a mistake. Same with using UZR.

This is Poz’s article:

For one thing, I think it’s quite likely that Detroit played EXCELLENT defense behind Verlander, even if they were shaky behind everyone else. I’m not sure how you can expect a defense to allow less than a .256 batting average on balls in play (the second-lowest of Verlander’s career and second lowest in the American League in 2016) or allow just three runners to reach on error all year (the lowest total of Verlander’s career).

This is Bill James:

The logic of the Baseball Reference WAR analysis is that, given the same defense behind them, the same park, Justin Verlander WOULD HAVE allowed significantly fewer runs than Rick Porcello. The question this pushes us to is, Is this actually a reasonable thing to believe? No, it isn’t. Maybe it is a reasonable adjustment in theory, I don’t know. Maybe if we compared 100 different pitchers, this would be a useful and instructive adjustment in the other 98 cases; I don’t know. But we’re talking about this case.

***

In Bill’s article he also had a preamble about pitcher Wins and how they are used in Cy Young voting. It’s Classic Bill, which means he was able to restructure a complex topic into something easy to grasp. The whole thing is a great read. His conclusion:

]]>The Won-Lost record was no longer the king of the library. From 1992 to 2005 other statistics were basically AS important in the Cy Young voting as the won-lost record, and since 2006 the other stats have been MORE important than the won-lost record.

Fast-forward to 2016, and… Verlander… Porcello… Scherzer… all got plenty of Cy Young 1st place votes. In talking with Bill James and Joe Posnanski, the question was how could Verlander’s WAR at Baseball Reference be so much higher than Porcello’s. And it came down to the fact that the Redsox DRS was near the league-high, and the Tigers was near the league-low, a gap of 104 runs. Since Verlander and Porcello get about 15% of the balls in play, that would presume a 15-run gap in fielding support between the two… even though their BABIP were both quite low. Porcello’s is easy enough to understand. But Verlander?

And the likely answer is back in 2012, that Verlander, once again, was not hurt by his fielders. I have a simple technique to adjust BABIP to account for team fielders, that is, until I get through the Statcast data to give me a more definitive answer. This is what the Tigers BABIP would adjust, if we use the DRS data and my simple technique. As you can see, when we adjust Verlander’s WAR, we shouldn’t adjust it based on the fact that the Tigers cost him some ten hits, but rather adjust him so that the Tigers fielders actually HELPED him to two hits. This is my working theory anyway, and we’ll see how things shakeout in the coming weeks.

BIP BABIP NewBABIP Name 155 0.252 0.258 Kyle Ryan 550 0.256 0.261 Justin Verlander 444 0.270 0.268 Michael Fulmer 169 0.272 0.269 Francisco Rodriguez 221 0.285 0.275 Alex Wilson 280 0.286 0.275 Matt Boyd 341 0.305 0.285 Jordan Zimmermann 141 0.319 0.292 Mark Lowe 441 0.320 0.292 Anibal Sanchez 199 0.327 0.296 Daniel Norris 412 0.330 0.297 REST of Tigers 166 0.331 0.298 Shane Greene 161 0.342 0.303 Justin Wilson 416 0.349 0.307 Mike Pelfrey]]>

For the above noted voter, it basically becomes a balance of ERA, Win% and FIP:

IP/2 - ER

+ W*5

- L*2

+(IP/3 + 2*SO - 3*BB - 13*HR)/7

That last one has the three core elements in FIP, and then the IP to balance it out. Once I do that, I get:

- 144 Porcello
- 114 Kluber
- 110 Happ
- 103 Sanchez
- 102 Sale
- 102 Verlander

Once you can do this for all 59, we simply merge all of them, and we’ll try to distill it into something clean, and, voila, a Cy Young predictor. A perfect job for an aspiring saberist during the holidays.

]]>Round 1:

14. Verlander

8. Porcello

5. Britton

3. Kluber

We knock out Kluber, and now we look at those 3 specific ballots, and see who those three voters put as #2. They move to #1 with Kluber knocked out. As it turns out, all 3 voters had Porcello #2.

Round 2:

14. Verlander

11. Porcello

5. Britton

We knock out Britton, and now we look at those 5 specific ballots, and see who those five voters put as #2. They move to #1 with Britton knocked out. As it turns out, all 5 voters had Porcello #2.

Final:

16. Porcello

14. Verlander

]]>Now, how about something EVEN SIMPLER? WAR already encapsulates a player’s season. And we have multiple years for a player. Why not come up with something that ONLY uses a player’s WAR? None of his components, none of his playing time, nothing except his WAR. And, I have to make it simple and transparent. And, not only forecast the upcoming year, but future years. The WAR Marcels… WARcels?

As usual, when it comes to creating a forecasting system, you go down that rabbit hole. You go down far enough and you are tempted to look at every little variable, improving it on the periphery, maybe making inroads for 1% of the players. But there’s a reason that The Marcels has staying power: simple, transparent. That’s the goal, that’s the constraint.

Forecasting Year T+1:

**Step 1**: Take 60% of year T, 30% of year T-1, 10% of year T-2. Let’s look at Edwin Encarnacion. For this example, I’m going to use the Baseball Reference version of WAR (rWAR). Later this week, I will do this for Fangraphs (fWAR) to confirm that this methodology will hold, and how the results will differ, if at all. His rWAR the last three years is: 3.7, 4.7, 3.6. That gives us a weighted average of 4.0.

Explanation: now you may think that the weights are too aggressive for the current time period, given that Marcel follows a 5/4/3 for hitters and 4/3/2 for pitchers model. However, that weighting scheme is for rate stats. For playing time, it uses a more aggressive 5/1/0 scheme. And since WAR is a combination of rate and playing time, we need a weighting scheme somewhere between the two. And a 6/3/1 fits the bill.

**Step 2**: Regression. Simply take 80% of the weighted WAR. Encarnacion is now at 3.2.

Explanation: now you may think we need playing time. And you’d be right, sort of. But given the constraints here of simply focusing on WAR, and given that WAR itself purports to represent itself as an overall metric, using playing time would undermine WAR. Indeed, what you’d want instead is WAR/PA and WAR/IP, which if you do that, you may as well do WAA/PA and WAA/IP. Which if you do that, you may as well rely on wRC+ and ERA+. Which if you do that, you may as well use The Marcels. (And eventually I will create something more granular, more based on components, more based on Statcast.) The idea for this metric is to NOT use The Marcels, but come up with something simpler than the most simple system. You have WAR in hand, let’s just use that.

**Step 3**: compare the player’s age in year T to the age of 30, where age is simply year T minus birth year. For every year away from age 30, add or subtract 0.1 wins. Obviously, add if he’s under 30 and subtract if he’s over 30. EE was born in 1983, which makes his calculated age 33 for the 2016 season, or 3 years beyond the peak of 30, or another 0.3 wins.

Explanation: A player who has a weighted WAR at age 28 of 4.0 and another player who had a weighted WAR at age 38 of 4.0 have historically shown to be around 3.2 the following year if 28 years old and 2.2 if 38 years old. Age makes a big difference.

So, for the 2017 season, Encarnacion gets a forecasted WAR of 2.9.

You may be thinking “darn, that is LOW! We started at 4.0 and we’re down to 2.9?” There were 59 nonpitchers born since 1931 with a weighted WAR of between 3.5 and 4.5. In the following year, they averaged 2.8. This goes from a near high of his mate Bautista who at age 34 got a 6.1 WAR down to Nick Swisher of negative 1.2 WAR. Don’t like Swisher as a comp? That’s ok, other negative WAR at age 34: George Foster, Willie McCovey, Bobby Bonds.

Forecasting Year T+2 through T+5:

Year T+2: Start with your forecast of year T+1, and then subtract 0.4 wins. Then apply a further adjustment based on age. Compare his year T age to 30 and add or subtract 0.08 wins. EE gives us 2.89 minus 0.4 minus 0.24, or 2.25.

Year T+3: Take Year T+2, subtract 0.4. Compare his age to 30, and add or subtract 0.03 wins for each year away. EE gives us 2.25 - 0.4 - 0.09, or 1.76.

Year T+4: Take Year T+3, subtract 0.4. Compare his age to 30, and add or subtract 0.03 wins. EE gives us 1.76 - .4 - .09, or 1.27.

Year T+5: Take Year T+4, subtract 0.4. Compare his age to 30, and add or subtract 0.03 wins. EE gives us 1.27 - .4 - .09, or 0.78.

Encarnacion Comps:

So, over the next five years, his WAR forecast totals 9 wins. How does that compare to his comp group of 55 players (it was 59, but we lost guys who are too recent to give us 5 year forecasts)? Their 5-year actual WAR was 10 wins. That’s on average. His best-case among recent players includes David Ortiz, Manny Ramirez, and Chipper Jones. His top 25th percentile averaged 18 wins. His worst-case scenarios includes: Bobby Bonds, Jim Rice, Albert Belle. His bottom 25th percentile averaged 2 wins. As you can see, forecasting is very difficult, since anything can happen.

So, there you have it… The WAR Marcels.

***

With this forecasting model as a framework, look for a deeper dive as they relate to this year’s free agent class on MLB.com in the coming days and weeks.

]]>Of course, it could have gone into the ground. It didn’t. It launched at a 24.1 degrees angle. Further limiting the above to 23.1 to 25.1 launch angle, and we have an .880 batting average and 1.570 wOBA. A batter has a decent amount of control on the launch angle.

As you can tell, it’s not a 1.000 batting average. Outs are still possible even though this ball meets the definition of a Barrel.

A third characteristic is the directional spray angle. This ball was hit almost straight-away. A hitter has very little control over the spray angle. They have pull tendencies, yes. But the range in spray is fairly wide. In addition, the spray angle is unlike speed and vertical angle. For speed, the batter wants to hit it hard. For vertical angle, the batter wants to hit it between 4 and 36 degrees. For spray, the batter wants to hit it down the line, or in the infield hole at low launch or in the OF gap at high launch. Basically, its a bunch of pockets of spray angle, to the point that we don’t know if hitting a ball at +30 degrees spray is good or bad, depending on who the batter is (think Ortiz).

For this reason, we don’t use spray angle as part of the definition of a barrel. We’re basically saying “if you had a typical distribution of spray angles”, then a ball hit at 102.5 mph, 24.1 degrees launch would have an .880 batting average, and 1.570 wOBA.

Now, if you are interested in Baez-Kershaw specifically, balls hit straight away have a .682 batting average. But that quickly goes up to .800 batting average as you start going toward the gap. And once you get to the gap and toward the line, the batting average is at .950+. And this is because these balls travel around 400 feet. Baez-Kershaw however didn’t go quite that far. Which brings us to yet another characteristic, the spin axis and RPM. Which will also bring environmental conditions, such as wind, temperature, humidity, elevation.

You end up going down the rabbit hole, but first you need to establish what it is that you are after, since all types of questions can be asked of Baez-Kershaw, all just as legitimate. Barrels is basically a “crack of the bat” kind of metric: the focus is on the speed and launch angle, because that ties the focus more closely to the batter’s talent level. The more you go away trying to explain the batter, and the more you go toward trying to explain the event, the more you introduce variables that are less linked to the batter, and more just to that specific point in time and space, of how a various set of conditions all came together based on timing to produce what we saw: a ball that sounded like it could do damage, but ultimately was an easy out.

]]>