Of course, it could have gone into the ground. It didn’t. It launched at a 24.1 degrees angle. Further limiting the above to 23.1 to 25.1 launch angle, and we have an .880 batting average and 1.570 wOBA. A batter has a decent amount of control on the launch angle.

As you can tell, it’s not a 1.000 batting average. Outs are still possible even though this ball meets the definition of a Barrel.

A third characteristic is the directional spray angle. This ball was hit almost straight-away. A hitter has very little control over the spray angle. They have pull tendencies, yes. But the range in spray is fairly wide. In addition, the spray angle is unlike speed and vertical angle. For speed, the batter wants to hit it hard. For vertical angle, the batter wants to hit it between 4 and 36 degrees. For spray, the batter wants to hit it down the line, or in the infield hole at low launch or in the OF gap at high launch. Basically, its a bunch of pockets of spray angle, to the point that we don’t know if hitting a ball at +30 degrees spray is good or bad, depending on who the batter is (think Ortiz).

For this reason, we don’t use spray angle as part of the definition of a barrel. We’re basically saying “if you had a typical distribution of spray angles”, then a ball hit at 102.5 mph, 24.1 degrees launch would have an .880 batting average, and 1.570 wOBA.

Now, if you are interested in Baez-Kershaw specifically, balls hit straight away have a .682 batting average. But that quickly goes up to .800 batting average as you start going toward the gap. And once you get to the gap and toward the line, the batting average is at .950+. And this is because these balls travel around 400 feet. Baez-Kershaw however didn’t go quite that far. Which brings us to yet another characteristic, the spin axis and RPM. Which will also bring environmental conditions, such as wind, temperature, humidity, elevation.

You end up going down the rabbit hole, but first you need to establish what it is that you are after, since all types of questions can be asked of Baez-Kershaw, all just as legitimate. Barrels is basically a “crack of the bat” kind of metric: the focus is on the speed and launch angle, because that ties the focus more closely to the batter’s talent level. The more you go away trying to explain the batter, and the more you go toward trying to explain the event, the more you introduce variables that are less linked to the batter, and more just to that specific point in time and space, of how a various set of conditions all came together based on timing to produce what we saw: a ball that sounded like it could do damage, but ultimately was an easy out.

]]>

First, I agree with the 2D representation using W/L. I do it for baseball, and I’ve done it for hockey and I’ve dabbled it for basketball. It’s clear, it’s concise, it keeps the system “in check” because of a verifiable point: the sum of the individuals should add up to the whole, the team W/L record.

***

I think it’s fair enough to say that if Drew Hutchison was on the mound when the Jays scored 7 (!) runs per start in 2015, while his mate RA Dickey was on the mound 4 or 4.5 runs per start, that we shouldn’t assume that they should both have received the Jays average of 5.6 (!) that year. Maybe the conditions Hutch pitched in was more like 5.8 and Dickey was more like 5.4 or something.

http://www.baseball-reference.com/teams/TOR/2015-pitching.shtml#players_starter_pitching::none

I’m totally on board with that possibility. But, if we totally ignore run support, this is akin to totally ignoring the W/L record. Information is information. And Hutch got 2.5 to 3 more runs than Dickey. Maybe it should be 2 to 2.5 because Hutch played in tougher conditions. That should still knock out some .200 to .250 win% from Hutch’s record.

Hutch had a 13-5 record with a 5.57 ERA, while Dickey was 11-11, 3.91. If the net effect is to suggest that Dickey’s 11-11, 3.91 can be represented as 11-11, and that Hutch’s 13-5, 5.57 can be represented as 9-9, then I think that is still too much deference paid to W/L records, and still not enough to run support.

Therefore, I would like to see what kind of impact the use of W/L records are having. I can accept “some” and “small”, but I’d like to see its impact specifically on the Jays pitchers in 2015.

]]>WOWY is “with or without you”, something that I first introduced way back some 15 years ago with catchers, which I later expanded in the Hardball Times Annual, and also did it with SS, with a special look at Jeter. Go read that. It’s fun. Well, it’s fun if you don’t mind getting twisted up in knots.

Anyway, so the idea is that “all other things equal”, let’s isolate the one thing we care about. So, let’s talk about Brandon Belt and Cubs RF. Why Cubs RF? Because as I was working on the Statcast fielding metric, and I was showing the results of the top RF, with Heyward, Betts, and Eaton at the top of the list, my colleague Mike Petriello said “hey, how come Heyward has so many fewer plays in his opportunity space than the other two?”. Which played right into the idea of positioning and WOWY.

So, we have Belt and we have Cubs, so, we would just do a Belt v Cubs and Belt v non-Cubs right? I wish it were so easy. As part of the earlier-mentioned positioning work, I noticed that Coors positioning was very different from the other parks. So, now we need to introduce parks in the mix, just to make sure that’s not an impact. Now, we’ve got Belt v Cubs at AT&T v Belt v non-Cubs at AT&T. That’s one combination. So, what do we see? With only 8 plate appearances in our tracking for this, the Cubs RF was positioned at +26.6 degrees (where 0 is straight away CF and +45 is the rf foul line), 294 feet from home plate. The non-Cubs RF are positioned a bit farther away from the line (+24.4 degrees) and a bit farther back (304 feet). That’s a total of 15.2 feet of difference in positioning.

But, we’re not done. Maybe Cubs RF play like this for all LHH. Let’s go see. For all other LHH at AT&T park (53 plate appearances), Cubs RF are positioned at 25.8 degrees, 292 feet from home. That’s only 4.6 feet away from where they’ve played Belt.

Is this an AT&T thing? How do the Cubs RF play in Wrigley, with and without Belt as the LHH? Glad you asked. With 12 plate appearances, Cubs RF are at +27.3 degrees, 305 feet away against Belt. But against non-Belt, they are 28 degrees, 297 feet away. That 8.8 feet difference.

And note this is only for RF, and I didn’t separate by Giants v non-Giants. I didn’t separate based on bases empty, or score or inning, or outs. Or… well… you get the picture. We’ve got already many ways to look at this data with just the most limited filtering conditions. Adding the other layer of variables, which is easy enough to do, is going to be far harder to try to explain.

Here’s the data. The first line is the “baseline” that the last column is compared against. The x and y points are based on home plate being 0,0, and 2b bag being 0,127 feet.

]]>Let’s check the win expectancy. Before Pillar, Jays had a .437 chance of winning. If this was a fly out, he’d have to go back to 2B for a win expectancy of .384. Had he tried to tag up, it would have put Saunders at third, with a .390 chance of winning. Right there, you can see why even thinking for a tag up is a bad idea. With only a .006 win gain, the runner has to be far enough from 2B, but close enough to come back for a fly out.

We’ll keep going. On the batter reaching 2B, we should end up with a run, meaning a win expectancy of .593. Instead we have runners on second and third, for a win expectancy of .542. That baserunning play cost .051 wins, or more accurately a potential of .051 wins. Carrera bailed him out.

Mathematically, there were three choices:

- play for the tag up, meaning .390 win% on an out, and .543 on a hit
- play to score, meaning .384 win% on an out, and .593 on a hit
- worst of both worlds, meaning .384 win% on an out, .543 on a hit

If there was less than 10% chance of there being a hit, you do option 1, and play for the tag up. If there was at least 10% chance of there being a hit, you do option 2, meaning be pretty close to third base. Option 3 is not an option.

]]>Bill James has the New Season Score, which gives Scherzer 310 points to Lester’s 302 points. I have Max with 88 points to Lester’s 85 points. But as I’ve said, BEFORE we look at the points, we focus only on ERA and W (or W and some portion of Losses). And I think Lester’s overwhelming lead in ERA with the tiny disadvantage at best in W or W/L is enough that we don’t need to rely on the points system to break the tie.

Hence, Lester for the W.

And the same thing repeats itself in the AL, but flipped the other way. Porcello is 4 wins ahead of Kluber (with 5 fewer losses), while being behind him in ERA by 0.2 earned runs. Which is a rounding error, which would therefore give the win to Porcello has on these two stats. Similarly, Porcello is 6 wins ahead of Verlander (with 5 fewer losses), while behind Verlander in ERA by 2.5 earned runs. Is it enough of an advantage for Verlander that we want to go to the secondary stats (of which Verlander has the advantage because of the strikeouts)? It seems to me that it doesn’t need to get there, that Porcello has built up enough of an advantage in W and L, and his ERA gap is just over the line as to not make a difference.

In terms of the New Season Score, Porcello is way ahead of not only Verlander and Kluber, but also Happ, who is ahead of these two guys. But Porcello is behind Britton, 327 points to 309 points. In terms of Cy Points, if we needed to go there, it’s Verlander at 78, Porcello at 74, Kluber at 73. Given that Porcello will either get 1st or 2nd place votes, while Britton can place anywhere on the ballot, or even not appear on a ballot, Britton could get more 1st place votes for Porcello, but that won’t matter overall.

Hence, Porcello for the W.

The tough part is the more down ballot picks, and I’ll come up with my official prediction tonight.

]]>Cueto’s story is that he’s at 18 wins, putting him in 3rd place behind the big 2. Fernandez (RIP) and Bumgarner are the only other ones with impressive K totals. Syndergaard has impressive K rates and the better ERA. And Kershaw has his own abbreviated story.

Guys at the bottom of the pack just won’t get any love: Roark, Arrieta, Carlos Martinez. For example, whatever Arrieta’s got going for him, Cueto is a bit above him. Roark is supplanted by Thor for the same reason, and Martinez brings up the rear with every other pitcher.

So, we can see 2 of the slots going to Bumgarner, Fernandez (RIP), Syndergaard, Kershaw based on individual preferences. Cueto? Well, I can see how he places third from this group of 5 on each ballot. But with only 2 spots on the ballot, I can see Cueto getting almost no love at all.

This is how the readers see it as well:

https://twitter.com/tangotiger/status/782596425402224640

]]>

In 2016, it’s not just two pitchers, but six, that are in contention. On the one side, you have Porcello and Happ, where their argument is mostly win-based. Porcello is 22-4, 3.15, while Happ is 20-4, 3.18. Porcello is 28 IP ahead, and 26 K ahead. Because Porcello is ahead across the board, Happ will get relegated to lower-ballot status. Porcello is the one that will get 1st place and 2nd place ballots, and maybe some third, while Happ will get his ballots 3rd through 5th, and maybe off the ballot for several voters.

On the other side we have the more traditional leaders: Verlander, Kluber, Sale. Similar to Verlander in 2012, their ERA is close to Porcello (3.10 to 3.21). I think the closeness of the ERA, not to mention close to the league leaders Tanaka at 3.07 and Sanchez at 3.06 (albeit at under 200 IP for both) will simply mean that the ERA is going to be a wash. They are all effectively tied. Which leaves the 1st place voting based just on W, which will go to Porcello.

But for 2nd place, with Porcello out of the mix, things get more interesting. Will the same thought process take place, and then just hand it to Happ? Or, having already given it to Porcello, there’s going to be more balance to the ballot. And that’s where the Cy points system comes into place. And that calls for Verlander and Kluber to get 2nd and 3rd (though we’ll see tonight if Verlander has a bad game or not).

Then 4th and 5th goes back to Happ v Sale, and that looks like 2012 Price v Verlander. Happ is 3 wins ahead of Sale, and his ERA is 3.18 to 3.21. Both squeakers of a win, just like Price/Verlander. It doesn’t stop there. Sale has a 27 IP lead, similar to Verlander/Price, and 64 K lead. Not to mention the CG. It’ll be close between the two.

Then we have Britton, who will get votes from 1st through 5th, and then off ballot. If we try to come up with how the voting will go by 1st through 5th, here’s one that is off the top of my head:

Pts 1 2 3 4 5 Off Pitcher 161 14 15 1 0 0 0 Porcello 103 7 4 8 5 4 2 Verlander 89 5 4 6 7 6 2 Kluber 65 4 4 4 3 3 12 Britton 42 0 3 4 6 6 11 Sale 38 0 2 5 5 5 13 Happ 10 0 0 1 2 3 24 Tanaka 10 0 0 1 2 3 24 Sanchez

Is this how the voters are going to think? Take Porcello out of the mix, and it might be Happ who would otherwise win it. But, since Porcello is getting all the W-L votes, Happ won’t get similar consideration. Britton is going to get sprinkled in all over, but as you can see, with so many viable candidates, and only 5 spots, there’s going to be plenty of pitchers left off ballots. And I haven’t even brought up Hamels and Quintana either.

And remember, not all voters are voting. The pool is made up of hundreds of potential voters, of which 30 are selected for the AL Cy. So, it’s very easy to see how you can randomly choose 30, and from those 30, it’s Britton that ends up in 2nd place, or Britton that ends up in 7th. With not much to distinguish them, it becomes a tough job to predict.

Anyway, after tonight I’ll come up with my final forecasts.

]]>The big problem with calculating the replacement level is that the pool of players is known (but not tabulated) PRIOR to the season, but the performance of those players is known AFTER the season. Think of starting pitchers. Every team starts with a rotation of 5 starting pitchers. Everyone else is a “replacement”. But, you get into issues. Players starting the season on the disabled list. Those in fact were already replaced prior to the season. So, we’ve already got a problem. Then we got young players in the minors, who are not what we consider “replacement” players. A replacement player is a lost-cost, easily available player.

Some 15 years ago, I was advocating for the term to be called “RAT”, or readily-available talent. The acronym didn’t stick… fortunately. We also bandied the term “FAT”, for freely-available. Unfortunately, because we chose the word “replacement”, many deny us the freedom to define the word, simply because the word itself is so commonplace.

Anyway, so, we’ve got huge hurdles when it comes to selection sampling and bias. But, let’s presuppose we know what the replacement level is, historically: win% of around .300. How can we reverse-engineer that number?

Well, if you take all the players with fewer than 60 innings and fewer than 300 plate appearances, you’ll end up with about 25% of the population of playing time. And those players had a win% of .295. Indeed, that win% is almost exactly the baseline that Fangraphs and Baseball Reference uses.

Those players are of course not ALL readily-available players. Some are, some are starters who lost their spot, some are late season hot shots. And missing in there are the readily-available players who ended up playing a larger role. All to say that if we select the bottom 25% in playing time, we end up with a win% that conforms to the replacement-level win%.

Knowing that, we can now apply that method, year by year, and see how the 25% of players with the fewest playing time performs. And we can therefore come up with a year by year, or at least era by era, replacement level.

More to come…

]]>If you want an exaggerated idea: if we had a 100 man roster, the replacement level player would not be the 3001st best player in baseball.

Bill James has introduced Variable Replacement Level. I’m glad he’s given the idea the exposure it needs.

]]>- 0.348 1B
- 0.348 3B
- 0.345 2B
- 0.342 DH
- 0.340 RF
- 0.336 CF
- 0.332 LF
- 0.325 SS
- 0.316 C

Soooo…. most offensive talent is found at 1B, with 3B a rounding error away, and 2B hot on their heels. More offense is in RF than CF, but not by much. And more offense in CF than LF, which is totally out of character of the fielding spectrum. Indeed, LF are barely outperforming SS.

If you want that in runs, catchers are 12 runs in offense below average, which is in keeping with its mirror on the fielding side.

SS are 6 runs worse on offense, which is sort of with its mirror on the fielding side.

CF is league average on offense, sort of its mirror on the fielding side.

And that’s it. Everything else is not a match. LF is 3 offensive runs worse than league average, when the fielding spectrum would suggest it should be 7 runs better than league average. That’s a 10 run swing suggesting that managers are unable to find good hitters there. They certainly aren’t putting good fielders there. Indeed, they can’t find too many regular LF to begin with. It’s become the defacto platoon spot on the field.

RF is only 2 runs better than average on offense, or about 5 runs away from where it should be. But 2B and 3B both better hitters than RF at 5 to 6 runs better than average? That’s an 8 run swing the other way. And it’s not like they are putting crappy fielders there either. 2B and 3B is overflowing with talent.

1B is 6 runs better than average on offense, which is not that much. So, this is quite a shakeup in thinking by MLB managers.

]]>Think about this: you hit a ball that the SS catches on the fly, and that’s a liner. But, if he is standing 10 feet farther back, and one-hops it, that’s a ground ball. Same exit speed, same launch angle, same spray angle, same spin rate, same spin axis, same bat channel. But where the SS is positioned and how he plays that ball determines whether it’s a grounder or liner. Analytically speaking, this doesn’t help me at all. It helps the VIEWER, because the viewer is outcome-driven. Does it help me as an analyst? Not at all.

How about infield flies and popups? What ARE those? Well, from a viewer standpoint, you can define it however it is that you need to get the message across. A 300 foot flyball that took 8 seconds to reach the LF who didn’t move one foot? Sure, call that a “popped-out”. A SS that is standing 150 feet from home plate who doesn’t move, but is otherwise standing behind the infield dirt: do you want to call that an “infield” fly or not? I don’t know. Whatever it is that helps the VIEWER.

But analytically? SS and 2B position themselves normally 130 to 160 feet from home plate. SS and 2B are infielders. So, an air ball that is in the air for 5 seconds that travelled 150 feet would need to be an infield air ball of some sort, whether you call that an infield fly or infield popup or whathaveyou.

Alot of what we do is ORGANIZING data. That is what Barrels was all about: how do we organize well-struck balls? Everything we do is to try to organize data into some set of manageable categories. Without that, we are left with 100,000 unique batted balls (which they are, like snowflakes). But, that doesn’t help anyone.

So, first figure out if your audience is the VIEWER or the ANALYST. Then categorize appropriately. And if you have two masters, then you need two definitions. That’s just the way it works. As best you can, try to get them to overlap as much as possible. But sometimes, you can’t. And that’s when you need to create dual sets non-dueling metrics.

For whatever it’s worth, I mark any infielder that is playing more than 220 feet from home plate as “outfield”. And any outfielder that is playing less than 200 feet from home plate as “infield”. Again, it doesn’t matter what exactly is an infield and outfield. They are just useful terms. I just know that analytically, drawing the lines like that allows me to organize the data in the way that I need it.

]]>

**Part 1**

The currency that you should strive for is wins. So, let’s start with W and L. Let’s assume that the replacement level win% is .385. So, in order to get a win-level metric using W and L, you would simply do:

W - .385*(W+L)

Which is .615 W - .385 L

**Part 2**

Next up is IP and R. Let’s assume that replacement level is 6 R / 9 IP. So, to convert IP and R into runs above replacement, we’d do IP * 6/9 - R. To convert to wins, divide all that by 10.

Which is: IP * .067 - R * .10

He also uses ER, so replacement level for that is 5.50 ER / 9 IP. Repeating like above

Which is: IP * .061 - ER * .10

The average of the two is: IP * .064 - R * .05 - ER * .05

And instead of IP, let’s use IPouts, which means that there are 3 outs per inning. So, we finally get:

IPouts * .021 - R * .05 - ER * .05

**Part 3**

Now we have SO and BB. This one is alot easier, since SO = BB is replacement level. Hence, we do SO - BB. To convert to runs you multiply by 0.3 and to convert to wins you divide by 10.

Which is: (SO - BB) * .03

**Part 4**

Now we want to know how to weight all of that. We can just add them all up, so that they each get an equal weight. But, let’s say we want to weight them as 50%, 33.3%, 16.7%, with the IP/R having the most weight and the SO/BB having the least amount of weight. Applying these weights to the above and we get:

IPouts * .0105 - R * .025 - ER * .025

+ .205 W - .128 L

+ (SO - BB) * .005

**Part 5**

Let’s rescale so we don’t have all those decimals. So, we’ll multiply everything by 40. It doesn’t matter what you multiply by, as long as everything gets multiplied by the same thing.

IPouts * 0.42 - R - ER

+ W * 8 - L * 5

+ (SO - BB) * .20

**Bill James says…**

Bill sent me his formula. For this post, I’ll remove the part that deals with Saves and Games Finished. Here’s there rest of his metric:

(Thirds of an inning pitched * .425 – Earned Runs Allowed - Runs allowed)Plus 8 times WinsMinus 5 times LossesPlus (Strikeouts Minus Walks) divided by 5

And that’s how a metric is born. His metric can be perfectly explained if you think in terms of Wins Above Replacement. If you like the weightings of the three components, then you’ll love his metric. Indeed, his metric seems pretty consistent with my Cy Young predictor, in terms of focusing on IP, ERA, W and SO. He considers a bit more, so it has a bit more practicality. But it would seem that it should do quite well as a Cy Young predictor.

So, when you see a pitcher with a Season Score of 320, just divide that by 40. That pitcher would be an 8 WAR pitcher.

I’ll look at relief pitchers in part 2, just as soon as I can figure out how he got there.

]]>Position by position, the best fielders:

- C: Posey, Perez
- 1B: Belt, Goldy, Rizzo
- 2B: Baez, Pedroia
- SS: Crawford, Simmons, Russell, Lindor
- 3B: Arenado, Machado, Beltre
- LF: Marte, Gordon
- CF: Kiermaier, Bradley, Hamilton, Inciarte, Marisnick
- RF: Heyward, Betts, Puig, Eaton

Among players who spent most of their time at DH, but still played the field, the best was Pujols. But, he would rank right around average among 1B.

Thanks to everyone for their participation! Polling will close some time early in the playoffs. Special thanks to Deadspin and their very energetic readers who added nearly 600 ballots in under 24 hours, to bring us to over 1300 ballots. I received 3400 links from Deadspin, so converting nearly 20% of those into voters is astounding. The next highest link was from McCoveyChronicles, with 360 links and Fangraphs at 180.

The success of this project is entirely made up by word of mouth, so I thank everyone who helped spread the word or otherwise submitted a ballot. Thank you!

]]>

Read all about it from Mike. Note that while the zone is generally a .500+ BA, 1.500+ SLG, its genesis is based on wOBA. Because the story could be explained without referencing wOBA, while still generally agreeing on wOBA, we can eat our cake.

]]>