Why small sample sizes matter

Mar 6, 11:36 AM

Opportunity isn't a given. When a baseball player signs a contract, there is no language that ensures the player will be given ample opportunity to succeed and work through struggles. It's why the option system exists, so teams are free to churn players at the end of the roster, looking to catch lightning in a bottle.

Mathematically, this makes little sense, of course. Chasing small samples with more small samples does not give you a larger, more reliable sample. Roster churn for the sake of "finding something that works" is an exercise in randomness, albeit one that occasionally pays off.

Despite the proliferation of sabermetric analysis in baseball, teams and players still mostly operate in inefficient ways. Decision making will never be perfectly rational in baseball, owing to tradition, moving statistical targets and, perhaps most importantly, psychology.

Dan Johnson knows what we're talking about. Casper Wells definitely knows what we're talking about.

Casper Wells, Victim of Impatience

There may be no player who has ever understood the importance of small samples as well as Wells, who essentially renamed the MLB waiver wire "The Wells Wire" in 2013. Have a look:

  • April 10 - Claimed from Seattle by Toronto
  • April 22 - Purchased from Toronto by Oakland
  • April 29 - Purchased from Oakland by Chicago (AL)
  • August 8 - Claimed from Chicago (AL) by Philadelphia
  • October 18 - Granted free agency

Wells was scooped off of waivers on four occasions in 2013, never receiving more than 71 plate appearances in a single stop. He also didn't a play a game in the minors. He spent the year drifting from bit role to bit role, never receiving a sustained opportunity to impress.

There's his career track record, of course - he's capable of playing all three outfield positions and has a career wRC+ of 93, a shade below league average but appreciable in a reserve role thanks to his 115 mark against left-handed pitchers - but a bench boss is unlikely to care what Wells did back in 2010.

And so Wells was given small sample after small sample with which to impress. It didn't turn out.

TeamPAOPS
Seattle0n/a
Toronto0n/a
Oakland5.000
Chicago (AL)710.407
Philadelphia260.199

If you're Wells, you know you're better than a .380 OPS on the season. If you're one of Wells' many managers, however, there are a hundred more Casper Wells, and if he doesn't hit the ground running you owe him nothing.

And so if you're Wells and you're sitting in a 3-0 count, there's no way in hell you're swinging and risking upsetting your manager. You don't have the cache to take a close strike 0-2 to try and improve your position in the count. Fall in line and add value or pack your bags (again).

Joey Votto, All-Star Economist

There are superstars who don't face the torture of living and dying by small-sample performance, however. Though they face the same decisions these others do, they are afforded the luxury of analyzing them through a friendlier lens.

When Cincinnati Reds first baseman Joey Votto steps to the plate with a runner on second and one out, no matter how knowledgeable the fan or manager, everyone wants a hit. It matters little that the run expectancy from last season would suggest a walk is a very valuable outcome, even with a man in scoring position.

(Run expectancy is the average number of runs a team would "expect" to score given the situation on the bases and the number of outs. For example, in the first row below, a team is "expected" to score .637 runs in an inning in which they have a man on second base with one out.)

SituationRun Expectancy
Man on 2nd, 1 Out0.637
Man on 2nd, 2 Outs0.305
Man on 1st and 2nd, 1 Out0.882

Even though a free pass for Votto adds roughly a quarter of a run to the team's expected production that inning, many would complain that Votto didn't drive in the run, which he's "paid to" do. Nevermind the fact that in plate appearances Votto doesn't walk, he gets out 70 percent of the time. 

That's still very good, but even if we assume any hit would score the runner, a walk is a preferable outcome (here we assume Votto has the "choice" to walk or swing, for the purposes of illustration):

DecisionOutcomeProbabilityRun ExpectancyNet
WalkWalk100%0.8820.245
HitSingle20.3%1.4930.856
HitDouble5.1%1.6371
HitTriple0.5%1.8941.257
HitHome Run0.4%2.2491.612
HitOut70.1%0.305-0.332
HitNET100%0.6190.005

Votto's choice to "go for it" rather than take a walk, assuming such a decision is that easy, costs the team nearly a quarter of a run, on average.

But again, a walk doesn't score the runner, and the fact that a walk is seen as, if not a failure then less of a success, is a matter of psychology.

The Difference Between One and 184,872

Run expectancies are calculated based on an entire season's worth of data. They take all 184,872 MLB plate appearances from 2013 into account, and it's unnatural for any individual - actor or observer - to take such a longview with a specific situation.

If you're the manager, you may only have two or three opportunities in this game with men on base. If you're Votto - the worst example because he is so wont to take the statistical approach - you may only get four turns up this game, and just 700 plate appearances on the season, fewer than 200 with runners in scoring position. You can trust that the aggregate numbers will manifest themselves over the course of the season, but you can't, at all, trust that making the "optimal" decision will yield anything immediate.

It's not surprising, then, that managers and players tend to act "irrationally" in the face of statistics. Humans are not completely rational thinkers, as much as Econ 101 textbooks would like you to believe that, and there are ample biases in the way we process information and make decisions. 

Humans tend to engage in loss aversion, for example, where a loss hurts more than a gain of equal amount. That might seem contrary to the Votto walk example - a loss-averse batter would certainly prefer not to make an out - but it can explain behavior that appears sub-optimal in a particular at bat.

Bad-Count Behavior

At the MIT Sloan Sports Analytics Conference in Boston this past weekend, two researchers from Stanford University presented a paper that suggested batters and pitchers both act irrationally in 0-2 and 3-0 counts, respectively.

The paper's research is actually quite interesting, in that it found three very clear biases in the calls all umpires make. In short, umpires are far more likely to call a ball a strike in a 3-0 count and far more likely to call a strike a ball in an 0-2 count, owing perhaps to their desire not to directly influence the final outcome of the at bat. They are also averse to calling back-to-back strikes.

From the paper's abstract:

We find that the strike zone contracts in 2-strike counts and expands in 3-ball counts, and that umpires are reluctant to call two strikes in a row. Effect sizes can be dramatic: in 2-strike counts the probability of a called strike drops by as much as 19 percentage points in the corners of the strike zone. We structurally estimate each umpire's aversions to miscalling balls and his aversions to miscalling strikes in different game states. If an umpire is unbiased, he would only need to be 50% sure that a pitch is a strike in order to call a strike half the time. In fact, the average umpire needs to be 64% sure of a strike in order to call strike three half the time. Moreover, the least biased umpire still needs to be 55% sure of a strike in order to call strike three half the time.

Previous research from John Walsh at The Hardball Times backs this up:

Nevertheless, we can still see the large difference in the two strike zones. Here are the numbers:

CountStrike Zone (sq. ft.)
All3.09
3-03.52
0-22.42

Wow, the 3-0 zone is nearly 50 percent larger than the 0-2 zone.

The excellent book Scorecasting also found that calls "on the corner" swing dramatically with the count:

Count% "Corner" Pitches called Strikes
All49.9%
2-strikes38.2%
3-balls60%
0-231.5%
3-067.6%

The message from all of this research is clear: umpires are biased in counts when the plate appearance could end on their call, and knowing this could provide an advantage for hitters and pitchers.

Inefficient Decision Making or Psychological Self-Preservation?

Surprisingly, hitters and pitchers do not adjust to this knowledge. The Sloan research found that with two strikes, hitters should only protect the sides of the plate and be more willing to take a ball high or low, but that's not the reality. Likewise, pitchers should attack more closely 0-2 and feel free to play more 3-0, but they don't.

Let's look at Matt Carpenter's decision in an 0-2 count. He had the best OPS in 0-2 counts of any player in baseball last season, so he seems likely to be the most "extreme" actor in these situations.

SituationAVGOBPSLGOPS
0-2 Count0.280.2750.540.815
After 1-2 Count0.3060.3560.4470.802
0-3 Count0000

Carpenter's decision is easy, because his outcome doesn't really change much by moving an 0-2 count to 1-2. He should do what he's been doing.

But again, he's the best hitter in the league at 0-2. Let's consider the decision at the league level.

SituationPAAVGOBPSLGOPS
MLB, 0-2162670.1520.1610.2150.376
MLB, 1-2271330.1660.1730.2390.412
MLB, After 1-2517740.1790.2280.2710.499
MLB, 0-379740000

The last row is the rub: No matter how much an additional ball may help your situation, taking strike three means you are out. You're done. You can't run 10,000 simulations of the game from that point forward to eventually see the benefit of taking that pitch. You are out.

Loosening your protection of the strike zone may help over the long-haul, but the research did not show "umpires call strikes as balls 100 percent of the time in 0-2 counts." The trade-off of a small positive gain in the expected outcome seems to be heavily outweighed by the increased chance of taking a called strike three.

That hardly seems irrational.

From the pitcher's perspective, it makes sense as well. While the research shows they should attack more closely with an 0-2 count, they already know that batters are more likely to swing and protect with such a count. Grooving a pitch down the middle to avoid moving to a 1-2 count, when pitchers still hold an enormous advantage, seems a mistake.

At the same time, hitters rarely swing with a 3-0 count, so is it really worth playing around the edges for a four-pitch walk?

CountSwing %Swing % In Zone
0-249.5%85.3%
1-257.8%88%
3-06.9%9.7%

How Diverting from the Norm Factors In

In Thinking, Fast and Slow, psychologist David Kahneman discusses how the process of acting can effect people's decision making. For example, making a decision to introduce a chance of harm to yourself is valued as far more significant than not acting to save yourself from a chance of harm, even if the likelihoods and costs associated would otherwise be the same.

In other words, by actively doing something, the loss-aversion proposition changes some. You made the choice to do something, and that introduces much more dissonance than doing nothing.

This cognitive bias manifests itself in these 0-2 and 3-0 hitting and pitching situations. Consider the following two examples:

Example A - Down 0-2, Matt Carpenter watches a borderline pitch high, which gets called for strike three.
Example B - Behind 3-0, Adam Wainwright grooves one down the plate, and the batter unexpectedly swings and hits it for a single.

In each of these cases, the actor acted in a way consistent with the research. Over the course of a season, these choices may prove positive, but in these specific examples, each actor deviated from the accepted norm and it resulted in an undesirable outcome. That's a much tougher pill to swallow, psychologically, than being beat acting the way you're "supposed" to act.

It's also probably tough to justify to a manager or the media after the game - we see plenty of instances where coaches and managers in sport act sub-optimally out of self-preservation, and this could be more of the same.

Will Things Change?

Your conclusion from the research paper at this point may be, "so what was the point?" Well for one, the information is good to have. Even if it's not used on a pitch-by-pitch basis, knowing umpires are biased in particular ways is still good to know, especially if such biases are even larger in particular umpires. 

Some players, too, may be more willing to risk short-term losses for a gain over the course of the season. It's nice to be Joey Votto because he's very good at baseball, but it's also nice to be Joey Votto because he knows if he surprises and swings 3-0 and subsequently gets out, he has a 10-year, $225 million contract backing him up. He's in a position with a great deal of stability, where he can take the longview on such choices.

If you're Abraham Almonte and you're fighting for a reserve outfielder job, you can't really rest assured that narrowing your strike zone when down 0-2 is going to work out. It might, but if it doesn't, you probably don't get 700 plate appearances for the macro-trend to present itself. If you're Kyle Drabek and you're fighting for a rotation spot, grooving a 3-0 pitch that goes for a home run seems a lot worse than walking a batter on four pitches.

We hear often about small sample sizes and how they can distort reality. This is true. A player who goes 7-for-10 isn't going to keep hitting .700, and it's far too narrow a window to suggest there's an underlying skill improvement.

But for a large class of baseball players, small samples mean the world, especially at this time of year. They may bias our understanding of player performance, but for many it's the only performance they'll have. If you're called into the managers office and handed a ticket to Triple-A, your cries of "small sample size" will fall on deaf ears.

We can't realistically expect individuals to account for high-level, league-wide analysis, because only the very best players have the security to wait for micro-events to bleed into the macro.