Skip to Main Content
Fan Engagement, Industry Analysis Articles, Team Performance

State of Analytics Series, Part I: The Past, Present and Future of Advanced Metrics in Baseball

By: Taylor Bechtold

Some insist the analytics movement that spread across baseball in the early 2000s, inspiring a best-selling book and a hit movie, didn’t just change the game but ruined it.

Others stand by a belief that the influx of data and technology has only made Major League Baseball front offices, coaches, players and fans smarter.

The fact is: The practice of looking at statistics differently isn’t a 21st-century development. It’s believed to have started to an extent in the 1950s, when some innovative people began to recognize that the traditional ways of evaluating player performance often did not tell the whole story.

Revolutionary executive Branch Rickey, who, of course, signed Jackie Robinson and created the minor league farm system, is also considered a pioneer in the use of statistical analysis for writing an article in Life magazine about an early version of on-base percentage.

Starting in the ’80s, fan and aspiring writer Bill James attempted to expand the thought process beyond the numbers on the back of a baseball card and into what he called the “ever-expanding line of numerical analysis.” James eventually worked with STATS, Inc. – now Stats Perform – to publish books about his revolutionary statistics.

Bill James in 2008

Because of his work that introduced statistical innovations such as runs created, range factor, win shares, Pythagorean winning percentage, game score, similarity scores and secondary average, he would become known as the Godfather of Sabermetrics – the Society for American Baseball Research (SABR).

James inspired others to follow with their own ideas, statistics, formulas, articles and books, like John Thorn, Pete Palmer and David Reuther’s ‘The Hidden Game of Baseball: A Revolutionary Approach to Baseball and Its Statistics.’ The flood of new information continued to evolve in the ’90s and accelerated from there. It became apparent that statistical analysis wasn’t just something writers were doing to sell books or fans explored in an effort to win their fantasy leagues.

Beyond the now popular OPS (on-base plus slugging percentage) and WHIP (walks and hits per innings pitched) stats, one of the first big advancements was WAR (wins above replacement), which attempts to take the disparate things a player does on the field and put them into one number. But the analytical community quickly realized that one-number stats aren’t able to portray an entire picture, and analysts moved away from WAR and toward asking smaller questions with smaller answers and different stats that tell you something different.

“I think (the data) just gives you a sense of context and a sense of where this belongs in history and I think that adds a story to tell. It doesn’t detract. It’s part of telling the whole story.” – veteran baseball analytics writer Eno Sarris

There was deeper value in the information, and eventually, a few desperate teams sought to mine success out of that data.

Author Michael Lewis propelled advanced analytics into the mainstream with the 2003 book ‘Moneyball: The Art of Winning an Unfair Game’ chronicling how the Oakland Athletics used metrics to assemble a contending team despite being limited by one of MLB’s smallest budgets. Tom Tango and Mitchel Lichtman’s ‘The Book: Playing the

Billy Beane talks with the media at the annual GM Meeting in 2013.

Percentages in Baseball,’ originally published in ’06, aimed to take the work of James and The Hidden Game to the next level. The film adaptation of Moneyball, however, gave baseball analytics its greatest exposure yet in ’11 and was nominated for six Academy Awards, including Brad Pitt for playing the role of A’s General Manager Billy Beane.

Oakland averaged 94.9 wins and reached the playoffs five times while only ranking higher than 21st in Opening Day payroll once between 2000-06. Beane turned down a lucrative offer from the Boston Red Sox, who believed he could help them get more out of a payroll that ranked in the top seven in each of those years. Beane’s rebuff wasn’t going to stop the Red Sox and other determined franchises from trying to get close to Oakland’s win-per-dollar ratio.

The Red Sox would hire a young Yale graduate with a law degree named Theo Epstein, as well as James as a senior advisor. In the years that followed, every organization would build an analytics department filled with full-time data scientists with advanced degrees in computer science, physics, mathematics or similar. Coverage of the game followed after the A’s model, with websites dedicated to analyzing statistics like Baseball Prospectus and FanGraphs becoming mainstream.

“I think (the data) just gives you a sense of context and a sense of where this belongs in history and I think that adds a story to tell,” veteran baseball analytics writer Eno Sarris, formerly of FanGraphs, told Stats Perform. “It doesn’t detract. It’s part of telling the whole story. I think numbers are part of telling the whole story.”

Perhaps what’s most astonishing about the analytics revolution is how much the data has influenced the actual way the game is played on the field. With every franchise analyzing the metrics, baseball has evolved over the past 20 years in completely unexpected ways. And aspects of the game that used to be commonplace are now enduring a slow death.

“Bunting is usually a waste of time. If you think about it, a bunt is the only play in baseball that both sides applaud. So what does that tell you? Nobody’s really winning here.” – Bill James

There was a time not long ago when a team’s ability to “manufacture runs” was considered to be closely aligned with its success. A club that could get someone on base, get that player over to second with a bunt or stolen base and then find a way to drive him in from scoring position could win some games. Now, a team that has to resort to manufacturing runs is likely to be considered a punchless club with more than its share of offensive troubles.

The data reveals that bunting – particularly sacrifice bunting – became the primary method of moving runners along from 1903-30. The total number of sacrifices eclipsed 2,000 each of those seasons and maxed out at a whopping 4,441 in the peak of the dead-ball era in 1915 before Babe Ruth, the game’s first slugger, changed everything four years later by belting a then league-record 29 home runs.

The emergence of the power hitter led to not only a decrease in sacrifice bunts, but also stolen bases, which dipped under 2,000 in 1920 and did not eclipse that mark again until the ’70s when Lou Brock, the success of the speedy A’s and the arrival of Astroturf in large ballparks made the game faster and sparked a steals rebirth. That continued into the ’80s when Ricky Henderson swiped a record 130 bags in ’82 and Vince Coleman stole 110, 107 and 109 bases from 1985-87.

But as the analytics movement began to take hold in the ’90s, MLB experienced a slow drop in stolen base attempts. After failing to surpass 3,000 only once in the ’90s, total steals only hit that mark once between 2000-09. Stolen bases have also decreased in each of the past four years, ending up at 2,280 in ’19 – the lowest in a non-shortened season since 1973.

Sacrifice bunts, meanwhile, have experienced an even more dramatic fall, decreasing in each of the past eight years from 1,667 in 2011 to an all-time low of 776 in ’19. Additionally, teams have seemingly begun to phase out players who need to bunt frequently in order to get on base.

So with stealing and bunting at historically low levels, why is manufacturing runs slowly becoming a thing of the past? Well, the data has helped organizations come to the realization that it just doesn’t work.

James explained why in a 2011 NPR interview, even as stolen bases were making a slight comeback at the end of the steroid era and reached 3,200 in ’10 and ’11: “Stealing bases adds some runs, but very few, and you lose most of the runs that you gain by having runners caught stealing. And hitting in (the) clutch is unpredictable and unreliable. The way you really score more runs is by getting more people on base.

“Bunting is usually a waste of time. If you think about it, a bunt is the only play in baseball that both sides applaud,” he added. “So what does that tell you? Nobody’s really winning here.”

Or, as Pitt said while portraying Beane in the movie version of Moneyball: “If someone bunts on us, just pick it up and throw it to first. They’re giving you an out – just giving it to you. Take it. Say thank you.”

He also says: “No more stealing. I pay you to get on first, not get thrown out at second.”

The reason being that stealing is only valuable if a baserunner is successful a certain percentage of the time. At the end of the 2019 season, the break-even point of stolen base percentage was 68.7. So a baserunner who steals at a higher percentage than 68.7 provides positive value and one that has a lower percentage provides negative value.

Led by Christian Yelich, who was 30 for 32 on stolen base attempts, only 29 qualified players finished the ’19 season with a stolen base percentage above the break-even point. There were 58 such players in 1987, when MLB set a modern-era record with 3,585 steals.

The inflow of advanced analytics also led to a major revolution in the field. Pitch framing – how proficient a catcher is at turning a ball into a strike by how he presents the pitch to an umpire – has become a major part of backstop evaluation. Stats Perform has developed its own way of measuring that skill with its Framing Runs and Called Strike Above Average metrics.

“No grounders. Ground balls are outs. If you see me hit a ground ball, even if it’s a hit, I can tell you: It was an accident.” – third baseman Josh Donaldson

There’s also data that can reveal where opposing batters are likely to hit the ball, so teams began to tinker with placing extra defensive players in those areas, even if it meant taking them out of their traditional positions. As a result, the defensive shift – considered an oddity when clubs moved three infielders to one side of second base against Ted Williams 70 years ago – became common.

After the strategy started to spread, the number of plate appearances that ended with an infield shift rose from 8,505 in ’12 to an all-time high 39,484 in ’19. Over this span, the percentage of plate appearances that featured shifts skyrocketed from 4.62 in ’12 to 21.17 in ’19.

In this game, every action has a reaction. For every strategic innovation, there’s a countermove by the opposition. In this case, hitters sought to avoid the defensive shift while also taking better advantage of their strength, smaller ballparks and an arguably juiced – or at the very least, lively – baseball. The infatuation with launch angle was born.

In 2017, the Washington Post reported that many hitters cited the shift as the primary reason they chose to focus on hitting the ball in the air after years of hitting coaches telling them to hit the ball down. High launch angles spread across the league like wildfire, and the average launch angle – the angle at which the ball flies after being hit – rose from 10.1 degrees in ’15 to 10.8 in ’16, 11.7 in ’18 and 12.2 in ’19, according to baseballsavant.com.

We’ve come a long way since the practice of chopping down on the ball (known as the Baltimore Chop) was developed in the 1890s.

Josh Donaldson

“No grounders,” third baseman Josh Donaldson, almost echoing the way Pitt’s character spoke about bunting, told the Washington Post. “Ground balls are outs. If you see me hit a ground ball, even if it’s a hit, I can tell you: It was an accident.

“If you look at a baseball field and look on the infield, there’s a lot of players there. You look in the outfield, there (are) fewer players and more grass. So if you hit it in the air, even if it’s not that hard, you have a chance.”

Players are indeed subscribing to that philosophy as the percentage of ground balls hit in 2019 was just 43.5 – the lowest since Stats Perform began recording the data in 1987. As one might expect, singles per plate appearance also dipped to an all-time low at 13.9%, as did triples – long considered one of the game’s most exciting plays – at just 0.4%.

On the other hand, home runs were hit in a remarkable 3.6% of all plate appearances and a single-season record 6,776 were slugged overall. Doubles per balls in play (in many cases, just near-homers) also reached an all-time high at 6.8%.

Coupled with the premium teams have placed on high-velocity pitchers during the 21st century, the launch-angle approach also led to an all-time record 23.0% of all plate appearances resulting in a strikeout. This rate has now risen in a somewhat disturbing 14 consecutive seasons.

SeasonHRs%PA=HR2Bs%BIP=2B1Bs%BIP=1BKs%PA=K
19156350.54,5324.223,52421.914,11510.2
19251,1691.24,3375.518,23723.26,6876.9
19351,3251.44,2655.417,49722.38,0168.3
19451,0071.13,4974.616,74521.98,0518.5
19552,2242.33,2514.415,43521.010,82511.4
19652,6882.24,1994.619,27821.119,28315.7
19752,6981.85,4444.824,83421.919,28013.0
19853,6022.26,4235.325,78821.222,45114.0
19954,0812.66,9586.125,11222.025,42516.2
20055,0172.78,8636.529,22421.330,64416.4
20154,9092.78,2426.428,01621.737,44620.4
20196,7763.68,5316.825,94820.842,82323.0

The Tampa Bay Rays, whose analytical ways were chronicled in Jonah Keri’s 2011 book ‘The Extra 2%: How Wall Street Strategies Took a Major League Baseball Team From Worst to First,’ began thinking about ways to counter launch angle as soon as it arrived. The Rays were considered to be at the forefront of the movement that believed that throwing high fastballs at high velocity could suppress the hitting approach.

In 2019, Tampa Bay seemed to finally have the pitching staff – one that ranked sixth in the majors with a 93.97 average mph on fastballs – to thrive using that strategy, throwing a league-high 45.7% of all fastballs either up in the strike zone or up and out of the zone. It proved to be effective as the Rays finished with the third-most strikeouts (1,621), fourth-fewest walks (453) and allowed just 181 home runs – the fewest in the majors.

Overall, however, the latest development in the ongoing evolution of the game has created an unwanted side effect, one that Commissioner Rob Manfred hopes to remedy. More strikeouts have brought more pitches and more pitches have brought longer games. MLB set a record for the fourth straight year with 3.92 pitches per plate appearance. And fewer balls are in play because of the record number of Ks, the historic home run rate and less foul territory in smaller parks.

Baseball commissioner Rob Manfred

In 2018, Manfred implemented several changes after the average nine-inning game lasted three hours and eight minutes in 2017 – up from 2:46 in ’05. The league imposed limits on mound visits, shortened commercial breaks and eliminated pitches on intentional walks. Enforcing a pitch clock and/or limiting a team’s ability to make pitching changes could be next.

As for what’s next in terms of the ever-changing world of analytics, well, much like the defensive shift, some executives feel the next step in defending the historic fly ball rate is to implement a four-man outfield more often. The alignment has increased dramatically over the past two seasons after only being used once in 2017 and the Rays even utilized this positioning against Oakland’s Matt Olson in the ’19 American League Wild Card Game.

Manfred, though, has floated the idea of banning shifts and the debate on the subject could be revisited at some point.

While people like Beane have traditionally been behind baseball’s analytics movement, that has changed more recently with data and technology companies contributing to the next phase. Stemming from its statistical roots and early work with James, Stats Perform has continued to innovate in terms of both analytics and AI- and data-powered products and solutions. Its advanced analytics team, global research group, vast historical database and detailed live data have helped enhance baseball writers’ analysis and played a major role in networks’ ability to take the broadcast into the modern era.

“We knew we had the data, so we asked the question, ‘Can we quantify and measure something people haven’t been able to figure out for a long time?’” – Stats Perform AI Data Analyst Kyle Cunningham-Rhoads

Stats Perform’s Pitch Intent data is revolutionizing the way players are being evaluated. Using its TVL data – which tracks pitch type (T), velocity (V) and location (L) – to establish zones where the catcher is setting up and measure the inches between that marker and where the pitch is actually thrown, Stats Perform is able to formulate its Command+ metric, which is a pitcher’s average miss adjusted for pitch type, along with the innovative Discipline-, Whiff+ BIP-.

“Things like Command+ have come from an internal curiosity,” Stats Perform AI Data Analyst Kyle Cunningham-Rhoads said. “We knew we had the data, so we asked the question, ‘Can we quantify and measure something people haven’t been able to figure out for a long time?’ We’re essentially answering the scouting question about which pitchers have the best command.”

Pitch Intent data also helps power Stats Perform’s Advanced Heat Maps, which provide a unique visual display of batter and pitcher performance designed for player management and fan engagement, and the deeper BIP+, Contact+, Discipline+ and Raw Value hitting statistics.

Stats Perform tackles the traditionally challenging practice of measuring defensive ability beyond fielding percentage with its Clean Fielding Percentage. While Defensive Runs Saved and Range Factor have been criticized for their flaws, Clean Fielding Percentage accounts for not only plays in which the team is charged for an error but also aspects of that play and others that aren’t ‘clean’ even though it may not be an error.

There has been another evolution in terms of tracking data in baseball, beginning with the implementation of PITCHf/x in the 2006 postseason. It has been revealed that umpire accuracy actually improved in the years after the debut of the system, which tracks the speed and trajectories of pitches.

Major League Baseball Advanced Media took a step into the world of player tracking in 2015 when it installed the Statcast system – a combination of radar technology and tracking cameras – into every big-league ballpark. Statcast is able to capture things like a pitcher’s velocity and spin rate, a hitter’s exit velocity and launch angle, the max speed and route efficiency of outfielders and a batted ball’s catch probability.

Teams are supplementing player evaluations with the data, writers are using it to enhance their descriptions of players and events, and broadcasters are taking advantage by overlaying Statcast graphics on replays and highlights.

Sarris believes the next big thing has to do with the game moving away from radar and toward optical technology. Optical technology has the ability to analyze players’ body and limb movements in a way radar never could. The hope is that the technology can not only tell when a pitcher is fatigued because of a change in release point or when he needs to make an on-the-fly adjustment in his mechanics, but also act as a way to prevent and/or predict injuries.

“I’m trying to learn about biomechanics because you’re going to learn about optimal uses of the body,” he explained. “Then, we’re going to be able to say more definitively things about where an arm should be, where the bat should be at a certain moment of the swing, and we’re going to have more data that relates to that sort of thing in the public and private sphere. We’re going to be talking more about how bodies move in space.”

Baseball analytics writer Eno Sarris with pitcher Jeff Samardzija

Sports Illustrated recently supported Sarris’ theory by describing the race that’s going on amongst teams to develop hitters through data and biomechanics – technology that’s behind what pitchers have been utilizing.

The Chicago Cubs hired Justin Stone to be their director of hitting after one year as the team’s biokinematic hitting consultant, and more tech-based coaches are being brought in as hitting directors and strategists than ever before. Stone has a training facility that uses electromyography (EMG) – which measures how well muscles fire during a swing – reengineered batting tees, ground force plates and 3D kinematic sensors.

“The future is to generate tracking, pose and event data directly from the broadcast video.” – Stats Perform Director of Computer Vision Sujoy Ganguly

While teams continue to innovate with staff and methodology, service providers are pushing forward with technology.

At the 2019 MIT Sloan Sports Analytics Conference, Stats Perform Director of Computer Vision Sujoy Ganguly discussed how Stats Perform is expanding the availability of tracking data and deepening the quality of data by providing human pose estimation – finding the players’ skeletons in the frame. This potential game-changer has the capability to show, for example, the variability of how a pitcher’s leg motion or arm angle might change depending on the pitch he’s throwing or time in the game.

“The future is to generate tracking, pose and event data directly from the broadcast video,” Ganguly said. “Over the last five, six years, there has been a revolution in computer vision technology and deep learning has really unlocked the level of detail that we can extract from the pixels, from images, specifically human pose estimation.”

It’s obvious that technology is spreading across baseball fast – and with it, the next wave of analytics.

Advanced analytics and data analysis provided by Stats Perform’s Lucas Haupt.