Updated: Apr 22
Each year, I begin my fall semester class with a lecture called “Introduction to Medical Informatics: Putting it in Perspective.” This ‘sets the stage’ for the semester and addresses some of the content scheduled over the next 14 weeks.
In this lecture, I refer to the concept of informatics, medical records from the 17th century, visions of future healthcare, industry ‘fun facts’, costs, pandemics, a peek into data structures, big data, metadata, analytics, arguably the mother of informatics, Florence Nightingale, and one other seemingly unrelated concept relating to Major League Baseball.
In 2002, the Oakland Athletics (the A’s) were in the midst of an incredible run. Towards the end of that season, the team had won 19-straight games, with the most recent wins coming from walk-off heroics in the ninth inning on back-to-back days. They needed one more win to eclipse the American League record for consecutive wins set by the 1906 Chicago White Sox and the 1947 New York Yankees.
The Kansas City Royals were now in town aiming to deny number 20.
Let’s add context: The prior year, Oakland had another great season with 102 wins that put them in American League Division Series against the New York Yankees. The Yankees defeated the A’s in 5 games. But at the end of the season, Oakland lost more than that title, they lost three strong players: Jeremy Giambi, Johnny Damon, and Jason Isringhausen to free agency. These events set the story for Michael Lewis' bestselling book “Moneyball: The Art of Winning an Unfair Game”, and the later Hollywood film “Moneyball” produced by Columbia Pictures in 2011 starring Brad Pitt as ‘Billy Beane’.
[Like all Book-to-Movie creations, the film differs from the text in many ways to meet the audio and visual experience for the movie viewer. But the concept is true to form.]
Professional sports are big business. Teams, under a framework of the “league” such as the NFL, NHL and MLB, invest in players, coaches, and facilities to win. Historically, there has always been great disparity in team budgets based on the local market, owner aspirations and influences by the subsection of the league into conferences and divisions.
In the early 2000’s, the Oakland A’s were at the lightweight end of the wealth scale. Billy Beane, general manager, had a challenging task to rebuild the team with one of the skinniest budgets in MLB.
Between the 2001 and 2002 seasons, Billy and his scouting advisors have regular strategy meetings to discuss things such as player trades. Frustrated with the legacy approaches and cheap talk (in the movie) Beane clearly states the problem: “The problem we're trying to solve is that there are rich teams and there are poor teams, then there's fifty feet of crap, and then there's us. It's an unfair game. And now we've been gutted. We're like organ donors for the rich. Boston's taken our kidneys and the Yankees have taken our heart… We've got to think differently. We are the last dog at the bowl. You see what happens to the runt of the litter? He dies.”
In the movie, to the consternation of his scouts, Beane brings in a young Yale economics graduate, Peter Brand to apply a statistical approach termed ‘sabermetrics’ that scientifically analyzes and studies baseball, through the use of statistics, in an attempt to determine why teams, win and lose [The character Peter Brand was film-fabricated]. Peter believed baseball's conventional wisdom is all wrong. “…Baseball thinking is medieval. They are asking all the wrong questions...”
The premise behind the moneyball theory is an algorithm called the James’ “Pythagorean Theorem”. Bill James was an aspiring writer, obsessive fan and had a knack for statistics. James began writing baseball articles after leaving the United States Army in his mid-twenties. Many of his first baseball writings came while he was doing night shifts as a security guard at the Stokely-Van Camp's pork and beans cannery. Unlike most writers, his pieces did not recount games in epic terms or offer insights gleaned from interviews with players. A typical James piece posed a question (e.g., “Which pitchers and catchers allow runners to steal the most bases?”), and then presented data and analysis that offered an answer. The overall term was called Sabermetrics.
Sabermetrics is never mentioned by name in the movie. While the film depicts Beane bringing in ‘Peter Brand’ before the 2002 season, the actual person was Paul DePodesta who is noted in the book. Beane hired DePodesta as his assistant to incorporate sabermetrics in starting in 1998. Paul was an economics grad (but from Harvard, not Yale). Paul and Billy had been evaluating Sabermetrics and finally put it in practice for the 2002 season.
Paul was approached years later by Columbia Pictures about production of the film, but he did not feel comfortable in the spotlight after the book's release, nor did he care for the secrets revealed about his scouting methods and preferred to remain in anonymity for the big screen rendition. In 2004, DePodesta moved on and continued his career in sports management and coaching in both MLB and the NFL.
In the movie, Peter Brand summarizes: “There is an epidemic failure within the game to understand what is really happening. And this leads people who run Major League Baseball teams to misjudge their players and mismanage their teams. People are overlooked for a variety of biased reasons and perceived flaws: Age, appearance, personality. Bill James and mathematics cuts straight through that. Billy, of the twenty thousand notable players for us to consider, I believe that there’s a championship team of twenty-five people that we can afford, because everyone else in baseball undervalues them – like an island of misfit toys.”
He goes on to show Beane the “code he wrote” is a sports analytics formula to estimate the percentage of games a baseball team “should” have won based on the number of runs they scored and allowed.
The A’s proceed with recruiting and purchasing undervalued, bargain-bin players whom the scouts have labeled as flawed but have game-winning potential. Through a series of over-the-phone trades, Beane assembles a team of no-names who, on paper, can get on base and score runs. And win more games.
This was a theory pontificated that a team could produce wins by building a team that could compete with those with significantly higher budgets such as the Red Sox and Yankees. With approximately $41 million in salary, the Oakland A’s ultimately competed with larger market teams such as the Yankees, who spent over $125 million in payroll during the 2002 baseball season.
Under the constant scrutiny of staff scouts and friction from field manager, Art Howe, Brand performed data mining on hundreds of individual players, ultimately identifying statistics that were highly predictive of how many times they would get on base and score runs. These statistics weren’t those baseball scouts traditionally valued – or understood. Instead of competing for high-priced home-run hitters with high batting averages, he sought lower-cost players with high on-base percentages. The theory was that players with a higher on-base percentage would be more valuable than those with lower on-base percentage even when those with the lower percentage ultimately hit more home runs, were younger, faster, and stronger.
After squandering an 11-0 lead, the score was now tied at 11-11 in bottom of the ninth inning. Art Howe opted to pinch-hit for outfielder Eric Byrnes with first-baseman Scott Hatteberg. Hatteberg took a second-pitch breaking ball from Royals reliever Jason Grimsley, delivering the ball deep into the night and well into the right-field seats to send the Coliseum into pandemonium. Teammates mobbed Hatteberg at home plate as a giant banner dropped down from the centerfield rafters that simply read: 20. It was history for the Oakland Athletics. It was history for Major League Baseball.
Arguably, the 2002 Oakland Athletics were one of the most exciting teams in baseball history, winning 103 games and breaking the American League record for 20 wins in-a-row. It should also be noted that notwithstanding the focus on the analytics, some pundits point out the 2002 A’s – despite losing three key players – still had solid talent.
Despite the tremendous success of the 2002 season, Oakland's streak came to an end with a 6-0 loss to the Minnesota Twins on September 6. The A's continued to play well down the stretch, as evidenced by their final record of 103-59. They eventually lost [again] to the Minnesota Twins in the 2002 AL Division Series playoff.
Over the years, the moneyball theory has had a lasting legacy in baseball, allowing teams with significantly lower budgets to choose players that would allow them to successfully compete with big-market teams. Turns out this premise, turned baseball's hidebound conventions about analyzing player capability, on its head.
According to a 2013 article on MLB.com, “Moneyball has played a role in 15 of 30 teams getting into at least one postseason series—not a Wild Card Game, but a postseason series—the last three years. Moneyball may also be why nine franchises have won the World Series the last 13 seasons.”
Back to Paul DePodesta… A dynamic leader in his own right, Paul returned to the NFL as the Chief Strategy Officer for the Cleveland Browns in 2016. DePodesta also became a part-time assistant professor of bioinformatics at the Scripps Translational Science Institute. He works with the institute’s analytics team, led by Ali Torkomani, an associate professor and director of genome informatics. Examples of the team’s previous work includes The Molecular Autopsy Study, which attempts to identify genes responsible for sudden and unexplained deaths through analysis of DNA of those whose deaths cannot be medically explained currently, and The GIRAFFE Study, which attempts to identify genetic mutations associated with atrial fibrillation. “Paul brings a valuable outsider’s perspective to medicine that will help make the field more precise and more predictive through the analysis of the vast amount of individualized data now being collected through genetic testing, wireless sensors, and other technologies,” said Eric Topol, the director of the Scripps Translational Science Institute. “We are excited to have him work with our informatics data scientists to jumpstart the ‘Moneyball’ of medicine.”
“In disciplines as disparate as baseball, financial services, trucking and retail, people are realizing the power of data to help make better decisions,” DePodesta said. “Medicine is just beginning to explore this opportunity, but it faces many of the same barriers that existed in those other sectors—deeply held traditions, monolithic organizational and operational structures, and a psychological resistance to change.”
The story of the 2002 Oakland Athletics should resonate with data scientists. While not exactly in the realm of ‘big data’, it speaks to the advantage of making data science part of an organization’s DNA, but just as importantly, it highlights how a big idea about any type of data can translate to serious business gains. It is a pure application of descriptive and predictive analytics.
According to some health executives, that approach can work in healthcare, too. By incorporating factors like risk adjustment and utilization rates, organizations can essentially create their own population health box scores to improve clinical care and lower costs. Technologies such as HL7's FHIR can do this across a region to enhance pubic health.
Sabermetrics is used to empirically analyze all the relevant metrics of player performance to answer objective questions. This sounds a lot like what a hospital administrator wants in terms of measuring provider/hospital performance. But instead of a competition for wins, wins are recognized when patient outcomes are optimized at the lowest cost. This is a very complex compendium of clinical services and expertise. Now Let's Play Ball!