Originally designed for rating chess players, ELO is a technique that is now widely used to rate sports teams, board game competitors, and eSports players. A team’s ELO rating is a single number that defines its strength relative to other teams. When two teams play, the winner gains ELO points from the loser. But the system is designed such that if the favorite wins, it only gains a few points from the loser, but if the lessor-ranked team pulls an upset, they gain more points (the bigger the upset, the more points). Over time, the system identifies the best teams by rewarding teams who win and penalizing those who lose. The more recent a result, the more effect it has on a team’s rating; older results eventually fade out of the rating calculation, so the rating reflects the current team strength.
The system is comparative – the difference in ELO ratings determines the relative strength of the teams. Teams with the same rating are equal in quality, and both should have a 50% chance of winning a game between them. A nice feature of ELO is that when two teams have different ratings, the difference determines the victory probability for a game between them.1In a game between Team A and Team B, Team A’s win probability is equal to:
Pr(A) = 1 / (10^(-ELODIFF/400) + 1)
Where ELODIFF is Team A’s Elo rating minus Team B’s Elo rating.For example, a team with an ELO rating 50 points above another should win against them about 57% of the time, while a team with a 150 point advantage should win about 70% of those matchups.
fivethirtyeight.com has built an ELO system for the National Football League. They went back to the start of the league in 1920 and calculated ELO ratings for all teams using the results from every NFL game ever played. The system factors in home-field advantage and size of victory. I grabbed that data and started playing with it, and here’s what I found. [Note that I used the game data starting with 1950, which I consider the start of the first “modern era” of pro football]
My first question was, how accurate is the ELO system for predicting results? I’ll compare what the ELO formula says versus actual results. For each of the 14787 games in the database, I identified the probability of the favorite winning, based on the rating difference versus their opponent. The following graph shows the expected win probability (the red line), based on the ELO formula, versus the actual results from the 14787 games (blue line).
If ELO was perfect, the blue line would exactly follow the red line. The variance you see is just random fluctuations, because we don’t have an infinite dataset. If you smooth out the data by making a trend line, the blue line exactly matches the red one. So I think it’s safe to say a couple of things: 1) ELO reflects the relative strength of the teams it rates, and 2) the win probability formula is accurate.
Now that we trust the ratings and the win probabilities, let’s look at upsets. Heading into each game, we know the ELO ratings of the two teams. From that, we can calculate the win percentage of the favorite. If the underdog wins, it’s an upset. But there’s a big difference between a 52% favorite losing and a 90% favorite losing. Just ask Homer Simpson. What I’m looking for are the biggest upsets in history – the games where the favorite had the highest expected win percentage, but they lost. I processed all 14787 games, and without further ado, here are the biggest upsets…
Well, unfortunately there must be a little more ado. It turns out that the list contains a few late-season games that were upsets in name only. A top-ranked team, who already clinched their division, rests their starters and loses to a lowly opponent. We’re not going to count those. So, given that proviso, here’s the list:
Season | Favorite (ELO Rating) | Underdog (ELO Rating) | Favorite expected win% | Point Spread | Result |
2008 | Patriots (1732) | Dolphins (1335) | 93.5% | 12.5 | Dolphins, 38-13 |
1995 | Cowboys (1728) | Redskins (1357) | 92.5% | 17.0 | Skins, 24-17 |
2002 | Steelers (1604) | Texans (1242) | 92.1% | 14.0 | Texans, 24-6 |
2020 | Rams (1625) | Jets (1266) | 92.0% | 17.5 | Jets, 23-20 |
2019 | Patriots (1681) | Dolphins (1337) | 91.3% | 17.5 | Dolphins, 27-24 |
There’s an asterisk associated with the first one on that list – 2008 is when Patriots QB Tom Brady was injured for the entire season. This early-season matchup between the 2007 Super Bowl champs and the Dolphins (who were 1-15 the season before) was judged as extremely one-sided, but ELO didn’t know that Brady was not playing. The oddsmakers obviously did, because the spread was smaller than you’d expect.
Where are the games from earlier decades? If I extend the list to 20 or 30, they start showing up. But they are less frequent, for a couple of reasons. First, there were fewer games in the old days. Each decade has more teams and longer schedules, resulting in more games. More games means the chance of a big mismatch increases. Second, the 1950s were especially competitive; the difference between the best and worst teams was pretty narrow.
What about the games that really matter – the playoffs. Here are the biggest playoff upsets.
Season | Playoff level | Favorite (ELO Raging) | Underdog (ELO rating) | Favorite expected win% | Point Spread | Result |
2011 | Division | Packers (1762) | Giants (1566) | 81.8% | 11 | Giants, 37-20 |
2007 | Super Bowl | Patriots (1849) | Giants (1602) | 81.6% | 12.5 | Giants 17-14 |
1987 | Division | 49ers (1740) | Vikings (1570) | 79.4% | 11 | Vikings, 36-24 |
1996 | Division | Broncos (1634) | Jaguars (1469) | 79.0% | 12.5 | Jags, 30-27 |
1983 | Wild Card | Cowboys (1649) | Rams (1491) | 78.3% | 8 | Rams, 24-17 |
There’s one Super Bowl on this list: the Giants knocking off the 18-0 Patriots in 2007. The next highest Super Bowl is the Patriots beating the “Greatest Show on Turf” Rams in 2000. And #3 is Joe Namath’s ‘guaranteed’ win over the Colts in 1968 (Colts were favored by 18, which is a bit much)
I know there are a few Chargers fans out there, so let’s look at their biggest regular season upsets, both for and against. On the plus side, there’s November 9, 1986, when the 1-8 Chargers (ELO 1434) travelled to Denver (ELO 1698) and somehow won 9-3. Of course you all remember that great quarterback matchup, John Elway versus….Tom Flick? Dan Fouts was injured so it fell to the immortal Flick to pick up the only win of his career. ELO says the Chargers had a 13.1% chance of winning that game; I somehow think that overstated it.
On the other side was a game from 1979. The 10-3 Chargers were cruising toward a Western Division championship when they hosted the lowly Atlanta Falcons (4-9). San Diego had an 88.6% chance of winning, but the Falcons prevailed 26-24. The only thing I remember about this game was that my boss told me he tore off his Chargers t-shirt and threw it into the fire. So it must have been a frustrating loss….
I remember both games. – Mark