This article appeared in the DailyO, on Thursday, 14th June, 2018
With the 2018 World Cup set to kick off in Russia this summer, everyone and their mother is scrambling to make bets on the national team that will take home the title. While it’s easy to make predictions that are more accurate than an octopus (Remember Paul, the Octopus?), even the best predictions are still hurt by the very noisy process that is a football match. Consider that for the 2014 World Cup the gurus at FiveThirtyEight (One of the best data analytics companies in the world) — who went to a lot more trouble than someone pontificating at a party — could really only narrow their choices down to four teams, and even then, they put their money (so to speak) on Brazil, who didn’t even make it to the final.
Rather than come up with our predictions, like some of the bigwigs like SAP and IBM have traditionally done, we at Infinite Analytics, thought it might be more interesting to contemplate why football (or soccer to the USA) is so difficult to predict, even in the era of fancy algorithms. Or, why expecting an upset is the best prediction you can make.
At its core, a football competition is an experiment designed to rank given groups of players from best to worst, as measured by the number of goals a given team scores against its opponents. This is relatively easy to do over a large number of games: better teams should win more often, and worse teams should lose more often. But because the game is so low-scoring and the difference in quality between teams is relatively small — especially among the best ones for deciding a World Cup champion — predicting the outcome of a single match between two well-matched opponents is very hard.
As anyone who’s watched a football match knows, most games have fairly low scores. In the latest English Premier League season, for example, teams scored on average only 1.34 goals in a given match.
It’s also very hard to score a goal. Through the latest season, the best team, Manchester City, made 664 attempts at a goal, but only scored 106 points. That’s only a 16% success rate. Moreover, the difference between the best and average team isn’t that large. Through the last season, the average success rate from a goal attempt was about 11%. (Source: http://www.footstats.co.uk/index.cfm?task=league_shots)
This might seem like a big difference, but even though an intuitive argument we can see why it’s actually very hard to observe given the structure of a football match.
Instead of a match between two teams, suppose we’re instead playing a game where you win $10 if you can guess which of two tainted coins is more likely to land heads up, in the same way you’re effectively trying to predict which football team will score more goals in the course of a match. How many times would you want to see each coin flipped to pick one? Likely more than happen in a typical football match.
Let us represent a match between two teams with a game where you flip two coins. A goal is represented by the coin that lands heads up – So every time a coin that represents a certain team lands “Heads”, it means a goal.
Let’s turn to some statistics from the latest season of the very competitive English Premier League. In the latest batch of matches, a given team only made 12 goal attempts per game, on average.
So assuming that you are ready with your coin for your favorite team and get to flip it 12 times.
What would happen if the best team in the latest season, Manchester City, went up against a hypothetical team with an average scoring success rate?
We ran a few simulations with the scoring success rate of Manchester City (about 16%) and a hypothetical average team (11% success). Dark circles represent a goal attempted and scored; light circles represent a goal attempted but missed or blocked. Each group of 12 represents a match.
One of these teams has an average scoring record, and the other one has the scoring record of the best-ranked team in this year’s Premier League season. Can you tell which is which? How much less confident would you be if you could only look at one pair?
This analogy makes a huge number of simplifying assumptions, to be sure. Teams don’t always make exactly 12 goal attempts. One team’s goal scoring success depends on the quality of the opposing team, external factors like player fatigue or injury, or random events like a star player losing his temper and getting a yellow card.
Fortunately, this doesn’t obscure the core argument. On average, the effects of the difference in team quality should be proportional to our confidence. That is, we would expect a very uneven match to make a bad team worse (they allow more goal attempts and let more goals through), and a good team better (they can make more goal attempts, and more goals are let through). Whereas with two evenly-matched teams — such as in the knockout stages of the World Cup — these effects should balance out.
So at the World Cup, it’s relatively tricky to predict the actual champion, but relatively easy to make good predictions about the best teams. It’s rare that a bad or mediocre team makes it past the group stage, and then survives past the initial knockout rounds. It’s not at all uncommon for “upsets” to happen in the knockout rounds between the handful of excellent teams that make it that far. Brazil were considered the solid favorites in 2014, but were roundly defeated by Germany in the semi-final.
Going back to our simplified coin flip analogy, let’s consider the two best teams in the latest English Premier League seasons. The second-best team, Manchester United, had a success rate closer to 13%, versus 16% for Manchester City from earlier.
When the two rival Manchester clubs played one another in this season, it was United, not City, that came out on top by a margin of one goal. Once again, we ran a quick simulation, assuming 12 goal attempts per game, and once again, dark circles represent successful goal attempts, light circles are missed or blocked attempts.