Skip to content

The Folly of Football Prediction

This article appeared in the DailyO, on Thursday, 14th June, 2018

With the 2018 World Cup set to kick off in Russia this summer, everyone and their mother is scrambling to make bets on the national team that will take home the title. While it’s easy to make predictions that are more accurate than an octopus (Remember Paul, the Octopus?), even the best predictions are still hurt by the very noisy process that is a football match. Consider that for the 2014 World Cup the gurus at FiveThirtyEight (One of the best data analytics companies in the world) — who went to a lot more trouble than someone pontificating at a party — could really only narrow their choices down to four teams, and even then, they put their money (so to speak) on Brazil, who didn’t even make it to the final.

Rather than come up with our predictions, like some of the bigwigs like SAP and IBM have traditionally done, we at Infinite Analytics, thought it might be more interesting to contemplate why football (or soccer to the USA) is so difficult to predict, even in the era of fancy algorithms. Or, why expecting an upset is the best prediction you can make.

At its core, a football competition is an experiment designed to rank given groups of players from best to worst, as measured by the number of goals a given team scores against its opponents. This is relatively easy to do over a large number of games: better teams should win more often, and worse teams should lose more often. But because the game is so low-scoring and the difference in quality between teams is relatively small — especially among the best ones for deciding a World Cup champion — predicting the outcome of a single match between two well-matched opponents is very hard.

As anyone who’s watched a football match knows, most games have fairly low scores. In the latest English Premier League season, for example, teams scored on average only 1.34 goals in a given match.

It’s also very hard to score a goal. Through the latest season, the best team, Manchester City, made 664 attempts at a goal, but only scored 106 points. That’s only a 16% success rate. Moreover, the difference between the best and average team isn’t that large. Through the last season, the average success rate from a goal attempt was about 11%. (Source:

This might seem like a big difference, but even though an intuitive argument we can see why it’s actually very hard to observe given the structure of a football match.

Instead of a match between two teams, suppose we’re instead playing a game where you win $10 if you can guess which of two tainted coins is more likely to land heads up, in the same way you’re effectively trying to predict which football team will score more goals in the course of a match. How many times would you want to see each coin flipped to pick one? Likely more than happen in a typical football match.

Let us represent a match between two teams with a game where you flip two coins. A goal is represented by the coin that lands heads up – So every time a coin that represents a certain team lands “Heads”, it means a goal.

Let’s turn to some statistics from the latest season of the very competitive English Premier League. In the latest batch of matches, a given team only made 12 goal attempts per game, on average.

So assuming that you are ready with your coin for your favorite team and get to flip it 12 times.

What would happen if the best team in the latest season, Manchester City, went up against a hypothetical team with an average scoring success rate?

We ran a few simulations with the scoring success rate of Manchester City (about 16%) and a hypothetical average team (11% success). Dark circles represent a goal attempted and scored; light circles represent a goal attempted but missed or blocked. Each group of 12 represents a match.

One of these teams has an average scoring record, and the other one has the scoring record of the best-ranked team in this year’s Premier League season. Can you tell which is which? How much less confident would you be if you could only look at one pair?

This analogy makes a huge number of simplifying assumptions, to be sure. Teams don’t always make exactly 12 goal attempts. One team’s goal scoring success depends on the quality of the opposing team, external factors like player fatigue or injury, or random events like a star player losing his temper and getting a yellow card.

Fortunately, this doesn’t obscure the core argument. On average, the effects of the difference in team quality should be proportional to our confidence. That is, we would expect a very uneven match to make a bad team worse (they allow more goal attempts and let more goals through), and a good team better (they can make more goal attempts, and more goals are let through). Whereas with two evenly-matched teams — such as in the knockout stages of the World Cup — these effects should balance out.

So at the World Cup, it’s relatively tricky to predict the actual champion, but relatively easy to make good predictions about the best teams. It’s rare that a bad or mediocre team makes it past the group stage, and then survives past the initial knockout rounds. It’s not at all uncommon for “upsets” to happen in the knockout rounds between the handful of excellent teams that make it that far. Brazil were considered the solid favorites in 2014, but were roundly defeated by Germany in the semi-final.

Going back to our simplified coin flip analogy, let’s consider the two best teams in the latest English Premier League seasons. The second-best team, Manchester United, had a success rate closer to 13%, versus 16% for Manchester City from earlier.

When the two rival Manchester clubs played one another in this season, it was United, not City, that came out on top by a margin of one goal. Once again, we ran a quick simulation, assuming 12 goal attempts per game, and once again, dark circles represent successful goal attempts, light circles are missed or blocked attempts.

This time it’s a little trickier, and closer to a World Cup knockout match. One team has Manchester City’s latest season scoring record, and the other has Manchester United’s. Which is which?

It’s certainly possible to do a better job predicting match outcomes than simply looking at a team’s win-loss record, or the fraction of goal attempts they make that are successful. There’s even research applying graph analysis techniques to the problem that produces good results. And indeed, these techniques probably would do a great job predicting situations where teams get to play lots of games.

When it comes to it, though, the World Cup is decided by a series of winner-takes-all matches in the knockout rounds. This noisy goal-scoring process we explored earlier still dictates outcomes. By considering player fatigue or the strength of a given team’s opposition, we may be able to state more confidently the Brazilian team is better than the English team on the merits, or in some broader sense. But that doesn’t change the fact both teams are extraordinarily good, and the mechanism for final arbitration — their potential matchup in the knockout stages — produces a very noisy, hard to predict signal.

It’s a relatively safe bet that the French will edge out the Peruvians in the group stage. But counting out the perennially disappointed English team is probably riskier than the pessimistic commentariat might suggest.

For the best odds, we predict an upset.

(For anyone playing along at home, in the first set of simulations, the red team represents Manchester City, and the blue team is a hypothetical team with the Premier League’s average scoring rate. In the second set, the purple team represents Manchester City, and the green team is Manchester United.)

How AI helped in completion of Beethoven’s unfinished 10th symphony

music lovers can rejoice as beethoven’s unfinished 10th symphony has finally seen the light of the day thanks to a brilliant semblance of artificial intelligence and human work.
know more

Delving Into The Wonder That is 3D Realism

a couple of weeks ago, the release of nike’s air shoe video made headlines. the shoe in question was not a real shoe but just a 3d ar object which would make the viewer believe it was a real shoe.
know more

The Story of Data with Kuntal Malia

searching through 1000s of options online or going to stores to find outfits you love is an endless process. stylenook eliminates this process through a hyper-personalised recommendation service
know more

AI helps in identifying those at risk of Sleep Apnea

in our last blogpost we discussed the potential role of ai in the study of sleep disorders. an ai project has found some factors that lead to sleep apnea in patients.
know more

wasim basir

marketing, board member

It’s most obvious in the digital media space, from click buys to personalized web experiences. For marketing, the AI journey has just kick-started, while in the tech sector it has been applied for a while now. We are still at an early stage where inroads are being made into AI content via chatbots and even some explanatory content creation but what will make anyone jump up and embrace it is when we will start seeing a lot of mainstream content being created by AI.

rich arnold

board member

Prior to joining Infinite Analytics, Richard served as the CFO of CrowdFlower, COO and CFO of Phoenix Technologies, as a member of the board of directors and chairman of the Audit Committee at Intellisync, and previously as CFO and executive vice president strategy and corporate development at Charles Schwab.

pravin gandhi

board member

Pravin Gandhi has over 50 years of entrepreneurial operational and investing experience in the IT industry in India. He was a founding partner of the first early stage fund India - INFINITY. Subsequently a founding partner in Seedfund I & II. With over 18 years of investing experience, he is extensively well networked in investment and entrepreneurial scene and is an active early stage angel investor in tech & impact space. Pravin holds a BS in Industrial Engineering from Cornell University, and serves on the board of several private corporations in India. He is on the board of SINE, IIT Mumbai Incubator.

Purushotham Botla

co-founder & cto

Puru has his Masters in Engineering and Management from MIT. Prior to MIT, he worked with Fidelity Investments building electronic trading products and high volume market data processing applications. He has completed his BE from VJTI, Mumbai.


Deb Roy

Executive Director, MIT Media Lab

Deb Roy is Professor of Media Arts and Sciences at MIT where he directs the MIT Center for Constructive Communication, and a Visiting Professor at Harvard Law School. He leads research in applied machine learning and human-machine interaction with applications in designing systems for learning and constructive dialogue, and for mapping and analyzing large scale media ecosystems. Deb is also co-founder and Chair of Cortico, a nonprofit social technology company that develops and operates the Local Voices Network to surface underheard voices and bridge divides.

Roy served as Executive Director of the MIT Media Lab from 2019-2021. He was co-founder and CEO of Bluefin Labs, a media analytics company that analyzed the interactions between television and social media at scale. Bluefin was acquired by Twitter in 2013, Twitter’s largest acquisition of the time. From 2013-2017 Roy served as Twitter’s Chief Media Scientist.

Erik Brynjolfsson

Board Member

Erik Brynjolfsson is the Jerry Yang and Akiko Yamazaki Professor and Senior Fellow at the Stanford Institute for Human-Centered AI (HAI), and Director of the Stanford Digital Economy Lab. He also is the Ralph Landau Senior Fellow at the Stanford Institute for Economic Policy Research (SIEPR), Professor by Courtesy at the Stanford Graduate School of Business and Stanford Department of Economics, and a Research Associate at the National Bureau of Economic Research (NBER).

Akash Bhatia

Co-Founder and CEO

Akash co-founded IA while studying for his MBA from MIT. Prior to MIT Sloan, he co-founded Zoonga. Before this, Akash was an engineer with Oracle in Silicon Valley. He has completed his M.S from University of Cincinnati and B.E from the College of Engineering, Pune.