The competition is about predicting the probabilities of more than 150,000 match outcomes using each team’s recent sequence of 10 matches. Your goal is to create the best prediction model. Predicting the result of a game stays an open challenge. We also added the performance of the bookmaker as a benchmark, turning their odds into probabilities. This way, the competitors are literally competing with the bookies.
Data science has been around in football for over a decade now. Today’s algorithms focus on event detection, player style, team analysis, and more to predict the results of a match. Predicting matches is a challenge. To be fair, that makes Football predictions and football itself (or sports in general) fun!
Predicting the outcome of a match is mostly (definitely not only) dependent on the current form of each team. Besides the form of the teams, you will have to consider home advantage, head-to-head records, injuries and suspensions, tactics, playing style, the importance of the match for each side, odds, and many other things. A prediction model can help make predicting a match easier.
Here is a little background information on the competition we launched on Kaggle.
How did the competition begin?
Octosports and Sportmonks wanted to give their customers the opportunity to play with the data and build a model in a competitive environment. This way, the community was able to battle each other. Over 382 teams participated and competed with each other and the bookies.
Which skills were needed?
The teams should be able to code in Python. Besides that, dealing with heterogenous and missing data is a big thing. Another roadblock was the imbalanced and noisy classes. Finally, the teams needed to use advanced machine learning techniques like sequence learning, neural networks, and more. The data set contained more than 150000 historical world football matches from 2019 to 2021, with more than 860 leagues and 9500 teams.
What did the winner use?
The winners used TensorFlow with LSTM models to try and beat the bookies and win the competition.
How to score points?
The points were calculated on the probabilities quality measure. The evaluation metric for this competition is a multinomial (log loss). This way, we measured and calculated everything fairly for all participants. The leaderboards can still be found on Kaggle.
Kaggle award
With over 382 teams participating (2,529 entries) and many different ideas and models, the competition was a big success. Kaggle loved the competition and awarded us with a prize. We became the winner of the ‘March 2022 best competition award’.
Up for the challenge?
Lastly, the competition is still accessible, and anyone can try to build a model with the data at hand. We dare you.