Walmart forecasting competition - post game analysis (with Rafael de Rezende)

00:08 Introduction
00:24 Rafael, perhaps you could start by telling us a little more about yourself and your background?
01:14 Today we are talking about a recent Kaggle Forecasting competition which focused on estimating the uncertainty distribution of Walmart sales. What was the key challenge here?
04:29 The nice idea behind Kaggle is that it encourages collaborative work and competition with other data scientists and machine learning engineers - almost like a sports event. Who were you against? What were the key challenges that you faced?
07:01 Joannes, what are your thoughts on Kaggle from a company perspective?
10:30 Who was part of the Lokad’s team?
11:43 What type of approach did you take? How did it vary from the methods of your competitors?
14:24 Joannes, what are your thoughts on the approach that Rafael’s team took?
20:23 Rafael, what are you thoughts now that the competition is over?
21:25 Is there something you noticed that the team did and that you think will be particularly useful for the future?
22:53 What’s next? More competitions on the horizon?


A forecasting competition based on Walmart store data ended June 2020. Lokad ranked 6th out of 909 teams. In this episode, we have a closer look at this unusual sporting event, and what it takes to win such a competition. We are joined by Rafael de Rezende who was leading the Lokad team in this competition.

Rafael has a background in supply chain and is an industrial engineer, at Lokad he is our Head of Product Development and takes on the “geeky” topics, such as time series forecasting, solutions for multi-echelon businesses and Lokad’s MOQ solver.

Kaggle is a fairly large organization - recently acquired by Google - that organises machine learning competitions that involve forecasting or predictions. In order to set up a competition, two things are required: a large data-set and large prices.

It’s a very competitive environment, with an extremely specific discipline and could almost be described as a high-level sport, filled with hundreds of people who are pros at winning these kind of competitions. Often, the teams that win are rarely made up of researchers, but instead people who are very good at figuring out what is state-of-the-art within the latest algorithms. For this competition, Lokad’s team were definitely Kaggle newbies, having never been in an environment of such a magnitude, going against almost 1000 other teams.

However, despite the competitiveness Kaggle, also has a very collaborative environment, with participants working hard to help one another to improve. But this collaboration can be a double-edged sword, as an interesting observation about this collaborative notion is that it hinders creativity. For example, a team publishes what they’re doing and other teams abandon what they’re doing to follow their process, creating a “flock effect”.

To wrap things up, we discuss how this was the first competition of this nature to use quantile forecasts to evaluate the results. In addition, we go into more detail on how Lokad’s team managed to score 6th place out of 909 teams in such a brutal competition and what they did to stand out from the pack, as well as the real-world applicability of the forecasting solutions found here.