00:00:05 Seasonal sales forecasting.
00:00:35 Core seasonality patterns explanation.
00:02:23 Struggles with seasonality forecasting, product lifespan.
00:04:33 Further issues with seasonality explored.
00:06:53 Overcoming issues with missing historical data.
00:07:16 Solution: consider product collections for seasonality.
00:08:49 Dynamic time warping.
00:09:06 Transition to deep learning for forecasting.
00:10:02 Dynamic time warping in demand forecasting.
00:12:09 Handling historical data spikes and promotions.
00:13:01 Need for accurate data and machine learning.
00:16:01 Deep learning’s popularity in machine learning.
00:16:30 Limitations of past data for future predictions.
00:17:40 Seasonality in human affairs for modeling.
00:19:42 Improving forecasting with machine learning.
00:21:39 Benefits of machine learning over time.

Summary

In an interview, Joannes Vermorel is discussing the challenges of incorporating seasonality into sales forecasting. Seasonality, defined as cyclical sales patterns influenced by time of year, week, and day, contributes to demand fluctuations. Secondary cyclicities, such as the “paycheck effect” and “quasi seasonality”, also hold significance. Understanding the baseline of demand, trends, and statistical noise in data is aiding accurate forecasting. Challenges include the short lifespan of many products and yearly variability of events like Christmas. Vermorel is suggesting the adoption of a collective perspective on similar products to predict new product performance and the use of machine learning setups for automatic maintenance of seasonal buckets.

Extended Summary

Kieran Chandler is interviewing Joannes Vermorel about the concept of seasonality in sales forecasting and its associated challenges.

Seasonality is one of the central patterns used to enhance the accuracy of sales forecasts, but it’s frequently misunderstood and improperly applied. Vermorel identifies seasonality as a major cyclicity in sales patterns. He breaks down these cyclicities into three key types: the time of the year, the day of the week, and the time of the day, which all contribute to variations in consumer demand.

Vermorel also discusses secondary cyclicities such as the “paycheck effect”, which is the monthly cycle of people receiving their paychecks and its resulting impact on their purchasing behavior. Additionally, he introduces “mineral cyclicities” or “quasi seasonality”, which includes events like Easter, Ramadan, and Chinese New Year, which are recurring but don’t always fall on the same date each year according to the Gregorian calendar.

Understanding the baseline of demand, trends, and the level of statistical noise in the data is another crucial aspect that Vermorel highlights. Recognizing these patterns aids in differentiating between random noise and authentic statistical patterns, which is essential for effective forecasting.

Despite the seeming simplicity of seasonality, Vermorel talks about why people often grapple with it. The first challenge is the short lifespan of most products on the market. For instance, the average lifetime of a Fast-Moving Consumer Goods (FMCG) product is between three to five years. This implies that a large part of a product catalog might not even exist for a full year, complicating the application of seasonality.

Vermorel introduces the concept of “time distortion” in seasonality, referring to shifts in seasonal patterns due to various factors. Changes in weather patterns can encourage consumers to start their winter shopping earlier or later than usual. The political climate and consumer sentiment can also influence the timing of purchases, such as Christmas shopping. These shifts can warp the typical seasonal patterns, making prediction challenging.

Next, he talks about the interference of additional patterns with seasonality. Vermorel emphasizes the importance of distinguishing between these external influences and genuine seasonal trends. He delves into how to address the challenges caused by the lack of historical data for a significant part of a catalog. He proposes a collective perspective, identifying common seasonal patterns across a range of similar products. This shared pattern can be applied to new products, enabling seasonality forecasting from day one, even for products without sales history.

Responding to a question about forecasting in uncertain weather conditions, Vermorel presents the concept of dynamic time warping. However, he admits the implementation of dynamic time warping is highly complex, leading to Lokad, like the voice recognition community, switching to deep learning for their forecasting engine.

To differentiate between seasonal demand and promotional spikes, Vermorel proposes a parametric decomposition approach. However, he recognizes this method’s limitations as historical data is usually intertwined, making it difficult to learn patterns independently.

To address this, Vermorel proposes two strategies. First, record all events that distort the observation of demand, including promotions, stockouts, and other factors affecting sales. Second, use a machine learning setup to learn all these patterns jointly, rather than trying to learn them separately. He suggests that this is why deep learning has become popular in the machine learning community.

Vermorel acknowledges the challenge of having full confidence in the results due to data distortions. However, he maintains that most markets have a lot of inertia, and the recent past is a reasonable approximation of the near future.

He also underscores the importance of seasonality in demand forecasting, acknowledging its stability yet emphasizing its non-fixity. This means that seasonality can evolve from one year to the next, and a good statistical model should be

able to predict this average rate of evolution.

As the conversation concludes, Vermorel recommends companies looking to improve their forecasting approaches to better account for seasonality transition to machine learning setups, moving away from old-school methods requiring manual crafting of seasonality profiles. Machine learning enables automatic maintenance of seasonal buckets, resulting in improved accuracy over time.

Full Transcript

Kieran Chandler: Joannes, we often talk about seasonality as being one of the core patterns that are applied to improve the accuracy of forecasts. But what do we actually mean by this? What are the core patterns?

Joannes Vermorel: Seasonality is one of the main cyclical patterns. There are three major cycles, which are the time of the year, the day of the week, and the time of the day. These are the three main ones. Then you have a couple of secondary cycles such as the paycheck effect, which is basically a monthly cyclical pattern where you have effects with people getting paid or things happening at the beginning or end of the month. Then, you have what I would refer to as a quasi-seasonality, things such as Easter, Ramadan, Chinese New Year. These events happen every year but not exactly at the same time of the year, at least according to the Gregorian calendar. We have all these cyclical patterns, and seasonality is a very important one of them. Besides cycles, we have the baseline for demand, the trend of how much things are growing or shrinking over time, and finally, the amount of statistical noise, which is also a very big pattern. You need to really understand the variability and determine if what you’re seeing is noise or an actual statistical pattern.

Kieran Chandler: Let’s talk about seasonality itself then. Why is it something that people struggle with? It seems fairly obvious to know that come Christmas, sales are going to spike and around holiday seasons, you’re going to see different sales for different things. So, why is it something people struggle with?

Joannes Vermorel: Seasonality is difficult to get right. Let’s review the challenges. The first challenge is that most products are short-lived on the market. The average lifetime of an FMCG product is something between three and five years. This means if you’re selling products that on average last three years on the market, it means that half of your products have only one and a half years of existence. You even have one third of your products that have less than a year. So if you want to have a naive seasonal model, a statistical model where you say the sales that I’m going to do for next Christmas are going to be similar to what I did for the previous Christmas, well, it turns out that for one third of your products, there wasn’t a last Christmas. You don’t have any reference. So, the first challenge is that because you have product novelty in the market, you end up with a significant portion of your catalog that doesn’t have a year’s existence. Let’s say, for example, for car parts, which tend to have a longer lifespan of about six years, even then, one sixth of your product don’t even have a year of sales. So, you will not be able to apply this nice seasonality to a sizable portion of your catalog. That’s one of the problems we face with seasonality.

Kieran Chandler: What are some of the other problems that we face with seasonality?

Joannes Vermorel: Another problem is that seasonality is not necessarily exactly the same from one year to another. Yes, Christmas is on the 25th of December every year, but if you go from one year to another, the Christmas season, from a merchant’s perspective.

Kieran Chandler: One year, it started to get very cold as early as October, leading people to begin their winter shopping earlier in the season. Conversely, sometimes temperatures remained mild for longer, so people started buying later. Sometimes, even the political climate affects shopping habits. For example, if people are very worried about the future, they might decide to postpone Christmas purchases until the last minute. These are just some of the factors that can cause what we typically refer to as time distortion. You still have the Christmas peak, for instance, but the start of the Christmas season might vary by a few weeks from one year to the next. This kind of distortion can occur with all seasonal patterns. Would you agree?

Joannes Vermorel: Absolutely, and that’s probably the second class of problems. The third class is that your seasonality is unfortunately affected by other patterns. What if, for example, last year in September you had a massive promotion that you’re not repeating this year? You shouldn’t confuse the impact of that promotion with seasonality. You need to disentangle seasonality from other patterns. And all these changes combined make seasonality much more elusive than if you were just dealing with a naive situation where you have pure seasonality for a product that is long-lived, stable, and everything works smoothly.

Kieran Chandler: Let’s talk about overcoming some of these problems then. If you don’t have historical data for, say, a third of your catalog, can you actually use seasonality in forecasting for those products?

Joannes Vermorel: Yes, but you need to start looking at your products as a collection instead of examining products one by one. What I mean is, let’s say you have winter boots. There will likely be a shared seasonal pattern for all those products. You can reasonably assume that shared seasonality exists. So if you can identify this shared seasonality, it doesn’t matter if you’re selling different winter boots this year compared to last year. You could say, “I know that for winter boots in aggregate, I have this pattern.” Thus, when I introduce a new product to the market, for which I have zero sales history, I can apply a seasonal pattern from day one. I can recycle something that was based on an aggregate for all the winter boots. The key insight is to look at the breadth of products instead of the depth of history.

Kieran Chandler: If you’re forecasting months in advance, how would you know whether summer is going to be elongated by an extra month? How can you forecast for that?

Joannes Vermorel: This is where we go into the next stage, which is how we deal with all the distortions. It can be done analytically, using a technique known as dynamic time warping. If people are interested, they can look it up on Wikipedia. Implementing dynamic time warping is exceedingly complicated software-wise. About 10 years ago, the voice recognition community, who were doing machine learning for voice recognition, had to deal with dynamic time warping. They found it too complicated, gave up, and moved to deep learning. Interestingly, Lokad did the exact same thing for seasonality. We implemented dynamic time warping in our forecasting engine, but eventually replaced the whole thing with deep learning in the latest generation of our forecasting engines.

Kieran Chandler: It sounds very sci-fi, time warping. Could you give us a quick overview?

Joannes Vermorel: The overview is that you know that your season is probably going to end soon, but you don’t know exactly when. However, you can account for the variability.

Kieran Chandler: We have reached the end of the season, so I know that the demand will be ongoing, but much lower. However, what does it look like - a seasonality analysis without this kind of dynamic time warping? What happens if the season ends early?

Joannes Vermorel: When the season ends early, you have your new baseline or level, and according to your seasonality profile, the demand for the first week of December is supposed to be like half of the last week of August. That’s your static, rigid profile. But the problem is, if the summer ended early and by the last week of August you are already out of the summer season, you do not want to divide your demand again by factor 2 on the first week of September. This is because from the last week of August to the first week of September, demand was dropping, and you are in a situation where demand has already dropped.

Dynamic time warping is a technique that helps to avoid compounding errors where the season starts late or early and you apply your seasonality profile twice. You end up having a first drop of demand and then you reapply your seasonal pattern that seems to indicate that there would be a further drop, or the opposite - demand has already spiked to a new plateau because the season started early, and then you reapply this factor on top. Dynamic time-warping does not predict any better transition between seasons, but it lets you avoid these compounding errors.

Kieran Chandler: So, the idea is that you’re going to have the same sort of demand profile, but it’s either compressed or elongated depending on the season?

Joannes Vermorel: Exactly.

Kieran Chandler: And then, the last thing we spoke about was historical spikes in data, and not transferring those spikes for seasonality for this year. How does that actually work? How do you not take those into account? How do you know what is seasonal demand and what is just a spike due to a promotion or something like that?

Joannes Vermorel: The classic approach for time series forecasting is a parametric decomposition. You have your history of demand and you would say out of this demand, this amount is basically the baseline, this amount is the seasonality factor, this amount can be explained with the trend, etc. This approach is weak in the sense that you want to be able to learn all your patterns independently, but the reality is that in your historical data, everything is completely mixed together.

There are at least two angles to this problem. First, you need to properly record in your historical data your promotions and your stock outs and all other events that were impacting not the demand, but your observation of the demand, which are the sales. The sales do not equate to demand. For example, when you want to forecast, you typically want to say, “I want to forecast the demand for the regular price, I don’t want to forecast the demand for a super low promotional price”. But what you have in your history is your sales and the sales are distorted because of the promotions and possibly stock outs and other things.

So the first thing is that you need to record all the factors that were distorting your perception of the demand, which is more tricky than it sounds. Very few companies have a very precise recording of all the events that influenced their sales. That can be things such as if you’re an e-commerce company, remembering if a product was part of the homepage or was prominently featured in a section, if a product was part of a newsletter, if there were price movements for the products, and even if you have competitive intelligence, remember the price of your competitors’ products.

Kieran Chandler: You mentioned competitors might have caused a drop in demand, not because you had a stock out, but because they were doing a massive promotion and you did not align your price with theirs. That would explain a drop in demand.

Joannes Vermorel: Yes, that’s one part of the recipe. The second part is the need for a machine learning setup where you can jointly learn all these patterns. Modern machine learning doesn’t attempt to learn statistical patterns in isolation. You don’t learn seasonality first, then trend, then promotional effect. Instead, you have a model that tries to capture all these patterns at once. This means the model needs the capacity to learn a wide variety of patterns. It requires a very expressive model. That’s why many people in the machine learning community have embraced deep learning. It’s an approach that can generate a model capable of capturing a wide variety of patterns.

Kieran Chandler: You keep mentioning distortions. It seems like there are so many possible ways for data to be distorted, making seasonality very difficult to implement. Can we really have full confidence in the results given the potential for distortions in the data?

Joannes Vermorel: Absolutely, and the question is, how much does the past let you predict the future? That’s the core assumption behind statistical demand forecasting. Unfortunately, the future can’t always be predicted by the past. However, this isn’t completely true when we talk about forecasts that are a few months ahead. Most markets have a lot of inertia. The recent past is still a reasonable approximation of the near future. That’s what we leverage.

Seasonal patterns are relatively strong. They’re seen in all human affairs. Everything follows this yearly cycle, and it’s probably been that way for thousands of years. Humans are creatures of habit, and these strong habits are reflected in nearly every time series that represents human affairs.

For example, the number of airplane passengers will follow a yearly curve. The amount of milk bought on any given day of the year will have a seasonality curve. The same goes for the purchase of video games, electricity consumption, and so on.

These patterns have been very stable and can be leveraged. But there is always some irreducible uncertainty about the future.

Kieran Chandler: We’ve been discussing the probabilistic approach in supply chain optimization. From what I understand, this approach can handle seasonality as well. Can you elaborate on how it deals with changes in seasonal patterns from one year to another?

Joannes Vermorel: Certainly. Our statistical model is designed to adapt to evolving seasonality. Although we may not be able to predict exact changes, we can forecast the average rate of evolution. This allows us to introduce the right amount of uncertainty into our forecasts. It’s important to remember that these forecasts are probability distributions that incorporate the seasonality pattern. However, this isn’t a perfect representation. There is some degree of fuzziness in terms of amplitude. We consider how much uplift there will be during peak seasons, as well as the timing of these peaks. It’s really a combination of these two factors.

Kieran Chandler: That makes a lot of sense. As we wrap things up, could you tell us what steps companies can take to improve their approach to forecasting, especially with regards to seasonality?

Joannes Vermorel: Well, the first thing that comes to mind is for them to get in touch with us at Lokad! Jokes aside, I think the most significant step would be transitioning to machine learning setups. The traditional approach to managing seasonality involves manually creating seasonality profiles. Essentially, you group products into buckets based on shared seasonality. This method, however, relies heavily on human input and is difficult to maintain over time.

Kieran Chandler: Could you expand on the problems with this traditional approach?

Joannes Vermorel: Of course. The main problem isn’t necessarily that the expert’s assumptions are incorrect. A supply chain expert might indeed accurately assess that a group of products share the same seasonality. The issue is that over time, maintaining this system becomes a nightmare. Every time you launch a new product, you need to ensure it lands in the right bucket. While the initial clustering might be good, it tends to degenerate and become inefficient over time.

Kieran Chandler: So, how does transitioning to machine learning help with this?

Joannes Vermorel: With a machine learning setup, you can maintain your seasonal buckets automatically. This will significantly improve your accuracy because your buckets, even if not perfect, won’t degenerate over time. They are regenerated every time you need them.

Kieran Chandler: That’s very insightful. Unfortunately, we have to wrap things up for today. Thank you for your time, Joannes.

Joannes Vermorel: You’re welcome, it was my pleasure.

Kieran Chandler: That’s all for this week. We’ll be back again next week with another episode. Until then, thank you all for watching.