00:00:04 Challenge in forecasting new product launches.
00:00:35 Problems with traditional forecasting approaches.
00:02:00 Time series forecasting critique for new products.
00:03:45 Alternative new product forecasting approaches.
00:07:22 Deep learning and attribute analysis in forecasting.
00:09:06 Demand influence: product attributes like color, size.
00:11:23 Deep learning for diverse product features forecasting.
00:11:58 Uncertainties in new product launches.
00:13:38 Benefits of probabilistic forecasts in risk handling.
00:14:44 New product launch implications on existing ones.
00:16:01 Product cannibalization concept, fashion industry tactics.
00:18:02 Price forecasting sensitivity and optimal price complications.
00:21:20 Overfitting in statistical modeling, price prediction impacts.
00:21:58 Advancements in product forecasting technology.
00:24:33 Cannibalization in product launches, customer loyalty research.

Summary

In an interview, Joannes Vermorel, Lokad’s founder, discusses the challenges of forecasting new product demand. Traditional time series forecasting fails for new products due to the lack of historical data. Vermorel criticizes conventional demand planning software for its heavy reliance on past data. For novel products, he suggests conducting market surveys. When new products are variations of existing ones, Vermorel proposes the use of product attributes to anticipate demand. He highlights uncertainty and cannibalization as significant challenges in new product forecasting, advocating a probabilistic approach. Vermorel indicates Lokad’s future direction includes leveraging the customer’s “social network” and transitioning from deep learning to differentiable programming.

Extended Summary

In an ongoing interview, Kieran Chandler and Joannes Vermorel, the founder of Lokad, are exploring the potential and obstacles of forecasting demand for newly launched products. Even though it’s inherently tough and unpredictable, Vermorel maintains that it’s feasible to forecast for new products, though it’s demanding and requires a significant effort.

The interview kicks off by recognizing the importance of forecasting product launches, which allows companies to take advantage of the usual spike in demand following the release of a new product. However, Vermorel points out that the standard forecasting method, time series forecasting, falls short when applied to new products.

Time series forecasting extends historical trends into the future, an approach similar to predicting weather patterns. For instance, a basic forecasting model might average last week’s sales to predict next week’s sales. However, with a new product, there’s no previous sales data to base predictions on, making this methodology insufficient.

Vermorel continues to critique conventional demand planning software that mostly depends on “glorified forms of moving averages,” with models like exponential smoothing, linear regression, and ARIMA acting as sophisticated moving averages. While these models might consider seasonal coefficients and varying averaging windows, they are still heavily reliant on past data, rendering them unfit for new products.

Addressing the issue of forecasting for completely new products, like the first iPhone, Vermorel suggests that statistical forecasts need a relevant set of past observations. Without similar previous products, generating a statistically founded forecast is almost impossible.

One strategy for such unique scenarios might involve surveying the market and measuring consumer opinions. Despite the high cost and time involved, Vermorel believes that for major launches like the iPhone, which likely cost hundreds of millions in research and development, the expense of detailed market research could be justified.

Vermorel begins by outlining the challenges faced when launching multiple new products. He argues that many new products aren’t completely novel but are often variations of existing products, simplifying their future sales estimations. He uses the fashion industry as an example, noting that although they introduce new collections each season, they usually consist of familiar items, such as shirts or shoes, with different features.

For forecasting new product sales, Vermorel proposes comparing the new product to existing ones using shared attributes, a method he argues is significantly different from conventional forecasting software. These attributes, like size and color, can offer valuable insights. For example, extreme sizes or specific colors might not sell as well as more common ones.

The typical method of forecasting new product sales asks a supply chain manager to establish a link between a new and old product. Vermorel criticizes this process as being tedious, especially when launching thousands of new products. It requires manually inspecting a vast archive of past launches. Any incorrect mapping based purely on intuition can make the forecast entirely off-target.

Vermorel proposes a more intelligent approach, using product attributes to predict demand. Depending on the industry, these attributes could vary from size, color, price point, and patterns for fashion items to compatible cartridges and other features for consumer electronics. This variety calls for a statistical algorithm capable of managing a wide range of features, where modern machine learning technologies like deep learning might be beneficial.

The conversation then transitions to the inherent uncertainty associated with new product forecasts. Vermorel acknowledges that the level of uncertainty during a product launch is usually quite high, and suggests a probabilistic forecast might be more helpful in such a case. While such a forecast might not be very precise, the advantage is in recognizing and accounting for the associated risks as the forecast provides a range of potential futures.

Joannes draws attention to the problem of cannibalization, particularly when launching new products, since they often take market share from existing ones. This issue is prevalent in the fashion industry, where new collections

can cannibalize sales from older ones. To counter this, the industry usually pushes out old collections through sales before introducing new ones.

The discussion carries on with a hypothetical situation. If Apple were to launch an iPhone in different colors simultaneously, offering a variety of choices might boost sales, but it could also lead to internal competition between the products.

In relation to price sensitivity, Kieran asks Joannes if it’s possible to predict demand based on price changes. Joannes confirms this, but explains that it adds more complexity to the issue. This discussion leads into the realm of reinforcement learning and the need for a careful balance to avoid overfitting, when a model performs well on known data but poorly on unknown data.

When changing price points, for instance, a brand might be entering unknown territory where past data becomes less relevant. Brands typically adjust their prices slowly, using this process as a learning opportunity. Joannes clarifies that overfitting can become an issue if brands start using their forecasting models to set their prices. To avoid this, it’s crucial not to over-depend on your forecasting model’s output when deciding the price.

Changing direction, Kieran inquires about the near future of product forecasting. Joannes discloses that Lokad has recently introduced a new forecasting engine based on deep learning, which, according to their benchmark, represents a significant upgrade and has reduced errors in new product forecasting by over 20%.

Joannes also emphasizes the significance of understanding the concept of product cannibalization when launching new products. While retailers might be tempted to think that launching many new products will significantly increase sales, Joannes reminds us that these new products will be competing for the same customer base.

The conversation ends with Joannes sharing their ongoing research to leverage the “social network” of customers, which refers to their client base’s consumption patterns. This effort is driven by the understanding that new products tend to first gain traction within the existing customer base. They are transitioning from deep learning to differentiable programming, considered the successor of deep learning, to handle this complex task effectively.

Full Transcript

Kieran Chandler: Today we’re going to be discussing if forecasting new products is actually possible and also understanding how much confidence we can have in these results. Joannes, unfortunately, we don’t have a crystal ball here in Lokad. So, can we actually forecast for new products?

Joannes Vermorel: The short answer is yes, the longer answer is it’s difficult, it requires effort, and the proper math. The bottom line is that when people think about forecasting, they often envision a specific type of forecast, something like time series forecasting. They think about it like temperature forecasting for the weather. Basically, you have a curve, what you’ve observed in the past, and you want to stretch this curve into the future to get your forecast. The simplest forecast you can make is just a moving average. For example, what will my sales be next week? If I average my sales from last week, it gives me a rough ballpark. This approach is naive but kind of works. However, for new products, it completely falls apart.

Kieran Chandler: Why does this time series approach not work then? Why does it fall apart?

Joannes Vermorel: It’s because you don’t have anything to average anymore. You want to go back in the past and forecast the future by averaging what you had. But if you want to forecast the sales of a product that hasn’t been sold yet, you don’t have any data. If you just forecast zero because you sold zero last week, it doesn’t make any sense. You launch a product, hopefully, you will sell a few units. The traditional moving average algorithm just does not work. It’s interesting because most of the early demand planning software out there were all relying on glorified forms of moving averages. There are many statistical models that come with fancy names but are nothing but glorified moving averages. Exponential smoothing is a moving average of some kind, linear regression is barely better than a moving average.

Kieran Chandler: So how can we actually work out a forecast for something that’s completely new? If we take, like the example of the iPhone before it was released, nothing like it existed before. Can we actually forecast for that?

Joannes Vermorel: If you want to make a forecast or at least a statistical forecast, you need to have a relevant set of observations from the past. You’re still trying to project the future by looking at the rearview mirror, but you need to have something to look at. If you have a product that is completely unique, then from a statistical perspective, it’s game over. You can’t work statistically. What you can do at best is to survey the market and take opinions on whether people would buy it or not. Obviously, this is a very expensive process. Apple could do it for the iPhone because they had probably invested hundreds of millions of dollars in research and development in bringing the iPhone to market. So, they could still afford to spend a few hundred or thousands of dollars on smart surveying.

Kieran Chandler: So, for ballpark estimation of how much you’re going to sell, obviously, if you’re launching many new products, you cannot afford to have such a tedious process. The good news is, if you’re launching a lot of stuff, what are the odds that everything you’re launching is completely new? In practice, almost zero, because you can’t launch hundreds of products that are completely unique. If you’re launching hundreds of products each year, chances are they’re all variations of the same sort of theme or topic or style.

Joannes Vermorel: Exactly, you can relate an existing product to some feature of the new product. If you’re in fashion, for instance, with every single collection, you’ll have new shirts and new shoes. But, they’re still shirts or shoes, and these products have characteristics, like sizes. Even if you don’t know how many units you’re going to sell, you understand that extreme sizes aren’t going to sell as much as the predominant size. If you want to have a statistical model, you just need to leverage this insight. If you’re forecasting how much future demand there’ll be for products you’re about to launch, you need to look at all the previous launches that you’ve done in the past and relate the product being launched to the old ones through their attributes.

Kieran Chandler: That’s interesting, especially the attribute approach which seems to be quite different from what’s typically done in most forecasting software. Now, with advancements in deep learning technology, is that the primary method used to look at these attributes in more detail?

Joannes Vermorel: Yes, let’s compare it to the classical perspective on forecasting new products. Initially, you have moving average models, that’s all. To forecast a new product, you’d ask the supply chain manager to create a link between a new product and an old one. This traditional approach required a human, a supply chain manager, to answer the question, “Which product is most similar to the one that you are about to launch?” so that we can pretend that this product has already been selling. Then you can go back to your moving average approach because suddenly, you have a time series; you have past sales for the product.

However, if you’re launching a thousand new products and you have to take this mapping decision for every new product you’re about to launch, and in your history, you’ve probably launched tens of thousands of products, then the process is incredibly tedious. You’d have to investigate an entire archive of past launches manually to do this mapping. If you do the mapping incorrectly, relying only on your intuition, then your forecast is completely useless.

A smarter approach is to leverage the attributes and consider what attributes govern the amount of demand you can expect. Going back to fashion, sizes are a very clear indicator, but color is also a significant factor. For example, if you have children’s clothes, parents are unlikely to buy exceedingly white clothes because children are going to get them dirty. Therefore, pristine white for children tends to be not such a great color. However, for business shirts, the dominant colors are likely to be white, light blue, and light pink.

Kieran Chandler: If you want to have, say, bright yellow for business shirts, it’s likely to be a vanishingly small percentage of your sales. These types of observations can be made intuitively, but the relationships can be very subtle. Attributes can be incredibly diverse – size, color, price points, patterns on the clothes. Or let’s take consumer electronics, if you want to forecast the demand for the next printer, you have a wide set of characteristics to consider – compatible cartridges, other features – it’s super diverse.

Joannes Vermorel: This is where we face a situation where you need an algorithm that can cope with this overwhelming diversity. That’s where deep learning, the modern flavor of machine learning, comes into play. Deep learning algorithms are especially good at dealing with an incredibly diverse set of features, which can even include plain text descriptions of the products.

Kieran Chandler: There’s a huge amount of variability, a lot to consider. Can we have confidence in the results of forecasts for new products?

Joannes Vermorel: That’s precisely the point that probabilistic forecasting tries to address. The amount of uncertainty when launching a new product is typically very high. If it was easy to forecast a new product, it probably wouldn’t be new, it would be a simple replacement of a near-perfect substitute for one of your existing products. In that case, the manual mapping I described earlier is probably a good fit. But if you’re launching something even slightly new, that doesn’t align completely with what you’re selling before, there’s an irreducible uncertainty. But that’s okay. Your competitors face the same challenge. To out-compete, all you need to do is forecast better than them. You don’t have a crystal ball, but chances are, they don’t either.

Kieran Chandler: With this irreducible uncertainty, what kind of expectations can we have?

Joannes Vermorel: That’s where the benefit of probabilistic forecasting comes in. Yes, your forecast will be inaccurate, but if you’re making a probabilistic forecast, you’re fully aware of that inaccuracy. What you’ll see in practice is a distribution spread over many possible futures, so that your decisions factor in the risk of having demand exceedingly above or below your forecast. It’s all about considering all possibilities.

Kieran Chandler: If we now look at things from a Lokad perspective, we’re looking at a range of probabilities for the entire business, the whole catalog. If one individual item could sell a huge amount or very little, wouldn’t that change the results for all our forecasts?

Joannes Vermorel: Absolutely, it does, and that’s what makes the problem more challenging.

Kieran Chandler: It sounds quite complicated. As I understand it, if you want to produce a statistical forecast for new products, you need to examine past launches and match their attributes to find relevant products. This process seems like the concept of ‘matching,’ but entirely automated. However, launching a new product displaces demand on your existing products. Am I correct?

Joannes Vermorel: Indeed, every product that you launch is likely to cannibalize your existing sales. Take, for instance, a fashion retailer who introduces a new type of shirt. They were probably already selling shirts, so when a new, trendy design comes in, you’re not just gaining market share against your competitors. Instead, your customers might choose this new product over another one that you were already selling. This situation leads to cannibalization, which is very challenging to manage.

Kieran Chandler: It’s so challenging to manage that it’s one of the main reasons why fashion brands have collections, right?

Joannes Vermorel: Exactly. Instead of trying to solve this complex cannibalization problem, it’s easier to do a sale, flush out all the previous collections, and then start a new collection. This way, you avoid seller cannibalization between the new collection and the old one. You’ve liquidated the stock so that you don’t have two collections competing with each other at a specific point in time.

Kieran Chandler: So if you want to refine your new product forecast, it can’t be done in isolation, right? If you launch multiple products, they’ll cannibalize what you have and also cannibalize themselves.

Joannes Vermorel: Correct. For instance, if Apple decides to launch a new iPhone, they wouldn’t have the same sales if they only released one color—black, for example—versus if they allowed customers to choose from five different colors on launch day. While having more options might slightly increase the sales, there would also be a lot of cannibalization.

Kieran Chandler: You’ve mentioned sales, which is like adjusting the product’s price in response to market trends. Is there a way to forecast the sensitivity of price? Is it possible to predict the demand for a product if I lower its price?

Joannes Vermorel: Yes, but that makes the problem even more complicated. It requires moving beyond classical supervised learning to the realm of reinforcement learning or other advanced situations. Why? Because you control what you’re going to observe once you start considering the price.

For example, a fashion brand has only observed the sales pattern for the price points that they’ve practiced in the past. So, if you decide to move upmarket towards more expensive price points, you’re venturing into unknown territories where your past data isn’t very relevant. Many brands would transition gradually so that they still have a chance to learn and see.

From a statistical perspective, the challenge is that if you build a forecasting model that takes a price point as input, you can adjust the forecast with the price point. However, the danger lies in what happens if you start using the output of your forecasting model.

Kieran Chandler: So you’re talking about a model to treat the pricing. As if one could build a forecasting model, tweak the price, and perform what-if scenarios, casting the launch of the same product at different prices. This approach aims to pick the optimal price that benefits us the most. However, if this is done naively, wouldn’t it result in a particular overfitting problem?

Joannes Vermorel: Yes, indeed. If you repeat this exercise many times with minimal price variations, the price you will pick could just be a fluctuation of your forecasting model itself. In essence, you’ll amplify any overfitting problem you might have in your learning process. Overfitting is when a statistical model performs well on the data you already have, but not so good on the data you don’t have. Ironically, when making a statistical forecast, you want your model to perform well on the data that you don’t have.

Kieran Chandler: That indeed raises an interesting question about how you measure the accuracy of such a model. But we can cover that another day. Back to this pricing question though, it seems like leveraging the price in a model like this could get exceedingly complicated. And, of course, you wouldn’t want to generate a massive overfitting error through exploration of the pricing variable, right?

Joannes Vermorel: Exactly, the exploration of the pricing variable can lead to massive overfitting, which is a very tricky aspect to manage.

Kieran Chandler: It sounds like that’s a fairly complex problem to solve. As a final question, what does the near future look like in terms of forecasting new products? What advancements are there in technology that we can look forward to?

Joannes Vermorel: Well, just last December, we rolled out our new forecasting engine that is based on deep learning. According to our own benchmark, it was probably one of our most significant upgrades in terms of incremental gains in accuracy. The gain in accuracy for forecasting new products was above 20% in shrinking the error, which is quite significant. One thing we have learned with this model is the ability to leverage plain text descriptions, which can be very useful. For example, if you’re a retailer and want to forecast how many units you’re going to sell for Lego boxes. That’s a tricky problem because, for instance, Lego releases a new medieval castle every year.

Kieran Chandler: Medieval castles should not be confused with elvish castles. One is going to be oriented for boys, while one is going to be oriented for girls. However, this is subtle, and you don’t actually have all the fine print attributes to reflect that if you’re selling thousands of toys in your store.

Joannes Vermorel: Indeed, it’s all based on the attributes, but often you don’t have that much data from your supplier. And you don’t necessarily have the time to manually add or adjust it yourself. Therefore, sometimes you need a forecasting engine capable of processing the plain text description. One area we are currently working on is to leverage these details to achieve more specific, more accurate forecasts.

For product launches, it’s critical to embrace the idea of cannibalization. If you are launching more products, it doesn’t mean that your sales are going to skyrocket. All the new products you’re launching compete for the same customers you already have. So one research area we’re focusing on is leveraging our loyalty database.

Typically, if you’re a retailer, you know which client is buying what. It’s completely different from a time-series perspective where you just think of how many units have been sold per day or per week for a given product. Here, you want to consider the social network of clients that have consumed the products in the past. The idea is that when you’re launching a new product, this new product will primarily start by gaining traction within your existing customer base.

If you want to have mathematical models capable of processing your clients’ social network, you typically have to transition from deep learning to differentiable programming, which is the descendant of deep learning. That’s the point where we are now.

Kieran Chandler: That’s fascinating. We’re going to have to leave it there, but forecasting social media and forecasting loyalty are really interesting concepts. Thanks for taking the time today.

Joannes Vermorel: Thank you.

Kieran Chandler: So that’s everything for this week. If you’re having problems forecasting for new products, we’d love to hear from you. Drop us an email or leave a comment below. We’re interested in hearing about the challenges you’re facing. That’s everything for this week, but we’ll see you again next time. Goodbye for now.