Data Science For Supply Chain Forecast (with Nicolas Vandeput)

November 28, 2018

guest speakers

00:00:09 Nicolas Vandeput’s introduction and background.
00:01:21 Nicolas’ book: Data Science for Supply Chain Forecast.
00:03:29 Progress in open source statistical toolkits.
00:05:36 Ease of modern forecasting tool experimentation.
00:06:17 Open source software’s broad industry impact.
00:08:03 Using Python and R for data science.
00:10:35 Impact of open-source tools on vendors.
00:13:22 Role of vendors in forecasting solutions.
00:14:26 Applying theory in production environments.
00:15:38 Moving to production: data challenges.
00:16:29 Handling supply chain data issues.
00:18:05 Data consolidation for better decisions.
00:19:56 Need for data culture in supply chain.
00:22:14 Educating for better supply chain management.
00:24:01 Optimization rule and importance of measuring.
00:24:15 Drawbacks of open source in supply chain.
00:25:37 Future role of Lokad and specialists’ value.
00:28:34 Python and open source evolution challenges.
00:31:37 Encouraging simple beginnings in data science.

Summary

Nicolas Vandeput and Joannes Vermorel discuss the transformative role of data science in supply chain forecasting. Vermorel emphasizes Lokad’s focus on providing an analytical overlay for supply chain decision-making, while addressing the challenges of rapidly evolving open-source toolkits. Vandeput reassures that his book simplifies the complexities of data science and machine learning for supply chain forecasting, encouraging readers to start with basic models and progressively add complexity. Both agree on the importance of continuous learning and adaptation in this fast-paced field. Vandeput hopes his book will empower readers to confidently apply data science in their supply chain operations, heralding a new era of quantitative, data-driven decision making.

Extended Summary

In the interview, Kieran Chandler, the host, introduces Nicolas Vandeput, a supply chain scientist with a keen interest in education, and Joannes Vermorel, the founder of Lokad. The discussion primarily revolves around Vandeput’s newly released book, “Data Science for Supply Chain Forecast.”

Vandeput begins by highlighting his passion for learning and teaching, which led him to write his book. He expresses his fascination with the burgeoning field of data science and its potential applications to supply chain management, which he identifies as a unique and complex area of business that can greatly benefit from modern data-driven techniques. He expresses that his book encapsulates these insights, focusing on how data science can be leveraged to improve forecasting in the supply chain sector.

Vandeput explains that, traditionally, supply chain forecasting was primarily based on what he calls “old school” statistical methods. However, with the advent and progression of data science, new ways of handling and interpreting data have emerged. His book seeks to prepare professionals for this new era, empowering them to leverage data science to enhance their supply chain forecasts.

Vermorel adds his perspective, emphasizing that he found Vandeput’s book to be highly valuable, particularly for supply chain managers who are yet to tap into data science. He suggests that managers should familiarize themselves with the book and encourage their teams to do the same.

Vermorel expands on the discussion by noting the significant advancements in open-source statistical toolkits over the past decade. He recalls that, around ten years ago, such toolkits, though readily available, were primarily designed for researchers and were not production-grade due to their complexity and lack of user-friendly documentation.

However, he notes that with the rise of data science over the last five to ten years, a transformation has occurred. Many university professors have begun to pay attention to the usability of these statistical packages, ensuring they are accessible to their students. This has led to an improvement in the quality of documentation, consistency in terminology across different packages, and a focus on mainstream, widely applicable techniques.

Vermorel concludes his point by acknowledging the efforts of the open-source community, with the aid of academia, in producing a range of high-quality, accessible, and user-friendly statistical packages that are now available to professionals in various industries, including supply chain management. This underlines the growing importance of data science in contemporary business practices.

The book, Vandeput noted, makes it possible for anyone with a laptop and free software to start generating sophisticated forecasts. The ready availability of open-source tools allows users to experiment and fine-tune their forecasting models for their specific needs, something which would have been far more challenging in the past.

The conversation then shifted to the broader impact of open-source software. Vermorel highlighted that almost all modern software is influenced by open-source programming, with cloud computing providers heavily relying on open-source platforms like Linux. Lokad itself, he revealed, utilizes 90% open-source software in its operations. The key game-changing aspect of modern open-source software is not just its accessibility, but the quality of its packaging and documentation.

Vandeput emphasized that it’s now easier to program in Python or R than in older frameworks like VBA or Excel macros. This easy access and usability, he said, are central messages of his book. Asked if this shift toward open-source software made him nervous, Vermorel responded that while it poses a challenge for vendors to demonstrate their value, it also acts as an enabler.

Vermorel noted that if a vendor’s sole value is implementing a semi-complicated forecasting model, they would struggle to compete with the open-source ecosystem. The real added value, he suggested, lies in providing full-scale, production-grade solutions for supply chain challenges, which involve more than just accurate forecasts.

Kieran Chandler, the host, then poses a question to Joannes Vermorel about transitioning to a production basis and the challenges that might arise. Vermorel begins by emphasizing the importance of quality data, suggesting that without it, any data-driven process may result in ‘garbage in, garbage out.’ This data quality concern extends to areas often overlooked in supply chains, such as data on promotions and stock outs. He argues that understanding the models that leverage this data can help supply chain professionals be more aware of these neglected areas.

However, Vermorel also acknowledges that the primary goal of a supply chain is not to compile an accurate historical database; that is a secondary objective. Lokad assists clients in accelerating the transition to quality data consolidation, potentially across multiple sites and systems. Understanding how this data can be exploited is important. Furthermore, Lokad aids in transforming these forecasts into supply chain decisions, extending the utility of forecasts into action.

The conversation then turns to the lack of data culture and machine learning understanding in the supply chain sector. Vandeput suggests that this is a significant obstacle for many companies. Often, supply chain projects are delayed due to insufficient data or misunderstanding of what machine learning can accomplish. This leads to a discussion about the need to change this culture and increase the importance placed on having correct data.

Vermorel suggests that education is a path forward, pointing to his and Vandeput’s efforts to evangelize the market through their respective books and other published materials. He hopes to see a new wave of supply chain practitioners approaching the field with a quantitative, engineering mindset.

Joannes Vermorel highlights the company’s vision of providing an analytical overlay to supply chain decision-making, rather than acting as an ERP or inventory movement repository. He also discusses the risks of the fast-evolving Python ecosystem, including the challenge of maintaining state-of-the-art toolkits. Nicolas Vandeput, in turn, assures that data science and machine learning in supply chain forecasting are not overly complex. His book aims to guide readers to start with simple models and gradually add complexity. Both speakers emphasize the need for continuous learning and adaptation, with Vandeput expressing hope that readers will gain confidence in their ability to apply data science in supply chain contexts.

Full Transcript

Kieran Chandler: Hi, this week on Lokad TV, we’re joined by Nicolas Vandeput, a supply chain scientist who specializes in demand forecasting. As well as having a strong technical background managing a multinational supply chain, Nicolas also has a keen interest in education. He spent time lecturing at the University of Brussels and has just released his book entitled “Data Science for Supply Chain Forecasts.” So, Nicolas, thanks very much for coming in today. As always, it’s really nice to get to know a little bit about our guests. Perhaps you could kick things off by explaining a bit about yourself and telling us how you got involved in the world of supply chains.

Nicolas Vandeput: Yes, as you said, I’m someone very interested in education and learning. I like to spend my time reading books, articles, and checking blogs online. I had the opportunity to learn a lot, and now I’m extremely happy to be able to apply that knowledge. At some point in time, I felt the need to share what I had learned, so I wrote a book to summarize this new field of data science. I think this is something really new, and it’s also unique to apply that to supply chain, which is a specific topic. People use data science for online marketing, but supply chain is a different subject, so I took the time to bring them together.

Kieran Chandler: Ok, so the book is called “Data Science for Supply Chain Forecasts.” It’s a bit of a mouthful, but what is it about?

Nicolas Vandeput: It’s exactly as the title says: it’s about supply chain and how to apply data science to get forecasts within supply chain. In the past, people used what I like to call “old school statistics,” which came with many different questions. Now that we’ve moved into a new world of data science, some questions remain, but new ones arise as well. We need to find new ways to deal with data, and this book is about preparing people for this new age.

Kieran Chandler: As always, we’re joined by Joannes Vermorel, the founder of Lokad. Joannes, you had a sneak peek at the book. What’s your perspective on it?

Joannes Vermorel: Yes, I had a sneak peek, and I had the chance to review the manuscript. It’s very good. I recommend supply chain managers, who do not have someone with data science expertise in their team, to get a copy and read at least the first few chapters. Then, have some other people in your team read the other chapters and maybe act on it. What is very interesting is the fantastic progress of open source statistical toolkits. Up to 10 years ago, they were mostly used by researchers to demonstrate things to other researchers. The code was there, but it was messy, research-grade, not production-grade, and the documentation was often nonexistent.

What really changed over the last five to ten years is that with the emergence of data science, many university professors started to pay attention to the quality of the statistical packages. They aimed to make them accessible to their students, ensuring impeccable documentation, consistent terminology across packages, and focusing on mainstream methods that work in a large variety of situations. The open source community, with the help of academics, produced a series of open source packages that were driven by user needs.

Kieran Chandler: Joannes, can you tell us more about Nicolas’s new book, “Data Science for Supply Chain Forecast”?

Joannes Vermorel: In this book, Nicolas takes the good parts of Python and the most relevant packages to demonstrate how you can achieve close to state-of-the-art forecasts with a minimal amount of effort, which is very impressive.

Kieran Chandler: Nicolas, what has changed in the open-source community that has improved the quality of what was out there?

Nicolas Vandeput: I think it’s a question of how easy it is to create a forecast. Ten years ago, it would have been a mess, and this book wouldn’t have been possible. It would have been aimed at extremely motivated professionals. But today, any professional with a bit of curiosity and passion can do it. It’s rather easy; you just need your own laptop with free software, and you can start with a really easy language. Some years ago, that wouldn’t have been possible. It would have been more complex. So now, you have the ability to easily test something on your own for free, and from there, you have the ability to experiment. As it is easy to experiment, you can do more experiments and then fine-tune the forecast, the code, and the data science just for your specific case. In the past, that wouldn’t have been possible.

Kieran Chandler: Joannes, it’s not just the forecasting world that’s benefiting from open-source software. Do you have examples of other industries that have really benefited from having open-source toolkits available?

Joannes Vermorel: The open-source movement is incredibly vast, so virtually all of the software world nowadays is affected by open-source. All the major cloud computing providers are running their own clouds based on Linux, and even at Lokad, 90% of the software we use is open-source. Even Microsoft, which we happen to use, is utilizing a lot of Linux on Azure. The .NET framework itself is open-source, and the deep learning toolkit we use, the CNTK, is also a Microsoft open-source product. At Lokad, we also release quite a few bits as open-source. This movement has been going strong for multiple decades.

What’s interesting, and relevant to forecasting and supply chain, is not just the fact that software is open-source, but that you have well-packaged and well-documented open-source components. This is completely game-changing. It means the difference between getting started with something simple, like a linear regression, in 20 straightforward lines of code, versus needing 200 lines of code and a month of getting all your pieces of software together just to have something that would even compile. You used to have incompatible code that would crash when combined, and you’d need a month of plumbing just to get something published by somebody else to work. Now, setting up your entire Python environment for data science takes just a couple of pages in the book. You simply install Anaconda, and you’re done.

Kieran Chandler: Joannes, could you tell us about the Linux subsystem on Windows and how it impacts the ease of access to these tools?

Joannes Vermorel: The Linux subsystem on Windows allows these tools to work on pretty much any flavor of Linux, and even on Windows systems. The book demonstrates the ease of access to these tools, which has changed significantly due to open source and production-grade packages.

Nicolas Vandeput: I’d like to add that in the book, I discuss professionals who used to rely on VBA and macros in Excel. It always seemed complex and bug-ridden to me. When you suggest using Python or another language, people often think it’s too complex. However, my message in the book is that it’s actually much easier to use open frameworks like Python or R than VBA or Excel macros.

Kieran Chandler: Joannes, with all these open-source forecasting tools that are easy to use, does it make you nervous as an enterprise software vendor like Lokad?

Joannes Vermorel: As an enterprise software vendor, it’s part of the ecosystem and also an enabler. We use these open-source tools too, so we don’t have to rebuild everything. The challenge is to find our added value. If your added value is just implementing semi-complicated forecasting models, then you don’t have any real added value compared to the ecosystem. The book highlights that vendors selling a toolkit with a few forecasting models don’t provide much value compared to the popular Python libraries. However, there is still potential in providing production-grade solutions and handling supply chain challenges at scale, which is something Lokad aims to address.

Nicolas Vandeput: I agree with Joannes. Many professionals and students in the supply chain world still view machine learning as a buzzword or something that won’t last. In reality, it’s here to stay. If you read my book and take the time to learn about it, you’ll see how useful and accessible it can be for supply chain optimization.

Kieran Chandler: So, when reading the book, you’re much better prepared to get a solution like Lokad that can go one step further. As you said, it can get the whole solution end-to-end working. Of course, to run a forecast in a super-agent environment, you also need a full process of review with people and so on. So, the population of the numbers in the forecast is just one step in the whole process. Can you talk about how the ideas in the book can be implemented?

Nicolas Vandeput: The objective of the book is just to discuss that specific step. It’s not because you just read the book that you have to do it by yourself. You can also go to other vendors like Lokad to understand, “I’ve read this in the book, how does it work for you? How can we implement that? I have this idea, would it work? Can we test it?” Then, you start to understand what software vendors like Lokad are really doing.

Kieran Chandler: A lot of these ideas have been developed from a theoretical perspective. Can they be applied in a production sort of way?

Nicolas Vandeput: Yes, totally. These new forecasts have been around for a long time. I mean, neural networks theory was first released in the 60s, and the first method in the book also comes from the 60s. We only start to use them today in a prediction environment because it has become easier to run them. Maybe ten years ago, it wasn’t, and people are now more aware of it. For sure, you can use that. What’s really interesting with data science, and that’s one of the big purposes of the book, is that it’s really science. You can test it, do experiments, and test it over and over again using data. That’s why it’s called data science. You can prove yourself, “Does it work? Yes or no?” It might not work, but then you can think, “I’m going to design a new experiment, take new things into account or remove stuff that I don’t need to see if it’s going to work better.” So, it’s really a science where you can prove your point that your forecast is going to be better and only start to use it after that.

Kieran Chandler: So, it can be used as another proof of concept before you’d approach someone like Lokad and say, “Joannes, when we talk about moving on to a production basis and using it on a daily basis, what are those challenges that people might come up against? And what is it that Lokad can really help with in terms of that full process?”

Joannes Vermorel: First, you need to get good data. If you do not have data that is very qualified, you end up with garbage in, garbage out. The interesting thing about this book is that it would give supply chain people a taste of what the models are doing, what kind of things they can leverage, and that would give them a better understanding of why they need to start paying attention now to a whole series of grey areas in supply chains, such as promotions. Typically, the data on promotions is a complete mess. The same goes for stockouts. Frequently, there is no proper historical data to reflect all the stockouts, so you don’t know if you had zero sales because there was zero demand or if you had zero sales because it was out of stock.

So, being more familiar with the sort of model that can exploit the data can make you more susceptible to see whether your existing process is adequate. The primary goal of a supply chain is to keep the flow of goods moving.

Kieran Chandler: So the primary goal of a supply chain is to compile an accurate historical database. That’s the first target. The secondary target is to serve everyone, keep the production going, and ensure client satisfaction. However, the execution of the secondary target is often not as high in quality as the first target. To accelerate this transition and achieve data consolidation at scale with multiple sites, we need to expand our horizons. Joannes, could you elaborate on this?

Joannes Vermorel: When we start working with clients, we often encounter the challenge of data consolidation and execution. Many clients are not aware that advanced numerical recipes can be applied to forecasts to generate better decisions. Reading the book on data science can expand your horizons and provide insights on how to execute the supply chain more efficiently. It helps you see the bigger picture before and after the forecast, from production to exploiting the forecasts for smarter decisions.

Nicolas Vandeput: Supply chain leaders are missing two crucial elements to extract actionable insights from data science and machine learning. The first is a culture of data, which is lacking in about 99% of companies worldwide. Many companies are still firefighting with messy spreadsheets and don’t realize the importance of proper data. This book aims to show that with proper data, you can achieve proper science, accurate forecasts, and effective experiments. Without data, many projects get delayed, and journalists can confirm that. The second element is the lack of talent and understanding of machine learning and data science in supply chain management. We need to change this culture and raise awareness about the value of data.

Kieran Chandler: Absolutely, having the correct data is crucial. It should be like a commandment, “Clean your data.” But how do we bring about this change in culture? How can we evangelize the market and emphasize the importance of having the right data?

Joannes Vermorel: Well, jokingly, we could start a cult where better data is mandatory, like a sacred commandment. But on a serious note, I believe we need to foster understanding and awareness. We can educate the market and place more importance on the value of clean and accurate data. By doing so, we can create a culture shift and ensure that having the correct data becomes a top priority for supply chain management.

Kieran Chandler: When you have a lack of awareness of what can be done with deep learning and have no experience with it, it’s very hard to even see the point, isn’t it?

Joannes Vermorel: Indeed, it’s challenging. I think a good starting point is getting your hands dirty with data crunching and not being completely buried under the pure technicality of the task. It shouldn’t take an IT team just to set up the environments. It’s very important to understand what makes these methods tick, what makes them work. It’s not magic.

Demystifying this is important. At Lokad, we’ve tried to evangelize the market. Education is a path forward. Nicolas is publishing a book, which is excellent. We’ve also published a book and quite a few things, including an extensive knowledge base on the Lokad website. But yes, the bottom line is education.

What I see is a new wave of supply chain practitioners coming to the field with more of an engineering mindset, which is more quantitative. You want to have numbers and things that can be repeated. Leadership is essential in supply chain, as it involves many people, countries, and sites. But if you have leadership without any kind of engineering mindset or quantitative mindset, it’s hard to optimize anything. As soon as you start using the word “optimize,” the cardinal rule of optimization is that you cannot optimize something that you do not measure. This leads to the question: how do you measure? And then you need data.

Kieran Chandler: Nicolas, we’ve spoken a lot about the benefits of using these open-source toolkits. How about some of the drawbacks? Where do you see some negatives with using these tools?

Nicolas Vandeput: As we discussed, and I think this is really important, the world of supply chain is about a lot of interaction between products, people, different teams, and so on. The process of forecasting is a very long process with many different stakeholders involved. My book is saying that for the last decades, we have been using the same techniques. If you look back at the 80s and 90s, you’ll find the same forecast engine as today. So nothing changed.

I’m suggesting that we can change this very specific piece, but of course, that’s not enough. The whole process needs to live and evolve. Just using Python won’t solve a process that does not work. It will just improve the numbers out of the forecast, but you still need to look at the full process.

Kieran Chandler: Joannes, you seem very confident that there’s always going to be a place for Lokad. Looking forward, where do you see that place in the marketplace actually being?

Joannes Vermorel: First, I believe that even when talking of pure forecasts, we still have cards to play to get better forecasts. But “better” is trickier than ever. It’s one thing to say you have a better forecast in terms of mean absolute error, but as soon as you enter the probabilistic world, it’s another game.

If you start to say, “I want to forecast demand, but not tomorrow, but on a probabilistic horizon that is when my container will arrive,” then it becomes a demand that starts at an uncertain point of time in the future and ends at another uncertain point. Things can get very complicated with more dimensions. Then, if you start to include factors like the probability that my competitor drops their price, which would have a very specific impact on the shape of the demand, you see that accuracy on forecast is not a one-dimensional thing. It has this whole complexity to address.

Kieran Chandler: Suddenly, your models become complex, even if you have very nice open-source toolkits. Plumbing all those things together in a way that is bug-free, production-grade, and scalable, there is still a lot of other things to do.

Joannes Vermorel: Yes, if we take the pure forecasting angle, I believe that’s one approach. But, at Lokad, our vision is really to have a more deep, end-to-end analytical overlay to generate smart decisions for a given supply chain. We don’t manage supply chain systems, we’re not an ERP. We don’t want to be the repository of all the inventory movements. We do have a copy of this data, but we are the smart analytical overlay.

That’s the vision. And again, even if Lokad was entirely built out of open-source tools, there is still value in bringing all of that together. For example, the cloud computing platforms that are available nowadays are a gigantic mashup of open-source stuff, but people like to go for Amazon because you could do it yourself. You could have your own private cloud, but it’s such a massive amount of effort to bring all those things together that at some point, there is value in having specialists do it for you.

One drawback I see about this ecosystem of Python and open-source very specifically is that it’s evolving so fast. If you do it yourself, there is one danger: you pick a flavor of Python and a toolkit, and then two years down the road there is something dramatically improved. Suddenly, you were state-of-the-art and now you’re not, just because the world kept moving and a lab just produced a new toolkit.

For example, Scikit was leading in pretty much everything up to a couple of years ago, but now PyTorch is completely challenging the whole thing by bringing deep learning and differentiable programming to the picture. So it raises a question: who is responsible to revisit what you’ve implemented two years ago and refresh it with the flavor of the day? A good vendor would take care of making sure that your data science solution is routinely revisited and probably routinely rewritten to stay state of the art.

I don’t know how long this golden age of progress in data science will last, but I would not be surprised if, for something like the next decade, every two years we see some fantastic progress where the new flavor is so much better than the previous one. There could be categories of problems that seem super hard to address but become accessible.

Nicolas Vandeput: If I may add something there, I think it’s really interesting that Joannes mentioned the complexity that Lokad is able to deal with. I’ve worked personally with Lokad’s team, so I know what they’re capable of. One of the messages of my book, and this is really important to me, is telling people that they can do it.

Some people might have heard what Joannes is saying and think, “Wow, data science and machine learning seem really complex, maybe that’s not for me.” I really want to reassure people by saying, “No, it’s not that complex. You can start with a simple model.” And actually, as the book is showing, you can fairly easily start with a very simple model that’s actually extremely strong as it is. Then, from there, you can add layers, maybe tweak the system a bit, the data, maybe bring another model in, and so on.

So, you can really start simple with something that will work already really well, and from there, then go to more complex matters. That’s really the message I want to put forward in the book: you can start simple and then progress to more.

Kieran Chandler: Can you help us understand how we can determine if a model is working? Can we test it? Can we replicate it? These principles seem to apply for simple models, but once you grasp them for a basic model, can you also understand them for much more complex models?

Nicolas Vandeput: Absolutely, I strongly advise people to start simple. Once you have a handle on the basics, you can then progress to something much more complex.

Kieran Chandler: That brings me quite neatly onto my last question. What are your hopes for the readers of your book and for the use of open source toolkits going forward?

Nicolas Vandeput: My vision, and indeed my hope, is that people will read this book and gain confidence in their abilities. At the end of the book, I want them to be able to say to themselves, “Yes, I really can do this. It doesn’t look so complex.” Maybe two weeks ago they couldn’t code, but by the end, I want it to look very simple to them. I want these people to start experimenting by themselves, to really live the data science, and try new models. I hope that they will become the new leaders in supply chain management. This is also one of the messages of Lokad. We are already in a quantitative world of supply chain where you can experiment and implement solutions that work consistently. To do that, you need data science. So, I really hope that the book will empower people to do that for themselves.

Kieran Chandler: We’re going to have to wrap it up, but thank you for your time today, Nicolas.

Nicolas Vandeput: Thank you.

Kieran Chandler: The book, “Data Science for Supply Chain Forecast”, is out now. Make sure you check it out on Amazon. Here at Lokad TV, we’ll be back again soon with another episode. But until then, thanks for watching.

Joannes Vermorel: Thank you.

Back to Lokad TV ›

PREVIOUS EPISODES