Properly preparing the data is a requirement to achieve success for any data-driven initiative. When considering supply chain challenges, data preparation is difficult because it involves complex enterprise systems that have not been designed with data science in mind.
Data preparation is one of the fundamental principles of data science. Put simply, it’s about transforming raw data into an easy to understand format so that it can be used successfully. This is no mean feat when you consider that data often comes from multiple sources and can be incomplete, inconsistent and likely to contain errors. This can lead to a “garbage in, garbage out” dilemma, which is to be avoided at all costs.
A recent Trifacta survey estimates that organisations are currently investing over $450 billion on data preparation alone. In this episode of LokadTV we try and understand why, despite this huge investment, data is still so badly managed and we introduce the concept of data preparation in more detail.
To conclude, we discuss in further detail the impacts of these complications, why data is currently so badly managed by companies and what good data preparation actually looks like.
00:46 Could you clarify why we are talking about data preparation today?
02:02 How long should it take to prepare a little bit of data?
03:15 Six months sounds like a long period of time. Is there any way we can speed up this process?
06:06 What do you expect to see in this documentation?
07:11 Is all that documentation needed?
10:07 Do you have a good example of how one of your clients did it in the past?
14:58 Obviously, here at Lokad, in order to build our probabilistic forecasts, we are fairly reliant on having clean historical data. Is there anyone other than us who really cares?
18:38 Should the IT department be preparing the data, making sure it is clean?
24:37 To sum things up, what does good data preparation look like?