With the recent GDPR compliance laws, data is now at the forefront of many business discussions. Making use of data within companies can deliver huge results when done correctly. However, without proper data preparation it can be very difficult to use this data to give any valuable insight and draw relevant conclusions.
A recent Trifacta survey estimates that organisations are currently investing over $450 billion on data preparation alone. In this episode of LokadTV we try and understand why, despite this huge investment, data is still so badly managed and we introduce the concept of data preparation in more detail.
Data preparation is one of the fundamental principles of data science. Put simply, it’s about transforming raw data into an easy to understand format so that it can be used successfully. This is no mean feat when you consider that data often comes from multiple sources and can be incomplete, inconsistent and likely to contain errors. This can lead to a “garbage in, garbage out” dilemma, which is to be avoided at all costs.
To conclude, we discuss in further detail the impacts of these complications, why data is currently so badly managed by companies and what good data preparation actually looks like.
00:46 Could you clarify why we are talking about data preparation today?
02:02 How long should it take to prepare a little bit of data?
03:15 Six months sounds like a long period of time. Is there any way we can speed up this process?
06:06 What do you expect to see in this documentation?
07:11 Is all that documentation needed?
10:07 Do you have a good example of how one of your clients did it in the past?
14:58 Obviously, here at Lokad, in order to build our probabilistic forecasts, we are fairly reliant on having clean historical data. Is there anyone other than us who really cares?
18:38 Should the IT department be preparing the data, making sure it is clean?
24:37 To sum things up, what does good data preparation look like?