With the recent GDPR compliance laws, data is now at the forefront of many business discussions. Making use of data within companies can deliver huge results when done correctly. However, without proper data preparation it can be very difficult to use this data to give any valuable insight and draw relevant conclusions.
A recent Trifacta survey estimates that organisations are currently investing over $450 billion on data preparation alone. In this episode of LokadTV we try and understand why, despite this huge investment, data is still so badly managed and we introduce the concept of data preparation in more detail.
Data preparation is one of the fundamental principles of data science. Put simply, it’s about transforming raw data into an easy to understand format so that it can be used successfully. This is no mean feat when you consider that data often comes from multiple sources and can be incomplete, inconsistent and likely to contain errors. This can lead to a “garbage in, garbage out” dilemma, which is to be avoided at all costs.
To conclude, we discuss in further detail the impacts of these complications, why data is currently so badly managed by companies and what good data preparation actually looks like.