Data Aggregation and Forecasting

00:00 Introduction
00:40 What are the different types of granularities to choose from in demand forecasting?
01:50 What kind of forecast are we using when we’re choosing the granularities to use?
03:02 What is the timeline in these forecasts, are they all equispaced?
03:54 In a monthly forecast, will the different numbers of weekends in a month be the cause of data disturbance? For example, if there are spikes in sales on Fridays?
05:25 Can we choose the most disaggregated level, say SKU/day, and then reconstruct any needed level of aggregation we want from that?
08:30 Can you give us an example of the edge cases when it comes to the level of aggregation?
10:43 Can you in theory just sum up your time series from the most granular level?
11:37 Is the accuracy dropping the higher the level of aggregation?
14:12 Is the forecasting method chosen another limiting factor?
15:34 As time series are always ‘more of the same’, making the assumption that the future is symmetrical to the past, is that causing a problem?
16:13 If you go more granular, will you lose out on patterns of seasonality?
18:42 Many RFPs are asking vendors to forecast over numerous levels of aggregation at once. Why?
21:34 Can you give us an example of a relevant horizon?
25:01 So the level of granularity should always be on the level of the decisions that you take?


When it comes to demand forecasting, there is an incredible diversity of methods and levels of data aggregation that are used. Some companies forecast on a daily basis whilst others on weekly, monthly, quarterly or yearly basis. Some forecast at the SKU level whilst others at the category level. How does one choose the right level of data aggregation?

Level of aggregation is directly linked to time-series forecasts which add their own sets of limitations such as having to adhere to a specified equispaced timeline. As most data is found at the most granular level, in theory, it should be possible to reconstruct any granularity level by summing the most granular level available to the desired level. Perhaps counterintuitive, that proves to give more unstable and inaccurate results.

What becomes evident, is that one should always be as granular as the decisions taken in the supply chain. However, as data becomes steadier the higher the level of data aggregation, it is tempting to opt for that option to make forecasts look nicer and easier to work with. However, this perceived increase of accuracy is only an accuracy in percentages, not dollars, and does not represent how the supply chain is actually performing.