3.5 MOVING AVERAGE
3.5.3 Setting the parameter
To use the moving average method. we shall set the parameter k . t h a t is, the number of demand observations we want to use to generate the forecast. To select this parameter. we face a tradeoff between.
0 T h e ability of the model t o filter noise. t h a t is. t o avoid overreactions t o demand observations that are significantly above or below thP average.
0 The ability of the model to promptly react to changes in demand such as a sudden increase or decrease in expected demand.
If a large value of k is chosen. the moving average method shows a strong inertia. On the one hand, a single observation significantly above (or below) the average has little consequence. On the other hand. it takes time for the model to adapt t o any significant change in average demand. So. iii this case the moving average Jilters nozse very effectively, but it adapts to changes in demand slowly.
On the contrary. if a small value k is chosen. a single demand observation has a great deal of influence on the future forecast ( t o an extremc. if k = 1 the forecast just equals last demand observation). Thus a small k makes the moving average very reactzw but at the same time very seriszt
In other words. demand observations significantly above or belo\$ the average
lead to bumps in the demand forecast, which turn into a larger forecasting error.14
Figures 3.4-3.7 show the behavior of moving average with various values of k . The examples consider the moving average with k = 2 and k = 6, and a forecasting horizon of one period ( h = 1).
When we analyze the performance of the moving average with a statistically stationary demand (figure 3.4 e 3.5) we can see that:
0 The moving average with time window 6 ( k = 6) requires a longer initialization:
0 The moving average with k = 6 is more stable than with k = 2. This leads t o more accurate forecasts. if the expected demand is stable (and the random part of the demand is not auto-correlated, t hat is variables
~t in equation (3.9) are independent-see definition A . l l ) . In these cases. stable forecasts are more effective simply because fluctuations in forecasts add to the fluctuations in demand and tend t o increase the gap between the two variables, th at is. the forecast error.15
In the case of the demand patterns displayed in figures 3.4 and 3.5. k = 6 guarantees more accuracy than k = 2 (RhlSE is 7.67 and 10.11, respectively, while MAD is 6.96 and 8.26. respectively).16
Figures 3.6 and 3.7 show how the moving average reacts to an odd demand observation that significantly differs from the mean. The figures show that the reaction to the anomaly is definitely larger in the case of k = 2 than in the case of k = 6. However, the effect of the odd observation lasts longer in the case of k = 6. Indeed, in the case of k = 2 the anomaly in period 15 quickly exits the sample we consider to generate the new forecast. This really means that if k = 2 the effects of the outlier are not larger but simply more concentrated over a shorter period of time. While the differences in MAD are negligible (MAD is 38.7 and 37.2 for k = 6 and k = 2 respectively). the differences in RhfSE are sizable (RhlSE is 82.6 and 71.8. respectively) since RMSE penalizes larger errors (see section 3.3).
I4We basically add t h e fluctuations of demand t o t h e fluctuations of forecast in a scenario where expected demand is stable.
151f the process is truly stationary, there is no reason whatsoever t o consider only the last k demand observations. If demand is truly stationary, we should simply take t h e average of all demand observations we have. However, in real-life contexts, this situation is hardly t h e rule. So we only consider t h e last k demand observations. as we believe them t o be t h e only relevant ones t o estimate future demand. Adding a n extra observation from the past adds more information on t h e one hand, and thus should increase accuracy, but on the other hand it reduces the quality of our inputs, as the older the d a t a , the least significant they are t o predict future demand.
16Notice t h a t we only use periods 7 t o 30 t o measure accuracy so t h a t performance of both alternatives are measured on the same sample.
MOVING AVERAGE 123
20 -
120
-I n
time loo 80
I
b40 6o
1
-e- + demand moving average k=2Fig 3 4 Behavior of moving average: case of k = 2 . stationary demnnd.
40 6o
I
-*- + demand moving average k=6F/g 3 5 Bchai lor of moving avprage case of k = 6. stationary derniliid
+ demand
+ moving average k=2
100
i cv
Fig. 3.6 Behavior of moving average: k = 2; demand featuring a pulse.
Yt'Ft 450
1
+ demand
+ moving average k=6 400 -
350 -
300 -
250 -
200 -
150 -
100 -
50 time
0 , , I , , I , , , # I , , , , , , , I t , , , ,
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Fig. 3.7 Behavior of moving average: k = 6. demand featuring a pulse.
Also. figures 3.6 and 3.7 show that when one uses time-series models, out- liers cause forecasting errors both when they occur (as they are unpredictable for time series models) and in successive periods as they bias forecasts.
The previous examples show t h a t , for "large" values of k . the moving av- erage *'filters noise" very well. t h a t is, it effectively tells the average behavior of demand from random short-term fluctuations. The example of figures 3.8 and 3.9 shows that large values of k entail a poor reactivity of the model. that is, they limit the ability to adapt to changes in expected demand. In the case of k = 2 the moving average completely "forgets" the previous behavior of
MOVING AVERAGE 125
450 -
400 -
350 -
300 -
250 -
200 -
150 -
100 -
50 - 0
<
i i
I 1
1 ;
; I
, --c *- demand moving average k=2=e2++&4 i
*
time
I I I I I ,
Fig. 3.8 Behavior of moving average: k = 2 : demand is a step function
450 -
400 -
350 ~ 300 ~ 250 -
200 ~
~---..-/$&.-6rs-
I #’
/
i I
i1 9’
I
I
#
--c demand
I ’
50 - 0
Fig. 3.9 Behavior of moving average: k = 6. demand is a step function
time
1 1
demand while in the case of moving average with step 6 ( k = 6) the transient state is much longer and thus the accuracy is worse (51AD is 29.0 and 56.2 while RhISE is 76.7 and 107.1. respectively).
3.5.4 Drawbacks and limitations
The moving average is a rather simple forecasting method t h a t is widely used.
However. it has drawbacks and limitations. This method gives an equal weight l / k to the last k demand observations. while it totally neglects previous ones.
Table 3.11 Demand data for vanilla ice cream
period 1 2 3 4 5 6 7 8
yt 116.36 96.30 109.64 99.92 110.31 99.88 89.07 107.38
period 9 10 11 12 13 14 15 16
yt 121.21 100.99 89.63 88.43 83.83 95.87 102.17 103.43
period 17 18 19 20 21 22 23 24
yt 104.55 88.19 98.53 103.58 87.95 110.83 103.87 115.57
One could think t ha t it might be more reasonable
0 To give more recent observations a greater weight than more remote ones: for example. one might want to give more weight to observation t than t o observation t - 1;
0 To give even more remote demand observations a nonzero weight.
Example 3.14 Let us consider a store t ha t sells ice cream on a beach. The demand for vanilla ice cream over the last 24 days is shown in table 3.11.
Demand is rather stationary with some minor variations.
The lead time is two days and deliveries are daily. This means that the time bucket is the single day and the forecasting horizon is two days ( h = 2) The manager of the store is trying t o predict future demand with the moving average algorithm. He wonders whether he shall be using moving average with k = 2 or k = 5.
To choose between the two options. we can measure which one would have performed better in the past, assuming th at the option t hat would have worked better in the past is going to be the better performer in the future as well. The moving average with step 5 ( k = 5) can generate the first forecast only in period 5. Our horizon consists of two periods: hence, in order to get a fair comparison, we are going to compare the accuracy of the two parameters in periods 7 t o 24.
Let us take you through the forecast generated in period 5 for period 7.
i.e., F s , ~ = F7:
0 If k = 2, the forecast generated in period 5 is the average of demand in period 4 and in period 5. So, F s , ~ = F7 = (99.92
+
100.31)/2 = 105.12.Given the demand in period 7 Y7 = 89.07. the error is e7 = 89.07 -
105.12 = -16.05.
0 If k = 5 , the forecast generated in period 5 is the average of the demand in the first 5 periods. So F s , ~ = F7 = (116.36
+
96.30+
109.64+
99.92+
100.31)/5 = 106.51. So the error in period 7 is e7 = 89.07 - 106.51 = - 17.44.
SIMPLE EXPONENTIAL SMOOTHING 127
Table 3.12 Forecast with step 2 ( I ; = 2 )
period 7 8 9 10 11 12 13 14 15
Ft 105.12 105.10 94.48 98.23 114.30 111.10 95.31 89.03 86.13
period 16 17 18 19 20 21 22 23 24
Ft 89.85 99.02 102.80 103.99 96.37 93.36 101.06 95.77 99.39
Table 3.13 Error with step 2 ( k = 2 )
period 7 8 9 10 11 12 13 14 15
et -16.045 2.285 26.735 2.765 -24.665 -22.67 -11.48 6.84 16.04
period 16 17 18 19 20 21 22 23 24
et 13.58 5.53 -14.61 -5.46 7.21 -5.41 9.775 8.105 16.18
Table 3.14 Forecast with step 5 ( k = 5 )
period 7 8 9 10 11 12 1 3 14 15
Ft 106.51 103.21 101.76 101.31 105.57 103.71 101.66 101.53 96.82
period 16 1 7 18 19 20 21 22 23 24
Ft 91.75 91.99 94.75 97.97 98.84 99.37 99.66 96.56 97.82
Table 3.15 Error with step 5 ( k = 5 )
period 7 8 9 10 11 1 2 1 3 14 15
et -17.44 4.17 19.45 -0.32 -15.94 -15.28 -17.83 -5.66 5.35
period 16 17 18 19 20 21 22 23 24
et 11.68 12.56 -6.56 0.56 4.74 -11.42 11.17 7.31 17.75
1t-e can repeat this process for t = 8, .... 24 and obtain tables 3.12 and 3.13, which show the forecasts and errors, respectively. in the case of k = 2 , and tables 3.14 and 3.15 that show the forecasts and errors, respectively. in the case of k = 5 .
Finally. with the error da t a we can compute accuracy metrics. For example.
the RMSE is 13.95 for k = 2 and 11.90 for k = 5 . Thus we draw the conclusion
th at we would rather select k = 5 .
0
3.6 SIMPLE EXPONENTIAL S M O O T H I N G