Mean Absolute Deviation - Introduction to Distribution Logistics

A first metric for accuracy is MAD (Mean Absolute Deviation), which basically uses the absolute error to make sure negative and positive errors add

U p :

l n NAD = -

C

^let/

t=l

METRlCS FOR FORECAST ERRORS 105

Table 3.3 ^LIeanError a metric for bias

Period 1 2 3 4 5 6 ATE

~ ~ ~~ ~

Demand 90 110 110 90 110 90

Forecast 1 100 100 100 100 100 100 0

Forecast 2 90 110 110 90 110 90 0

Table 3.4 Comparison between LIE and LIAD

Period 1 2 3 4 5 6 1\IE h14D

Demand 7 13 9 12 8 11

Forecast 1 10 10 10 10 10 10 0 2

Forecast 2 6 1 2 8 11 7 10 1 1

The example in table 3.4 tells the difference between AIE and AlAD. The first forecast is not biased. as the mean demand equals the mean forecast. On the contrary. the second series of forecasts is biased. as it is always conser\ative:

The forecast is always one unit below the demand. LIE actually tells us that the first series of forecasts is unbiased while the second one under-forecasts.

However, the second forecast captures and follows demand fluctuations more accurately than the first one. Thus, in each single time bucket the second forecast tends to be closer to demand than the first one. MAD catches such a difference as it tells that the second forecast is more accurate than the first one.

Finally, which forecast is the best option? Should we care more about accuracy or bias?

Actually. we cannot tell whether one forecast is better than the other. One is better for bias. the other for accuracy. In some contexts bias might matter inore than accuracy and vice versa. However. we may see that correcting for bias is relatively easier than correcting for inaccuracy. If a forecasting process is consistently conserlative. but it folloa 5 demand fluctuations very closely (see the example in table 3.4). we can improve the forecast by adding the average bias to the forecast. For example. if a forecaster is conservative and consistently underestimates demand by 10 units. when he/she generates a demand forecast of 110 units for next period, we might expect deniand to be around 120 units (110 units

+

10 units). In the example above. if the second forecaster predicts a demand of 12 units for the next period. we might add one extra unit to it since in the past we have noticed that he/she tends to under-forecast by one unit. Thus we might expect demand for 13 units. Such

an adjustment to the forecast improves both bias and accuracy, thus reducing both the ME and the MAD. On the contrary. there is no obvious solution to inaccuracy. Say you want to improve the quality of the first forecast in table 3.4. What would you do? Actually there is no easy fix with regard to inaccuracy.

Concept 3.3 A good forecast ^2s both accurate and unbzased. Both are very relevant performance metrzcs, but whale there ^zsa fazrly easy fix for a conszs- tently bzased but accurate forecast, there as no such easy fix f o r a n unbiased and znaccurate one.

3.3.3 Root M e a n Square Error

A second metric for accuracy is Root Mean Square Error (RMSE). This metric squares errors t o sum positive and negative ones.

t=l (3.3)

RMSE is a very commonly used metric, as in statistics squared errors are often used instead of absolute ones (they result in a differentiable function, whereas the absolute value function is kinky). Thus, a quadratic error provides estimates that are more directly linked to the variance and standard deviation (see appendix A) of the demand distribution. Often we use the forecast that a n algorithm generates as a n estimate for the expected level of demand while we use RMSE as a n estimate of standard deviation.

Table 3 . 5 shows the differences among ME. MAD. and RMSE. Forecast 2 differs from Forecast 3, as errors are more frequent but they tend to be smaller.

This is why RMSE considers Forecast 2 to be more accurate th an Forecast 3.

This finding can be generalized by saying that

RMSE

is a quadratic metric for error and thus it tends t o overweight large errors. So RMSE "prefers"

forecasting algorithms t h at generate constant errors. rather than algorithms that are very accurate in some periods but can generate significant errors in others. MAD is a linear metric for error and thus gives the same weight t o all errors. small or large.

ME. RICISE. and MAD measure the forecast error using the same units of measurement as demand. For example, if demand is measured in units or kg.

then LIE, RMSE, and N A D are measured in units or kg as well. This can be a drawback: When reading the performance of any forecast. we should carefully consider the scale that is adopted. If one decides t o use kg rather t h a n hg to measure demand for cheese. ME. MAD, and RhlSE drop by a factor of 10.

Moreover, these metrics make the comparison of performances across products very hard. As table 3.6 shows, the metrics presented so far might lead us t o believe that the forecast for item A is more accurate than the forecast for item B. However. an error of one unit out of an average demand of 10 units

METRICS FOR FORECAST ERRORS 107

Table 3 5 Comparison between accuracy metrics: MAD and RSISE

Period 1 2 3 4 5 6 BIAS AlAD R I I S E

Demand 7 13 9 12 8 11

Forecast 2 6 12 8 11 7 10 1 1 1

Forecast 3 7 10 9 9 8 11 1 1 1.73

Error 2 $1 +l +1 +1 +1 +1 1 1 1

Error 3 0 + 3 0 + 3 0 0 1 1 1.73

Table 3 6 Comparison between accuracy metrics: LIAD and RIISE

Period 1 2 3 4 5 6

ME

^MAD ^RAISE

Demand -4 ⁷ ¹³ ⁹ ¹² ⁸ ¹¹

Forecast A 8 12 10 11 7 1 2 0 1 1

Error A -1 +1 -1 +l +1 -1 0 1 1

Demand

B

⁷⁰ ¹³⁰ ⁹⁰ ¹²⁰ ⁸⁰ ¹¹⁰

Forecast B 75 125 95 115 75 115 0 5 5

Error B -5 +5 -5 +5 +5 -5 0 5 5

is “worse” than a n error of 5 units out of a demand of 100 units. Thus. often one wants to look at percentage error metrics.

3.3.4

The drawbacks of nietrics such as AIE. MAD. and RXISE lead us to introduce percentage errors t h a t basically try t o compare the forecasting error with demand. The most classic metrics in this vein are Mean Percentage Error (LIPE) and hlean Absolute Percentage Error ( N A P E ) . which measure percentage bias and percentage accuracy, respectively. Notice that. as following equations show. these metrics compare the error in period t with the demand in the same period:

M e a n Percentage Error and M e a n Absolute Percentage Error

h l P E = - x t . l n

t=l yt 1 ” N A P E = -

w.

n ^{t = l}Y t

These metrics are pure numbers and thus do not depend on the scale one uses to measure demand. Hence. one can easily compare the accuracy and bias across various product or

market^.^

Example 3.9 Some European Fortune 500 companies have adopted different percentage errors metrics. They basically divide the error by the forecast rather than by the demand; hence, they use the metrics below, which are modified versions of M P E and N A P E :

1 "

n Ft M P E M = -

5 ,

t=l

This might be a tempting solution but is actually an awful one. Indeed, this definition of percentage error provides the forecasters (whose reward may depend on these metrics) with two means t o improve their performance:

0 First, they can reduce the numerator, that is reduce the forecasting error.

0 Second. they can increase the denominator. t hat is increase the forecast.

This gives the forecasters an incentive t o overstate their forecast. Not surpris- ingly the companies noticed that the predicted demand was on the average above the actual one.

These metrics are particularly dangerous in the case of low or highly variable demand. Let us consider the case of a demand that in 1 / 3 of the cases is zero, in 1/3 of the cases is one, and in 1 / 3 of the cases is two. Let us assume that the forecaster is judged and rewarded on the basis of 51APEhl.

Also, let us assume that he/she has no specific idea about what is going to happen in the next period. So he/she basically faces the long term demand distribution. He/she has two options. The more reasonable one is to forecast one unit for all future periods. In this case. in 213 of the cases the absolute error is 1 and in 113 of the case it is zero. Given the forecast of one. the hIAPEM is going to be 0.66. The other apparently less reasonable option is t o forecast two units for all future periods. In 1 / 3 of the cases, demand is going to be zero and the error is going to be 2 . In 1 / 3 of the cases demand is going t o be one and error is going t o be one. and finally in 1 / 3 of the cases the forecast is going to be correct. This really means t hat the hfAPEM is just 0.5 (33.33% 2

+

^33.33%^{. 1}

+

3 3 . 3 3 % . 0) /2. As this example clearly shows.

7Note t h a t , in general, we expect products/markets with higher demand to have less variability. Thus. in general. we also expect t h a t t h e higher t h e demand, t h e lower t h e percentage error, as t h e forecasting problem is simpler.

METRlCS FOR FORECAST ERRORS 109

Table 3.7 Percentage error nietrics: MPE and NAPE

Period 1 2 3 4 i) 6

Demand A 7 13 9 12 8 11

Forecast A 8 12 10 11 7 1 2

Error

A

^-14.3% ^+7.7% ^-11.1% ^+8.3% ^+12.5% ^-9.1%

Demand B 70 130 90 120 80 110

Forecast B 75 125 95 115 75 115

Error B -7.1% +3.8% -5.6% 1 4 . 2 % +6.3% -4.5%

Table 3.8 Comparison between absolute and percentage error metrics

LIE MAD l l P E N A P E

Forecast A 0 1 -1% l0.5'x

Forecast B 0 5 -0.5% 5.3%

these metrics. which are apparently very similar to hIPE and N A P E and are commonly used. provide very odd incentives to overstate the forecast.

0

\Ye can reconsider the data in table 3.6 and calculate the percentage errors displayed in table 3.8. Data show that the forecast for demand B is actually more accurate than for demand A.

The use of hIPE and MAPE as performance evaluation measure is sug- gested in the literature (see, e.g.. [13]). but these metrics have sevwal drawbacks and weaknesses:

They cannot be adopted when demand during a time bucket can be zero.

Indeed, when demand is zero we cannot compute the percentage error.

In real applications. such a case is relatively frequent. For example.

in the case of retail chains. replenishnients are so quick and frequent that one needs to forecast demand down to the single da) or single week. Also, assortments tend to be veq- wide and thus man\ products have relatively low demand rates. These trends make the likelihood of a zero demand for a single product. in d single store. in a given day quite sizeable. Understandably. the extent of this problem dcpends on the definition of the demand one wants to forecast: The longer the time bucket. the larger the market (nation vs. single store) and the broader the set of product variants (single

SKU

or product family). the

Table 3.9 Percentage error metrics in case of variable demand

Period 1 2 3 4 5 6 7 8 9 10

Demand 10 10 10 10 1 10 10 10 10 10

Forecast 1 10 10 10 10 10 10 10 10 10 10

Error 1 0 0 0 0 -9 ⁰ ⁰ ⁰ ⁰ ⁰

Forecast 2 12 12 12 12 1 12 12 12 12 12

Error 2 -2 -2 -2 -2 0 -2 -2 -2 -2 -2

ME MAD M P E MAPE

Forecast 1 -0.9 0.9 -90% 90%

Forecast 2 -1.8 1 .8 -18% 1 8 %

higher the expected demand and thus the lower the probability of a zero demand.

0 Even in cases of nonzero demand, these indexes can give really odd re- sults when demand shows wide variations. Indeed, as the example in table 3.9 shows, MP E and MAPE tend to overweight errors in low demand periods. In the example. the error of the first forecasting method in period five is so large (in percentage) t hat it more than counterbal- ances the greater accuracy th at this method achieves in other periods.

Thus. these metrics cannot possibly be computed when demand is zero, and when demand varies substantially they might provide misleading insights.

For example, in table 3.9 the first forecast seems to be more accurate and less biased than the second one, while M P E and MAPE seem to suggest just the opposite. Thus these metrics might lead us to erroneous conclusions.

Indeed. in most circumstances the cost due to a forecast error of 2 units in a low demand period is quite similar t o the cost of a 2 units error in a high demand one. Finally. these metrics actually build strange incentive schemes for the forecasters. If a forecaster is to allocate his/her efforts among different products or over time, he/she might end up focusing on items in periods of low demand since a unit of error is more heavily penalized by the error metric.8 3.3.5 M E % , MAD%, RMSE%

The problems discussed in the previous section lead us t o design new performance metrics t ha t

81n this case. we clearly overlook the fact t h a t the effort required t o cut the error by one unit might be different for different products/periods.

METRlCS FOR FORECAST ERRORS 11 1

0 consider errors in low and high demand periods equally damaging and

0 allow us to compare the performance across products and markets with different mean demand.

Such metrics are ME%. LIADX. and RAISE’%.

mean demand for the product/market combination:

These performance measures compare the ME, I I A D , and RAISE to the

LIE hlE% ⁼=.

Y

where

. n 1 -

Y = - C K .

t = l

These metrics still retain the good features of LIPE and N A PE. Indeed, if we apply them t o the da ta in table 3.6, they suggest that forecast B is more accurate t ha n forecast A: MAD’% and RAISE% are 5% (5/100) for

B,

^while

they are 10% (1/10) for A: LIE% is zero in both cases.

Sloreover. they avoid some of the drawbacks of 51PE and MAPE as they can properly judge the quality of the forecasts in table 3.9. MAD% for forecast 1 is 9.9% (0.9/9.1) while it is 19.8% (1.8,’g.l) in case of forecast 2 .

These metrics can measure the quality of a forecast and compare it with the average demand.9 However. predicting a n extremely variable demand can be more complex than predicting a very stable one. In other words, a given forecasting error might be very good in the case of an extremely variable demand. whereas it might be very poor in the case of a flat one. Thus we might not want t o look a t the forecasting error per se. but we might want to put it in the right perspective and analyze the complexity of the forecasting task.

gNote t h a t in this case the denominator depends on t h e sample we choose. Thus. if we consider t h e accuracy of the forecast for hlay 2006 and look a t t h e demand over the first five months of 2006 or over the last 12 months, we are going t o get two different figures.

Therefore, t o make sure metrics for accuracy and bias do not change over time. we shall define sampling policies. For example, a company t h a t generates forecasts a t the day level might want t o record accuracy and bias a t the month level t o properly define the sample and thus t h e average demand in each sample.

Table 3.10 The impact of demand variability on forecasting performance.

Period 1 2 3 4 5 6 hlE% MAD% RhiSE%

Demand 4 10 9 10 11 10 10

Forecast A 9 10 11 10 9 11 0

Error A +1 -1 -1 +1 +1 -1 0 10% 10%

DemandB 15 8 5 12 13 7

Forecast B 14 9 7 10 12 8 0

Error B +1 -1 -2 +2 +1 -1 0 13.3% 14.3%

Dalam dokumen Introduction to Distribution Logistics (Halaman 122-130)