Comparing multiple outcomes simultaneously: analysis of variance

CHAPTER 2 Statistics

2.6 Comparing multiple outcomes simultaneously: analysis of variance

DRAFT - Version 3 -Correlation

67

(EQ 39)

That is, r² is the degree to which a regression is able to reduce the sum of squared errors, which we interpret as the degree to which the independent variable explains variations in the dependent variable. When we have perfect linear dependency between Yand X, then the degree of correlation is 1 in absolute value, and the regression line is perfectly aligned with the data, so that it has zero error.

In computing a correlation coefficient, it is important to remember that it only captures linear dependence. A coefficent of zero does not mean that the variables are independent: they could well be non-linearly dependent. For example, if y² = 1 - x², then for every value of X, there are two equal and opposite values of Y, so that the best fit regression line is the X axis, which leads to a correlation coefficient of 0. But, of course, Y is not independent of X! Therefore, it is important to be cautious in drawing conclusions regarding independence when using the correlation coefficient. For drawing such conclusions, it is best to use the chi-square goodness-of-fit test described earlier.

Like any statistic, the correlation coefficient r can have an error due to random fluctuations in a sample. It can be shown that if X and Y are jointly normally distributed, then the variable is approximately normally distributed with a mean of and a variance of 1/(n-3). This can be used to find the confidence interval around r in which we can expect to find .

A specific form of correlation that is relevant in the analysis of time series is autocorrelation. Consider a series of values of a random variable that are indexed by discrete time, i.e., X₁, X₂,...,X_n. Then, the autocorrelation of this series with lag l is the correlation coefficient between the random variable X_iand X_i-l. If this coefficient is large (close to 1) for a certain value of l, then we can infer that the series has variation on the time scale of l. This is often much easier to compute than a full scale har- monic analysis by means of a Fourier transform.

Finally, it is important to recognize that correlation is not the same as causality. We must not interpret a correlation coefficient close to 1 or -1 to infer causality. For example, it may be the case that packet losses on a wireless network are positively correlated with mean frame size. One cannot infer that larger frame sizes are more likely to be dropped. It could be the case, for example, that the network is heavily loaded when it is subjected to video traffic, which uses large frames. The increase in the loss rate could be due to the load, rather than the frame size. Yet, the correlation between these two quantities would be strong.

To go from correlation to causation, it is necessary to determine the physical causes that lead to causation. Otherwise, the unwary researcher may be led to unsupportable and erroneous conclusions.

DRAFT - Version 3 - Comparing multiple outcomes simultaneously: analysis of variance

of making at least one Type I error can be greater than 5% (or 1%)! To see this, think of flipping a coin ten times and looking for ten heads. This has a probability of about 1/1024 = 0.1%. But if we were to flip 1024 coins, chances are good that we would get at least one run of ten heads. Arguing along similar lines, it is easy to see that, as the number of comparisons increases, the overall possibility of a Type I error increases. What is needed, therefore, is a way to perform a single test that avoids numerous pairwise comparisons. This is achieved by the technique of ‘Analysis of Variance’ or ANOVA.

ANOVA is a complex topic with considerable depth. We will only discuss the simplest case of the ‘one-way layout’ with fixed effects. Multi-way testing and random effects are discussed in greater depth in advanced statistical texts.

2.6.1 One-way layout

In the analysis of a one-way layout, we group observations according to their corresponding treatment. For instance, we group repeated measurements of the packet loss rate for a given buffer size, say 5 buffers. The key idea in ANOVA is that if none of the treatments–such as the buffer size– affect the observed variable–such as the loss rate–then all the observations would be drawn from the same population. Therefore, the sample mean computed for observations corresponding to each treatment should not be too far from the sample mean computed across all the observations. Moreover, the estimate of population variance computed from each group separately should not differ too much from the variance estimated from the entire sample. If we do find a significant difference between statistics computed from each group separately and the sample as a whole, we reject the null hypothesis. That is, we conclude that, with high probability, the treatments affect the observed outcomes. By itself, that is all that basic ANOVA can tell us. Further testing is necessary to determine which treatments affect the outcome and which do not.

We now make this more precise. Suppose we can divide the observations into I groups of J samples each (we assume that all groups have the same number of samples: this is usually not a problem because the treatments are under the control of the experimenter). We denote the jth observation of the ith treatment by the random variable Y_ij. We model this observation as the sum of an underlying population mean , the true effect of the ith treatment , and a random fluctuation :

(EQ 40)

These errors are assumed to be independent and normally distributed with zero mean and a variance of . For convenience, we normalize the s so that . Note that the expected outcome for the ith treatment is E(Y_ij) = .

The null hypothesis is that the treatments have no effect on the outcome. If the null hypothesis holds, then the expected value of each group of observations would be , so that . Moreover, the population variance would be .

Let the mean of the ith group of observations be denoted and the mean of all the observations be denoted . We denote the sum of squared deviations from the mean within each sample by

(EQ 41)

is an unbiased estimator of the population variance because it sums I unbiased estimators, each given by

. Similarly, we denote the sum of squared deviations from the mean between samples by

(EQ 42)

μ α_i ε_ij

Y_ij = μ α+ _i+ε_ij

σ²

α_i α_i

i=1 I

∑

⁼ ⁰ ^{μ α}⁺ ⁱ

μ ∀i,α_i = 0 σ²

Y_i. Y_..

SSW (Y_ij–Y_i.)²

j=1 J

∑

i=1 I

∑

SSW I J( –1)

--- σ²

1 J–1

--- (Y_ij–Y_i_.)²

j=1 J

∑

SSB J (Y_i.–Y_..)²

i=1 I

∑

DRAFT - Version 3 -One-way layout

69

SSB/(I-1) is also an unbiased estimator of the population variance because is an unbiased estimator

of . So, the ratio should be 1 if the null hypothesis holds.

It can be shown that SSB/(I-1) is a variable with I-1 degrees of freedom and that SSW/I(J-1) is a variable with I(J-1) degrees of freedom. The ratio of two variables with m and ndegrees of freedom follows a distribution called the F distribution with (m,n) degrees of freedom. Therefore, the variable follows the F distribution with (I-1, I(J-1)) degrees of freedom, and has an expected value of 1 if the null hypothesis is true.

To test the null hypothesis, we compute the value of and compare it with the critical value of an Fvariable with (I-1, I(J-1)) degrees of freedom. If the computed value exceeds the critical value, then the null hypothesis is rejected. Intui- tively, this would happen if SSB is ‘too large’, that is, there is significant variation in the sums of squares between treatments, which is what we expect when the treatment does have an effect on the observed outcome.

EXAMPLE 18: SINGLEFACTOR ANOVA

Continuing with Example 9, assume that we have additional data for larger buffer sizes, as shown below. Can we still claim that the buffer size plays a role in determining the loss rate?

Here, I = 5 and J = 10. We compute = 1.26%, = 0.81%, = 0.44%, = 0.07%, and = 0.01%. This allows us to compute SSW = 5.13*10^-5 and SSB = 1.11*10^-3. The Fstatistic is therefore (1.11*10^-3/ 4)/(5.13*10^-5/45) = 242.36. Looking up the Ftable we find that even with only (4, 40) degrees of freedom, the critical Fvalue at the 1% confidence level is 3.83. The computed statistic far exceeds this value. Therefore, the null hypothesis is rejected.

The Ftest is somewhat anticlimactic: it only indicates that a treatment has an effect on the outcome, but it does not quantify the degree of effect. Nor does it identify whether any one treatment is responsible for the failure of the test. These questions can be resolved by post-hoc analysis. For example, to quantify the degree of effect, we can compute the regression of the observed effect as a function of the treatment. To identify the treatment that is responsible for the failure of the test, we can re-run the F test eliminating one treatment at a time. If the Ftest does not reject the null hypothesis with a particular treatment removed, then we can hypothesize that this treatment has a significant effect on the outcome, testing this hypothesis with a two-variable test.

If these approaches do not work, two more advanced techniques to perform multi-way comparisons are Tukey’s method and the Bonferroni method (see Section 2.10 on page 73 for texts where these methods are discussed).

Loss rate with 5 buffers

1.20% 1.30% 0.90% 1.40% 1.00% 1.80% 1.10% 1.20% 1.50% 1.20%

Loss rate with 100 buffers

0.10% 0.60% 1.10% 0.80% 1.20% 0.30% 0.70% 1.90% 0.20% 1.20%

Loss rate with 200 buffers

0.50% 0.45% 0.35% 0.60% 0.75% 0.25% 0.55% 0.15% 0.35% 0.40%

Loss rate with 500 buffers

0.10% 0.05% 0.03% 0.08% 0.07% 0.02% 0.10% 0.05% 0.13% 0.04%

Loss rate with 1000 buffers

0.01% 0.02% 0.01% 0.00% 0.01% 0.01% 0.00% 0.02% 0.01% 0.00%

σ² 1

I–1

--- (Y_i.–Y_..)²

i=1 I

∑

σ²

---J SSB/(I-1)

SSW/I(J-1) ---

χ² χ²

χ²

SSB/(I-1) SSW/I(J-1) ---

Y_5. Y₁₀₀ Y_200. Y_500. Y_1000.

DRAFT - Version 3 - Design of experiments

2.6.2 Multi-way layouts

It is relatively straightforward to extend the one-way layout to two or more treatments that are simultaneously applied. For instance, we may want to study the joint effect of buffer size and cross traffic workload on the loss rate. The details of this so- called two-way layout are beyond the scope of this text. We will merely point out that in the context of such designs, we not only have to determine the effect of a treatment on the outcome, but also deal with the possibility that only certain combina- tions of treatment levels affect the outcome (for instance, the combination of small buffer size and heavy cross traffic). Such interaction effects greatly complicate the analysis of multi-way layouts.

Dalam dokumen Mathematical Mathematical Foundations of Computer Networkingof Computer Networking (Halaman 77-80)