A Simple Experiment: The Effect of a Treatment

The Significance of Evidence

8.1 A Simple Experiment: The Effect of a Treatment

8 Experiments

An experiment tries to evaluate the effects of one or more treatments. Imagine you wish to evaluate the effect of some treatment. One natural strategy is: choose groups of subjects; apply this treatment at different levels to different groups; then see if the groups are different after treatment. For example, you might wish to investigate whether consuming a painkiller affected headaches. Different levels of treatment would correspond to different numbers of pills (say, none, one or two).

Different subjects would be different people. We would then divide the subjects into groups, apply different levels of treatment (i.e. give them different numbers of pills), and record outcomes. Now we need to tell whether the groups are different.

Telling whether the groups are different requires care, because the subjects will differ for a variety of irrelevant reasons (bodyweight, susceptibility to medication, and so on), so there will be differences between groups—we need to be able to tell whether these differences are due to the treatment, or to the irrelevant reasons. One powerful strategy is to allocate subjects to groups before treatment at random, so that each group looks similar (i.e. each group has the same variation in bodyweight, and so on). We can then use the significance machinery of the previous chapter to tell whether the differences between groups are due to the effects of treatment.

The procedures we use can be extended to deal with multiple treatments. We deal with the case of two treatments, because it captures important phenomena. Rather than do experiments for each treatment separately, it is a good idea to look at the results for different levels of both treatments together. Doing so allows us to identify possible interactions, and to treat fewer subjects in total. The machinery doesn’t change (though there is more of it) when we pass to more than two treatments.

8.1.1 Randomized Balanced Experiments

Assume you wish to useLlevels of treatment (so there areLgroups; not treating a subject is one level of treatment). We will evaluate the results of the experiment by comparing the outcome for each group of subjects. This means it is helpful if there are the same number of subjects in each group—we say the experiment isbalanced—so that the error resulting from chance effects in each group is the same. We will haveGsubjects in each group, and so there is a total ofLGsubjects. We must now choose which treatment each subject gets. We allocate subjects to groups at random, ensuring that each group getsG subjects. You could do this, for example, by permuting the subjects randomly, then allocating the firstGto group 1, etc.

We now perform the experiment, by treating each group of subjects at the prescribed level, and recording the results. For each subject, we will observe a measurement. Writexijfor the observed value for thej’th subject in thei’th group. We want to know if the groups are different. We assume that differences in observed values in each group are purely due to noise. This means that we could model

xijDiCij

whereijis “noise”—unmodelled effects that have zero mean, and do not depend on the treatment. If the treatment has no effect, then each of theiwill be the same. We can model the treatment at levelias having an effectt_i, so that

iDCt_i: We would liketo be the average of thei, so we must have

tuD0:

All this yields a model

xijDCtiCij

We assume that the noise is independent of the treatment level. This assumption is satisfied by randomizing the allocation of subjects to treatment groups. We will assume that the noise is normally distributed, with variance². This makes it straightforward to estimateandiwith least squares. We writeO for our estimate of, and so on. We then chooseO to be the value ofthat minimizes the overall sum of squared differences

D ^argmin X

.xij /²:

We choose eachOito be the value ofithat minimizes the sum of squared differences within thei’th group O

iD ^argmin_i X

.xij i/²:

All this yields

ijxij

GL andOiD P

jxij

G :

8.1.2 Decomposing Error in Predictions

The results are inLgroups, one for each level of treatment. The overall sum of squared differences from the estimated mean O

SS_TDX

.xij /O ²:

This can be decomposed into two terms, as the exercises show. We have X

.xij /O ²D 2 4

.xij Oi/² 3 5CG

.O Oi/²

# :

8.1 A Simple Experiment: The Effect of a Treatment 181

This expression breaks the total sum of squared errors SS_Tinto two components. The first, SS_WDX

.xij Oi/²;

is due to within-group variation; and the second,

SS_BDG

.O Oi/²

is due to between-group variation. The relative size of these two terms should tell us whether the treatment has any effect or not. For example, assume there are large effects. Then SS_B should be “big”, because theOishould be different from one another, and the SS_W should be “small”, because measurements should be rather closer to their group means than to the overall mean (and because SS_T D SS_WCSS_B, so when one goes up the other must go down). Now assume there is no effect. Then theOishould be quite similar to, and so to each other. This means SSO _Bshould be “small”, and so on. Using this line of reasoning requires some quantitative notion of what is “big” and what is “small”.

8.1.3 Estimating the Noise Variance

The trick to knowing whether SS_Bis “big” or not runs as follows. If there is no effect, then SS_Band SS_Wcan each be used to produce estimates of the noise variance. We produce these estimates, and then determine whether the difference between estimates can be explained by sampling variation. If it can’t, then the treatment has some effect.

Using theith treatment group, we can estimate²by

O ² D

j.xij Oi/²

.G 1/ :

An even better estimate would be to average these estimates across groups. This yields O

²D 1 L

i.xij Oj/² .G 1/

This estimate is sometimes known as thewithin group variationorresidual variation. We write

MS_W D 1 L

" P

j.xij Oi/² .G 1/

D 1

L.G 1/

SS_W: (8.1)

This estimate hasL.G 1/degrees of freedom, because we haveLGdata items, but we lose one degree of freedom for each of theLmeans we have estimated.

Assume the treatment has no effect. Then we expect that all thet_iDi are zero. In turn, each of theOiis an estimate of. These estimates are the values of a random variable whose expected value is , and whose variance is²=G. This means that

i.Oi /O ²

L 1 ²

G: We can use this to produce another estimate of², where

O ²DG

j.Oj /O ² L 1

This estimate is sometimes known as thebetween group variationortreatment variation. We write

MS_BDG P

j.Oj /O ²

.L 1/ D

1 L 1

SS_B:

This estimate hasL 1degrees of freedom, because we have usedLdata items (theOj), but estimated one mean (the).O If the treatment has no effect, then any difference between these estimates would have to be the result of sampling effects.

We can apply an F-test of significance (Sect.7.3.1) to the ratio of the estimates. Now imagine that at least one of the treatment levels has an effect. In turn, this would mean that one of theOjwas more different fromO than would occur if the treatment had no effect. This means that MS_B would be larger than would happen if the treatment had no effect. This means that, if there is no effect of treatment, the statistic

FD MS_B MS_W

would be “about” one, with a known distribution (the F-distribution; Sect.7.3.1). If there is a treatment effect, then we expect F to be larger than one. Finally, we need the degrees of freedom. The estimate MS_WhasL.G 1/degrees of freedom. The estimate MS_BhasL 1degrees of freedom. We can now use the F-test to obtain a p-value for F withL 1,L.G 1/degrees of freedom. In other words, we use tables or a software environment to evaluate

Z 1 F

p_f.uIL 1;L.G 1//du (wherepf is the probability density for the F-statistic, Sect.7.3.1).

8.1.4 The ANOVA Table

We now have a powerful and useful procedure that I can name and put in a box. Analyzing data by comparing variances in this way is sometimes known asanalysis of varianceorANOVA. It is usual to lay out the information in a table, sometimes called anANOVA table. We have only one kind of treatment we are varying in the experiment, so the experiment is known as aone factorexperiment. The form of the table appears in Procedure8.1.

Procedure 8.1 (Evaluating Whether a Treatment Has Significant Effects with a One-Way ANOVA for Balanced

Dalam dokumen Probability and Statistics for Computer Science (Halaman 191-194)