EFFECT SIZE ESTIMATION IN ONE-WAY DESIGNS

It's all very well in practice, but it will never work in theory.

—Anonymous management maxim (Larman, 2002, p. 127)

This chapter discusses estimation of effect size magnitude in one-way designs with at least three conditions and continuous outcome variables.

Greater emphasis is placed on focused comparisons (contrasts) with a single degree of freedom (df) than on the omnibus comparison of all the means where df > 2. This is because the latter are often uninformative (see chap.

2, this volume). A large omnibus effect can also be misleading if it is a result of a single discrepant mean that is not of substantive interest. The next two sections address contrast specification and effect size estimation with standardized mean differences for fixed factors. Later sections review measures of association for fixed or random factors and special issues for effect size estimation in covariate analyses. A supplemental reading on this book's Web site reviews effect size estimation in multivariate one-way designs with independent samples. Exercises with answers for this chapter are also available on the Web site.

CONTRAST SPECIFICATION AND TESTS

This section describes the basic rationale of contrast specification. It can be skipped by readers who are already familiar with this topic; otherwise it needs close study. A design with a single fixed factor is assumed. Because

the levels of a random factor are selected by a chance-based method to enhance generalizability, analysis of the omnibus effect is usually more appropriate than contrast analyses that compare just two levels at a time.

Specification

A contrast is a directional effect that corresponds to a particular facet of the omnibus effect. Contrasts are often represented in the statistical literature with the symbols \|l or iff. The former is a parameter that represents a weighted sum of population means:

V|/= ciHj (6.1)

i= 1

where (CL c², . . . , c^a) is the set of contrast weights or coefficients that specifies the comparison. Application of the same weights to sample means esti- mates \|/:

c,.M^f (6.2)

i = 1

The weights that specify a contrast must respect certain rules: They must sum to zero (2, c( = 0), and the weights for at least two different means should not equal zero. Means assigned a weight of zero are excluded from the contrast, and means with positive weights are compared with means given negative weights. Suppose factor A has a = 3 levels. The weights (1, 0, — 1) meet all the requirements stated earlier and specify the contrast

fy = (1) MI + (0) M² + (-1) M³ = M! - M³

which is the pairwise comparison of MI with M3 excluding M2. The weights (— 1, 0, 1) just change the sign of \|/i- In general, the sign of a contrast arbitrary. By the same logic, the sets of weights

C/2, 0, -1/2), (5, 0, -5), and (1.7, 0, -1.7)

among innumerable others with the same pattern of coefficients all specify the same pairwise comparison as the set (1, 0, —1). The scale of VJ/i will change depending on which of the sets of weights is applied to the means.

This does not affect statistical tests or measures of association for contrasts because their equations correct for the scale of the weights.

However, the scaling of contrast weights is critical if a comparison should be interpreted as the difference between the averages of two subsets of means. If so, the weights should make up what Keppel (1991) called a standard set and satisfy what Bird (2002) called mean difference scaling. The sum of the absolute values of the coefficients in a standard set is two (£|cj| = 2.0). This implies for a pairwise comparison that one weight must be +1, another must be -1, and the rest are all zero. For example, the coefficients (1, 0, -1) are a standard set for the comparison of MI with M3, but the set (Vz, 0, -yz) is not.

At least three means contribute to a complex comparison. An example of a complex comparison is when a control condition is compared with the average of two treatment conditions. A complex comparison is still a single- df effect because only two means are compared, at least one of which is averaged over two or more conditions. A standard set of weights for a complex comparison is specified as follows: The coefficients for one subset of conditions to be averaged together each equals + 1 divided by the number of conditions in that subset; the coefficients for the other subset of conditions to be averaged over all equal —1 divided by the number of conditions in that subset; and weights for any excluded condition are zero. For example, the set of coefficients (^l/2, — 1, Vz) is a standard set for the contrast

V2 = №) MI + (-1) M2 + (Vi) M3 = (M! + M3)/2 - M2

which compares M2 with the average of M! and M³. However, the weights (1, -2, 1) are not a standard set for the same comparison because the sum of their absolute value is 4.0, not 2.0.

Two contrasts are orthogonal if they each reflect an independent aspect of the omnibus effect; that is, the result in one comparison says nothing about what may be found in the other. In a balanced one-way design, two contrasts are orthogonal if the sum of the products of their corresponding weights is zero, or:

i= 1

In an unbalanced design, two contrasts are orthogonal if

-O (6.4, where n.j is the number of cases in the ith condition. A pair of contrasts is nonorthogonal (correlated) if their weights do not satisfy Equation 6.3 in a

TABLE 6.1

Examples of Orthogonal and Correlated Mean Difference Contrasts

Contrast Orthogonal pair

Vjf2

Cross-product Correlated pair

\j/2 V3

Cross-product

Weights c, <%

1 0

1/2 -1

.5 0

1/2 -1

1 -1

.5 1

-1

1/2

-.5

1/2

0 0

Sum 0 0 0 0 0 1.5

balanced design or Equation 6.4 in an unbalanced design. Correlated contrasts describe overlapping facets of the omnibus effect.

Table 6.1 presents the standard sets of weights for two different pairs of contrasts in a balanced one-way design. The first pair is orthogonal because the sum of the cross-products of their weights is zero. Intuitively, these contrasts are unrelated because the two means compared in \j/i, Mj and M3, are combined in \j/2 and contrasted against a third mean, M^. The second pair of contrasts in Table 6.1 is not orthogonal because the sum of the cross-products of their weights is 1.5, not 0. Contrasts tyi and \\l^ are correlated because M2 is one the of the two means compared in both.

If every pair in a set of contrasts is orthogonal, the entire set is orthogonal. The maximum number of orthogonal comparisons is limited by the degrees of freedom of the omnibus effect, d/A = a - 1 . The overall A effect can theoretically be broken down into a - 1 independent directional effects.

Expressed in terms of sums of squares, this is

a- 1

S, (6.5)

where SSA and SS,j,. are, respectively, the sum of squares for the omnibus effect and the ith contrast in a set of a — 1 orthogonal comparisons. The same idea can also be expressed in terms of variance-accounted-for effect sizes:

where TJA and f\y. are, respectively, the observed correlation ratios for the omnibus effect and the ith contrast in a set of all possible orthogonal contrasts. All the terms mentioned are defined later.

There is generally more than one set of orthogonal comparisons that could be specified for the same one-way design. For example, contrasts \j/!

and \jf2 in Table 6.1 defined by the weights (1, 0, -1) and (Vi, -1, Vz), respectively, make up an orthogonal set for a factor with three levels. A different pair of orthogonal contrasts is specified by the weights (1, -1, 0) and (l/2, Vi, -1). In general, any set of a -1 orthogonal contrasts satisfies Equations 6.5 and 6.6. Statisticians like orthogonal contrasts because of their statistical independence. However, it is better to specify a set of nonorthogonal comparisons that addresses substantive questions than a set of orthogonal comparisons that does not. See B. Thompson (1994) for more information about orthogonal versus nonorthogonal contrasts.

If the levels of factor A are equally spaced along a quantitative scale, such as the drug dosages 3, 6, and 9 mg • kg"1, special contrasts called trends or polynomials can be specified. A trend describes a particular shape of the relation between a continuous factor and outcome. There are as many possible trend components as there are degrees of freedom for the omnibus effect. For example, if a continuous factor has three equally spaced levels, there are only two possible trends, linear and quadratic. If there are four levels, an additional polynomial, a cubic trend, may be present. However, it is rare in behavioral research to observe nonlinear trends beyond a quadratic effect.

There are standard weights for polynomials. Some statistical software programs that analyze trends automatically generate these weights. Tables of standard polynomial weights are also available in several statistics text- books (e.g., Winer, Brown, & Michels, 1991, p. 982). For example, the set of standard weights for a pure linear trend for a factor with three equally spaced levels is (-1, 0, 1). The weight of zero in this set can be understood as the prediction that M2 is halfway between M\ and M3. The set of standard weights for a pure quadratic trend is (1, -2, 1), and it predicts that M! = M3 and that M2 differs from both. More than one trend component may be present in the data. For example, learning curves often have both linear and quadratic components. Trends in balanced designs are always orthogonal and thus are called orthogonal polynomials. Please note that except for linear trends in one-way designs with two or three samples, the sum of the absolute values of the weights that define trends is not 2.0; that is, they are not standard sets. This is not a problem because trends are not generally mean difference contrasts in the sense discussed earlier. Also, the effect size magnitude of trends are usually estimated with measures of association, not standardized mean differences.

Statistical Tests

The test statistic for a contrast is ty or F^. When a nil hypothesis is tested for the same contrast, tq = FVJ>, but (^ preserves the sign of the contrast and can test non-nil hypotheses, too. When the samples are independent, one can compute a standardized mean difference or a correlation effect size from either t^ or F^ for a nil hypothesis. This makes these test statistics useful even if a nil hypothesis is probably false.

The form of the t statistic for a contrast between independent means in either a balanced or unbalanced one-way design is t$(dfw) = (\j/ - \|/o)/Sjj», where dfw is the total within-conditions degrees of freedom N

— a, \|/o is value of the contrast specified in the null hypothesis, and s$ is the standard error of the contrast. The latter is

s, • ^MSW ^J (6.7)

where MS\^ is the pooled within-conditions variance (see also Equation 2.30).

The form of the F statistic for a contrast between independent means is F^(l, dfw) = SS,j»/MSw where the numerator equals

(6.8)

The statistical assumptions of the £ and F tests for contrasts between independent means are described in chapter 2.

The situation is more complicated when the samples are dependent.

The omnibus error term — which may be designated MS^xS for a nonadditive model or MSres for an additive model — assumes sphericity, which may be unlikely in many behavioral studies (see chap. 2, this volume). Error terms of tjj> or F<j, printed by statistical software programs that analyze data from correlated designs may be based on scores from just the two conditions involved in the contrast (e.g., Keppel, 1991, pp. 356-361). These tests are actually forms of the dependent samples t test, which does not require sphericity because only two conditions are compared. The form of this t statistic for a contrast in a correlated design is tq(n - 1) = (\ff - X|/o)/5<jr, where the standard error in the denominator is

(6.9)

In Equation 6.9, SD is the variance of the contrast difference scores and n is the group size. A difference score D^ can be computed for each contrast and for every case in a repeated-measures design or set of matched cases in a matched-groups design. Each difference score is a linear combination of the original scores across the levels of the factor weighted by the contrast coefficients. Suppose in a one-way design with three levels of a repeated- measures factor A that the coefficients for \j/2 are (l/2 -1, l/i). The scores across the levels of A for a particular case are 17, 20, and 25. The difference score for this case on this contrast equals

Djz = 0/2) 17 - (1) 20 + (Vi) 25 = 1

Difference scores for the other cases on the same contrast are computed the same way, and their average for the whole sample equals ifo-

The variance SD takes account of the cross-conditions correlation for that contrast (e.g., Equation 2.15). In general, this variance decreases as the correlations increase. If the cross-conditions correlations are reasonably positive, the contrast standard error may be smaller than that which may be found for the same contrast in a between-subjects design with the same factor and outcome variable.

Control of Type I Error

With the caveats given next, it is useful to know something about methods to control experimentwise Type I error across multiple comparisons.

Methods for planned comparisons assume a relatively small number of a priori contrasts, but methods for unplanned comparisons anticipate a larger number of post hoc tests, such as all possible pairwise contrasts. A partial list of methods is presented in Table 6.2 in ascending order by degree of protection against experimentwise Type I error and in descending order by power.

These methods generally use t$ or F^ as test statistics but compare them against critical values higher than those from standard tables (i.e., it is more difficult to reject H0). For example, the adjusted level of statistical significance for an individual comparison (a*) in the Dunn-Bonferroni methods equals CXEW/k, where the numerator is the rate of experimentwise rate of Type I (see chap. 2, this volume) and k is the number of contrasts.

If (XEW = -05 and k = 10, the level of statistical significance for each individual contrast is a* = .05/10 = .005 in the Dunn-Bonferroni method.

In unprotected tests of the same 10 contrasts, the level of Ot would be .05 for each contrast. The methods listed in the table are also associated with the construction of simultaneous confidence intervals for Vj/ (defined later).

TABLE 6.2

Partial List of Methods That Control Experimentwise Type I Error for Multiple Comparisons

Method Nature of protection against experimentwise type I error Planned comparisons

Unprotected None; uses standard critical values for ^ and F9

Dunnett Across pairwise comparisons of a single control group with each of a - 1 treatment groups

Bechhofer-Dunnett Across a maximum of a - 1 orthogonal a priori contrasts Dunn-Bonferroni Across a priori contrasts that are either orthogonal or

nonorthogonal Unplanned comparisons

Newman-Keuls,3 Across all pairwise comparisons Tukey HSDb

Scheffe Across all pairwise and complex comparisons

"Also called Student-Newman-Keuls.

"HSD = honestly significant difference; also called Tukey A.

See Winer et al. (1991, pp. 140-197) for more information about methods to control for Type 1 error across multiple comparisons in single-factor designs.

It was mentioned earlier that there is not a clear consensus that control of experimentwise Type I error is desirable if the power of the unprotected tests is already low (see chap. 3, this volume). There is also little need to worry about experimentwise Type I error if a relatively small number of contrasts is specified. Wilkinson and the Task Force on Statistical Inference (1999) noted that the tactic of testing all pairwise comparisons following rejection of HQ in the test of the omnibus effect is typically wrong. Not only does this approach make the individual comparisons unnecessarily conservative when a classical post hoc method such as the Newman-Keuls procedure is used, but it is rare that all such contrasts are interesting. The cost for reducing Type I error across all comparisons is reduced power for the specific tests the researcher really cares about.

Confidence Intervals for i|i

Distributions of sample contrasts are simple, and correct confidence intervals for \|/ can be constructed with central test statistics. The general form of an individual 100 (1 - oc)% confidence interval for \|/ is

\jf ± s^ [t^,^ a (d/error)] (6.10)

The standard error of the contrast, Sq, is defined by Equation 6.7 when the samples are independent and by Equation 6.9 when they are dependent. The term in brackets is the positive two-tailed critical value of t at the oc level of statistical significance with degrees of freedom equal to those of the corresponding analysis of variance (ANOVA) error term for the same contrast. It is also possible to construct simultaneous confidence intervals—also known as joint confidence intervals—based on more than one observed contrast. The width of a simultaneous confidence interval for \\f is typically wider than that of the individual confidence interval for \|/ based on the same contrast. This is because simultaneous confidence intervals take account of multiple comparisons—that is, they control for experimentwise Type I error. For example, the level of a* for a simultaneous confidence interval in the Dunn-Bonferroni method equals Upy//k where k is the number of contrasts. If OL^y/ = .05 and k = 10, the level of statistical significance for the critical t statistic in Equation 6.10 is a* = .05/10 = .005. The resulting simultaneous 99.5% confidence interval based on t$ will be wider than the corresponding 95% individual confidence interval based on £<&, , „ for the same comparison. See Bird

v2-tail, .05 r

(2002) for more information about simultaneous confidence intervals for \\f.

STANDARDIZED CONTRASTS

Contrast weights that are standard sets are assumed next. A standardized mean difference for a contrast is standardized contrast. The parameter estimated by a sample standardized contrast is 8V = ty/G*, where the numerator is the unstandardized population contrast and the denominator is a population standard deviation. Recall that there is more than one population standard deviation for a comparative study. For example, a* could be the standard deviation in just one of the populations or, under the homogeneity of variance assumption, it oculd be the common population standard deviation across all conditions. The form of a generic standardized contrast is dty = VJf/d*, where the numerator is the observed contrast and the denominator—the standardizer—is an estimator of a* that is not the same in all kinds of dy statistics.

Independent Samples

Three basic ways to estimate o* in one-way designs with independent samples are listed next. Olejnik and Algina (2000) described another method for individual difference factors (see chap. 4, this volume), but it is not used as often. The first two options are based on methods for two-group designs, so they may be better for pairwise comparisons than complex comparisons.

The third method is good for either kind of comparison:

1. Select the standard deviation of only one group, usually the control group when treatment is expected to affect both central tendency and variability. This makes d<j, analogous to Glass's A. This method may also be preferred if the within-groups variances are heterogenous, but it may be necessary to report more than one value of dy for the same contrast.

2. Estimate 0* as the square root of the pooled within-groups variance for only the two groups being compared (i.e., Sp;

Equation 2.9). This makes dq analogous to Hedges's g. How- ever, this method ignores the variances in all the other groups.

Another possible drawback is that the standardizer of dy could be different for each contrast.

3. Standardize \ff against the square root of the pooled within- groups variance for all a groups in the design, MSW. Now dq for different contrasts will all be based on the same estimator of a*, which in this case is the common population standard deviation C. An implication of this choice is that the standardizer (MSw)1'2 assumes homogeneity of variance across all conditions.

Given reasonably similar within-groups variances, the third method may be the best choice. It generates a standardized contrast that is an extension of Hedges's g for designs with three or more groups. For this reason it is referred to as gy.

The value of g<j, = \j//(MSvp)^2 can also be computed from t^ for a nil hypothesis (Equation 6.7), the contrast weights, and the group sizes as follows:

(6.11)

If the design is balanced, Equation 6.11 reduces to g$ = t(j,(2/n)1'2 for a pairwise contrast. Please note that g^ takes the sign of t$. If the square root of F<J, is substituted for t$ in this equation, g«j, will be positive regardless of the direction of the mean contrast.

Table 6.3 presents a small data set for a balanced design where n = 5.

Table 6.4 reports the results of an independent samples ANOVA of the omnibus effect and two orthogonal contrasts defined by the weights (1, 0, -1) and (1/2, —1, Vz) where

V! = M! - M³ = 13.00 - 12.00 = 1.00

ty^z = (Mi + M³)/2 - M² = (13.00 + 12.00)/2 - 11.00 = 1.50

Dalam dokumen EBUPT180036.pdf (Halaman 164-200)