Particularly useful for all readers is Chapter 3, which addresses common misinterpretations of the results of statistical tests. Readers already convinced of the limitations of statistical tests should find in this book useful arguments to strengthen their point of view. Its emphasis on effect sizes in primary studies rather than results of statistical tests avoids some of the limitations of the latter.
After the widespread adoption of the Intro Stats method, there was a dramatic increase in the reporting of statistical tests in journal articles in psychology and related fields. Percentage of articles reporting results of statistical tests in 12 journals of the American Psychological Association from 1911 to 1998. The following summarizes some of the main arguments against the continued widespread use of statistical tests in the behavioral sciences; they are discussed in more detail in later chapters.
One way to compensate for some of the limitations of statistical tests is to report additional information, such as a measure of the size of the effect size.
FUNDAMENTAL CONCEPTS
A 100 (1 - oc)% confidence interval for a parameter is a pair of statistics that gives an interval that, over repeated samples, includes the parameter 100 (1 - Ct)% of the time. It is important to understand that the standard error metric of the C test is affected by sample size, which has now been demonstrated. Independent samples t test results for the data in Table 2.1 at three different group sizes.
Results of the independent sample t-test and the dependent sample f-test for the data in Table 2.3. Results of independent samples FTest at three different group sizes for the data in Table 2.5. This is another serious shortcoming of the use of statistical tests in the behavioral sciences.
Independent Samples F-Test and Dependent Samples F-Test Results for the data in Table 2.7. Chi-Square Test of Association results for the same proportions at different group sizes. Results of the /2 test have been reported for the same proportions but a larger group size, n = 80.
WHAT'S WRONG WITH STATISTICAL TESTS—
AND WHERE WE GO FROM HERE
This probability is the conditional probability of the statistic assuming that H0 is true (see Chapter 2, this volume). The latter is the posterior probability of the null hypothesis in light of the data, and this is probably what researchers would really like to know. In Equation 3.1, p (Ho) is the prior probability that the null hypothesis is true before the data are collected, and p (D) is the prior probability of the data regardless of the null hypothesis being true.
That is, given the p-value from a statistical test together with the estimates of p(Ho) and p(D), we can derive with this equation p(Ho | D), the posterior probability of the null hypothesis. Rejection of the null hypothesis confirms the alternative hypothesis and the research hypothesis behind it. The widespread use of NHST in the social sciences and the resulting cognitive misdirection may be part of the problem.
One of the advantages of NHST is that it automates much of the decision-making process. Even if the numbers are random, some results are expected to be statistically significant. However, confidence intervals are subject to some of the same kinds of inference errors as NHST.
These methods can avoid some of the problems of traditional statistical tests and can be very useful in the right situation. It is assumed that reasonable people would disagree with some of the specifics listed. It is also the researcher's responsibility to demonstrate the substantive (theoretical, clinical or practical) significance of the results.
If it is feasible to test only the null hypotheses, but the null hypothesis is implausible, the interpretation of the statistical test results should be modified accordingly. It is easily derived from the t-test and is just a special case of Pearson's correlation r.
PARAMETRIC EFFECT SIZE INDEXES
These estimates are often expressed as population versions of effect size indices that were introduced later. Effect size refers to the extent of the influence of the independent variable on the dependent variable. The lower part of the table lists the results of two other hypothetical studies with the same unstandardized mean difference of 75.00.
Results of the f-test and effect size indices at three different group sizes for the data in Table 4.2. The width of a traditional confidence for 5 is the product of the appropriate two-tailed critical value of the central test statistic Z (i.e., the normal deviation) and an asymptotic standard error. The upper bound, ncf\j> is the non-centrality parameter of the non-central F distribution in which the observed F falls at the 2.5th percentile.
Specifically, the lower bound of the 95% confidence interval for T| is equal to the ratio ncp^/(ncp^ + N), and the upper limit to the ratio ncf\j/(ncf>u + N). This presentation shows the degree of overlap between the two groups at the case level. The right tail ratio (RTR) is the relative proportion of scores from two different groups that fall in the upper extreme of the combined frequency distribution.
Likewise, a left tail ratio (LTR) is the relative proportion of scores that fall in the lower extreme of the combined distribution. The top 60% of scores in the group with the highest mean exceed the same ratio in the other group (L/2 = 0.60). The magnitude of the difference between two criterion sets is measured by a standardized mean difference.
The observed effect size for treatment is half the size of the criterion contrast effect size, or. Also, remember that effect size is partly a function of the design of the study. These are all crucial aspects in the planning of the study and should not be overlooked.
The practical significance of the group-level differences just described is considered in the test's manual (Lachar, Wingenfeld, Kline, & Gruber, 2000).
NONPARAMETRIC EFFECT SIZE INDEXES
A population risk ratio – also called a rate ratio – is the ratio of the proportions of an adverse outcome, in this case recurrence. The population odds ratio is denoted co below, but note that the symbol "CO refers to another parameter for a continuous outcome (see Sect. The parameter u) is the ratio of the population odds for an adverse outcome. the size of the treatment group is nj = C + D, where C and D represent the number of treated cases in which the disease recurred and did not recur, respectively.
This is defined in Table 5.2 as the ratio of the odds of relapse in the control group, DC = pc (1 ~ Pc)> over the odds in the treatment group. The estimator of the population Pearson correlation between two dichotomies,
is also equal to the square root of X2(1)/N, the ratio of the chi-square statistic to a single degree of.
The estimated standard error of the log-transformed odds ratio calculated using the fifth equation in Table 5.3 is the same. The denominator under the radical is the product of the sample size and the smallest table dimension minus one. The lower part of Table 5.4 defines the sensitivity, specificity, predictive value and base rate, all calculated using the cell frequencies presented in the upper part of the table.
Predictive value is also influenced by another very important factor, base rate (BR), the proportion of all individuals with the disorder or BR = (A + C)/N in Table 5.4 - The effect of base rate on predictive value is shown in Table 5.5. The correlation between smoking status and heart disease status is q> = .16, so that the former explains about 2.5% of the variance in the latter. For the same reason, the values of the risk ratio, RR, and the odds ratio, OR, are less than 1.00.
The comparative risk index with the best overall psychometric properties is the odds ratio, the ratio of the odds within groups for a particular outcome. This approach explicitly takes into account the effect of population base rates on the accuracy of decisions based on a screening test for the presence or absence of a disorder or condition.
EFFECT SIZE ESTIMATION IN ONE-WAY DESIGNS
The second pair of contrasts in Table 6.1 is not orthogonal because the sum of the cross products of their weights is 1.5 rather than 0. The maximum number of orthogonal comparisons is limited by the degrees of freedom of the omnibus effect, d/A = a - 1. In Equation 6.9, SD is the variance of the scores contrast differences, and n the group size.
The width of a simultaneous confidence interval for \\f is typically wider than the width of the individual confidence interval for \|/ based on the same contrast. Standardize \ff against the square root of the total within-groups variance for all groups in the design, MSW. Standardize the dependent mean change relative to the standard deviation of the contrast difference scores, SD,.
Refer back to Table 6.4 and review the results of the independent sample analysis of the data in Table 6.3. Now look at the results of the dependent samples analysis in Table 6.5 for the same data. See Table 6.4, which shows the results of an independent sample analysis of the data in Table 6.3.
When the samples are dependent, ESCI standardizes the contrast based on SD, the standard deviation of the contrast difference scores. However, the values of the two estimators for the same effect converge as the sample size increases, conservatively. The form of estimated eta for a contrast is f\y = (SS^/SSj)1^2, which is the absolute value of the bivariate correlation between the contrast and the outcome.
In independent-samples models, the partial T\$ is the absolute value of the correlation between the contrast and the outcome control for all noncontrast effects. Refer to Table 6.5 to scan the results of a dependent samples analysis of the same data.