Foundations of Behavioral Statistics

Third, the book includes many concrete hypothetical data sets, as well as encouragement to use computer software (eg, SPSS statistical package and Excel spreadsheet program) to confirm and further explore statistical dynamics. But more often, several different types of characterization are needed (eg, "the data ranged from 83.0 to 116.5, and the most common score was 99.0").

Thompson (2000a) emphasized that “univariate and multivariate analyzes of the same data [emphasis added] can produce results that are as different as night and day [emphasis added]. If we only have data from a subset of the population, our data set is called a sample.

For example, we can now make comparative statements such as Patty has less authority than Steve. For example, we could argue that this person likes cabernet three times as much as chardonnay.

Internal design validity addresses concerns about whether we can be sure that the intervention caused the observed effects. Historical threats to internal design validity occur when unplanned events not part of the design occur during the intervention.

Although "statistics" is always about sample data,. descriptive statistics" are about either sample or population data. If two of the characterizations performed the same function, there would be fewer than four categories.

But it seems reasonable to require that the one number we use for a location descriptive statistic for a given data set must somehow be in the middle of the scores. The influential recommendations of the Task Force on Statistical Inference of the American Psychological Association (APA) emphasized that there are many “ways to include data or distributions in graphics.

Table 2.1 presents data for scores of seven people on the variables X and Y. The variable X is the number of cups of coffee consumed on one day, and Y is the number of phone calls each person placed on the same day

The median is the "center" of the scores in the sense that half of the scores are above the median and half are below the median. If (and only if) the data are interval-scaled, the distances of the scores from the median become meaningful.

TABLE 2.2. Cumulative Frequency Distribution for the Table 2.1 X Data

Or consulting other data for the student (eg, a GRE verbal score of 780) may indicate that the student simply did not try. Unbiasedness refers to the capacity of repeated samples that invoke a statistic to produce accurate estimates of corresponding parameters.

Because these data are on an interval scale, the differences in scores from both the mean and median are meaningful. Then calculate (a) the difference, (b) the absolute difference, and (c) the squared difference of each score from the mean.

TABLE 2.4. Two Illustrative “Robust” Location Descriptive Statistics

The bad news is that the two post-test means suggest that the interventions were equally (in)effective. The results for this data (no mean change, but increased variability in achievement) are obvious given that the data set in the example is ridiculously small.

FIGURE 3.1. Plot of pretest to posttest changes for intervention participants

One candidate for descriptive dispersion statistics is the sum of individual deviation scores from the mean, where each xi = Xi – MX. The sum of deviation scores also has the desirable property of accounting for each score.

But why does sample data tend to underrepresent the score distribution in the population, when we draw strictly random samples from the population? However, most of us feel more comfortable working in the non-quadratic metric of the scoring world.

Consequently, the standard deviation is sometimes called the "first moment about the mean." Other descriptive statistics also use the mean as a benchmark for quantifying deviations, but use higher order exponential powers of the deviations (i.e. 3 and 4), defining the second and third moments around the mean. If we change the previous example for the case where n = 7, the mean under the condition of maximum score spread will be equal.

Her score pulls the average up from the original value of “2.00 recipes” to the new value of “3.40 recipes.” However, her score has an even more dramatic effect on the SD, which goes from the original value of “0.82 recipes” to the new value of “3.44 recipes”. A popular trimmed dispersion metric trims 25% of the scores at both distribution ends and then calculates the trimmed range.

These scores have no measurement metrics other than standard deviation units because the original metric has been removed from the scores via division. Distribution statistics determine how similar or different the results are to each other, or to a reference point (eg, the mean).

Note that the skewness coefficient can never be calculated unless n is at least 3, due to the use of (n – 2) within a divisor. Note that the kurtosis coefficient can never be calculated unless n is at least 4, due to the use of (n – 3) in a divisor.

Table 4.2 presents two examples of skewed distributions, one nega- nega-tively skewed and one posinega-tively skewed

The asymptotic properties of the normal distribution can be concretely understood if we determine the proportion (or, if multiplied by 100, the percentage) of data points below a given point, Xi, in a given normal distribution. So we have, for the six regions that consume 99.73% of the points, for each of the infinitely many normal distributions, .

standard deviation of 10.0. Figure 4.4 is an SPSS graphic of a plot of val- val-ues of u against corresponding valval-ues of X i , which displays as a bell shape.

Thus, some computer programs create the boxplot using Q1 and Q3 (i.e., the 25th and 75th percentiles, using Equation 2.1) rather than the hinges. The shapes of data distributions that are at least interval scale can be characterized by computer statistics that quantify (a) symmetry (i.e., the coefficient of skewness) and (b) height relative to width (i.e., the coefficient of kurtosis).

FIGURE 4.6. Normal distribution for M = 50.0 and SD = 10.0

Given that the covariance is a description of a bivariate relationship, which also requires interval scaling of both variables, why is COVXY itself not used as a description of bivariate association? This is not intended to suggest that the covariance is unimportant for other purposes that do not involve description.

An important implication of Equation 5.2 is that the intersection of the two means is always on the line of best fit. Note that the line of best fit does not capture any of the asterisks in the figure 5.1 scattergram.

FIGURE 5.1. Scattergram of the Table 5.1 data

To make this discussion concrete, consider the quasi-hypothetical data (ie, the data are approximate but true for a recent point in time) presented in Table 5.4. The example reflects what statisticians call the third variable problem (ie, the problem that a third, fourth variable, etc. can falsely inflate the correlation coefficient or, conversely, that a third, fourth variable, etc. can falsely lower the correlation coefficient).

TABLE 5.4. Scores on Three Variables for 10 Cities

For all stars in the upper right quadrant, all of these examples had Y and X scores above their respective averages. For all stars in the lower left quadrant, all of these examples had scores on both Y and X below their averages.

How well do the two variables arrange the cases in exactly the same (or opposite) order?, and. Spearman's ρ can also be calculated when both or one of the two variables is only ordinally scaled.

TABLE 5.7. Ranks of Seven People on Three Variables Participant/

The algebraically equivalent form of the Pearson r for this combination of scale levels is. For the correlation of the scores on item #1 with the total scores on the other test items, with the exception of item #1, we have rpb.

TABLE 5.8. Modeled Heart Attack Data Heart attack

First, if (a) r= 0 and (b)SDX=SDY, the tracing on the floor of the circumference of the bell will create a circle, an ellipse, or a line. Third, if r= –1 or +1, tracing the circumference of the bell on the floor will produce a circle, an ellipse, or a line.

A heuristic example will clarify the differences between (a) the population distribution of N scores, (b) the sampling distribution of n scores, and (c) the sampling distribution of statistics. The number of samples drawn for the sampling distribution from a given population is a function of the population size and the sample size.

TABLE 6.1. Empirically-Derived Sampling Distribution Number of heads Number of samples Percentage

For example, we might want (and expect) to reject the null hypothesis: "The average life expectancy of AIDS patients randomly assigned to take a new drug is equal to the average life expectancy of AIDS patients randomly assigned a placebo drug ." But in other cases, we may not want to reject the null hypothesis: "The proportion of side effects that occur with a new AIDS drug is equal to the proportion of side effects that occur in AIDS patients taking placebo drugs." The decision to reject the null can be communicated by saying, "The results were statistically significant." All this term means is that you rejected the null hypothesis because the sample statistic was very unlikely.

FIGURE 6.1. Graphical presentation of the sampling distribution for the mean of n = 3 scores drawn from the population of N = 20 scores in Table 6.2.

Note also that the average of the statistics in all three sampling distributions in Table 6.5 exactly corresponds to the population parameter. The mean of the statistics in a sampling distribution for unbiased estimators will correspond to the population parameter being estimated.

Table 6.4 presents a finite population consisting of N = 6 scores. For the purposes of formulating features that apply to the sampling distribu-tions of all statistics, here we will arbitrarily focus on the sampling distri-butions for the mean, and compu

This reflects a dynamic stated in what is called the central limit theorem, which states that as n gets larger, the sampling distribution of the mean will approach normality even if the shape of the population is non-normal. And we have no way of knowing whether our particular estimate, 24.0 months, came from the middle, or the low or high end of the sample distribution.

For the null test that the two variances are equal (but not necessarily for other applications involving different hypotheses), the formula for the test statistic is Note the logically expected use of Pearson's r of paired scores in the formula for SE.

TABLE 6.6. Hypothetical Data for Use in Comparing Independent and Dependent (Paired) t Tests

To understand power (and thus β), we need to understand the relationships of four characteristics of a given study: (a) n, (b)α, (c)β, and (d) effect size. First, for a fixed effect size (both r and r2 are among the dozens of effect size statistics), the sample sizes at which the fixed result goes from statistically significant to statistically non-significant (or vice versa) can be determined.

TABLE 6.7. Statistical Precision for M as a Function of SD X and n

The pCALCULATOR estimates the probability of the sample statistic(s) (and sample results even more extreme in their deviation from the null hypothesis than our sample results), assuming (a) the sample comes from a population exactly described by the null hypothesis, and (b) given the sample size. Which of the following correctly lists these studies in the order of largest to smallest p CALCULATE.