160 Chapter 3 Association: Contingency, Correlation, and Regression
This example shows the correlation and the regression line are nonresistant:
They are prone to distortion by outliers. Investigate any regression outlier. Was the observation recorded incorrectly, or is it merely different from the rest of the data in some way? It is often a good idea to refit the regression line without it to see if it has a large effect, as we did in this last example.
Section 3.4 Cautions in Analyzing Associations 161
140 120 100 80 60 40 20 0
50 55 60 65 70
Education (% completing high school)
Crime Rate
75 80 85
mFigure 3.21 Scatterplot of Crime Rate and Percentage With at Least a High School Education. There is a moderate positive association 1r = 0.472. Question Does more education cause more crime, or does more crime cause more education, or possibly neither?
Think It Through
The strong correlation of 0.79 between urbanization and education tells us that highly urban counties tend to have higher education levels. The mod- erately strong correlation of 0.68 between urbanization and crime rate tells us that highly urban counties also tend to have higher crime. So, perhaps the reason for the positive correlation between education and crime rate is that education tends to be greater in more highly urbanized counties, but crime rates also tend to be higher in such counties. In summary, a correlation could occur without any causal connection.
Insight
For counties with similar levels of urbanization, the association between crime rate and education may look quite different. You may then see a negative correlation. Figure 3.22 portrays how this could happen. It shows a negative trend between crime rate and education for counties having urbanization = 0 (none of the residents living in a metropolitan area), a separate negative trend
Crime rate
Education Counties with urbanization = 100
Counties with urbanization = 50 Counties with urbanization = 0
mFigure 3.22 Hypothetical Scatter Diagram Relating Crime Rate and Education.
The points are also labeled by whether urbanization= 0.50, or 100. Question Sketch lines that represent (a) the overall positive relationship between crime rate and education, and (b) the negative relationship between crime rate and education for counties having urbanization= 0.
162 Chapter 3 Association: Contingency, Correlation, and Regression
Whenever two variables are associated, other variables may have influenced that association. In Example 14, urbanization influenced the association between crime rate and education. This illustrates an important point: Correlation does not imply causation.
In Example 14, crime rate and education were positively correlated, but that does not mean that having a high level of education causes a county’s crime rate to be high. Whenever we observe a correlation between variables x and y, there may be a third variable correlated with both x and y that is responsible for their association. Let’s look at another example to illustrate this point.
for counties having urbanization = 50, and a separate negative trend for counties having urbanization = 100. If we ignore the urbanization values and look at all the points, however, we see a positive trend—higher crime rate tending to occur with higher education levels, as reflected by the overall posi- tive correlation.
c Try Exercises 3.53 and 3.57
Lurking variable b
Ice Cream and Drowning
Picture the Scenario
The Gold Coast of Australia, south of Brisbane, is famous for its beaches.
Because of strong rip tides, however, each year many people drown. Data collected monthly show a positive correlation between y = number of peo- ple who drowned in that month and x = number of gallons of ice cream sold in refreshment stands along the beach in that month.
Question to Explore
Clearly, the high sales of ice cream don’t cause more people to drown.
Identify another variable that could be responsible for this association.
Think It Through
In the summer in Australia (especially January and February), the weather is hot. People tend to buy more ice cream in those months. They also tend to go to the beach and swim more in those months, and more people drown. In the winter, it is cooler. People buy less ice cream, fewer people go to the beach, and fewer people drown. So, the mean temperature in the month is a variable that could be responsible for the correlation. As mean temperature goes up, so does ice cream sales and so does the number of people who drown.
Insight
If we looked only at months having similar mean temperatures, probably we would not observe any association between ice cream sales and the number of people who drown.
c Try Exercise 3.54
Example 15
A third variable that is not measured in a study (or perhaps even known about to the researchers) but that influences the association between the response vari- able and the explanatory variable is referred to as a lurking variable.
Section 3.4 Cautions in Analyzing Associations 163
In interpreting the positive correlation between crime rate and education for Florida counties, we’d be remiss if we failed to recognize that the correlation could be due to a lurking variable. This could happen if we observed those two variables but not urbanization, which would then be a lurking variable. Likewise, if we got excited by the positive correlation between ice cream sales and the number drowned in a month, we’d fail to recognize that the monthly mean tem- perature is a lurking variable. The difficulty in practice is that often we have no clue what the lurking variables may be.