Chapter 10: Correlation and Regression Chapter 13: Nonparametric Statistics
Objectives:
β Learn how to draw a scatter plot for a set of ordered pairs.
β Learn how to compute the correlation coefficient.
β Learn how to compute the equation of the regression line.
β Learn how to compute the Spearman rank correlation coefficient.
Overview of Chapters 10 and 13
Sec. # Title Page(s)
10 - 1 Scatter Plots and Correlation 369 β 385 13 - 6 The Spearman Rank Correlation
Coefficient 459 β 461
10 - 2 Regression 386 β 393
Remember?
Independent
variable influences Dependent
variable
At a Glance!
ο¨ Are two or more variables linearly related?
(Scatter plot and/or correlation coefficient)
ο¨ If so, what is the strength of the relationship?
(Scatter plot and/or correlation coefficient)
ο¨ What type of relationship exists?
(Scatter plot, correlation coefficient and/or regression)
ο¨ What kind of predictions can be made from the relationship?
(Regression)
ο¨
A scatter plot is a graph of the ordered pairs of numbers (x, y) consisting of the independent variable x and the dependent variable y.
10 β 1: Scatter Plots and Correlation
10 β 1: Scatter Plots and Correlation (cont.)
ο¨
It is a visual way to describe the nature of the relationship between the x and y. It may shows:
ο€
a positive linear relationship,
ο€
a negative linear relationship,
ο€
a curvilinear relationship,
ο€
or no relationship.
ο¨
Example 10 β 1, page 372, Example 10 β 2,
page 372 β 373, Example 10 β 3, page 373.
Examples of scatter plots patterns
Correlation
ο¨ Pearsonβs linear correlation coefficient, which will be denoted by π, measures the strength and the direction of a linear relationship between two quantitative variables.
Calculating π
ο¨ The linear correlation coefficient is given by
π = πβππ β (βπ)(βπ)
πβππ β βπ π πβππ β βπ π
ο¨ The above coefficient is also known as Pearson product moment correlation coefficient (PPMC).
Properties of π
ο¨ The range of the correlation coefficient is from +1 to -1.
ο¨ If the value of π is close to +1, then there is a strong positive linear relationship between the variables.
ο¨ If the value of π is close to -1, then there is a strong negative linear relationship between the variables.
ο¨ If the value of π is close to 0, then there is either a
weak or no linear relationship between the variables.
Properties of π
Example 10 β 4: Car Rental Companies
# of Cars (x) Revenue (y)
63 7
29 3.9
20.8 2.1
19.1 2.8
13.4 1.4
8.5 1.5
ο¨ From the left table, we obtain:
βπ = πππ. π,
βπ = ππ. π,
βππ = πππ. ππ,
βππ = ππππ. ππ,
βππ = ππ. ππ.
Example 10 β 4 (cont.)
π = π(πππ. ππ) β (πππ. π)(ππ. π)
π(ππππ. ππ) β πππ. π π π(ππ. ππ) β ππ. π π
= π. πππ
ο¨ Hence, there is a strong positive linear correlation relation between the number of rented cars and revenues.
ο¨ Example 10 β 5, page 377 (Negative correlation),
Example 10 β 6, page 378 (Weak positive correlation).
13 β 6: The Spearman Rank Correlation Coefficient
ο¨ If π is the sample size, and π is difference in ranks, then the Spearman rank correlation coefficient is calculated as
ππ = π β πβπ π π(ππ β π)
Example 13 β 7: Bank Branches and Deposits (page 459)
# of branches (X) Deposits (Y) Rank (X)
Rank (Y)
209 23 4 4
353 31 2 1
19 7 8 6
201 12 5 5
344 26 3 2
132 5 6 7
401 24 1 3
126 5 7 8
# of branches (X) Deposits (Y) Rank (X)
209 23 4
353 31 2
19 7 8
201 12 5
344 26 3
132 5 6
401 24 1
126 5 7
# of branches (X) Deposits (Y)
209 23
353 31
19 7
201 12
344 26
132 5
401 24
126 4
# of branches (X)
209 353 19 201 344 132 401 126
Example 13 β 7 (cont.)
Rank (X) Rank (Y) π π π
4 4 0 0
2 1 1 1
8 6 2 4
5 5 0 0
3 2 1 1
6 7 -1 1
1 3 -2 4
7 8 -1 1
β π ππ = βπ π
Example 13 β 7 (cont.)
ππ = π β πβπ π
π ππ β π = π β π β ππ
π ππ β π = π β ππ πππ
= π. πππ
ο¨ The above value indicates that we have a strong positive correlation.
ο¨ We can calculate Spearmenβs correlation if the data are ordinal-level qualitative.
10 β 2: Regression
ο¨ If the value of the correlation coefficient is significant, the next step is to determine the
equation of the regression line, which is the dataβs line of best fit.
ο¨ Best fit means that the sum of the squares of the vertical distances from each point to the line is at a minimum.
Line of best fit
Line of best fit (cont.)
Determination of the Regression Line Equation
ο¨ The equation regression line is:
πβ² = π + π β π
ο¨ Here, π is the intercept or the regression constant, π is the slope or the regression coefficient, π₯ is the observed independent variable, and they are used to calculate π¦β²which is the predicted dependent
variable.
Determination of the Regression Line Equation (cont.)
π = βπ βππ β (βπ)(βπ) π βππ β βπ π
π = π βππ β (βπ)(βπ) π βππ β βπ π
Example 10 β 9 (page 388)
ο¨ Number of rented cars is the independent variable π₯, while the revenue is the dependent variable π¦. The regression line is found to be:
πβ² = π. πππ + π. πππ β π
ο¨ This means that as the number of rented cars
increases by 1 as the revenue increases by 0.106 on average.
Example 10 β 10 (page 389)
ο¨ Number of absences is the independent variable π₯, while the final grade is the dependent variable π¦. The regression line is found to be:
πβ² = πππ. πππ β π. πππ β π
ο¨ This means that as the number of absences
increases by 1 as the final grade decreases by 3.622 on average.
Example 10 β 11 (page 391)
ο¨ Predict the income of a car rental agency (y) that has 200,000 automobiles (x).
ο¨ Note that in the Example 10 β 1, the unit of number of rented automobiles is in ten thousands.
Therefore, 200,000 automobiles is in fact 20 ten thousand, i.e. x = 20. Hence,
π¦β² = 0.396 + 0.106 ππ = 2.516
Important Rule!
ο¨ Q. Is there any relationship between the Personβs
correlation coefficient and the regression coefficient π?
ο¨ A. The sign of the correlation coefficient and the sign of the slope of the regression line will always be the same.
Application Summary
Measure Excel only Excel + MegaStat
Scatter plot β
Personβs linear correlation
coefficient β
Spearmanβs correlation
coefficient β
Regressions equation β