ยฉ FAROUQ MOHAMMAD A. ALAM 1
Chapter 9
SIMPLE LINEAR REGRESSION AND
CORRELATION
Learning Outcomes
โบ After studying this chapter, the student will:
1. be able to obtain a simple linear regression model and use it to make predictions.
2. be able to calculate the coefficient of determination and to interpret tests of regression coefficients.
3. be able to calculate correlations among variables.
4. understand how regression and correlation differ and when the use of each is appropriate.
9.1 INTRODUCTION
9.2 THE REGRESSION MODEL
ยฉ FAROUQ MOHAMMAD A. ALAM 3
Regression vs. Correlation
โบ Regression is an inferential method which is employed usually is to predict or estimate the value of one variable corresponding to a given value of another variable.
โบ Correlation is an inferential method which is concerned with measuring the strength of the relationship between variables.
โบ Fundamentals of regression analysis are based on the theory of conditional probability.
Types of Variables in Regression Analysis
โบ An independent variable or predictor variable (๐ฟ) is the variable being influenced by the investigator.
โบ An dependent variable or response variable (๐) is the variable being influenced by the independent variable.
ยฉ FAROUQ MOHAMMAD A. ALAM 5
Independent variable vs. dependent variable
Independent
variable influences Dependent
variable
9.3 THE SAMPLE REGRESSION EQUATION 9.4. EVALUATING THE REGRESSION
EQUATION
9.5. USING THE REGRESSION EQUATION
ยฉ FAROUQ MOHAMMAD A. ALAM 7
The Scatter Diagram
โบ A scatter diagram consists of points that are plotted by assigning values of the independent variable ๐ฟ to the horizontal axis and values of the dependent variable ๐ to the vertical axis.
The Least-Squares Line
โบ The method of least squares is used to obtain the least-squares regression line which has the following form:
๐ = เทกเท ๐ท๐ + เทก๐ท๐๐
โบ Here, ๐ is the predicted value of the dependent variable, ๐ is the corresponding value of the independent variable on which the prediction is based, ๐ทเทก๐is the point where the line crosses the vertical axis (i.e., the intercept), and ๐ทเทก๐shows the amount by which y changes for each unit change in ๐ (i.e., the slope).
ยฉ FAROUQ MOHAMMAD A. ALAM 9
The Least-Squares Line (cont.)
๐ = เทก เท ๐ท
๐+ เทก ๐ท
๐๐
โบ The above equation means that as the independent value increase by 1 unit, as the dependent variable increase by ๐ on the average.
๐ = เทก เท ๐ท
๐โ เทก ๐ท
๐๐
โบ The above equation means that as the independent
value increase by 1 unit, as the dependent variable
decrease by ๐ on the average.
The Least-Squares Line (cont.)
๐ท เทก
๐= ฯ
๐=๐๐(๐
๐โ เดฅ ๐)(๐
๐โ เดฅ ๐) ฯ
๐=๐๐๐
๐โ เดฅ ๐
๐๐ท เทก
๐= เดฅ ๐ โ เทก ๐ท
๐๐ เดฅ
ยฉ FAROUQ MOHAMMAD A. ALAM 11
Types of Relations from Scatter Diagram
Types of Relations from Scatter Diagram (cont.)
โบ A positive ๐ทเทก๐ indicates that values of Y tend to increase as
values of X increase, and we say that there is a direct (positive) linear relationship between X and Y.
โบ A negative ๐ทเทก๐ indicates that values of Y tend to decrease as values of X increase, and we say that there is an inverse
(negative) linear relationship between X and Y.
โบ When there is no linear relationship between X and Y, ๐ฝแ1 = 0.
ยฉ FAROUQ MOHAMMAD A. ALAM 13
The Least-Squares Line (cont.)
โบ The least-squares regression line is called the โbest fitโ line for describing the relationship between our two variables since the sum of the squared
vertical deviations of the observed data points (๐
๐) from the least-squares regression line is
smaller than the sum of the squared vertical
deviations of the data points from any other line.
The Least-Squares Line (cont.)
The Least-Squares Line (cont.)
Using the Fitted Equation
โบ The fitted equation can be used to obtain a prediction for the value Y given a value of X.
ยฉ FAROUQ MOHAMMAD A. ALAM 17
MegaStat Application
Example 9.3.1
โบ Table 9.3.1 shows the measurements taken on each subject were deep abdominal adipose tissue (AT) obtained by CT and waist circumference (in cm). Construct a scatter plot and perform
regression analysis if you know that deep abdominal AT is the dependent variable, while the waist measurement is the
independent variable.
ยฉ FAROUQ MOHAMMAD A. ALAM 19
Example 9.4.2
โบ We wish to know if we can conclude that the slope of the
population regression line describing the relationship between X and Y is zero. Also, predict Y and estimate the mean of Y for a waist circumference of 1m.
ยฉ FAROUQ MOHAMMAD A. ALAM 21
Example 9.4.2
โบ We wish to know if we can conclude that the slope of the
population regression line describing the relationship between X and Y is zero. Also, predict Y and estimate the mean of Y for a waist circumference of 1m.
โบ Important Note: Recall that X is measured in cm. Before
prediction or estimating the mean, check the measurement units.
Here, change 1m to cm by multiplying 1 by 100, then predict Y or estimated its mean based on X = 100.
Regression Analysis (Scatter Diagram) (MegaStat Application)
1. In Data Ribbon, click on MegaStat icon, then select Correlation / Regression.
2. Select Scatterplot.
ยฉ FAROUQ MOHAMMAD A. ALAM 23
3. Uncheck if you do not want to include the regression line in the 3. Input the range of X.
4. Input the range of Y.
5. Uncheck if you do not want to include the regression line in the
โบ The scatter diagram indicates a direct linear relationship between X and Y.
โบ The equation line is:
เท
๐ = โ๐๐๐. ๐๐๐ + ๐. ๐๐๐ ๐
โบ The regression equation means that as the waist
measurement increases by 1 unit as the deep abdominal AT increases by 3.489 units on the average.
ยฉ FAROUQ MOHAMMAD A. ALAM 25
Regression Analysis (MegaStat Application)
1. In Data Ribbon, click on MegaStat icon, then select Correlation / Regression.
2. Select Regression.
ยฉ FAROUQ MOHAMMAD A. ALAM 27
3. Input the range of X. 4. Input the range of Y.
5. Change the option in the drop box to โType in predictor
valuesโ, add the values of X.
then press OK.
5. Change the option in the drop box to โType in predictor
valuesโ, add the values of X, and then press OK.
โบ The coefficient of determination is equal to 0.670.
ยฉ FAROUQ MOHAMMAD A. ALAM 29
โบ The intercept and the slope of the regression equation.
โบ The predicted value of Y given that X = 100 is เท๐ โ ๐๐๐.
9.6. THE CORRELATION MODEL
9.7 THE CORRELATION COEFFICIENT
ยฉ FAROUQ MOHAMMAD A. ALAM 31
Regression Analysis vs. Correlation Analysis
โบ Regression analysis describes the relationship between the
dependent (Y) and independent (X) variables for the purposes of prediction.
โบ Correlation analysis is used to determine the strength and direction of the relationship between the Y and X.
โบ The population correlation coefficient ๐ measures the direction and strength of the linear relationship between X and Y. It is known as Pearson's correlation coefficient.
Properties of ANY Correlation Coefficient
โบ The range of the correlation coefficient is from +1 to -1.
โบ If the value of the correlation coefficient is close to +1, then there is a strong direct linear relationship between the variables.
โบ If the value of the correlation coefficient is equal to +1, then there is a perfect direct linear relationship between the variables.
Properties of ANY Correlation Coefficient (cont.)
โบ If the value of the correlation coefficient is close to -1, then there is a strong inverse linear relationship
between the variables.
โบ If the value of the correlation coefficient is equal to -1, then there is a perfect inverse linear relationship between the variables.
โบ If the value of the correlation coefficient is close to 0
Properties of ANY Correlation Coefficient (cont.)
โบ If the value of the correlation coefficient is close to 0 (from the negative side), then there is an inverse weak linear relationship between the variables.
โบ If the value of the correlation coefficient is equal to 0, then there is no linear relationship between the
variables.
โบ The sign of the correlation coefficient is the same as the sigh of the slope of the regression equation.
The Sample Correlation Coefficient
โบ The sample correlation coefficient ๐ describes the linear relationship between the sample observations of the two variables in the same way as the population correlation coefficient ๐.
โบ The sample correlation coefficient is calculate using the following formula:
๐ = ๐ฯ๐๐ โ (ฯ๐)(ฯ๐)
๐ฯ๐๐ โ ฯ๐ ๐ ๐ฯ๐๐ โ ฯ๐ ๐
MegaStat Application
ยฉ FAROUQ MOHAMMAD A. ALAM 37
Example 9.7.1 and Example 9.7.2
โบ Table 9.7.1 shows a subjectโs height (cm) and the peak spinal
latency (Cv) of the SEP (a type of electrical activity of the brain).
Investigate the relationships between a subjectโs height and the Cv of the SEP. Use the sample correlation coefficient to check if it is of sufficient magnitude to indicate that, in the population, height and Cv SEP levels are correlated.
Correlation Analysis (MegaStat Application)
1. In Data Ribbon, click on MegaStat icon, then select Correlation / Regression.
2. Select Correlation Matrix.
ยฉ FAROUQ MOHAMMAD A. ALAM 39
6. Choose โgreater thanโ.
3. Input the range of information, then
ยฉ FAROUQ MOHAMMAD A. ALAM 41
โบ MegaStat indicates that the
correlation coefficient is equal to 0.848 (strong positive linear
relationship).