• Tidak ada hasil yang ditemukan

Chapter14 - Introduction to Multiple Regression

N/A
N/A
Protected

Academic year: 2019

Membagikan "Chapter14 - Introduction to Multiple Regression"

Copied!
50
0
0

Teks penuh

(1)

Statistics for Managers

Using Microsoft® Excel

5th Edition

Chapter 14

(2)

Learning Objectives

In this chapter, you learn:

How to develop a multiple regression model

How to interpret the regression coefficients

How to determine which independent variables to

include in the regression model

How to determine which independent variables are

most important in predicting a dependent variable

How to use categorical variables in a regression

(3)

The Multiple Regression

Model

Idea: Examine the linear relationship between 1 dependent (Y)

& 2 or more independent variables (X

i

).

i

ki

k

2i

2

1i

1

0

i

β

β

X

β

X

β

X

ε

Y

Multiple Regression Model with k Independent

Variables:

(4)

Multiple Regression Equation

The coefficients of the multiple regression model

are estimated using sample data

ki

k

2i

2

1i

1

0

i

b

b

X

b

X

b

X

Estimated

(or predicted)

value of Y

Estimated slope coefficients

Multiple regression equation with k independent

variables:

Estimated

intercept

(5)

Multiple Regression Equation

Example with

two independent

variables

Y

X

1

X

2

2

2

1

1

0

b

X

b

X

b

Sl

op

e f

or

va

ria

ble

X

1

(6)

Multiple Regression Equation

2 Variable Example

A distributor of frozen dessert pies wants to

evaluate factors thought to influence demand

Dependent variable: Pie sales (units per week)

Independent variables: Price (in $)

Advertising ($100’s)

(7)

Multiple Regression Equation

2 Variable Example

Sales = b

0

+ b

1

(Price) + b

2

(Advertising)

Sales = b

0

+b

1

X

1

+ b

2

X

2

Where X

1

= Price

X

2

= Advertising

Week

Pie Sales

Price

($)

Advertising

($100s)

1

350

5.50

3.3

(8)

Multiple Regression Equation

2 Variable Example, Excel

Regression Statistics

Multiple R

0.72213

R Square

0.52148

Adjusted R Square

0.44172

Standard Error

47.46341

Observations

15

ANOVA

df

SS

MS

F

Significance F

Regression

2

29460.027

14730.013

6.53861

0.01201

Residual

12

27033.306

2252.776

Total

14

56493.333

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

306.52619

114.25389

2.68285

0.01993

57.58835

555.46404

Price

-24.97509

10.83213

-2.30565

0.03979

-48.57626

-1.37392

Advertising

74.13096

25.96732

2.85478

0.01449

17.55303

130.70888

(9)

Multiple Regression Equation

2 Variable Example

)

decrease, on average, by

24.975 pies per week for

each $1 increase in selling

price, net of the effects of

changes due to advertising

b

2

= 74.131

: sales will

increase, on average, by

74.131 pies per week

for each $100 increase

in advertising, net of the

effects of changes due

to price

where

Sales is in number of pies per week

Price is in $

(10)

Multiple Regression Equation

2 Variable Example

Predict sales for a week in which the selling price is

$5.50 and advertising is $350:

Predicted sales is

428.62 pies

428.62

(11)

Coefficient of

Multiple Determination

Reports the proportion of total variation in Y

explained by all X variables taken together

squares

of

sum

total

squares

of

sum

regression

SST

SSR

2

(12)

Coefficient of

Multiple Determination (Excel)

Regression Statistics

Multiple R

0.72213

R Square

0.52148

Adjusted R Square

0.44172

Standard Error

47.46341

Observations

15

ANOVA

df

SS

MS

F

Significance F

Regression

2

29460.027

14730.013

6.53861

0.01201

Residual

12

27033.306

2252.776

Total

14

56493.333

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

306.52619

114.25389

2.68285

0.01993

57.58835

555.46404

Price

-24.97509

10.83213

-2.30565

0.03979

-48.57626

-1.37392

Advertising

74.13096

25.96732

2.85478

0.01449

17.55303

130.70888

.52148

(13)

Adjusted r

2

r

2

never decreases when a new X variable is added

to the model

This can be a disadvantage when comparing

models

What is the net effect of adding a new variable?

We lose a degree of freedom when a new X

variable is added

Did the new X variable add enough independent

(14)

Adjusted r

2

Shows the proportion of variation in Y explained by all X

variables adjusted for the number of X

variables used

(where n = sample size, k = number of independent variables)

Penalizes excessive use of unimportant independent

variables

Smaller than r

2

Useful in comparing models

(15)

Adjusted r

2

Regression Statistics

Multiple R

0.72213

R Square

0.52148

Adjusted R Square

0.44172

Standard Error

47.46341

Observations

15

ANOVA

df

SS

MS

F

Significance F

Regression

2

29460.027

14730.013

6.53861

0.01201

Residual

12

27033.306

2252.776

Total

14

56493.333

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

306.52619

114.25389

2.68285

0.01993

57.58835

555.46404

Price

-24.97509

10.83213

-2.30565

0.03979

-48.57626

-1.37392

Advertising

74.13096

25.96732

2.85478

0.01449

17.55303

130.70888

.44172

r

adj

2

44.2% of the variation in pie sales is explained

by the variation in price and advertising, taking

into account the sample size and number of

(16)

F-Test for Overall Significance

F-Test for Overall Significance of the Model

Shows if there is a linear relationship between all of

the X variables considered together and Y

Use F test statistic

Hypotheses:

H

0

: β

1

= β

2

= … = β

k

= 0 (no linear relationship)

H

1

: at least one β

i

≠ 0 (at least one independent variable

(17)

F-Test for Overall Significance

Test statistic:

where F has (numerator) = k and

(denominator) = (n - k - 1)

degrees of freedom

1

k

n

SSE

k

SSR

(18)

F-Test for Overall Significance

Regression Statistics

Multiple R

0.72213

R Square

0.52148

Adjusted R Square

0.44172

Standard Error

47.46341

Observations

15

ANOVA

df

SS

MS

F

Significance F

Regression

2

29460.027

14730.013

6.53861

0.01201

Residual

12

27033.306

2252.776

Total

14

56493.333

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

306.52619

114.25389

2.68285

0.01993

57.58835

555.46404

Price

-24.97509

10.83213

-2.30565

0.03979

-48.57626

-1.37392

Advertising

74.13096

25.96732

2.85478

0.01449

17.55303

130.70888

(19)

F-Test for Overall Significance

H

0

: β

1

= β

2

= 0

H

1

: β

1

and β

2

not both zero

= .05

df

1

= 2 df

2

= 12

Test Statistic:

Decision:

Conclusion:

Since F test statistic is in the

rejection region (p-value < .05),

reject H

0

There is evidence that at least one

independent variable affects Y

0

Critical

Value:

F

= 3.885

(20)

Residuals in Multiple Regression

Two variable model

Y

x

1i

The best fitting linear regression

equation, Y, is found by

minimizing the sum of squared

errors,

e

2

<

Sample

observation

Residual =

e

i

= (Y

i

– Y

i

)

(21)

Multiple Regression Assumptions

Assumptions

:

The errors are independent

The errors are normally distributed

Errors have an equal variance

e

i

= (Y

i

– Y

i

)

Errors (

residuals

) from the regression model:

(22)

Multiple Regression Assumptions

These residual plots are used in multiple regression:

Residuals vs. Y

i

Residuals vs. X

1i

Residuals vs. X

2i

Residuals vs. time (if time series data)

Use the residual plots to check for

violations of regression assumptions

(23)

Individual Variables

Tests of Hypothesis

Use t-tests of individual variable slopes

Shows if there is a linear relationship

between the variable X

i

and Y

Hypotheses:

H

0

: β

i

= 0 (no linear relationship)

H

1

: β

i

≠ 0 (linear relationship does exist

(24)

Individual Variables

Tests of Hypothesis

H

0

: β

j

= 0 (no linear relationship)

H

1

: β

j

≠ 0 (linear relationship does exist

between X

i

and Y)

Test Statistic:

(df = n – k – 1)

j

b

j

S

b

(25)

Individual Variables

Tests of Hypothesis

Regression Statistics

Multiple R

0.72213

R Square

0.52148

Adjusted R Square

0.44172

Standard Error

47.46341

Observations

15

ANOVA

df

SS

MS

F

Significance F

Regression

2

29460.027

14730.013

6.53861

0.01201

Residual

12

27033.306

2252.776

Total

14

56493.333

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

306.52619

114.25389

2.68285

0.01993

57.58835

555.46404

Price

-24.97509

10.83213

-2.30565

0.03979

-48.57626

-1.37392

Advertising

74.13096

25.96732

2.85478

0.01449

17.55303

130.70888

t-value for Price is t = -2.306, with

p-value .0398

(26)

Individual Variables

Tests of Hypothesis

d.f. = 15 - 2 - 1 = 12

= .05

t

/2

= 2.1788

H

0

: β

j

= 0

H

1

: β

j

≠ 0

The test statistic for each variable falls in

the rejection region (p-values < .05)

There is evidence that both Price and

Advertising affect pie sales at

= .05

Reject H

0

for each variable

Coefficients

Standard Error

t Stat

P-value

Price

-24.97509

10.83213

-2.30565

0.03979

Advertising

74.13096

25.96732

2.85478

0.01449

(27)

Confidence Interval Estimate

for the Slope

Confidence interval for the population slope β

i

Example:

Form a 95% confidence interval for the effect of changes in

price (X

1

) on pie sales, holding constant the effects of advertising:

-24.975 ± (2.1788)(10.832): So the interval is (-48.576, -1.374)

i

b

1

k

n

i

t

S

b

Coefficients

Standard Error

Intercept

306.52619

114.25389

Price

-24.97509

10.83213

Advertising

74.13096

25.96732

where t has

(n – k – 1) d.f.

Here, t has

(28)

Confidence Interval Estimate

for the Slope

Confidence interval for the population slope β

i

Example:

Excel output also reports these interval endpoints:

Weekly sales are estimated to be reduced by between 1.37 to 48.58 pies

for each increase of $1 in the selling price, holding constant the effects of

advertising.

Coefficients

Standard Error

Lower 95%

Upper 95%

(29)

Testing Portions of the Multiple

Regression Model

Contribution of a Single Independent Variable X

j

SSR(X

j

| all variables except X

j

)

= SSR (all variables) – SSR(all variables except X

j

)

Measures the contribution of X

j

in explaining the total

(30)

Testing Portions of the Multiple

Regression Model

Measures the contribution of X

1

in explaining SST

From ANOVA section of

regression for

From ANOVA section of

regression for

Contribution of a Single Independent Variable X

j

, assuming

all other variables are already included

(consider here a 3-variable model):

SSR(X

1

| X

2

and X

3

)

= SSR (all variables) – SSR(X

2

and X

3

)

3

3

2

2

1

1

0

b

X

b

X

b

X

b

(31)

The Partial F-Test Statistic

Consider the hypothesis test:

H

0

: variable X

j

does not significantly improve the

model after all other variables are included

H

1

: variable X

j

significantly improves the model after

all other variables are included

Test using the F-test statistic:

(with 1 and n-k-1 d.f.)

MSE

j)

except

variables

all

|

(X

SSR

j

(32)

Testing Portions of Model:

Example

Test at the α = .05 level to determine whether

the price variable significantly improves the

model given that advertising is included

(33)

Testing Portions of Model:

Example

H

0

: X

1

(price) does not improve the model

with X

2

(advertising) included

H

1

: X

1

does improve model

α

= .05, df = 1 and 12

F critical Value = 4.75

(For X

1

and X

2

)

(For X

2

only)

ANOVA

df

SS

MS

Regression

2

29460.02687

14730.01343

Residual

12

27033.30647

2252.775539

Total

14

56493.33333

ANOVA

df

SS

Regression

1

17484.22249

(34)

Testing Portions of Model:

Example

Conclusion: Reject H

0

; adding X

1

does improve model

316

Regression

2

29460.02687

14730.01343

Residual

12

27033.30647

2252.775539

Total

14

56493.33333

ANOVA

df

SS

Regression

1

17484.22249

(35)

Relationship Between Test

Statistics

The partial

F

test statistic developed in this section

and the

t

test statistic are both used to determine the

contribution of an independent variable to a multiple

regression model.

The hypothesis tests associated with these two

statistics always result in the same decision (that is,

the

p

-values are identical).

a

a

F

t

2

1

,

(36)

Coefficient of Partial Determination

for k Variable Model

Measures the proportion of variation in the dependent variable

that is explained by X

j

while controlling for (holding constant)

the other independent variables

j)

except

variables

all

|

SSR(X

)

variables

SSR(all

SST

j)

except

variables

all

except

variables

Yj.(all

(37)

Using Dummy Variables

A dummy variable is a categorical

independent variable with two levels:

yes or no, on or off, male or female

coded as 0 or 1

Assumes equal slopes for other variables

If more than two levels, the number of

(38)

Dummy Variable Example

Let:

Y = pie sales

X

1

= price

X

2

= holiday

(X

2

= 1 if a holiday occurred during the week)

(X

2

= 0 if there was no holiday that week)

2

1

0

b

X

b

X

b

(39)

Dummy Variable Example

Same

slope

X

1

(Price)

Y (sales)

b

0

+ b

2

Holiday

No Holiday

(40)

Dummy Variable Example

Sales: number of pies sold per week

Price: pie price in $

Holiday:

1 If a holiday occurred during the week

0 If no holiday occurred

b

2

= 15: on average, sales were 15 pies greater in weeks

with a holiday than in weeks without a holiday, given

the same price

)

15(Holiday

30(Price)

-300

(41)

Dummy Variable Model

More Than Two Levels

The number of dummy variables is

one less than

the number of levels

Example:

Y = house price ; X

1

= square feet

If style of the house is also thought to matter

Style = ranch, split level, colonial

(42)

Dummy Variable Model

More Than Two Levels

Example: Let “colonial” be the default category, and let X

2

and

X

3

be used for the other two categories:

Y = house price

X

1

= square feet

X

2

= 1 if ranch, 0 otherwise

X

3

= 1 if split level, 0 otherwise

The multiple regression equation is:

3

3

2

2

1

1

0

b

X

b

X

b

X

b

(43)

Dummy Variable Model

More Than Two Levels

18.84

With the same square feet, a

ranch will have an estimated

average price of 23.53

thousand dollars more than a

colonial.

With the same square feet, a

split-level will have an

estimated average price of

18.84 thousand dollars more

than a colonial.

Consider the regression equation:

3

2

1

23.53X

18.84X

(44)

Interaction Between

Independent Variables

Hypothesizes interaction between pairs of X

variables

Response to one X variable may vary at different levels

of another X variable

Contains a two-way cross product term

(45)

Effect of Interaction

Given:

Without interaction term, effect of X

1

on Y is

measured by β

1

With interaction term, effect of X

1

on Y is measured

by β

1

+ β

3

X

2

Effect changes as X

2

changes

ε

X

β

X

β

X

β

β

(46)

Interaction Example

X

2

= 1:

Y = 1 + 2X

1

+ 3(1) + 4X

1

(1) = 4 + 6X

1

X

2

= 0:

Y = 1 + 2X

1

+ 3(0) + 4X

1

(0) = 1 + 2X

1

Slopes are different if the effect of X

1

on Y depends on X

2

value

X

1

4

4

8

8

12

12

0

0

0

0

0.5

0.5

1

1

1.5

1.5

Y

= 1 + 2X

1

+ 3X

2

+ 4X

1

X

2

(47)

Significance of Interaction

Term

Can perform a partial F-test for the

contribution of a variable to see if the

addition of an interaction term improves the

model

Multiple interaction terms can be included

Use a partial F-test for the simultaneous

(48)

Simultaneous Contribution of

Independent Variables

Use partial F-test for the simultaneous contribution

of multiple variables to the model

Let m variables be an additional set of variables added

simultaneously

To test the hypothesis that the set of m variables

improves the model:

MSE(all)

m

/

)]

variables

m

of

set

new

except

(all

SSR

[SSR(all)

F

(49)

Chapter Summary

Developed the multiple regression model

Tested the significance of the multiple

regression model

Discussed adjusted r

2

Discussed using residual plots to check

model assumptions

(50)

Chapter Summary

Tested individual regression coefficients

Tested portions of the regression model

Used dummy variables

Evaluated interaction effects

Referensi

Dokumen terkait

(1) Subbidang Industri Minyak Bumi mempunyai tugas melakukan penyiapan bahan dan data bagi penyusunan pedoman, analisis, bimbingan teknis, pengawasan penaatan, pemantauan

Hasil Penelitian ini menunjukkan bahwa (1) Komitmen organisasi berpengaruh terhadap kinerja Rumah Sakit Umum Daerah (RSUD) Sungailiat; (2) Pengendalian internal

Pertanian organik dalam penelitian ini adalah pertanian jenis tanaman padi dengan sistem produksi yang holistik dan terpadu dengan cara mengoptimalkan. kesehatan dan

→ Menjawab pertanyaan tentang materi Sumber daya yang di butuhkan untuk usaha kerajinan dari bahan limbah berbentuk bangun datar yang terdapat pada buku pegangan peserta didik

5.3 Pengaruh Program Pertanian Organik terhadap Sosial Ekonomi Kelompok Dampingan Yayasan Bina Keterampilan Pedesaan (BITRA) Indonesia di Desa Lubuk Bayas Kecamatan Perbaungan

Dua media aplikasi komputer yaitu Microsoft Power Point berbasis tayangan slide dan Windows Movie Maker berbasis tayangan video yang peneliti bandingkan pada

Dari analisis pengembangan website BKD Karanganyar berdasar model SWOT, dapat disusun suatu roadmap pengembangan website BKD Karanganyar tahun 2011-2015 yaitu tahap I,

Dana perimbangan adalah semua pengeluaran Negara yang dialokasikan kepada daerah untuk membiayai kebutuhan daerah dalam rangka pelaksanaan desentralisasi, yang