Sebagian Materi dapat di download di ariefyulianto.wordpress.com

(1)

(2)

• Sebagian Materi dapat di download di ariefyulianto.wordpress.com

• Software dapat di download di uap.unnes.ac.id

(3)

Referensi

1. Damodar N Gujarati. Basic econometrics. Copyrighted Material. Fourth Edition.

2. Damodar N Gujarati. 2006. Dasar-Dasar Ekonometrika. Jakarta : Penerbit Erlangga. 3. Rainer Winkelmann. 2008. Econometric

Analysis of Count Data. Fifth edition. Berlin Heidelberg : Springer-Verlag

4. Sarwoko. 2008. Dasar-Dasar Ekonometrika. Yogyakarta : Penerbit Andi

(4)

Kontrak (1)

Metode Pembelajaran

Agar dicapai hasil pengajaran yang optimal, maka pada mata kuliah ini digunakan kombinasi metode pembelajaran ceramah dan diskusi di dalam kelas, serta observasi mandiri di luar kelas (lapangan).

Sistem Penilaian

Penilaian atas keberhasilan mahasiswa dalam mengikuti dan memahami materi pada mata kuliah ini didasarkan penilaian selama proses

perkuliahan dan nilai ujian, dengan komposisi sebagai berikut: a. nilai tugas individu/kelompok, nilai presensi bobot 1

(5)

Kontrak (2)

Tugas

Tugas pada mata kuliah ini dapat bersifat tugas individu atau tugas kelompok, dan pemberian tugas oleh dosen dilakukan pada saat

perkuliahan. Tidak ada toleransi terhadap keterlambatan penyerahan/ pengumpulan tugas, kecuali ada alasan yang adapat

dipertanggungjawabkan. Persyaratan Mengikuti Kuliah

Sesuai dengan Tata Tertib Mengikuti Kuliah yang ditetepkan oleh UNNES. Telah membaca dan membawa sekurang-kurangnya buku referensi utama

pada setiap perkuliahan. Lain-lain:

Toleransi keterlambatan untuk dosen dan mahasiswa adalah 30 menit dari jadual dan yang masuk ke kelas terakhir adalah dosen

(6)

(7)

• econometrics means “economic measurement • . . . econometrics may be defined as the

quantitative analysis of actual economic phenomena based on the concurrent

development of theory and observation, related by appropriate methods of inference

• Econometrics is concerned with the empirical determination of economic

(8)

WHY A SEPARATE DISCIPLINE?

•econometrics is an amalgam of economic theory (makes statements or hypotheses that are mostly qualitative in nature), mathematical economics (to express economic theory in mathematical form (equations) without regard to measurability or empirical verification of the theory),

economic statistics (collecting, processing, and presenting economic data in the form of charts and tables), and

(9)

METHODOLOGY OF

ECONOMETRICS

1. Statement of theory or hypothesis.

2. Specification of the mathematical model of the theory

3. Specification of the statistical, or econometric, model

4. Obtaining the data

5. Estimation of the parameters of the econometric model

6. Hypothesis testing

7. Forecasting or prediction

(10)

To illustrate the preceding steps

1.Statement of Theory or Hypothesis

The fundamental psychological law . . . is

that men [women] are disposed, as a rule

and on average, to increase their

consumption as their income increases, but

not as much as the increase in their income

(11)

2. Specification of the Mathematical Model of Consumption

• Y = β1 + β2X 0 < β2 < 1 (I.3.1)

where Y = consumption expenditure and X = income, and where β1 and β2, known as the

parameters of the model, are, respectively, the

(12)

(13)

3. Specification of the Econometric Model of Consumption

• Mathematical Model are exact or deterministic

relationship between consumption and income. But relationships between economic variables are generally inexact

• Y = β1 + β2X + u (I.3.2)

(14)

(15)

4. Obtaining Data

(16)

(17)

(18)

5. Estimation of the Econometric Model

• For now, note that the statistical technique

of

regression analysis

is the main tool

(19)

• Y

ˆ = −184

.

08 + 0

.

7064

Xi

(20)

6. Hypothesis Testing

• _{Statistical inference (hypothesis}

(21)

7. Forecasting or Prediction

• To illustrate, suppose we want to predict

the mean consumption expenditure for

1997. The GDP value for 1997 was 7269.8

billion dollars

• _Y

_{ˆ1997 = −184}

_.

_{0779 + 0}

_.

_{7064 (7269}

_.

_{8) =}

(22)

(23)

The Eight Components of

Integrated Service Management

1. Product Elements

2. Place, Cyberspace, and Time 3. Process

4. Productivity and Quality 5. People

6. Promotion and Education 7. Physical Evidence

8. Price and Other User Outlays

(24)

Marketing management (Philip

Kotler twelfth edition

• Product is the first and most important

element of the marketing mix. Product

strategy calls for making coordinated

(25)

Initial public offering

• Emiten

• Underwriter

• Auditor

(26)

2. THE NATURE OF

(27)

(28)

THE MODERN INTERPRETATION

OF REGRESSION

Regression analysis is concerned with the study of the dependence of one variable, the dependent

variable, on one or more other variables, the

explanatory variables,with a view to estimating

and/or predicting the (population) mean or average value of the former in terms of the known or fixed (in repeated sampling) values of the latter

Contoh : how the average height of sons changes, given the fathers’ heigh ; Distribution in a

(29)

(30)

Measurement Scales of Variables

• _{Ratio Scale}_{For a variable}_X_{, taking two values,}

X1 and X2, the ratio X1/X2 and the distance (X2 − X1) are meaningful quantities

• _{Interval Scale}_{the distance between two time} periods, say (2000–1995) is meaningful, but not the ratio of two time periods (2000/1995)

• _{Ordinal Scale}_{Examples are grading systems} (A, B, C grades) or income class (upper, middle, lower).

(31)

TWO-VARIABLE REGRESSION

ANALYSIS:SOME BASIC IDEAS

the simplest possible regression analysis,

namely, the

bivariate,

or

twovariable,

regression in which the dependent variable

(the regressand) is related to a single

(32)

A HYPOTHETICAL EXAMPLE

in the table refer to a total population of 60 families in a

(33)

E(Y | Xi) = β1 + β2Xi

where β1 and β2 are unknown but fixed parameters known as the

regression coefficients; β1 and β2 are also known as intercept and

slope coefficients, respectively. Equation (2.2.1) itself is known as the

linear population regression function. Some alternative expressions

used in the literature are linear population regression model or simply

(34)

THE MEANING OF THE TERM

LINEAR

• Linearity in the Variables (a regression function such as E(Y | Xi) =

β1 + β2X2i is not a linear function because the variable X appears

with a power or index of 2.

• Linearity in the Parameters (E(Y | Xi) = β1 + β2X2i is a linear (in the

parameter) regression model ; E(Y | Xi) = β1 + 3β2 x2 , which is

(35)

(36)

STOCHASTIC SPECIFICATION OF

population regression function (PRF)

family consumption expenditure on the average increases, the relationship between an individual family’s consumption

expenditure and a given level of income?

where the deviation ui is an unobservable random variable taking

positive or negative values. Technically, ui is known as the

(37)

THE SIGNIFICANCE OF THE STOCHASTIC DISTURBANCE TERM (1)

1. Vagueness of theory (The theory, if any, determining the behavior of Y may be, and often is, incomplete)

2. Unavailability of data (family wealth as an explanatory variable in addition to the income variable to explain family consumption expenditure. But unfortunately, information on family wealth generally is not available

3. Core variables versus peripheral variables (Assume in our

consumptionincome example that besides income X1, the number of children per family X2, sex X3, religion X4, education X5, and geographical region X6 also affect consumption expenditure

4. Intrinsic randomness in human behavior

(38)

THE SIGNIFICANCE OF THE STOCHASTIC DISTURBANCE TERM (2)

1. Principle of parsimony (we would like to keep our regression model as simple as possible

2. Wrong functional form (we do not know the form of the

(39)

(40)

(41)

(42)

3. TWO-VARIABLE REGRESSION

MODEL: THE PROBLEM OF

(43)

TWO-VARIABLE REGRESSION MODEL: THE PROBLEM OF ESTIMATION (ordinary least square)

(44)

(45)

(46)

(47)

(48)

(49)

THE COEFFICIENT OF DETERMINATION r 2:

A MEASURE OF “GOODNESS OF FIT”

(50)

The fundamental psychological law . . . is that men [women] are disposed, as a rule and on average, to increase their consumption as their income increases, but not by as much as the increase in their income,” that is, the marginal propensity to consume (MPC) is greater

(51)

(52)

(53)

Variables Entered/Removedb

Removed Method

All requested variables entered. a.

Dependent Variable: Konsumsi b.

Model Summary

,981a ,962 ,957 6,4930 ,962 202,868 1 8 ,000 Model

1

R R Square

Adjusted R Square

Std. Error of the Estimate

R Square

Change F Change df1 df2 Sig. F Change Change Statistics

(54)

ANOVAb

8552,727 1 8552,727 202,868 ,000a

337,273 8 42,159

8890,000 9

Regression

Squares df Mean Square F Sig.

Predictors: (Constant), Pendapatan a.

Dependent Variable: Konsumsi b.

Coefficientsa

24,455 6,414 3,813 ,005 ,509 ,036 ,981 14,243 ,000 (Constant)

Pendapatan Model

1

B Std. Error Unstandardized

(55)

THE RELATIONSHIP BETWEEN EARNINGS AND EDUCATION

(56)

Notes

• Alasan menggunakan adjusted R2 karena nilai R2 bias, setiap tambahan satu variabel pada variabel independent akan meningkat tidak

peduli variabel tersebut berpengaruh signifikan atau tidak

• Alasan menggunakan standarized beta mampu mengeliminasi perbedaan unit/ukuran pada

variabel independent (butir, ekor) namun tidak dapat diketahui multikolinieritas (korelasi antar var bebas), nilai beta tidak dapat

(57)

TWO-VARIABLE REGRESSION MODEL: THE PROBLEM OF ESTIMATION

Recall the two-variable PRF

where ˆYi is the estimated (conditional mean) value of Yi .

(58)

CLASSICAL NORMAL LINEAR REGRESSION MODEL (CNLRM)

• Using the method of OLS we were able to

estimate the parameters

β

1,

β

2, and

σ

2. Under the assumptions of the

classical

(59)

(60)

Asumsi Klasik

• Model regresi linier : terspesifikasi benar dan error term additif

• Nilai rata-rata yang diharapkan disturbance error term = 0

• Kovarian distrubance dengan x = nol

• Varian dari variabel residu, disturbance adalah sama atau homokedastisitas

• Tidak ada otokorelasi antar variabel disturbance • Tidak ada korelasi sempurna antar variabel

bebas

(61)

HYPOTHESIS TESTING: GENERAL COMMENTS

• HYPOTHESIS TESTING: GENERAL COMMENTS (Is a given observation or finding compatible with some stated hypothesis or not?)

• In the language of statistics, the stated hypothesis is known as the null hypothesis and is denoted by the

symbol H0. The null hypothesis is usually tested against an alternative hypothesis (also known as maintained

hypothesis) denoted by H1

• reject or not reject the null hypothesis

• There are two mutually complementary approaches for devising such rules,

(62)

Type kesalahan

Hipotesis o Menerima Ho Menolak Ho

Jika Ho benar Keputusan tepat Kesalahan jenis I

(63)

HYPOTHESIS TESTING:

THE CONFIDENCE-INTERVAL APPROACH

• Two-Sided or Two-Tail Test To illustrate the

confidence-interval approach, once again we revert to the consumption– income example. As we know, the estimated marginal

propensity to consume (MPC), ˆ β2, is 0.5091. Suppose we postulate that H0: β2 = 0.3 ; H1: β2 = 0.3

(64)

(65)

HYPOTHESIS TESTING:

THE CONFIDENCE-INTERVAL APPROACH

• One-Sided or One-Tail Test

Sometimes

we have a strong a priori or theoretical

expectation (or expectations based on

some previous empirical work) that the

alternative hypothesis is one-sided or

unidirectional rather than two-sided, as

just discussed. Thus, for our

consumption–income example, one could

postulate that

H

0:

β

2 ≤ 0

.

3 and

H

1:

β

2 >

(66)

• HYPOTHESIS TESTING: THE TEST-OF-SIGNIFICANCE APPROACH

• Testing the Significance of Regression Coefficients:

The t Test

• which gives the interval in which ˆ β2 will fall with 1 − α

probability, given β2 = β*2. In the language of hypothesis testing, the 100(1 − α)% confidence interval established in (5.7.2) is known as the region of acceptance (of the null hypothesis) and the region(s) outside the confidence interval is (are) called the region(s) of rejection (of H0) or the critical region(s). As noted previously, the

(67)

(68)

(69)

(70)

(71)

(72)

What is the nature of

multicollinearity

• Model regresi yang baik, seharusnya tidak

terjadi korelasi diantara variabel

independen.

• Jika berkorelasi maka variabel tidak

ortogonal (korelasi antar variabel

(73)

(74)

Ciri-Ciri Multikolinieritas (Ghozali,

2005)

• Nilai R square yang dihasilkan dari estimasi model regresi tinggi, namun secara individual variabel independent banyak yang tidak

signifikan -> dependen

• Antar variabel independent memiliki korelasi >0,9

(75)

THE NATURE OF

MULTICOLLINEARITY

• it meant the existence of a “perfect,” or exact, linear relationship among some or all

(76)

(77)

multicollinearity may be due to the

following factors

• The data collection method employed, for

example, sampling over a limited range of the values taken by the regressors in the population

• Constraints on the model or in the population

being sampled

• Model specification

• _{An overdetermined model.}_{This happens when}

(78)

Cara mengobati multikolinieritas

1. Menggabungkan data cross section dan time series

2. Keluarkan satu atau lebih variabel independen yang memp nilai korelasi tinggi (0,94%)

3. Transformasi variabel

4. Gunakan model untuk prediksi bukan interpretasi

5. Gunakan center data untuk analisis (data mentah – mean)

(79)

AUTOCORRELATION:

WHAT HAPPENS IF

(80)

three types of data

(1) cross section

(2) time series

(81)

• correlation between members of series of

observations ordered in time [as in time series data] or space [as in cross-sectional data]

• autocorrelation as “lag correlation of a given series with itself, lagged by a number of time units,’’ whereas he reserves the term serial correlation to “lag correlation between two different series.

(82)

(83)

(84)

indicates that both linear and

(85)

(86)

(87)

DETECTING

AUTOCORRELATION

(88)

• Autokorelasi dalam konsep regresi linier berarti komponen error berkorelasi berdasarkan urutan waktu (pada data timeseries) atau urutan ruang (pada data cross-sectional).

• Contoh data timeseries (terdapat urutan waktu) misalnya pengaruh biaya iklan terhadap penjualan dari bulan januari hingga bulan

desember. Sedangkan data cross-sectional adalah data yang tidak ada urutan waktu, misal pengaruh konsentrasi zat X terhadap

kecepatan reaksi suatu senyawa kimia.

• Untuk mendeteksi ada atau tidaknya autokorelasi, dapat dilakukan dengan menggunakan statistik uji Durbin-Watson. Apabila nilai D-W berada di sekitar angka 2, berarti model regresi kita aman dari

(89)

Menanggulangi autokorelasi

• Beberapa uji statistik yang sering dipergunakan adalah uji Durbin-Watson atau uji dengan Run Test dan jika

data observasi di atas 100 data sebaiknya menggunakan uji Lagrange Multiplier. Beberapa cara untuk

menanggulangi masalah autokorelasi adalah dengan mentransformasikan data atau bisa juga dengan

mengubah model regresi ke dalam bentuk persamaan beda umum (generalized difference equation). Selain itu juga dapat dilakukan dengan memasukkan variabel lag dari variabel terikatnya menjadi salah satu variabel

(90)

(91)

Korelasi

• Korelasi antara x(t) dan y(t) dinamakan

dengan

cross-correlation

, dirumuskan

(92)

Auto-korelasi

• Korelasi x(t) dengan dirinya sendiri disebut

auto-korelasi



x t d x

t x t

x t

C_xx( ) _ ( ) _ ( ) _

_

( ) ( _ )



(93)

Korelasi

• Contoh

(94)

Korelasi

1. Untuk 1.5+p>1 atau p>-0.5

1

t

0 ₁

h(t)

1.5+p 2.5+p

1

t x(t)

0 )

(95)

Korelasi

2. Untuk 1.5+p<1 dan 1.5+p>0, atau -1.5<p<-0.5

(96)

Korelasi

3. Untuk 1.5+p<0 dan 2.5+p>1, atau -1.5<p<1.5

(97)

Korelasi

4. Untuk 2.5+p<0 atau p<-2.5

1

t

2.5+p 1.5+p

x(t-p) h(t)

0 )

( p _ C_xh

1

p y(p)

-2.5 -0.5

(98)

Korelasi

1. Untuk 1+p<1.5 atau p<0.5

(99)

Korelasi

2. Untuk 1+p>1.5 dan 1+p<2.5, atau 0.5<p<1.5

(100)

Korelasi

3. Untuk p<2.5 dan 1+p>2.5, atau 1.5<p<2.5

(101)

Korelasi

1

t

p 1+p

x(t) h(t-p)

4. Untuk p>2.5

0 )

( p _ C_xh

1

p y(p)

2.5 0.5

(102)

Autokorelasi

1. Untuk 0<p<1, maka

(103)

Autokorelasi

2. Untuk 0>p>-1, karena p negatif, maka geser kiri

(104)

Autokorelasi

1

p y(p)

-1 +1

1+p 1-p

3. Untuk p>1 dan p<-1,

0 )

(105)

(106)

ILUSTRASI ANALISIS REGRESI

Apakah Skor Tes Masuk dan Peringkat kelas di SMU mempengaruhi Nilai Mutu Rata – rata Mahasiswa Tingkat Pertama ?

Variabel Dependen :

NMR (Y)

Variabel Independen :

Skor Tes (X1)

(107)

(108)

LANGKAH -LANGKAH

• Masukkan data pada SPSS Data Editor

• Pilih Analyze > Regression > Linear

1. Pilih dependen Variable

2. Pilih Independen Variables

3. Pada pilihan Statistics, aktifkan : Collinearity Diagnostics

Durbin Watson

Klik Continue

4. Pada pilihan Plot, aktifkan Normal Probability Plot. Klik

Continue

5. Pada Pilihan Save,

~ Predicted Value, aktifkan Unstandardized

~ Residual, aktifkan Studentized

Klik Continue

(109)

HASIL ANALISIS

• Regression

Model Summaryb

.691a _.478 _.417 _.4915 _2.254

Model 1

R R Square

Adjusted R Square

Durbin-W atson

Predictors: (Constant), PERINGKA, SKORTES a.

Dependent Variable: NMR b.

ANOVAb

3.762 2 1.881 7.786 .004a

4.107 17 .242

7.869 19

Regression

Squares df Mean Square F Sig.

Dependent Variable: NMR b.

Coefficientsa

1.269 .978 1.298 .212

2.769E-03 .002 .275 1.568 .135 .998 1.002

-.184 .050 -.648 -3.692 .002 .998 1.002

(Constant) SKORTES PERINGKA Model

1

t Sig. Tolerance VIF

Collinearity Statistics

(110)

PEMERIKSAAN ASUMSI

1. ASUMSI NORMALITAS ERROR

Hasil P-P plot menunjukkan pola garis lurus mendekati sudut 450, sehingga asumsi normalitas sisaan terpenuhi

Dependent Variab le: NMR

(111)

PEMERIKSAAN ASUMSI

2. ASUMSI AUTOKORELASI

Diperoleh nilai d = 2.254

Kaidah Uji Durbin Watson : Disimpulkan tidak ada autokorelasi bila du < d < 4 – du, Nilai du dapat dilihat di Tabel

Dengan n = 20 dan k (banyak variable bebas) = 2, diperoleh nilai du = 1.54 dan 4 – du = 4 – 1.54 = 2.46

Karena du = 1.54 < d = 2.254 < 4 – du = 2.46 maka dapat diterima bahwa asumsi nonautokorelasi terpenuhi

Model Summaryb

.691a _.478 _.417 _.4915 _2.254

Model 1

R R Square

Adjusted R Square

Durbin-W atson

(112)

Collinearity Diagnosticsa

2.725 1.000 .00 .00 .04 .269 3.185 .01 .01 .96 6.397E-03 20.639 .99 .99 .00 Dimension

Index (Constant) SKORTES PERINGKA Variance Proportions

Dependent Variable: NMR a.

PEMERIKSAAN ASUMSI

3. ASUMSI MULTIKOLINEARITAS

Condition Index = 20.639 < 30

Nilai VIF untuk skortes = 1.002 < 10 Nilai VIF untuk peringkat = 1.002 <10 Jadi tidak terdapat multikolinearitas

Coefficientsa

1.269 .978 1.298 .212

2.769E-03 .002 .275 1.568 .135 .998 1.002 -.184 .050 -.648 -3.692 .002 .998 1.002 (Constant)

SKORTES PERINGKA Model

1

t Sig. Tolerance VIF Collinearity Statistics

(113)

PEMERIKSAAN ASUMSI

4. ASUMSI

HETEROSKEDASTISITAS

Plotkan residual terstudentkan dengan nilai dugaan.

a. Pilih Graphs > Scatter > Simple.

b. Pilih Define

Pilih Stundentized Residual sebagai Y axis

Pilih Unstundardized predicted value sebagai X axis

Klik OK Plot antara residual terstudentkan

dengan nilai dugaan berpola acak, sehingga asumsi

homoskedastisitas terpenuhi

Unstandardized Predicted Value

(114)

INTERPRETASI

VALIDASI MODEL

Koefisien determinasi (R2) = 0.478

Artinya kontribusi pengaruh skor tes dan peringkat terhadap nilai mutu rata-rata sebesar 47.8%. Sedang sisanya dipengaruhi oleh

variabel lain yang belum ada dalam model

Bila kita melakukan prediksi besarnya NMR berdasar skor tes dan perigkat, maka tingkat akurasinya sebesar 47.8%

Uji F melalui ANOVA Regresi menghailkan p = 0.004 Uji koefisien regresi secara simultan signifikan

(115)

INTERPRETASI

Model hasil regresi

NMR = 1.269 + 0.002769 Skor tes – 0.184 Peringkat 1. Penjelasan terhadap fenomena

Variabel yang berpengaruh secara signifikan adalah peringkat dengan koefisien regresi – 0.184

(116)

INTERPRETASI

2. Prediksi

Misal terdapat seorang anak dengan Skor tes 550 dengan peringkat 4, maka berapa NMR – nya?

NMR = 1.269 + 0.002769 (550) – 0.184 (4) = 2.05

Prediksi NMR adalah 2.05

Tingkat akurasi dari hasil prediksi ini adalah sebesar 47.8% (relatif rendah), akan tetapi bersifat general (karena nilai p untuk uji F

(117)

INTERPRETASI

3. Faktor determinan

ZNMR = 0.275 ZSkor tes- 0.648 Zperingkat

Variabel yang berpengaruh paling kuat terhadap NMR adalah

peringkat, kemudian Skor tes. (Koefisien standardize Beta terbesar berarti pengaruhnya paling kuat, seandainya seluruh variabel

signifikan). Dalam contoh ini yang signifikan hanya peringkat,

sehingga yang berpengaruh secara bermakna terhadap NMR hanya peringkat.

Coefficientsa

1.269 .978 1.298 .212

2.769E-03 .002 .275 1.568 .135 .998 1.002 -.184 .050 -.648 -3.692 .002 .998 1.002 (Constant)

SKORTES PERINGKA Model

1

t Sig. Tolerance VIF Collinearity Statistics

(118)

HETEROSCEDASTICITY

WHAT HAPPENS IF THE

ERROR VARIANCE IS

(119)

(120)

(121)

THE CLASSICAL LINEAR

REGRESSION MODEL

PRF:

Yi

=

β

1 +

β

2 Xi

+

ui .

It shows that

Yi

depends on both

Xi

and

ui

. Therefore,

unless we are specific about how

Xi

and

ui

are created or generated, there is no way

we can make any statistical inference about

the

Yi

and also, as we shall see, about

β

1 and

β

2. Thus, the assumptions made about

the

Xi

variable(s) and the error term are

(122)

There are several reasons why the variances of ui

may be variable, some of which are as follows

• Following the error-learning models

• As incomes grow, people have more discretionary income2 and hence more scope for choice about the disposition of their income. Hence, σ2i is likely to increase with income

• As data collecting techniques improve, σ2i is likely to decrease • Heteroscedasticity can also arise as a result of the presence of

outliers

• the regression model is correctly specified (ex demand function for a commodity, if we do not include the prices of commodities

complementary to or competing with the commodity in question (the omitted variable bias)

(123)

There are several reasons why the variances of ui

may be variable, some of which are as follows

• Another source of heteroscedasticity is skewness in the distribution of one or more regressors included in the

model. Examples are economic variables such as

income, wealth, and education. It is well known that the distribution of income and wealth in most societies is uneven, with the bulk of the income and wealth being owned by a few at the top.

• Heteroscedasticity can also arise because of (1)

(124)

(125)

what happens to the regression results if the observations for Chile are dropped from the

(126)

• the problem of heteroscedasticity is likely

to be more common in cross-sectional

than in time series data. In cross-sectional

data, one usually deals with members of a

population at a given point in time, such as

individual consumers or their families,

firms, industries, or geographical

(127)

(128)

(129)

DETECTION OF

HETEROSCEDASTICITY

• as in the case of multicollinearity, there are

no hard-and-fast rules for detecting

heteroscedasticity, only a few rules of

thumb (need most economic

investigations. In this respect the

(130)

(131)

(132)

(133)

DUMMY VARIABLE

(134)

model is based on several simplifying

assumptions, which are as follows

• The regression model is linear in the parameters

• The values of the regressors, the X’s, are fixed in repeated sampling.

• For given X’s, the mean value of the disturbance ui is zero • For given X’s, there is no autocorrelation in the disturbances

• If the X’s are stochastic, the disturbance term and the (stochastic) • X’s are independent or at least uncorrelated

• The number of observations must be greater than the number of regressors

• There must be sufficient variability in the values taken by the regressors.

• The regression model is correctly specified

• There is no exact linear relationship (i.e., multicollinearity) in the regressors.

(135)

four types of variables

• ratio scale, interval scale, ordinal scale,

and

nominal scale

• known as

indicator variables,

categorical variables, qualitative

(136)

THE NATURE OF DUMMY

VARIABLES

• In regression analysis the dependent variable, or

regressand, is frequently influenced not only by ratio scale variables (e.g., income, output, prices, costs, height, temperature)

• qualitative,or nominal scale, in nature, such as sex, race, color, religion, nationality, geographical region, political upheavals, and party affiliation

• As a matter of fact, a regression model may contain

(137)

Dummy Variables

• Dummy variables refers to the technique of

using a dichotomous variable (coded 0 or 1) to represent the separate categories of a nominal level measure.

(138)

Coding of dummy Variables

• Take for instance the race of the respondent in a study of voter preferences

– Race coded white(0) or black(1)

• There are a whole set of factors that are possibly

different, or even likely to be different, between voters of different races

– Income, socialization, experience of racial discrimination, attitudes toward a variety of social issues, feelings of

political efficacy, etc

(139)

Multiple categories

• Now picture race coded white(0), black(1), Hispanic(2), Asian(3) and Native American(4) • If we put the variable race into a regression

equation, the results will be nonsense since the coding implicitly required in regression assumes at least ordinal level data – with approximately equal differences between ordinal categories. • Regression using a 3 (or more) category

(140)

Creating Dummy variables

• The simple case of race is already coded correctly

– Race: coded 0 for white and 1 for black

• Note the coding can be reversed and leads only to changes in sign and direction of interpretation.

• The complex nominal version turns into 5 variables:

– White; coded 1 for whites and 0 for non-whites – Black; coded 1 for blacks and 0 for non-blacks

– Hispanic; coded 1 for Hispanics and 0 for non- Hispanics – Asian; coded 1 for Asians and 0 for non- Asians

(141)

Regression with Dummy Variables

• The dummy variable is then added the regression model

• Interpretation of the dummy variable is usually quite straightforward.

– The intercept term represents the intercept for the omitted category

– The slope coefficient for the dummy variable represents the change in the intercept for the category coded 1 (blacks)

i i

i

a

B

X

B

Race

e

(142)

Regression with only a dummy

• When we regress a variable on only the dummy variable, we obtain the estimates for the means of the depended variable.

• a is the mean of Y for Whites and a+B₁ is the mean of Y for Blacks

i i

i

a

B

Race

e

(143)

Omitting a category

• When we have a single dummy variable, we have information for both categories in the model

• Also note that

White = 1 – Black

• Thus having both a dummy for White and one for Blacks is redundant.

• As a result of this, we always omit one category, whose intercept is the model’s intercept.

• This omitted category is called the reference category

– In the dichotomous case, the reference category is simply the category coded 0

(144)

Suggestions for selecting the

reference category

• Make it a well defined group – other is usually a poor choice.

• If there is some underlying ordinality in the

categories, select the highest or lowest category as the reference. (e.g. blue-collar, white-collar, professional)

(145)

Multiple dummy Variables

• The model for the full dummy variable scheme for race is:

• Note that the dummy for White has been

(146)

Tests of Significance

• With dummy variables, the t tests test

whether the coefficient is different from the

reference category, not whether it is

different from 0.

• Thus if

a

= 50, and

B

1

= -45, the coefficient

(147)

Interaction terms

• When the research hypothesizes that different categories may have different responses on other independent variables, we need to use interaction terms

• For example, race and income interact with each other so that the relationship between income

(148)

Creating Interaction terms

• To create an interaction term is easy

– Multiply the category * the independent variable – The full model is thus:

• a is the intercept for Whites;

• (a + B1) is the intercept for Blacks;

• B2 is the slope for Whites; and

• (B2 + B3) is the slope for Blacks

• t-tests for B1 and B3 are whether they are different than a and B2

i i

i a B Race B Income B Race Income e

(149)

Non-Linear Models

• Tractable non-linearity

– Equation may be transformed to a linear model.

• Intractable non-linearity

(150)

Tractable Non-Linear Models

• Several general Types

– Polynomial

– Power Functions

(151)

Polynomial Models

• Linear

• Parabolic

• Cubic & higher order polynomials

• All may be estimated with OLS – simply square, cube, etc. the independent variable.