• Tidak ada hasil yang ditemukan

An Introduction to Stata

N/A
N/A
Protected

Academic year: 2018

Membagikan "An Introduction to Stata"

Copied!
43
0
0

Teks penuh

(1)

AN INTRODUCTION TO

STATA

Ekki Syamsulhakim

[email protected]

Yangki Imade Swara

(2)

Session 2

Importing Data from Excel

Creating graphs / scatterplot

Dropping variable(s)

Keeping variable(s)

(3)

Session 3

Merging Dataset

Some important notes on doing applied economic

research

Introduction to IFLS Data

The Books

The CodebooksThe Data

(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)

Exercise: merging

Make a do-file!

clear

cd D:\stata-training\data

use nlsw88_indchar

sort idcode

des

(16)

Exercise: merging

Continue the do file

clear

use nlsw88_empl

sort idcode

des

save nlsw88_empl, replace

use nlsw88_indchar, replace

merge 1:1 idcode using nlsw88_empl

drop _merge

(17)

Checking for duplicates

Continue the do file

(18)

if

command

if

at the end of a command means the command is to

use only the data specified.

If

is allowed with most

Stata commands.

tab occupation never_married if age>40,

(19)

IFLS

(20)
(21)
(22)

Getting familiar with the data

Rule #3: Know the context! (Kennedy, 2000, page 5)

“It is crucial that one become intimately familiar with the

phenomenon being investigated — its history, institutions,

(23)

Getting familiar with the data

Rule #3: Know the context! (Kennedy, 2000, page 5)

“Exactly how were the data gathered? Did government agencies

impute the data using unknown formulas? How were the interviewees selected? What instructions were given to the

(24)

Getting familiar with the data

Rule #4: Inspect the Data! (Kennedy, 2000, page 5-6)

“Inspecting the data involves summary statistics, graphs, and data

cleaning, to both check and ‘get a feel for’ the data. Summary

statistics can be very simple, such as calculating means, standard errors, maximums, minimums, and correlation matrices,…”

“The advantage of graphing is that graphics broadcast whereas

statistics narrowcast, or, as Tukey (1977, p.vi) notes: ‘The greatest value of a picture is when it forces us to notice what we never

(25)

Getting familiar with the data

Rule #4: Inspect the Data! (Kennedy, 2000, page 5-6)

“Data cleaning looks for inconsistencies in the data — are any

observations impossible, unrealistic, or suspicious? According to Rao (1997, p. 152), ‘Every number is guilty unless proved

innocent’… “

“…Do you know how missing data were coded? Are dummies all

(26)

Getting familiar with the data

#3. Thou shalt know the context.

Corollary: Thou shalt not perform ignorant statistical analyses.

#4. Thou shalt inspect the data.

Corollary: Thou shalt place data cleanliness ahead of econometric

(27)

Getting familiar with the data

Read the

user’s guide(s)

of IFLS 3 Data

Get to know about

The story / history of IFLS (including IFLS 1, 2…)The sampling mechanism

The questionnaire(s)

The type of software compatible to analyse the data

(28)

Know the context

(29)

Know the context

May also be useful to consider to use “expenditure”

instead

Often used as a better proxy of “permanent income” (Cornwell,

2009, among others)

(30)

Our Case

We want to know the relationship between Subjective

Well Being and Income

Even though what we are doing is probably a ‘stylised

fact’, we need to have an economic theory (or mixed with

psychology theory?) to explain the relationship

(31)

Model and Variables

We will use Linear Probability Model and Probit model to

estimate the equation

(32)

Variables

Variable (D

or C) Question in Questionnaire Instrument for the variable Book

Happiness

(D) Taken all things together how would you say things are these days

SW12 3A

Highest level of schooling ever completed by HHM

AR16 K

Marital

(33)

Variables

Variable (D or C)

Question in Questionnaire

Instrument for the variable

Book

HH Income

(C) Proxied by Total HH expenditure per adults– lots of question!

Downloaded from RAND website (made by Firman Witoelar)

- 3A - 3B

Optimism (D)

Knowing about how prices change

in recent year, do you think you can

keep the standard of living you have

today in the next 5 years?

SW03A 3A

Working

(34)

Variables

Variable Question in Questionnaire

Code Book

Ethnicity Ethnicity AR15D K

Has

children (D) Relationship with HH Head AR02

- K

- Ptrack Urban (D) Urban / rural residence SC_0597 Htrack Location

(D) Provincial codes SC_01xx Htrack

(35)
(36)
(37)

Downloading IFLS Data

Create a folder for original IFLS Data

For this training, create D:\stata-training\IFLS4m

Download from

https://

sites.google.com/a/fe.unpad.ac.id/ekki/stata-training

bk_ar1.dta save in d:\stata-training\IFLS4m

(38)

From instrument to variable

Create a do file, save it as “IFLS-training-step1.do”

* STEP 1 *

Clear

* CREATING A MACRO TO DEFINE FOLDERS *

---global dir00 "D:\stata-training\log\" global dir01 "D:\stata-training\data\" global dir02 "D:\stata-training\output\"

* Directories being used to get original data *

(39)

* STEP 2 *

* THIS DO-FILE CONTAINS STEPS

* IN CLEANING DATA FOR HAPPINESS MODEL USING IFLS 4 * By Ekki Syamsulhakim, CEDS UNPAD

clear

set mem 200m

* Loading Original Data

use $dir1\bk_ar1, clear des

sort pidlink

* Keeping important variables

keep ar01a ar02 ar02b ar07 ar07x ar09 ar10 ar11 ar13 /// ar15c ar15d ar16 ar17 ar18h hhid07 pid07 pidlink

(40)

Next step

Renaming variables

Which one better: SEX or MALE or FEMALE?

Which one better: AR15c or Activity_Past_Week or ActvtPastWk ?Should we change PIDLINK, PID07 or HHID07?

Generating variables to be used in regression

(41)

Renaming variables

* renaming variables

rename ar01a hhm_lives_inhh

rename ar02 rel_to_hhhead

rename ar07x male!

rename ar07 male

rename ar09 age

rename ar02b rel_to_hh

rename ar10 id_num_father

rename ar11 id_num_mother

rename ar13 marital_status

rename ar15d ethnic

rename ar15c activt_pstwk

rename ar16 educ_lvl

rename ar17 educ_grade

(42)

Generating Married Dummy Variables

First we want to create a dummy variable “married”1 if married, 0 otherwise

In IFLS, marital status is coded as:

1. Not married

2. Married

3. Separated

4. Divorced

5. Widow/er

(43)

Generating Married Dummy Variables

* dummy variable married

gen married=1 if marital_status==2

replace married=1 if marital_status!=2

Referensi

Dokumen terkait

I think we’re going to see that emerging as there’s more access and more tools for people to do stuff with their data once they get it through things like the health data

Tyang menyenangkan, dll). Guru dapat membantu menumbuhkan sikap dan persepsi yang positif terhadap tugas-tugas kelas dengan cara memberikan pemahaman akan nilai tugas, kejelasan

Karena secara tidak langsung telah mengubah model bisnis secara signifikan disamping itu juga telah mengganggu teknologi oleh sebab itu Tujuan baru perusahaan

C4.5 DECISION TREE IMPLEMENTATION IN SISTEM INFORMASI ZAKAT (SIZAKAT) TO AUTOMATICALLY DETERMINING THE AMOUNT OF ZAKAT RECEIVED BY MUSTAHIK.. David Bayu Ananda and

Pengeluaran pemerintah merupakan salah satu indikator yang mempengaruhi pertumbuhan ekonomi di Indonesia. Pengeluaran pemerintah adalah belanja sektor pemerintah termasuk

Hubungan Penguasaan Relativsatz Dengan Kemampuan Menerjemahkan Teks Bahasa Jerman.. Skripsi: Fakultas Bahasa

meningkatkan semangat kerja karyawan adalah dengan memberikan insentif.. kepada karyawan agar dapat memotivasi dan menstimulus perkerjaan

HUBUNGAN MINAT MEMBACA D ENGAN KETERAMPILAN MEMAHAMI ISI TEKS BAHASA JERMAN.. Universitas Pendidikan Indonesia | repository.upi.edu