Session 2
•
Importing Data from Excel
•Creating graphs / scatterplot
•Dropping variable(s)
•
Keeping variable(s)
Session 3
•
Merging Dataset
•
Some important notes on doing applied economic
research
•
Introduction to IFLS Data
• The Books
• The Codebooks • The Data
Exercise: merging
Make a do-file!
•clear
•
cd D:\stata-training\data
•
use nlsw88_indchar
•
sort idcode
•des
Exercise: merging
Continue the do file
•clear
•
use nlsw88_empl
•sort idcode
•
des
•
save nlsw88_empl, replace
•
use nlsw88_indchar, replace
•
merge 1:1 idcode using nlsw88_empl
•
drop _merge
Checking for duplicates
Continue the do file
if
command
•
if
at the end of a command means the command is to
use only the data specified.
If
is allowed with most
Stata commands.
•
tab occupation never_married if age>40,
IFLS
Getting familiar with the data
•
Rule #3: Know the context! (Kennedy, 2000, page 5)
• “It is crucial that one become intimately familiar with the
phenomenon being investigated — its history, institutions,
Getting familiar with the data
•
Rule #3: Know the context! (Kennedy, 2000, page 5)
• “Exactly how were the data gathered? Did government agencies
impute the data using unknown formulas? How were the interviewees selected? What instructions were given to the
Getting familiar with the data
•
Rule #4: Inspect the Data! (Kennedy, 2000, page 5-6)
• “Inspecting the data involves summary statistics, graphs, and data
cleaning, to both check and ‘get a feel for’ the data. Summary
statistics can be very simple, such as calculating means, standard errors, maximums, minimums, and correlation matrices,…”
• “The advantage of graphing is that graphics broadcast whereas
statistics narrowcast, or, as Tukey (1977, p.vi) notes: ‘The greatest value of a picture is when it forces us to notice what we never
Getting familiar with the data
•
Rule #4: Inspect the Data! (Kennedy, 2000, page 5-6)
• “Data cleaning looks for inconsistencies in the data — are any
observations impossible, unrealistic, or suspicious? According to Rao (1997, p. 152), ‘Every number is guilty unless proved
innocent’… “
• “…Do you know how missing data were coded? Are dummies all
Getting familiar with the data
#3. Thou shalt know the context.
• Corollary: Thou shalt not perform ignorant statistical analyses.
#4. Thou shalt inspect the data.
• Corollary: Thou shalt place data cleanliness ahead of econometric
Getting familiar with the data
•
Read the
user’s guide(s)
of IFLS 3 Data
•Get to know about
• The story / history of IFLS (including IFLS 1, 2…) • The sampling mechanism
• The questionnaire(s)
• The type of software compatible to analyse the data
Know the context
Know the context
•
May also be useful to consider to use “expenditure”
instead
• Often used as a better proxy of “permanent income” (Cornwell,
2009, among others)
Our Case
•
We want to know the relationship between Subjective
Well Being and Income
•
Even though what we are doing is probably a ‘stylised
fact’, we need to have an economic theory (or mixed with
psychology theory?) to explain the relationship
Model and Variables
We will use Linear Probability Model and Probit model to
estimate the equation
Variables
Variable (D
or C) Question in Questionnaire Instrument for the variable Book
Happiness
(D) Taken all things together how would you say things are these days
SW12 3A
Highest level of schooling ever completed by HHM
AR16 K
Marital
Variables
Variable (D or C)
Question in Questionnaire
Instrument for the variable
Book
HH Income
(C) Proxied by Total HH expenditure per adults– lots of question!
Downloaded from RAND website (made by Firman Witoelar)
- 3A - 3B
Optimism (D)
Knowing about how prices change
in recent year, do you think you can
keep the standard of living you have
today in the next 5 years?
SW03A 3A
Working
Variables
Variable Question in Questionnaire
Code Book
Ethnicity Ethnicity AR15D K
Has
children (D) Relationship with HH Head AR02
- K
- Ptrack Urban (D) Urban / rural residence SC_0597 Htrack Location
(D) Provincial codes SC_01xx Htrack
Downloading IFLS Data
•
Create a folder for original IFLS Data
•
For this training, create D:\stata-training\IFLS4m
•
Download from
https://
sites.google.com/a/fe.unpad.ac.id/ekki/stata-training
• bk_ar1.dta save in d:\stata-training\IFLS4m
From instrument to variable
• Create a do file, save it as “IFLS-training-step1.do”
* STEP 1 *
Clear
* CREATING A MACRO TO DEFINE FOLDERS *
---global dir00 "D:\stata-training\log\" global dir01 "D:\stata-training\data\" global dir02 "D:\stata-training\output\"
* Directories being used to get original data *
* STEP 2 *
* THIS DO-FILE CONTAINS STEPS
* IN CLEANING DATA FOR HAPPINESS MODEL USING IFLS 4 * By Ekki Syamsulhakim, CEDS UNPAD
clear
set mem 200m
* Loading Original Data
use $dir1\bk_ar1, clear des
sort pidlink
* Keeping important variables
keep ar01a ar02 ar02b ar07 ar07x ar09 ar10 ar11 ar13 /// ar15c ar15d ar16 ar17 ar18h hhid07 pid07 pidlink
Next step
•
Renaming variables
• Which one better: SEX or MALE or FEMALE?
• Which one better: AR15c or Activity_Past_Week or ActvtPastWk ? • Should we change PIDLINK, PID07 or HHID07?
•
Generating variables to be used in regression
Renaming variables
* renaming variables
rename ar01a hhm_lives_inhh
rename ar02 rel_to_hhhead
rename ar07x male!
rename ar07 male
rename ar09 age
rename ar02b rel_to_hh
rename ar10 id_num_father
rename ar11 id_num_mother
rename ar13 marital_status
rename ar15d ethnic
rename ar15c activt_pstwk
rename ar16 educ_lvl
rename ar17 educ_grade
Generating Married Dummy Variables
• First we want to create a dummy variable “married” • 1 if married, 0 otherwise
• In IFLS, marital status is coded as:
1. Not married
2. Married
3. Separated
4. Divorced
5. Widow/er
Generating Married Dummy Variables
* dummy variable married
gen married=1 if marital_status==2
replace married=1 if marital_status!=2