Small Area Estimation Applications in the US Census Bureau Annual Survey of Employment and Payroll Evaluation

(1)

Small Area Estimation Applications in the US Census Bureau

Annual Survey of Employment and Payroll Evaluation

Bac Tran

Program Research Branch, Chief Governments Division

U.S. Census Bureau

(2)

Outline

 Target Population

 Population Parameters

 Sampling Frame

 Sample Design

 Small Area Challenges

 Estimators

 Evaluation

(3)

Target Population

 Individual governments

A government is an organized entity which, in addition to having governmental character, has sufficient discretion in the management of its own affairs to distinguish it as

separate from the administrative structure of any other governmental unit

 Types

o Counties

o Municipalities o Townships

o Special Districts

o School Districts

(4)

Parameters of Interest

Annual Survey of Employment and Payroll (ASPEP)

Full-time Employees Full-time Pay

Part-time Employees Part-time Pay

Part-time Hours

(5)

Parameters of Interest (Cont’d) ASPEP Publication

Statistics on the number of federal, state, and local government employees and their gross payrolls

(6)

Parameters of Interest Statistical Aggregation

 Totals

by (state, function)

 Level of government totals

o Local, state, state and local

o Nation

(7)

Parameters of Interest (Cont’d) Some Function Codes of ASPEP

001, Airport

002, Space Research & Technology (Federal) 005, Correction

006, National Defense and International Relations (Federal)

012, Elementary and Secondary - Instruction 112, Elementary and Secondary - Other Total 014, Postal Service (Federal)

016, Higher Education - Other 018, Higher Education - Instructional 021, Other Education (State)

022, Social Insurance Administration (State) 023, Financial Administration

024, Firefighters 124, Fire - Other 025, Judicial & Legal

029, Other Government Administration 032, Health

040, Hospitals

044, Streets & Highways

050, Housing & Community Development (Local) 052, Local Libraries

059, Natural Resources 061, Parks & Recreation

062, Police Protection - Officers 162, Police-Other

079, Welfare 080, Sewerage

081, Solid Waste Management 087, Water Transport & Terminals 089, Other & Unallocable

090, Liquor Stores (State) 091, Water Supply

092, Electric Power 093, Gas Supply 094, Transit

(8)

Sampling Frame

 Governments Integrated Directory (GID)

Created in 2007

 Unit ID: 14 digits

State (2) Type (1) County (3) Unit (3) SUP (3) SUB (2)

(9)

Sampling Frame (Cont’d)

Example of an unit ID

 33 2 031 001 000 00 = New York City

33 2 031 001 301 00 = New York City public school system (dependent on the city government)

33 2 031 001 302 00 = Fashion Institute (dependent post- secondary education agency)

33 2 031 001 303 00 = CUNY, City University of New York (dependent on the city government)

33 2 031 001 303 01 = Manhattan Community College (one campus of CUNY)

(10)

Sample Design

Multistage sample design

 PPS sample

o Stratified PPS (state

x

type) based on Total Pay

 Cut-off sampling method in sizable (state, type) strata

o Construct a cut-off point to determine small and large size units (two strata)

 Modified cut-off sampling (a stratified PPS

sample method)

(11)

Sample

Sampling Frame



 Sample

πps

Certainties Births

yˆgf

(12)

Small Area Challenges

 Designed at (state, type) level, estimated at state by function level

 Estimate total employees and total payroll at state by function level

,

gf

gf gfi

i U

Y Y where g state and f function



   

(13)

Other Challenges

Skew data- Not Transform

(14)

Other Challenges (Cont’d)

Skew data- Log Transform

(15)

Estimators- ASPEP

 Direct

Horvitz-Thompson:

 Composite

 Battese, Harter, Fuller (BHF) Model

 Our Proposed Model

ˆ gf HT gfi gfi

y   w y

(16)

Composite Estimator

where g= state, f= function code

ˆ ˆ

ˆ gf composite g gf ˆ HT (1 g ) ˆ gf synthetic y   y    y

ˆ gf synthetic ˆ ˆ gf g

y  K Y

(17)

Estimators- ASPEP

Composite Weight (Cont’d)

 Purcell & Kish (1979)

 Issue:  Negative in some

i= (state, function code)

 Fixable (Lahiri & Pramanik, 2010)

,

2 ,

( ˆ )

1 ( ˆ ˆ )

D gf g G f F

gf

S D

i i

g G f F

v Y

w Y Y

 

  



(18)

Composite Estimators (Cont’d)

Y ˆ Y ˆ Y ˆ

Direct (HT):

Synthetic : =

Composite:

2009 ASPEP regress on

ˆ ˆ

gf g

K Y

ˆ

^HT_gf

y

syn

y

gf

composite

y

gf

ˆ

1

Y Y ˆ

^g

Y ˆ

51

ˆ

^gf

gf

gf f

K x

 x



(19)

Estimators (Cont’d)

Battese, Harter, Fuller (BHF) Model

: the number of full-time employees for the j^th governmental unit within the i^th small area

: number of full-time employees for the i^th small area obtained from the previous census

: unknown intercept and slope, respectively; are small area specific random effects

: errors in individual observations

0 1

ij i i ij

y     x   v 

y

ij

x i

0

and

1

  v

ⁱ

 ij

(20)

Estimators (Cont’d) Our Proposed Model

where

0 1

log( y ij )     log( ) x i   v i  ij

2 2

~ (0, ) and ~ (0, )

iid iid

i ij

v N   N 

(21)

Data for Evaluation

Government units that overlap between the 2002 and 2007 Census of

Governments reporting strictly positive

numbers of full-time employees.

(22)

Evaluation

 Performance of log transform EB

o Results

o Residuals Diagnostic

o EB performance in small area o Benchmark Ratio (BR)

• EB  HT when n becomes larger

 Smoothening the EB

o One-way raking state totals to the direct (HT)

o Two-way raking state by function totals to the

(23)

Evaluation- Results

 Out of 1,225 (CA, function code) cells

o 671 cases (clear winner) our model o 324 cases  HT

o 230 cases  Composite

 No significant difference

o 160 cases between log-transformed model and the HT

o 145 cases between the composite and the HT

 HT won in cells where more than 70% of the units were large certainties

 Testing for significance, our model can be

used in 831 out of 1,225 cells (≈68%)

(24)

Evaluation- Results

Table 1: Percent Relative Error for Differences Estimates of Full Time Employees to the Truth (California)

(25)

Evaluation (Cont’d)

Results- Diagnostic Analysis

 QQ Plot for BHF Model

(26)

Evaluation (Cont’d)

Results- Diagnostic Analysis

 QQ Plot for Our Model

(27)

Evaluation- Results

(For Gas Supply, All States, Average n= 4)

Figure 4:

(28)

Benchmark Ratio (BR)

o BR= |∑(estimate-HT)/HT|

o Indicating how close the estimate is to the HT

when considering large areas

(29)

Evaluation (Cont’d) Results

Comparison of Benchmark Ratios (Nation)

Size BR for the EB BR for the BHF

< 50 1.5 1.6

≥ 50 1.1 1.5

(30)

Visualization of Table 1

-30%

-20%

-10%

0%

10%

20%

30%

40%

50%

Distance to the Truth (Relative Errors)

(Function, Sample size⁾

Figure3: Distance of the Estimators to the Truth

HT

Ours

BHF

(31)

Raking: Log-transformed to HT Base (CA)

-5.00%

-4.00%

-3.00%

-2.00%

-1.00%

0.00%

1.00%

2.00%

005 079 087 016 018 092 001 032 059 025 040 094 052 081 124 050 162 062 044 029 023 080 089 024 061 112 012

Distance to True

Function Code

Figure 5: Effect of Benchmarking the Log Transformation

Log

Log_Benchmark ed

(32)

Evaluation (Cont’d) Effect of Raking

Benchmarking improved

(33)

Comparison: EB, Raking EB and HT

-15.00%

-10.00%

-5.00%

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

35.00%

40.00%

005 079 087 016 018 092 001 032 059 025 040 094 052 081 124 050 162 062 044 029 023 080 089 024 061 112 012

Distance to True

Funtion Code

Figure 7: EB, EB Benchmarked, and HT

Log

Log_Benchmarked HT

(34)

Domain Analysis (Gas Supply, AVG n=4)

EB= log(full-time employees), Benchmarked-EB= EB benchmarked to HT (one-way raking to nation total)

(35)

Overall- Relative Errors

35

Table 2: Comparison of Overall Relative Errors (CA)

Overall - Absolute Relative Errors

Σ|(HT-True)/True| Σ|(EB-True)/True| Σ|(EB_benchmarked -True)/True|

Σ|(BHF-True)/True|

5.26% 1.67% 1.44% 14.35%

Overall - Relative Errors

Σ(HT-True)/True Σ(EB-True)/True Σ(EB_benchmarked- True)/True

Σ(BHF-True)/True

3.05% -1.5% -1% -14.35%

(36)

Evaluation (Cont’d)

Two-way Raking: (States, Functions)

 Two-way raking:

o All states to National total

o All functions to National functions

 255 underestimated cases goes down to

210 cases.

(37)

Acknowledgements

 Thankfully for strong support to this research

o Carma Hogue (Assistant Division Chief) o Lisa Blumerman (Division Chief)

 Technical advice/review

o Dr. Partha Lahiri

(38)

Contact Information

Bac Tran

[email protected]

Program Research Branch, Chief Governments Division

U.S. Census Bureau

(39)

Thank you for your time!

Questions?