Directory UMM :Data Elmu:jurnal:J-a:Journal of Econometrics:Vol95.Issue1.2000:

(1)

Estimating the density of unemployment

duration based on contaminated samples

or small samples

q

Hang K. Ryu

!

, Daniel J. Slottje

"

,

*

!Department of Economics, Chung Ang University, Seoul, South Korea "Department of Economics, Southern Methodist University, Dallas, TX 75275, USA

Received 1 August 1997; received in revised form 1 November 1998; accepted 1 April 1999

Abstract

In estimating a density function for the duration of unemployment, we consider two departures from what would be ideal conditions. If the so-called digit preference e!ect produces local distortion in observed samples, we can apply a maximum entropy density estimation method. To establish the functional form of the density, we maximize entropy subject to moment restrictions. The global shape of the density is determined by the lower ordered sample moments which are not a!ected much by the digit preference e!ect. As a by-product of this method, we can establish the local transition structure of the digit preference e!ect. As a second case of departure from an ideal condition, we consider coarse sample observations where unemployment duration was observed only for 4, 10, 14, 26, and 52 weeks. Once the unemployment duration density is derived, quintile behavior over time, the Lorenz curve, and the Gini coe$cient for the distribution of unemployment duration can be obtained. Finally, we discuss the rami"cations of only focusing on the headcount ratio of unemployment when other information is avail-able. ( 2000 Elsevier Science S.A. All rights reserved.

Keywords: Unemployment duration; Digit preference e!ect; coarse sample observations; Global approximation; Gini coe$cient

*Corresponding author. Tel.:#1-214-7683555; fax:#1-214-7681821. E-mail address:[email protected] (D.J. Slottje)

q

We thank the Associate Editor and two anonymous referees for many useful suggestions.

(2)

1. Introduction

Economists for some time have understood that the duration of unemploy-ment is as important as the traditional unemployunemploy-ment rate (or head count ratio) in analyzing the problem of unemployment. In modeling unemployment dura-tion, little attention has been paid to the problematic nature of data that comes from retrospective surveys. An exception is the very recent work of Torelli and Trivellato (1993). They note in their study that individuals tend to forget an

event occurred (memory e!ect) or remember the timing of an event incorrectly

(telescoping e!ects). These two e!ects can also be considered reasons for another

problem, the so-called&digit preference e!ect'. Individuals tend not to think that

they have been unemployed for precise units of time. They tend to clump weeks

of unemployment together. This&digit preference'makes it very di$cult to use

traditional hypothetical statistical distributions as descriptions of the distribu-tion of unemployment duradistribu-tion.

One purpose of this paper is to determine the underlying probability density function (pdf) when the observed samples are contaminated by the digit

prefer-ence e!ect. The justi"cation for using a maximum entropy estimation method to

remove the digit preference e!ect is the following. Zellner and High"eld (1988)

and Ryu (1993) derived an exponential polynomial series for an unknown density function when they maximized entropy subject to a given set of

moments. We call this maximum entropy estimated density a &pseudo'-true

density function because it approaches the true density function under certain conditions which will be stated later. Under the assumption of a local departure

due to digit preference, it will produce an insigni"cant e!ect on the estimated

sample moments of lower order.1It is hoped that the digit preference distortion

component has negligible in#uence on the lower ordered sample moments, and

out pseudo-true density function is a good approximation of the underlying true density function. Based on this pseudo-true density function, we can decompose

each observation into two parts, a pseudo-&true' value part and a component

due to the digit preference distortion.

As an alternative way to remove the digit preference e!ect, local correction

approaches can be applied. Pickering (1992) introduced a Markov transition

where the model is misclassi"ed because there is a possibility of misclassifying

odd observations to adjacent even observations, while even numbers are assumed correctly reported. Ridout and Morgan (1991) apply the Beta-geomet-ric density function as the true underlying density function and make a local

(3)

correction in such a way that the probability of misreporting is symmetric around the local maximum and decreases as the distance from the local

max-imum increases. However, these local correction models are based on the"xed

form approach so that their global properties may or may not be satisfactory. The second purpose of this paper is to determine the unemployment duration density for coarse sample observations. In particular, the Handbook of Labor Statistics (1989) reports the number of persons unemployed for 4, 10, 14, 26, 52

weeks for the period 1958}1987. Additional observations for 1988}1994 were

taken from Bureau of Labor Statistics monthly reports on the Employment

Situation. The data we used are censored, and they are reported as 52 weeks when in fact they could be for 200 weeks, this is always a problem since we only

know that individuals have been unemployed at least 52 weeks. Given "ve

sample points for each year, the global shape of an estimated density cannot be very close to the true density at all points, but if the duration density is a smooth function particularly for a longer spell of unemployment, an approximation with an exponential series can do well over this region. Once the density function is

derived, we can easily"nd quintile values and attendant Lorenz curves. Once

the Lorenz curve is uncovered, we can calculate inequality in unemployment

duration over time. Finally, we can then "nd the Gini coe$cient of duration

density and discuss its relationship with the total accumulated duration of unemployment in weeks.

There are several previous works for inference from grouped data. Heitjan

(1989) de"nes grouped data be the result of observing continuous variables only

up to the nearest interval. Heitjan and Rubin (1990, 1991) proposed a grouped-data likelihood function and estimated its parameters for coarse grouped-data where all the data fall in one of a countable number of subsets of the sample space. They extended their model for the more general case of the data coarsening at

random. In comparison, our approach suggests a curve-"tting method with an

exponential polynomial for the distribution function where only several points of the distribution are known due to data grouping.

The paper proceeds as follows. Section 2 lays out the methodology we will use to derive the maximum entropy density function and compare the observed sample moments with the estimated true sample moments for dealing with inaccurate samples. Section 3 deals with the problem of a coarse sample, Section 4 presents the empirical results and Section 5 concludes the study.

2. Density estimation for contaminated observations

Our objective is to determine the underlying probability density function when the observations are measured with error. Suppose

u

(4)

wherex

iis the unknown true unemployment duration of theith person while

u

iis the reported observed value of theith person's unemployment duration. For

example, some people have a preference for even numbers so that he (or she) will

report the nearest even number instead of the truex

i. The di!erence isei.

In general, we may consider several global smoothing approaches. We could apply a nonparametric kernel estimation method. There are many good

refer-ences. Nadaraya (1965), Silverman (1986), Devroye (1987), and HaKrdle (1990)

all provide good explanation of this method. Collomb (1984), Prakasa Rao

(1983), Devroye and GyoKr" (1985), GyoKr" et al. (1989) discuss the statistical

properties of kernel estimation explicitly. However, the data we are using are biased towards 4, 12, and 26 weeks and the location of each kernel will be biased accordingly. The kernel estimation method preserves local information and there is some tendency that each observed peak will remain as a peak even after smoothing. Another problem for out particular case is that many

observa-tions are located near the boundary (1}4 weeks of unemployment).

As an alternative way to estimate the density function, we can apply a locally weighted regression method (LOWESS) introduced by Cleveland (1979), or an orthonormal basis (ONB) method (Ryu, 1990); Prakasa Rao, 1983. To smooth

out local digit preference e!ect, it is necessary to include a su$cient number of

samples to average out the local e!ect in the LOWESS and to keep the size of

series expansions relatively small for the ONB method. The justi"cation for

using such a semiparametric or nonparametric approach is that the digit

preference e!ect is a local e!ect so that the contribution of the digit preference

e!ect can be averaged out when the global density function is estimated. In

comparison, in this paper, we determine the global density using a maximum entropy method subject to the lower-ordered moments and we evaluate the

degree of in#uence caused by the digit preference e!ect by neglecting the higher

ordered moments.

2.1. Review of maximum entropy density estimation

Ryu (1993) shows that we can maximize entropy (=_{) subject to restrictions}

given in the following general form:

max

f

="!

P

f(x)logf(x) dx satisfying

P

/

m(x)f(x) dx"mm,

m"0, 1,2,J (2.2)

with them_mhaving known values. The Lagrangian method produces the

follow-ing maximum entropy distribution as a solution:

f(x)"exp

C

+N n/0

c

(5)

The unknown constantsc

n's can be computed from known values ofmm. If we

choose a polynomial sequence,/

m(x)"xm, the density is called the exponential

series distribution.

Suppose a density function is given in an exponential power series. Zellner

and High"eld (1988) have shown how to determine the parameters numerically

from theJ#1 moment restriction conditions:

f(x)"_exp

C

₊J j/0

c

jxj

D

satisfying

P

xmf(x) dx"km, m"0, 1,2,J. (2.3)

As an alternative way to determine the parameters, Ryu (1993) has shown an

explicit parameter determination rule assuming knowledge of 2J#1 moment

restriction conditions:

f(x)"exp

C

+J j/0

c

jxj

D

satisfying

P

xmf(x) dx"km, m"0, 1,2, 2J. (2.4)

See Theorem A.1 stated in the appendix for details. The major di!erence is that

only J#1 conditions are necessary to solve (2.3) numerically, but 2J#1

conditions are needed to solve (2.4) explicitly.

Until now, we have assumed that the moments have known values. If we

consider the e!ect of sampling error on parameter estimation that occurs

because the known moment k

m is replaced by its sample mean, k(m"

(1/n)+_ni_/1x_mi, two types of departures are expected. The true density function

may not be the same with the above exponential polynomial series and the

sample moments will be di!erent from the true moments (even if we assume

knowledge of the true density function).

Barron and Sheu (1991), Zellner and High"eld (1988), Aroian (1948) and

others have shown that maximum likelihood estimation of parameters are mathematically the same as those given above in our maximum entropy proced-ure with sample moments replacing the given moments in (2.3). With these estimated parameters, the model estimated moments will converge to the sample moments. Similarly, Ryu (1993) shows how to estimate parameters when sample moments replace the given moments in (2.4). See Theorem 2 of Ryu (1993) stated in the appendix. In practice, the functional form of true pdf can be

well approximated with an exponential series if a su$ciently large number of

terms are included in the series and the true moments can be well approximated with sample moments if the sample size becomes large. In Section 4, we will apply both parameter estimation methods and compare their performance.

Barron and Sheu (1991) also approximated the density functions by sequences of exponential polynomials but they considered the relative entropy

(Kull-back}Leibler distance) between the true density and the estimated density. The

(6)

Kullback's Relative Entropy should be noted. See for example Barron and Sheu (1991) and Kapur and Kesavan (1992). The maximum entropy principle is

founded on the concept of the maximization of Shannon's entropy which is

essentially a measure of uncertainty. Thus, we maximize the uncertainty about the information not given to us subject to the use of all the information given to

us. On the other hand, with relative entropy (Kullback}Leibler distance), the

underlying concept is that of the probabilistic distance of one distribution from

another. Based on the KLIC (Kullback}Leibler information criterion)

:f

1(x)log [f1(x)/f2(x)] dx, the object is to"nd a distributionf1which, out of all

the distributions satisfying the given constraints, is closest tof

2. Here f1is our

estimated density andf₂is a uniform density or measure. Thus by minimizing

this KLIC with respect tof

1subject to side conditions, we get as close as possible

tof

2, uniform measure. The counterpart:f2(x)log [f2(x)/f1(x)] dxmeasures the

closeness off

2tof1as the base measure. Note the"rst integral above is not equal

to the second. An invariant distance measure, as indicated by Je!reys (1961) is

the sum of the above two integrals. With the sum, distance fromf

1tof2is the

same as that fromf

2tof1.

2.2. Justixcation for using a ME method to remove the digit preference ewect

We provide two justi"cations. First, to understand the relationship between

the global shape and the local"ne details of a density function, we can expand

the logarithm of the density function in a polynomial series, logf(x)"₊_Jj

/0cjxj.

The lower-order terms determine the global shape with fewer roots and when we

increase the expansion size, we are superimposing the"ne details of the

higher-order terms (which have many roots) to the previously derived global form of the density function. Therefore, if we want only a globally smooth function, then

we can keep the size of the exponential series small such asJ"4 (orJ"6). To

estimate the parameters of a short series, we only require the lower order sample moments.

As the second justi"cation for using observed sample moments which are

distorted by the digit preference e!ect, we separate the observed valueu

iinto

a true observation x

i and an error term ei as stated in (2.1). Now let us

distinguish between the true sample moments (k(

k), the observed sample

moments (m

k), and the sample error moment (l(l),

True sample moments: k(

k"

Observed sample moments: fK_k"1

n

n + i/1

uk_i.

Sample error moments: l(

(7)

Next decompose the true sample moments with the observed sample moments and the sample error moments,

k(

In (2.5), higher-order terms can be neglected if f

kAkfk~1v(1A(k(k!1)/2)

f

(k~2)l(2A(k3)f(k~3)l(3A2.

To evaluate the e!ect of digit preference, we can decompose the error term

into two parts:

e_i"u

i!xi"(ui!x8i)#(x8i!xi), (2.6)

wherex8_iis the theoretical sample generated by the maximum entropy pdf using

the observed sample momentsf_k,

fK(x)"exp

C

+J j/0

c(

jxj

D

, (2.7)

where parameter estimation methods are already discussed. We refer to the above maximum entropy density as a pseudo true density function because it

converges to the true density under certain conditions.2 In (2.6), we have

decomposed the error term into two parts, the"rst part is the di!erence between

the observed value (u

i) from the model estimated value (x8i) and the second part is

the di!erence between the true value (x

i) and the model estimated value (x8i).

We assume that the true population pdf and the maximum entropy estimated

pdf's are very smooth so that all the digit preference e!ects will be con"ned to

the"rst part of the RHS term of (2.6).3

Now let us approximate the sample error moments with the"rst part of (2.6),

l(

Once we have approximated the values ofl(

l, we can evaluate the role ofl(l in

(2.5). The digit preference e!ect is negligible iff

kAkfk~1l(1A[k(k!1)/2]f(k~2)l(2.

2Suppose the true underlying density function is a smooth function. If the observed sample moments approach the true moments, and the size of the exponential series goes to in"nity, then we can show that the maximum entropy density function converges globally to the true density function.

(8)

To summarize, we have assumed that the model estimated pdf (2.7) is smooth

so that this function cannot pick up the digit preference e!ect. The second term

of (2.6) is negligible relative to the "rst term of (2.6), so our observed sample

moments can approximate the unknown true sample moments well even though

each observation may be contaminated with a digit preference e!ect. Our

suggested model will be successful if the true density function is smooth so that it

can be well approximated with an exponential series of lowerjwhile the digit

preference e!ect is a phenomenon which impacts the higherjterms. It should be

noted that other smoothing methods which removes local#uctuations can also

be e!ective in removing the second term of (2.6). There is an analogy between

our smoothing method and the time-series method of spectral analysis. One can

remove high-frequency#uctuations by expanding the time series in a Fourier

series of lower order. Similarly, local digit preference e!ects are removed in an

exponential power series of lower order.

3. Density estimation for coarse observations

A major"nding of the previous section is that the functional form of a density

can be written in an exponential series. Furthermore, Ryu and Slottje (1996, 1998) showed that an income distribution, which has a fat tail in the right-hand side, can be approximated well by an exponential polynomial series.

We interpret an unemployment duration share as a probability density (pdf )

function because the duration shares(z

i) is the probability associated with the

possibility that each week of total measured unemployment will end up in theith

person.4 The population index z is de"ned such that the person at z"0 is

employed full time and the person atz"1 is unemployed for 52 weeks.

4Suppose we have unemployment sample observations. Let this sample be rearranged in order from lowest to greatest values and let the ordered values be (u

0,2,u_I) whereIis the sample size. Dividing

each measured unemployed weeks by total measured unemployed weeks, the share function is de"ned as s

We interpret the share function as a probability density function because the share s

iis the

probability associated with the probability density that each weeks of total measured unemployed weeks will end up with theith person. Since each individual has di!erent attributes and a di!erent locational position, each unemployment week will end up in the hand of theith individual with probabilitys

iso that this share function can be considered as a pdf.

Once we have considered the share function as a density function, we can apply the maximum entropy method to determine the functional form of the share function. See Ryu (1993) for examples of this method. Let us approximate the logarithm ofs

iwith a polynomial series:

logs(z)"+N

n/0

a

nzn#f,

wheres

(9)

We introduce a very simple model,

logs(z)"_a@

0#a1z#a2z2#a3z3#a4z4#e. (3.1)

Since the share of the person is proportional to his measured unemployed weeks, we write

logD(z)"a

0#a1z#a2z2#a3z3#a4z4#e, (3.2)

whereD(z) refers to duration of unemployed weeks. To estimate the parameters

of (3.2), we applied the least-squares method. To evaluate the usefulness of the above simple model, we can compare the performance of our model based on

"ve sample points with those of a full data model (or a pseudo-&true'model).

Once the share functions(z) is derived, quintile values are

Q

j,

P

Rj

s(z) dz, (3.3)

whereR

j is the domain ofQj. Similarly, a Lorenz curve can be derived,

¸(z)"

P

z 0

s(z@) dz@. (3.4)

Each year, di!erent numbers are observed for 4, 10, 14, 26, and 52 weeks of

unemployment. The duration transition mechanism over time can be followed

through the changes in the Gini coe$cient, in a Lorenz curve, and through the

estimated quintiles themselves.

4. An application

4.1. The duration density for observations contaminated with the digit preference ewect

This section serves two purposes. The"rst one is to evaluate the usefulness of

the maximum entropy method as an approach to determining the unknown unemployment duration density function. We study how much distortion is

produced by the digit preference e!ect relative to the sample moments. Once the

pseudo-&true' density is derived, the distribution produced by the digit

prefer-ence e!ect is plotted using a Boxplot for each observation group.

The second purpose of this section is to determine the density function given the coarseness of the sample observations. Finally, we also look at the level of inequality in the density of unemployment duration. We uncover the Gini

coe$cient for the duration density and its relationship with the total

(10)

Table 1

Maximum entropy estimated unemployment duration probability density

Weeks Sample Observ Model Zellner Weeks Sample Observ Model Zellner

obs % PDF PDF obs % PDF PDF

1 218 2.3 2.09 3.22 27 57 0.6 1.69 1.37

2 411 4.4 3.71 4.03 28 88 0.9 1.54 1.30

3 467 5.0 5.24 4.67 29 17 0.2 1.38 1.24

4 1079 11.6 6.21 5.09 30 116 1.2 1.22 1.17

5 216 2.3 6.46 5.26 31 26 0.3 1.07 1.10

6 474 5.1 6.14 5.22 32 220 2.4 0.93 1.03

7 206 2.2 5.51 5.03 33 15 0.2 0.81 0.95

8 614 6.6 4.78 4.73 34 37 0.4 0.70 0.88

9 181 1.9 4.09 4.39 35 50 0.5 0.62 0.81

10 402 4.3 3.50 4.03 36 96 1.0 0.55 0.75

11 58 0.6 3.03 3.68 37 34 0.4 0.50 0.69

12 915 9.8 2.68 3.36 38 26 0.3 0.47 0.63

13 293 3.1 2.42 3.07 39 62 0.7 0.47 0.58

14 139 1.5 2.24 2.81 40 161 1.7 0.44 0.53

15 97 1.0 2.13 2.59 41 8 0.1 0.44 0.49

16 307 3.3 2.06 2.40 42 47 0.5 0.45 0.46

17 208 2.2 2.03 2.24 43 18 0.2 0.48 0.43

18 107 1.1 2.02 2.10 44 87 0.9 0.50 0.40

19 60 0.6 2.04 1.98 45 11 0.1 0.53 0.39

20 359 3.8 2.06 1.88 46 44 0.5 0.54 0.38

21 50 0.5 2.07 1.79 47 12 0.1 0.53 0.37

22 295 3.2 2.07 1.71 48 53 0.6 0.47 0.38

23 24 0.3 2.05 1.64 49 31 0.3 0.38 0.39

24 104 1.1 2.00 1.57 50 28 0.3 0.26 0.41

25 69 0.7 1.93 1.50 51 25 0.3 0.06 0.44

26 603 6.5 1.82 1.44

Observed samples are such that 218,000 persons reported one week of unemployment. Similarly, 411,000 persons reported 2 weeks of unemployment. The third column is histogram for the observed samples of the second column. The fourth column is the maximum entropy density functions estimated using Theorems A.1 and A.2 of the appendix and the"fth column is estimated using (2.3) replacing the given moments with the sample moments. The size of the expansion series (J) is 6.

the inequality measure of unemployment in each year to the headcount ratio (the traditionally reported unemployment rate) and to the mean duration of unemployment.

If the Gini coe$cient of unemployment duration exhibits di!erent behavior

over time vis-a`-vis the headcount ratio or in comparison to the mean duration

measure, then this result is of considerable signi"cance. Every unemployment

measure can be considered to be an indicator of economic well being. If di!erent

measures give con#icting signals of well being over time, then researchers and

(11)

Fig. 1. ME estimated PDF from raw frequency.

In Table 1, weeks of unemployment are listed in the"rst column, the observed

numbers of corresponding weeks are reported in the second column, a histo-gram is listed (the proportion of persons belonging in that group) in the third column, and the maximum entropy estimated density function estimated by Theorems A.1 and A.2 (reported in the appendix) is listed in the fourth column. The maximum entropy density function estimated by maximum likelihood

method is listed in the"fth column.

In Fig. 1, we plot the raw frequencies (third column of Table 1) and the

estimated densities by MOM and MLE (fourth column and"fth columns of

Table 1). We normalized the density function so that the area below the curve is one.

To evaluate the degree of distortion produced by the digit preference e!ect, we

generate theoretical sample observations which we called the &pseudo'-true

samples from the maximum entropy estimated pdf by inverting the distribution function

F(x8

i)"i/nNx8i"F~1(i/n) for i"1, 2,2,n. (4.1)

Therefore, the observed sample u

i can be decomposed into two parts, the

theoretical sample observation part and departure of it from the observed value.

u

i"x8i#fi for i"1, 2,2,n. (4.2)

Approximating the sample error moments of (2.8) asl(

l"(1/n)+ni/1fli, the degree

(12)

Table 2

Comparison of sample moments with"rst and second moment correction terms

Moments fK l( First First/fK Second Second/fK

1 14.89 0.0938 !0.0938 !0.0063 0.0000 0.0000

2 358.8 0.7503 !2.792 !0.0078 0.7504 0.0021

3 11090 0.1425 !101.0 !0.0091 !33.51 !0.0030

4 393300 1.308 !4160 !0.0106 !1615 !0.0041

5 0.1519E#8 0.1856 !0.1844E#6 !0.0121 !0.8321E#5 !0.0055 6 0.6202E#9 3.090 !0.8545E#7 !0.0138 !0.4427E#7 !0.0071 7 0.2634E#11 0.0311 !0.4072E#9 !0.0154 !0.2393E#9 !0.0091 8 0.1151E#13 8.677 !0.1977E#11 !0.0172 !0.1303E#11 !0.0113 9 0.5142E#14 !1.378 !0.9719E#12 !0.0189 !0.7117E#12 !0.0138 10 0.2334E#16 27.27 !0.4822E#14 !0.0207 0.3888E#14 !0.0167 11 0.1074E#18 !9.105 !0.2408E#16 !0.0224 !0.2122E#16 !0.0198 12 0.4992E#19 92.64 !0.1208E#18 !0.0242 !0.1156E#18 !0.0232

The observed duration of unemployed week u

i is divided into true unobserved duration xi and error ei. If we de"ne true sample moment as

k(

k"(1/n)+ni/1xki, observed sample moments asfKk"(1/n)+ni/1uki, and the sample error moments asl(l"(1/n)+ni/1eli, then we can expand the true moment

in a series of observed sample moments and sample error moments as shown in (2.5):

k(

k"fKk!kfKk~1l(1#

k(k!1)

2 fK(k~2)l(2#2 (2.5a)

The&First'in the fourth column means the"rst right-hand side correction term of (2.5)!kfK_k_~1l(

1. Similar interpretation goes for the&Second'.

H.K.

Ryu,

D.J.

Slottje

/

Journal

of

Econometrics

95

(2000)

131

}

(13)

observed sample momentsf_k, sample error momentsl_l, the"rst correction part

and the second correction part of (2.5). We also list the ratio of the"rst (as well

as second) correction part relative to the observed sample moments to see their

in#uence in the sample moment estimation. In all cases, the correction

compo-nents appear to have little in#uence. There is a signi"cant digit preference e!ect

where reporting 4 weeks is more likely than 3 or 5 weeks. Ultimately this not taken into account in the estimation, but expansions on Section 2.2 are used to

argue the e!ect is small, provided su$cient smoothing is done and there are no

genuine high frequencies in the data.

Now let us compare the performance of Ryu's MOM with that of MLE. For

our digit preference problem, we could not keep the size of the exponential series

very large because the digit preference e!ect is assumed to be an e!ect on the

higher ordered term in the series. Furthermore, only the lower-ordered sample

moments are assumed to be free of the digit preference e!ect and thus the

sampling error for using the sample moments, not the true moments, may not be negligible for higher-ordered sample moments. In these respects, MLE looks to

be a better choice for this digit preference problem. However, Ryu's MOM

produced better performance in the goodness of "t than MLE. The use of

higher-ordered sample moments (J#1th,2, 2Jth moments) though not

neces-sarily accurate seemed to be helpful in removing the digit preference e!ect. The

sample error momentsl(

k"(1/n)+ni/1ekistated in (2.8) are

Sample error moments comparison for two-parameter estimation methods

Order MOM MLE Order MOM MLE

1 0.0938 0.1601 7 0.0311 4.459

2 0.7503 0.7475 8 8.677 14.52

3 0.1425 0.3588 9 !1.378 20.06

4 1.308 1.432 10 27.27 59.75

5 0.1856 1.133 11 !9.105 98.03

6 3.090 4.076 12 92.64 271.8

Note the sample error moments produced by MLE are a little bit larger than

those produced by MOM. Therefore, the digit preference e!ect stated in (2.5) is

less negligible for the ML estimated method.

In Fig. 2, we present a Boxplot consisting of the digit preference distortionsf

i.

If everyone has a certain preference for an odd number or an even number, then

the di!erence between the observed sample and the theoretical sample values

should be con"ned between!1 and#1 with mean zero. However, in Fig. 2, we

(14)

Fig. 2. Boxplots of the departures due to digit preference.

duration level. Similar e!ects can be observed around other key duration weeks,

i.e., 26, 32, and 40 weeks. Therefore we can conclude that there is more than

a simple odd/even digit preference e!ect at work here.

4.2. Duration density from coarse sample observations

Our objective is to derive the whole distribution based upon "ve sample

observation points. We use (3.2) and the parameters are estimated by the least-squares method. The performance of the estimated distribution can be compared to the cumulative distribution based on the full sample data. In

Table 1, we have a list of full data, but let us use only the following"ve points.

Weeks Cumulative %

4 z

1" 23.3

10 z

1" 45.8

14 z

1" 60.8

26 z

1" 85.3

52 z

1"100.0

We approximated the duration weeksD(z) with

logD(z)"a

0#a1z#a2z2#a3z3#a4z4#e (3.2)

and the parameters were estimated by the least-squares method using"ve points

atD(z

(15)

Table 3

Approximation of unemployment duration with an exponential polynomial series

Weeks Freq Cumulat. Model Weeks Freq Cumulat. Model

dist dist dist dist

1 218 2.3 6.9 27 57 85.9 87

2 411 6.7 14 28 88 86.9 87

3 467 11.8 19 29 17 87.1 88

4 1079 23.3 23 30 116 88.3 89

5 216 25.6 27 31 26 88.6 90

6 474 30.7 31 32 220 90.9 91

7 206 32.9 35 33 15 91.1 91

8 614 39.5 38 34 37 91.5 92

9 181 41.5 42 35 50 92.0 93

10 402 45.8 46 36 96 93.1 93

11 58 46.4 50 37 34 93.4 94

12 915 56.2 53 38 26 93.7 94

13 293 59.3 58 39 62 94.4 95

14 139 60.8 61 40 161 96.1 95

15 97 61.9 64 41 8 96.2 96

16 307 65.2 67 42 47 96.7 96

17 208 67.4 70 43 18 96.9 97

18 107 68.5 73 44 87 97.8 97

19 60 69.2 75 45 11 97.9 98

20 359 73.0 77 46 44 98.4 98

21 50 73.6 79 47 12 98.5 98

22 295 76.7 81 48 53 99.1 99

23 24 77.0 82 49 31 99.4 99

24 104 78.1 83 50 28 99.7 99

25 69 78.9 85 51 25 100.0 100

26 603 85.3 86

Observed samples are such that 218,000 persons reported one week of unemployment. The third column is the distribution for the observed samples of the second column. The fourth column is the estimated distribution using only"ve sample observation at 4, 10, 14, 26, and 52 weeks. We approximated the duration weeksD(z) with

logD(z)"a

0#a1z#a2z2#a3z3#a4z4#e (3.2b)

and the parameters were estimated by the least-squares method using "ve points at D(z

1)"4,D(z2)"10,2. The fourth column is estimated withz"D~1(weeks).

(16)

Once we have estimated the whole distribution function, let us compare the quintile shares to show how the total accumulated unemployed weeks are

distributed into the "rst 20 per cent of persons who reported unemployment,

second 20 per cent,2

Observed Model estimated

Though we have used only"ve sample points, the estimated quintile values

are good approximations of the observed quintile values. At the tail areas, Q

1 and Q5 had more departures than did Q2,Q3,Q4 in the middle of the

distribution.

Once we have seen that (3.2) works well for the coarse observations, we can apply this method to actual unemployment duration data collected by the Bureau of Labor Statistics. In the estimation performed, we included only those individuals who experienced unemployment for a portion of 52 weeks, but excluded those who never worked or never experienced unemployment. We also

include one arti"cial sample point, so that 1 per cent of people in the sample are

considered to be unemployed for 0.3 of a week. In (3.2), we needD(z)"_{0 for}

z"_{0, but this obviously is impossible for the logarithmic form given in (3.2).}

The reason for including the arti"cial sample point is the following. For the

years 1966}1969, the performance of (3.2) was not satisfactory. These are&baby'

boom years and their distribution was quite di!erent from the distribution in

other years. Therefore, we applied (3.2) for the years 1958}1965, and found on

average 1 per cent of people were unemployed for 0.3 week. This condition is

imposed rather than attempting an impossible boundary condition,D(0)"0

over the problem period. We derived the duration density function based on D(0.01)"0.3,D(z

1)"4,D(z2)"10,D(z3)"14,D(z4)"26, and D(z5)"52 for

1958}1994 and we report the quintile values in Table 4 for the entire population.

The Gini coe$cient is a convenient summary measure to describe the

distributive nature of a given distribution and it can be calculated from the mean and higher moments of a given distribution. Nonetheless, a brief discussion is probably necessary since the application here is not the usual one. When a Gini

coe$cient increases in value towards one, this indicates that the level of

inequal-ity has increased. How does one interpret this fact in the context of the distribution of unemployment duration? Shorrocks (1992, 1993) argues that an increase in the level of inequality in the unemployment duration distribution

means social welfare has decreased. This is because the Pigou}Dalton principle

applies. An increase of (say) one week of unemployment for one individual with

(17)

Table 4

Quintile values, Gini coe$cient, and average weeks of unemployment duration (total unemploy-ment of men and women)

Year Q

1 Q2 Q3 Q4 Q5 Gini AVWEEKS

1958 0.0196 0.0827 0.1522 0.2570 0.4884 0.4561 11.07 1959 0.0181 0.0754 0.1493 0.2500 0.5072 0.4760 9.58 1960 0.0185 0.0769 0.1497 0.2512 0.5038 0.4721 9.85 1961 0.0174 0.0758 0.1555 0.2573 0.4939 0.4674 10.11 1962 0.0186 0.0790 0.1543 0.2509 0.4972 0.4659 9.82 1963 0.0185 0.0802 0.1574 0.2518 0.4921 0.4615 9.89 1964 0.0170 0.0725 0.1520 0.2516 0.5070 0.4793 9.28 1965 0.0146 0.0625 0.1490 0.2509 0.5230 0.5007 8.27 1966 0.0124 0.0508 0.1364 0.2490 0.5513 0.5319 7.52 1967 0.0126 0.0522 0.1400 0.2478 0.5474 0.5279 7.39 1968 0.0109 0.0435 0.1304 0.2485 0.5667 0.5503 7.02 1969 0.0120 0.0484 0.1343 0.2468 0.5584 0.5391 7.21 1970 0.0169 0.0723 0.1530 0.2480 0.5099 0.4814 8.83 1971 0.0179 0.0774 0.1560 0.2542 0.4945 0.4661 9.91 1972 0.0159 0.0687 0.1515 0.2567 0.5071 0.4835 9.42 1973 0.0148 0.0629 0.1475 0.2518 0.5230 0.5000 8.49 1974 0.0164 0.0693 0.1494 0.2500 0.5148 0.4876 8.90 1975 0.0189 0.0833 0.1600 0.2571 0.4806 0.4510 10.76 1976 0.0187 0.0811 0.1567 0.2554 0.4881 0.4580 10.45 1977 0.0174 0.0750 0.1536 0.2541 0.4999 0.4720 9.75 1978 0.0173 0.0743 0.1539 0.2495 0.5049 0.4760 9.16 1979 0.0173 0.0746 0.1551 0.2486 0.5043 0.4754 9.04 1980 0.0199 0.0856 0.1581 0.2520 0.4844 0.4507 10.51 1981 0.0213 0.0911 0.1600 0.2481 0.4795 0.4418 10.58 1982 0.0252 0.1008 0.1518 0.2460 0.4762 0.4285 11.92 1983 0.0229 0.0954 0.1554 0.2516 0.4745 0.4337 11.78 1984 0.0202 0.0866 0.1575 0.2540 0.4816 0.4479 10.91 1985 0.0203 0.0877 0.1599 0.2514 0.4807 0.4463 10.60 1986 0.0215 0.0919 0.1599 0.2491 0.4776 0.4398 10.81 1987 0.0209 0.0896 0.1596 0.2490 0.4808 0.4442 10.54 1988 0.0201 0.0861 0.1581 0.2488 0.4868 0.4517 10.19 1989 0.0202 0.0867 0.1606 0.2446 0.4879 0.4519 9.93 1990 0.0210 0.0903 0.1616 0.2456 0.4814 0.4339 10.11 1991 0.0187 0.0799 0.1571 0.2363 0.5081 0.4724 8.42 1992 0.0199 0.0845 0.1574 0.2373 0.5010 0.4628 8.98 1993 0.0236 0.0963 0.1566 0.2270 0.4965 0.4470 9.02 1994 0.0273 0.1050 0.1511 0.2170 0.4996 0.4390 9.00

decline. Thus, we assume that higher values of the Gini coe$cient are a worse

state of nature than lower values. An increase in the unemployment rate and/or in the mean duration of unemployment have the same interpretation. From the

derived density, the Gini coe$cients are calculated and reported in the seventh

(18)

Table 5

Quintile values, Gini coe$cient, and average weeks of unemployment duration (unemployment of men)

Year Q

1958 0.0231 0.0935 0.1505 0.2494 0.4835 0.4411 11.50 1959 0.0218 0.0891 0.1530 0.2430 0.4932 0.4522 10.09 1960 0.0218 0.0894 0.1534 0.2437 0.4917 0.4509 10.19 1961 0.0219 0.0914 0.1570 0.2447 0.4851 0.4448 10.34 1962 0.0222 0.0921 0.1567 0.2427 0.4863 0.4448 10.20 1963 0.0218 0.0926 0.1607 0.2427 0.4822 0.4419 10.05 1964 0.0200 0.0840 0.1558 0.2442 0.4960 0.4597 9.53 1965 0.0170 0.0742 0.1582 0.2472 0.5034 0.4754 8.68 1966 0.0149 0.0631 0.1489 0.2447 0.5283 0.5035 7.77 1967 0.0150 0.0645 0.1525 0.2439 0.5240 0.4994 7.65 1968 0.0126 0.0534 0.1450 0.2471 0.5418 0.5232 7.17 1969 0.0145 0.0613 0.1488 0.2434 0.5320 0.5080 7.45 1970 0.0199 0.0842 0.1578 0.2413 0.4969 0.4603 9.12 1971 0.0205 0.0876 0.1590 0.2470 0.4859 0.4496 10.10 1972 0.0191 0.0819 0.1571 0.2479 0.4940 0.4609 9.65 1973 0.0179 0.0778 0.1587 0.2454 0.5002 0.4695 8.84 1974 0.0193 0.0816 0.1559 0.2445 0.4986 0.4640 9.30 1975 0.0238 0.0999 0.1606 0.2450 0.4707 0.4267 11.24 1976 0.0222 0.0925 0.1558 0.2491 0.4804 0.4406 11.09 1977 0.0212 0.0904 0.1603 0.2450 0.4831 0.4448 10.11 1978 0.0215 0.0905 0.1585 0.2407 0.4888 0.4483 9.67 1979 0.0199 0.0804 0.1425 0.2633 0.4939 0.4615 11.93 1980 0.0247 0.1009 0.1564 0.2411 0.4768 0.4292 11.04 1981 0.0247 0.1014 0.1578 0.2420 0.4742 0.4271 11.15 1982 0.0306 0.1114 0.1412 0.2351 0.4817 0.4190 12.43 1983 0.0270 0.1044 0.1473 0.2442 0.4771 0.4248 12.37 1984 0.0245 0.0944 0.1533 0.2465 0.4762 0.4302 11.72 1985 0.0247 0.1017 0.1583 0.2416 0.4737 0.4265 11.10 1986 0.0260 0.1045 0.1555 0.2401 0.4739 0.4234 11.38 1987 0.0264 0.1062 0.1563 0.2374 0.4737 0.4214 11.18 1988 0.0231 0.0962 0.1582 0.2430 0.4794 0.4360 10.63 1989 0.0231 0.0975 0.1626 0.2389 0.4779 0.4340 10.04 1990 0.0239 0.1000 0.1613 0.2410 0.4738 0.4286 10.68 1991 0.0187 0.0799 0.1571 0.2363 0.5081 0.4724 8.42 1992 0.0217 0.0901 0.1565 0.2334 0.4983 0.4549 9.15

In Tables 5 and 6, we repeat the analysis done in Table 4 using the unemploy-ment data for men only and women only. We note that the average number of unemployed weeks for women is shorter than that of men. This suggests that

women have a higher chance on average of"nding new jobs. Since women have

a higher Gini coe$cient with a lowerQ

1and higherQ5values, the burden of

(19)

Table 6

Quintile values, Gini coe$cient, and average weeks of unemployment duration (unemployment of women)

Year Q

1958 0.0146 0.0609 0.1397 0.2647 0.5200 0.4998 10.01 1959 0.0132 0.0519 0.1293 0.2531 0.5525 0.5305 8.45 1960 0.0140 0.0557 0.1321 0.2577 0.5405 0.5183 9.14 1961 0.0112 0.0496 0.1415 0.2768 0.5209 0.5137 9.62 1962 0.0140 0.0585 0.1408 0.2592 0.5275 0.5075 9.07 1963 0.0143 0.0613 0.1449 0.2635 0.5161 0.4973 9.61 1964 0.0131 0.0552 0.1406 0.2610 0.5300 0.5129 8.86 1965 0.0118 0.0466 0.1295 0.2505 0.5616 0.5429 7.59 1966 0.0095 0.0354 0.1148 0.2506 0.5897 0.5739 7.17 1967 0.0098 0.0370 0.1184 0.2485 0.5863 0.5701 7.01 1968 0.0091 0.0326 0.1099 0.2458 0.6026 0.5854 6.84 1969 0.0093 0.0331 0.1100 0.2457 0.6019 0.5844 6.87 1970 0.0134 0.0561 0.1414 0.2553 0.5339 0.5145 8.39 1971 0.0147 0.0635 0.1480 0.2627 0.5112 0.4918 9.62 1972 0.0122 0.0518 0.1385 0.2665 0.5310 0.5173 9.08 1973 0.0120 0.0474 0.1288 0.2542 0.5577 0.5392 8.04 1974 0.0135 0.0552 0.1371 0.2534 0.5407 0.5196 8.35 1975 0.0149 0.0643 0.1471 0.2655 0.5083 0.4890 10.07 1976 0.0156 0.0677 0.1512 0.2591 0.5064 0.4842 9.58 1977 0.0142 0.0586 0.1383 0.2595 0.5295 0.5085 9.26 1978 0.0137 0.0575 0.1421 0.2552 0.5315 0.5114 8.51 1979 0.0138 0.0588 0.1452 0.2550 0.5272 0.5074 8.46 1980 0.0158 0.0683 0.1505 0.2610 0.5053 0.4828 9.79 1981 0.0182 0.0787 0.1570 0.2522 0.4939 0.4642 9.79 1982 0.0202 0.0843 0.1517 0.2559 0.4880 0.4541 11.14 1983 0.0189 0.0812 0.1548 0.2582 0.4870 0.4571 10.89 1984 0.0166 0.0711 0.1509 0.2571 0.5044 0.4791 9.77 1985 0.0164 0.0716 0.1531 0.2589 0.5000 0.4759 9.90 1986 0.0175 0.0766 0.1564 0.2563 0.4932 0.4662 10.04 1987 0.0164 0.0706 0.1512 0.2567 0.5051 0.4801 9.65 1988 0.0171 0.0735 0.1540 0.2496 0.5058 0.4775 9.08 1989 0.0179 0.0725 0.1446 0.2452 0.5198 0.4870 9.07 1990 0.0172 0.0741 0.1535 0.2534 0.5018 0.4742 9.58 1991 0.0187 0.0799 0.1571 0.2363 0.5081 0.4724 8.42 1992 0.0177 0.0768 0.1570 0.2422 0.5062 0.4745 8.74

women belonging toQ

4andQ5will have real di$culty in"nding new jobs while

those women belonging to Q

1and Q2su!er only shorter spells of number of

unemployed weeks. In comparison, those men belonging toQ₁andQ₂will su!er

longer spells of number of unemployed weeks relative to those of women because the average number of unemployed weeks of men are longer than those

of women and Q

1 and Q2 values of men are larger than the Q1 and Q2 for

(20)

Table 7

Pearson and Spearman rank correlation coe$cients

Year GINI (G) AVWEEKS (=) UN RATE (;) no association is rejected for all three cases because Prob (R'0.478)"0.005 ifn"30.

In Table 7 we present the Pearson and Spearman rank correlation coe$

-cients. In the second column, we report the rank of the Gini coe$cient. The

(21)

Fig. 3. Av unemployed weeks vs. unemployment Gini coe$cient.

unemployment is shared by many people and the distribution has less inequality relative to other years. In contrast, 1968 is a boom year with the smallest average

unemployed weeks and possesses the largest Gini coe$cient. Relatively few

people su!ered unemployment for short duration but the burden of

unemploy-ment is borne by those few people who were unemployed relatively long time. In the third and fourth column, we report the rank of average unemployment weeks and unemployment rate. These two values move together so that they have smaller numbers (lower ranks) during the boom years and larger numbers (upper ranks) during the recession years. The Spearman rank correlation

num-ber between the Gini and Average Weeks is!0.8112, and thus we reject the

null hypothesis of no association between the Gini and Average Weeks at the

signi"cance level of 0.005. Similarly, the Spearman rank correlation number

between the Gini and Unemployment Rate is!0.7409 and between Average

Weeks and Unemployment Rate is 0.7267.

In Fig. 3, we plotted average unemployed weeks versus unemployment Gini

coe$cients. As indicated in Table 7, smaller average weeks are combined with

larger Gini coe$cients.

Table 8 shows the relationship between the change in the Gini coe$cient and

the change in the mean duration of unemployment and the head count ratio over time. Recalling Table 4, we see that from (say) 1965 to 1966 the Gini

coe$cient increased from 0.5007 to 0.5319 while the mean duration of

unem-ployment fell from 8.27 to 7.52 weeks. In Table 8, this is re#ected in a&#'in the

second column and a&!'in the third column. If the Gini coe$cient increases in

(22)

Table 8

Comovement of Gini, AV weeks, and unemployment rate

Year GINI AVWEEKS UNEMPL

1958

1959 # ! !

1960 ! # #

1961 ! # #

1962 ! ! !

1963 ! # #

1964 # ! !

1965 # ! !

1966 # ! !

1967 ! ! #

1968 # ! !

1969 ! # !

1970 ! # #

1971 ! # #

1972 # ! !

1973 # ! !

1974 ! # #

1975 ! # #

1976 # ! !

1977 # ! !

1978 # ! !

1979 ! ! !

1980 ! # #

1981 ! # #

1982 ! # #

1983 # ! !

1984 # ! !

1985 ! ! !

1986 ! # !

1987 # ! !

1988 # ! !

1989 # ! !

1990 ! # #

1991 # ! #

1992 ! # #

1993 ! # !

1994 ! ! !

(23)

Obviously, the 1958}1959 changes re#ect less mean unemployment duration but more hardship for a small number of the population. This group is likely the hardcore unemployed and might well deserve special emphasis

in the implementation of training programs. This holds for 1958}1994 except

for the years of 1961}1962, 1966}1967, 1978}1979, 1984}1985, and 1993}1994.

A policymaker or government agency that focused only on the head count ratio or the mean duration of unemployment would have concluded that

welfare had increased. The Gini coe$cient sent a di!erent message, a certain

portion of the population was hit disproportionately hard. The point is that

additional information from examining the Gini coe$cient of unemployment

duration can be relevant in targeting certain groups of the population for policy initiatives.

5. Conclusion

This paper has introduced some new methods to deal with the problems that arise in trying to analyze unemployment duration. It is well known that looking at the head count ratio of unemployment does not give an accurate representa-tion of the distriburepresenta-tion of unemployment. Unfortunately, unemployment data

based on duration of unemployment su!ers from several defects. The"rst one is

that the so-called digit preference e!ect tends to distort the true distribution of

unemployment since individuals tend to focus on particular periods of time when they were unemployed. Secondly, the coarse nature of the data makes it

di$cult to get a reasonable picture of the true distribution of the number of

weeks of unemployment.

This paper has introduced a method, the exponential polynomial series expansion framework, to overcome both of these problems and presents a more appropriate way to analyze the density function of the duration of unemploy-ment. In estimating a density function for the duration of unemployment, we considered two departures from what would be ideal conditions. If the so-called

digit preference e!ect produced local distortion in observed samples, we could

apply a maximum entropy density estimation method. To establish the func-tional form of the density, we maximized entropy subject to moment restric-tions. The global shape of the density was determined by the lower ordered

sample moments which were not a!ected much by the digit preference e!ect. As

a by-product of this method, we could establish the local transition structure of

the digit preference e!ect. As a second case of departure from an ideal

condition, we considered coarse sample observations where unemployment duration was observed only for 4, 10, 14, 26, and 52 weeks. Once the unemploy-ment duration density was derived, quintile behavior over time was examined,

the Lorenz curve was derived and the Gini coe$cient of unemployment

(24)

6. For further reading

Hall, 1981

Appendix A. Parameter estimation for maximum entropy density

Theorem A.1. The problem stated in (2.4) has a unique analytic solution. Thexrst parameter c

0 is a normalization constant but the remaining parameters

c"_Mc

1,2,cJN@can be determined from the following relationships.

Bc"dNc"B~1d ₍_*₎

where theJ]Jsquare matrix B and theJ]1vectord_{are de}_x_{ned as follows. If the}

domain of x isxnite, we can always transform it to[0, 1], B

mn,!mn[km`n!km`n~1]anddm,[m(m#1)km!m2km~1] (**)

wherem,n"_1,₂_,_J._If_x3₍!R_,#R₎_or_x3_[0,#R_),_{we de}_x_ne B

mn,mnkm`nanddm"!m(m#1)km,

where m,n"1,2,J. Since B is a positive-dexnite matrix, Eq. (_*) determines Mc

1,2cJNandc0can then be found by normalization of the pdf. Here, we assume

knowledge ofk₀,2,k

2J.Since thesevalues are moments of a certain distribution,

these values are restricted to satisfy certain conditions. For example, k₂,k₄,2

should be positive and there are relations connecting higher- and lower-order moments.

In Theorem A.1, we have assumed that the moments have known values. If we

consider the e!ect of sampling error on parameter estimation that occurs

because the known moment k

m is replaced by its sample mean,

k(

m"(1/n)+ni/1xmi, use the following theorem. The proof can be found in Ryu

(1993).

Theorem A.2. Suppose a population density is given by a power series of degree J,

logf

0(x)"+Jj/0c0jxj. Now we estimate the ME parameters from the sample

information.

BKc₍"d_KNc₍"BK~1d_K_,

where theJ]Jsquare matrixBK,and theJ]1vectord_K _{are de}_x_{ned as follows. For}

a compact domainx3[0, 1],

BK_mn,!mn[k(

(25)

d_K

which is the population mean of the asymptotic distribution of c₍_, _and x"J¹₍c₍!c₀₎_{has a multi}_v_{ariate normal distribution with mean zero and a}_J]_J

asymptotic covariance matrix+"_B_~1_E[u_iu_@_i]B_~1_.

References

Aroian, L., 1948. The fourth degree exponential distribution function. Annals of Mathematical Statistics 19, 589}592.

Barron, A.R., Sheu, C., 1991. Approximation of density functions by sequences of exponential families. The Annals of Statistics 19 (3), 1347}1369.

Cleveland, W.S., 1979. Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association 74, 829}836.

Collomb, G., 1984. ProperieHteHs de convergence presque comple`te du preHdicteur a` noyau. Zeitschrift fuKr W.V.G. 66, 441}460.

Devroye, L., 1987. A Course in Density Estimation. BirkhaKuser, Boston. Devroye, L., GyoKr", L., 1985. Nonparametric Density Estimation: The¸

1View. Wiley, New York.

GyoKr", L., HaKrdle, W., Sarda, P., Vieu, P., 1989. Nonparametric Curve Estimation from Time Series, Lecture Notes in Statistics. Springer, Berlin.

Hall, P., 1981. On trigonometric series estimates of densities. The Annals of Statistics 9, 683}685. Handbook of Labor Statistics, 1989. US Department of Labor Bureau of Labor Statistics. HaKrdle, W., 1990. Applied Nonparametric Regression. Cambridge University Press, Cambridge. Heitjan, D., 1989. Inference from grouped data: a review. Statistical Science 4, 164}183.

Heitjan, D., Rubin, D., 1990. Inferences from coarse data via multiple imputation: age heaping in a Third World nutrition study. Journal of the American Statistical Association 85, 304}314. Heitjan, D., Rubin, D., 1991. Ignorability and coarse data. Annals of Statistics 19, 2244}2253. Je!reys, H., 1961. Theory of Probability. Oxford University Press, Oxford.

Kapur, J., Kesavan, H., 1992. Entropy optimization Principles with Applications. Academic Press, New York.

Nadaraya, E., 1965. On non-parametric estimation of density functions and regression curves. Theory of probability and its Applications 10, 186}190.

Pickering, R.M., 1992. Digit preference in estimated gestational age. Statistics in Medicine 11, 1225}1238.

Prakasa Rao, B., 1983. Nonparametric Functional Estimation. Academic Press, Orlando. Ridout, M.S., Morgan, B.J., 1991. Modeling digit preference in fecundability studies. Biometrics 47,

1423}1433.

Ryu, H.K., 1990. Orthonormal basis and maximum entropy estimation of probability density and regression functions. Unpublished Ph.D. Dissertation University of Chicago, Chicago, IL. Ryu, H.K., 1993. Maximum entropy estimation of density and regression functions. Journal of

Econometrics 56, 397}440.

(26)

Ryu, H.K., Slottje, D.J., 1998. Measuring Trends in US Income Inequality Theory and Applications, Lecture Notes in Economics and Mathematical Systems, Vol. 459. Springer, Berlin.

Silverman, B., 1986. Density Estimation for Statistics and Data Analysis. Chapman & Hall, London. Shorrocks, A.F., 1992. Spell incidence, spell duration and the measurement of unemployment.

Unpublished mimeo. University of Essex.

Shorrocks, A.F., 1993. On the measurement of employment. Unpublished mimeo, University of Essex.

Torelli, N., Trivellato, R., 1993. Modeling inaccuracies in job-search duration data. Journal of Econometrics 59, 187}211.