• Tidak ada hasil yang ditemukan

Directory UMM :Data Elmu:jurnal:M:Mathematical Biosciences:Vol167.Issue1.Sept2000:

N/A
N/A
Protected

Academic year: 2017

Membagikan "Directory UMM :Data Elmu:jurnal:M:Mathematical Biosciences:Vol167.Issue1.Sept2000:"

Copied!
20
0
0

Teks penuh

(1)

Estimation of HIV infection and incubation via state space

models

Wai-Yuan Tan

*

, Zhengzheng Ye

Department of Mathematical Sciences, The University of Memphis, 335 Win®eld Dunn, Memphis, TN 38152, USA

Received 1 February 1999; received in revised form 28 August 1999; accepted 3 September 1999

Abstract

By using the state space model (Kalman ®lter model) of the HIV epidemic, in this paper we have de-veloped a general Bayesian procedure to estimate simultaneously the HIV infection distribution, the HIV incubation distribution, the numbers of susceptible people, infective people and AIDS cases. The basic approach is to use the Gibbs sampling method combined with the weighted bootstrap method. We have applied this method to the San Francisco AIDS incidence data from January 1981 to December 1992. The results show clearly that both the probability density function of the HIV infection and the probability density function of the HIV incubation are curves with two peaks. The results of the HIV infection dis-tribution are clearly consistent with the ®nding by Tan et al. [W.Y. Tan, S.C. Tang, S.R. Lee, Estimation of HIV seroconversion and e€ects of age in San Francisco homosexual populations, J. Appl. Stat. 25 (1998) 85]. The results of HIV incubation distribution seem to con®rm the staged model used by Satten and Longini [G. Satten, I. Longini, Markov chain with measurement error: estimating the `true' course of marker of the progression of human immunode®ciency virus disease, Appl. Stat. 45 (1996) 275]. Ó 2000 Elsevier Science Inc. All rights reserved.

Keywords: Backcalculation method; Chain binomial distribution; Gibbs sampler; HIV infection distribution; HIV incubation distribution; Observation model; Prior distribution; Stochastic system model

1. Introduction

To estimate the numbers of susceptible people (S people), HIV-infected people (I people) and AIDS cases, Tan and Xiang [3,4] have proposed some state space models in homosexual popu-lations. In these models, the stochastic system models are the chain multinomial and binomial

*Corresponding author. Tel.: +1-901 678 2492; fax: +1-901 678 2480.

E-mail address:waitan@memphis.edu (W.-Y. Tan).

0025-5564/00/$ - see front matter Ó 2000 Elsevier Science Inc. All rights reserved.

(2)

distributions expressed in terms of stochastic equations whereas the observation model is a sta-tistical model based on AIDS incidence data. A major problem in the classical Kalman ®lter method is that one needs to assume that the parameters are known for deriving optimal estimates or predictions of the state variables. Hence, in the HIV epidemic, to derive estimates of the numbers of S people, I people and AIDS cases, one needs to assume the probabilities of infection of S people by HIV (to be denoted bypS…t†) and the transition rates to AIDS of I people (to be denoted byc…t†) as known or can be estimated from other sources. Thus, in Tan and Xiang [3,4], the pS…t† were estimated from studies by Tan et al. [1] based on the San Francisco City Clinic Cohort (SFCCC) data set whereas the estimates of c…t† were derived by studies by Satten and Longini [2] based on the San Francisco Men's Health Study (SFMHS ) data set. In this paper, we will use the state space model to develop a general Bayesian procedure to estimate simultaneously the HIV infection distribution, the HIV incubation distribution and the numbers of S people, I people and AIDS cases over the time span. The advantages of the new method over the approach in Tan and Xiang [3,4] are: (1) One does not need to assumepS…t†and c…t†as known although if some information on these parameters are available from previous studies, the information can always be incorporated into the analysis through the Bayesian component of the method. Thus, the method is always applicable even though there is no prior information aboutpS…t†andc…t†or no data sets from previous studies to estimate these parameters [5]. (2) The method permits us to incorporate or combine information from three sources: (a) Information from the stochastic system model. (b) Information from the data set through the observation model. (c) Information onpS…t†andc…t†from previous studies through the prior distribution of these parameters. Notice that if there is no prior information on the parameters due to lack of previous studies, to im-plement the method one may always assume non-informative or uniform prior which re¯ects the situation that our prior information about the parameters is lacking or vague and imprecise.

In Sections 2 and 3, we will introduce the chain binomial model and illustrate how to develop a state space model for a large population at risk for the HIV epidemic. By using this state space model, in Sections 4 and 5 we will propose a general procedure for estimating simultaneously the HIV infection distribution, the HIV incubation distribution and the numbers of S people, I people and AIDS cases. In Section 6, we will apply the method to the San Francisco homosexual pop-ulation to estimate these distributions as well as the numbers of S people, I people and AIDS cases. Finally in Section 7, we will draw some conclusions and discuss some issues relevant to the model and the method.

2. The chain binomial model of the HIV epidemic

(3)

this person develops AIDS symptoms and/or when his/her CD4‡ T-cell counts fall below 200=mm3. In this section, we will illustrate how to develop a discrete time stochastic model for the HIV epidemic with variable infection duration in this city. (With no loss of generality we will let a month be the time unit unless otherwise stated.)

To begin with, letS…t† denote the number of S people at timet,Z…t† the number of new AIDS cases during the month‰tÿ1;t† andI…u;t†the number of I people who have contracted HIV at timetÿu…tPu†. (We referu as the infection duration of I people and denote byI…u† infective people with infection duration in‰u;u‡1†.) Suppose that at timet0 ˆ0, a few HIV were intro-duced into the population to start the HIV epidemic so that with probability one,I…u;t† ˆ0 if

uPtP0. When time is discrete, we are then entertaining a multi-dimensional discrete time

stochastic process X…t† ˆ fS…t†;I…u;t†; uˆ0;1;. . .;tg and Z…t†. For this stochastic process, let

pS…t† be the probability that a S person will contract HIV to become an I…0† person during ‰t;t‡1†andc…u;t† the probability that anI…u†person will develop AIDS symptoms to become a clinical AIDS patient during‰t;t‡1†. Further, we make the following assumptions:

1. As shown in [7], thepS…t†are functions of the dynamics of the HIV epidemic and the state vari-ables and hence are basically stochastic probabilities. However, through Monte Carlo studies, Tan and Byers [7] and Tan et al. [8] have shown that one may practically ignore randomness in

pS…t†. That is, one may derivepS…t†by replacing the random state variables by the correspond-ing expected numbers; see Remark 1. Thus in this paper, we will assume that pS…t†and c…u;t† are deterministic functions of time t. As in the literature, we further assume thatc…u;t† ˆc…u†; see [9,10].

2. Due to AIDS awareness, one may assume that there are no immigrants and recruitment for AIDS cases.

3. As the total population size changes very little over time aside from death from AIDS, for S people and I people one may assume that the numbers of immigrants and recruitment are al-most equal to those by death and migration [11,12]. This is equivalent to assuming that the im-migration and recruitment rate k equals to the death and migration rate l of people in the population. Notice that for the San Francisco homosexual population, Hethcote and Van Ark [11] have shown that the number of immigration per year is almost identical to that of mi-gration per year (about 5% annually), see also [13]; further, based on census data [14], they have estimated the death rate for people between age 24 and 54 as 0.000532 per month. Thus, for this population one may expect that this assumption would not a€ect signi®cantly the estimates of the HIV infection distribution and the HIV incubation distribution; see Remark 2.

4. As in the literature, we assume that there are no reverse transitions from I to S and from A to I; see [10,11].

(4)

dynamics of the HIV epidemic to constructpS…t†while the latter approach tries to estimatepS…t† by avoiding the dynamics of the HIV epidemic. It is to be understood that regardless of the approaches, the pS…t† are functions of the dynamics such as the mixing pattern of the epidemic expect that in the latter approach the function form is not explicitly given in terms of the dy-namics. By assuming a preferred mixing pattern, one may assume pS…t† as a mixture of two functions, one relating to the restricted mixing pattern and the other to the proportional mixing pattern. Then with the estimates ofpS…t†available, one may derive estimates of the proportion of the restricted mixing pattern a well as other parameters such as the per contact probability of transmission. This is the approach used by Tan and Xiang [3] to estimate the per contact prob-ability of HIV transmission from the I people to the S people.

Remark 2. Our Monte Carlo studies have indicated that in general this assumption has little impact on the estimates of the HIV infection distribution and the HIV incubation distribution. For the San Francisco homosexual population, the estimates of the HIV infection and the HIV incubation distribution in Section 6 are almost identical to those given in Tan and Xiang [3,4] by using other data sets. It is to be noted, however, that this assumption does have some impact on the estimates of the numbers of S people and I people. In applying our theories to the San Francisco data we have thus made some adjustment by incorporating a 1% monthly increase (i.e., kÿlˆ0:01) in estimating the numbers of S people and I people. Notice that with 50,000 people in January 1970 and with a 1% monthly increase, the estimate of the population size of the San Francisco homosexual population in 1985 is 58,048 which is very close to the survey results of 58,500 in 1985 by Lemp et al. [16].

Given the above assumptions, one may readily derive some basic results for the process

X

ˆ fS…t†;I…u;t†; uˆ0;1;. . .;tg and Z…t†. In Section 2.1, we will derive some stochastic

equa-tions forS…t†andI…u;t†; uˆ0;1;. . .;t. In Section 2.2, we will derive the probability distributions ofX…t†.

2.1. Stochastic equations forS…t†;I…u;t†; uˆ0;1;. . .;tandZ…t†

Let FS…t† denote the number of S people who have contracted HIV to become I…0† people during ‰t;t‡1† and FI…u;t† the number of I…u† people who have developed AIDS symptoms to become clinical AIDS patients during ‰t;t‡1†. Then the conditional distribution of FS…t† given

S…t† is binomial with parameters fS…t†;pS…t†g (i.e., FS…t† jS…t† BfS…t†;pS…t†g). Similarly,

FI…u;t† jI…u;t† Bf…u;t†;c…u†g. Further, under assumptions (1)±(4) given above, we have:

S…t‡1† ˆS…t† ÿFS…t†; …1†

I…0;t‡1† ˆFS…t†; …2†

I…u‡1;t‡1† ˆI…u;t† ÿFI…u;t†; uˆ0;. . .;t; …3†

Z…t‡1† ˆX

t

uˆ0

(5)

Let

e…t‡1† ˆ ‰eS…t‡1†;eI…0;t‡1†;eI…u‡1;t‡1†; uˆ0;1;. . .;t;eZ…t‡1†ŠT

denote the vector of random noises for the deviation from the respective conditional mean numbers. From the above distribution results, one may readily derive the conditional means of the random variables in the above equations. Then, by subtracting these conditional means from the respective random variables in the above equations and noting that the time unit of one hour is very small, we obtain

eS…t† ˆ ÿ‰FS…t† ÿS…t†pS…t†Š;

eI…0;t‡1† ˆ ‰FS…t† ÿS…t†pS…t†Š;

eI…u‡1;t‡1† ˆ ÿ‰FI…u;t† ÿI…u;t†c…u†Š; uˆ0;1;. . .;t;

eZ…t‡1† ˆX

t

uˆ1

‰FI…u;t† ÿI…u;t†c…u†Š:

Then, Eqs. (1)±(4) are equivalent to the following stochastic di€erence equations:

S…t‡1† ˆS…t† ÿS…t†pS…t† ‡eS…t†; …5†

I…0;t‡1† ˆS…t†pS…t† ‡eI…0;t‡1†; …6†

I…u‡1;t‡1† ˆI…u;t† ÿI…u;t†c…u† ‡eI…u‡1;t‡1†; uˆ0;1;. . .;t; …7†

Z…t‡1†…t† ˆX

t

uˆ0

I…u;t†c…u† ‡eZ…t‡1†: …8†

In Eqs. (5)±(8), givenX…t† the random noisese…t† have expectation 0. It follows that the ex-pected value of these random noises is 0. Using the basic formulae Cov…X;Y† ˆ

EfCov‰…X;Y† jZjg ‡Cov‰E…X jZ†;E…Y jZ†Š, it is also obvious that elements of e…t† are uncor-related with elements ofX…t† as well as with elements of e…s† for all t6ˆs. Further, because the random noises are basically linear combinations of binomial random variables, the variances and covariances of elements of e…t† are easily be derived.

2.2. The probability distributions ofX…t†

LetXˆ fX…1†;. . .;X…tM†g, wheretM is the last time point andHˆ fpS…t†;c…t†; tˆ1;. . .;tMg.

ThenX is the collection of all the state variables andHthe collection of all the parameters. Using results in Section 2.1, the conditional probability distribution PrfX jX…0†gofX givenX…0†is

PrfX jX…0†g ˆ Y

tMÿ1

jˆ0

PrfX…j‡1† jX…j†;Hg; …9†

(6)

Pr X…t

Notice that Eq. (10) is a product of binomial distributions so that the above distribution is re-ferred to as a chain binomial distribution.

2.3. The mean numbers ofX…t† function of HIV incubation. Then, from Eqs. (5)±(8), we have

(7)

where g…u† ˆRI…uÿ1†c…u† is the probability density function of HIV incubation. Notice that Eq. (11) is a convolution of the HIV infection distribution and the HIV incubation distribution.

3. The state space model of the HIV epidemic

State space models (Kalman ®lter models) are stochastic models consisting of two sub-models: one sub-model has been referred to as the stochastic system model which is the stochastic model of the system; the other sub-model has been referred to as the observation model which is a statistical model based on some data from the system. For the model of the HIV epidemic in Section 2, the stochastic system model of the state space model is the set of stochastic di€erential equations given in Eqs. (5)±(8) de®ned in Section 2; the observation model of the state space model is a statistical model based on some available AIDS incidence data.

For the observation model, let Y…j† be the observed number of new AIDS cases during the month‰j;j‡1†. Then, the observation model is given by the equation

Y…j† ˆZ…j‡1† ‡e…j† ˆEZ…j‡1† ‡…j† ‡e…j†; …12† where…j† ˆZ…j‡1† ÿEZ…j‡1†ande…j†is the measurement error (reporting error for reporting AIDS incidence) for observing Y…j†. Because reporting delay has been corrected for CDC sur-veillance data, one may assume that the e…j†s are independently normally distributed with mean zero and conditional variance given Z…j‡1† as r2

j ˆZ…j‡1†r

2; for justi®cation for assuming

such a variance, see [4]. Thus, the conditional probability density of the observation Yˆ fY…1†;. . .;Y…tM†g incubation which is the basic formula used by the backcalculation method (see [9,10]). It follows that the backcalculation method as given in [9,10] is a special case of the above observation model.

To present the state space model in matrix form, let

(8)

whereF…t‡1;t† is given by

4. A general procedure for simultaneously estimating the state variables and the unknown parameters

Consider a state space model with stochastic system model given by (15) and with observation model given by (16). random noises and e…j‡1† are the random measurement error.

Let H denote the unknown parameters in F…t‡1;j†, H…j‡1† and in the probability distri-butions ofe…j‡1† and e

…j‡1†. LetP…X jH† be the probability density function of X given H

and X…0† derived from the stochastic system model and P…Y jX;H† the probability density function of Y givenfX;Hgderived from the observation model. (P…Y jX;H†is usually referred to as the likelihood function of the parameters.) LetP…H† be the prior distribution ofHderived from previous studies or from prior knowledge aboutH. (If there is no prior information or the prior information is vague and imprecise, one usually assumes a non-informative or uniform prior; see [17].) Based on the type of probability distributions being used, the standard inference in the literature may be classi®ed as:

1. The Sampling Theory Inference: GivenX, inference aboutHis derived only from the likelihood function P…Y jX;H†. For example, the backcalculation method in the HIV epidemic derives estimate ofHˆ fpS…t†; tˆ1;. . .;tMgby maximizingP…Y jX;H†, see [9,10]; these are the

max-imum likelihood estimator (MLE) of H.

2. The Bayesian Inference: GivenX, the Bayesian inference aboutHis derived from the posterior distribution ofHwhich is proportional to the product ofP…H†andP…Y jX;H†. For example, one may use the posterior meanfHjX;YgofHˆ fpS…t†;c…t†; tˆ1;. . .;tMggivenfX;Ygor

the posterior mode of Hgiven fX;Yg as an estimate of H. These are the empirical Bayesian estimate ofH; see [18,19].

(9)

published to date; see, for example, [20±22]. For example, by using estimates ofpS…t† andc…t† from other sources, Tan and Xiang [3,4] have estimated the numbers of S people, I people and AIDS cases in the homosexual population.

In the above, notice that in the sampling theory inference, the prior information aboutHand the information about X from the stochastic system model are completely ignored; in the Bayesian inference, the information from the stochastic system model has been ignored. In the classical Kalman ®lter theories, the parameters H are assumed known. Thus, in each of these cases, some information has been lost or ignored. In this section, we proceed to develop a general procedure to estimate simultaneously the unknown parameters and the state variables by using the multi-level Gibbs sampler method [23±25]. We will call this method a general Bayesian method because it not only combines information from the likelihood and the prior distribution but also incorporates information from the stochastic system model. To proceed, note ®rst that the joint probability density function of …H;X;Y† is

P…H;X;Y† ˆP…H†P…X jH†P…YjX;H†: …17† Thus the conditional distribution of X given by…Y;H† is

P…X jY;H† /P…X jH†P…Y jX;H†; …18† and the conditional distribution ofH given by (Y, X) is

P…HjY;X† /P…H†P…XjH†P…Y jX;H†: …19† The multi-level Gibb's sampler method is a Monte Carlo method to estimate P…X jY† (the conditional distribution of X given Y) and P…HjY† (the posterior distribution of H given Y) through a sequential procedure. The algorithm of this method iterates through the following loop:

(1) GivenH…† andY, generateX…†from P…XjY;H…††.

(2) GenerateH…† from P…HjY;X…††, whereX…† is the value obtained in (1).

(3) UsingH…†obtained from (2) as initial values, go back to (1) and repeat the (1)±(2) loop until convergence.

Since in practice it is often very dicult to derive P…X jY;H†whereas it is easy to generateX from P…X jH†, we will apply the weighted bootstrap method due to Smith and Gelfand [26] to generate X from P…XjY;H†. The algorithm of the weighted bootstrap method is given by the following steps (for proof, see [26]):

(a) Given H…† and X…j†, generate a large random sample of size N for X…j‡1† by using

PfX…j‡1† jX…j†g; denote it by fX…1†…j‡1†;. . .;X…N†…j‡1†g.

(b) Compute wk and qk from P…Y…j‡1† jX…s†, sˆ0;1;. . .;j‡1;H…††, kˆ1;. . .;N, where wkˆP…Y…j‡1† jX…k†…s†, sˆ0;1;. . .;j‡1;H…†† and qkˆwk=

PN

iˆ1wi.

(c) Construct a population P with elements fE1;. . .;ENg and with P…Ek† ˆqk. (Note

PN

iˆ1qi ˆ1.) Draw an element randomly fromP. If the outcome is Ek, thenX…k†…j‡1† is an

element generated from the conditional distribution of X given the observed data and given the parameter values.

Starting with jˆ0 and continuing until jˆtM, by combining the above two iterative

(10)

5. A general procedure for simultaneously estimating the HIV infection, the HIV incubation and the numbers of S people, I people and AIDS cases

In this section, we apply the general theory of Section 4 to develop a general procedure to es-timate simultaneously the HIV infection distribution, the HIV incubation distribution as well as the numbers of S people, I people and AIDS cases in the model given in Section 3. For this model,

P…XjX…0†;H†is given by Eq. (9) andP…Y jX;H†is given by Eq. (14). We will estimate the prior distributionP…H†by using results from previous studies. (This is the empirical Bayesian approach.)

5.1. The prior distributionP…H†

Since the HIV incubation is usually not a€ected by HIV infection [9,10], we assume that a prior h1

Similarly, a natural conjugate prior forh2

is

the prior distributions can be estimated from previous studies. For example, suppose that a prior study with sample sizenhas been conducted to estimate the HIV infection distributionfI…t†. Let

(11)

non-informative uniform prior. This is equivalent to no prior information; in this case the results of Bayesian approach are equivalent to the results from the sampling theory approach numeri-cally although the two approaches are very di€erent conceptually.

5.2. GeneratingX from the conditional density P…X jH;Y†

To use the weighted bootstrap method as described in Section 4, we will need to generate X fromP…X jH†. This can be achieved by using the stochastic Eqs. (1)±(4) given in Section 2. Thus, given X…j† ˆ fS…j†;I…u;j†;uˆ0;1;. . .;jg and given the parameter values, we use the binomial generator to generate FS…t† and FI…u;t† through the conditional binomial distributions

FS…t† jS…t† BfS…t†;pS…t†g and FI…u;t† jI…u;t† BfI…u;t†;c…u†g. These lead to S…t‡1† ˆ

S…t† ÿFS…t†, I…0;t‡1† ˆFS…t†, I…u‡1;t‡1† ˆI…u;t† ÿFI…u;t†, uˆ0;1;. . .;t and Z…t‡1† ˆ

Pt

uˆ0FI…u;t†. The binomial generator is readily available from the IMSL subroutines [28] or other software packages such as SAS. With the generation ofX fromP…X jH†, one may then apply the weighted bootstrap method to generateX from P…X jY;H†.

5.3. GeneratingHfrom the conditional density P…HjX;Y†

Using Eqs. (1)±(4) given in Section 2, and the prior distribution from Section 5.1, we obtain

P…HjX;Y† / Y

The above equation shows that the conditional distribution ofpS…t†givenX and givenYis a b-distribution with parameters fI…0;t‡1† ‡a1…t†;S…t‡1† ‡a2…t†g: Similarly, the conditional dis-tribution of c…t† given X and given Y is a b-distribution with parameters fc1…u† ‡b1…u†;

c2…u† ‡b2…u†g. Since generating a large sample from theb-distribution to give sample means are numerically identical to compute the mean values from theb-distribution, the estimates of pS…t† and c…t† are then given by

We will use these estimates as the generated sample means.

Using the above approach, we can readily estimate simultaneously the numbers of S people, I people and AIDS cases as well as the parametersfpS…t†;c…t†g. With the estimation offpS…t†;c…t†g, one may readily estimate the HIV infection distributionfI…t†and the HIV incubation distribution

g…t†through the formulafI…t† ˆpS…t†Qtÿ1

iˆ1…1ÿpS…i††andg…t† ˆc…t†

Qtÿ1

(12)

6. Simultaneous estimation of the HIV infection, the HIV incubation and the numbers of S people, I people and AIDS cases in the San Francisco homosexual population

As an application of the method given in the previous section, in this section we proceed to estimate simultaneously the HIV infection distribution, the HIV incubation distribution and the numbers of S people, I people and AIDS cases in the San Francisco homosexual population. For this population, the number of the monthly AIDS incidence and the monthly death from AIDS are available from January 1981 through December of 1994 from the gopher server of the CDC at Atlanta, GA. This data set is given in Table 1 and is used to construct the observation model of the state space model. (To avoid the problem of reporting delay and the confusion caused by the change of new AIDS case de®nition e€ective in January 1993, we have used the data only up to December 1992.)

6.1. The initial size

Since the average AIDS incubation period is around 10 years and since the ®rst AIDS case was reported in 1981, as in [3], we assume 1 January 1970 ast0ˆ0. It is also assumed that at time 0 there are no AIDS cases and no HIV infected people with infection durationu>0 but to start the HIV epidemic, some HIV were introduced into the population at time 0.

For the initial population size at time 0 in the city of San Fransisco, we follow [3] to assume that

S…0† ˆ40 000 and I…0;0† ˆ36. Following [3], we also assume that there were 10 000 more S people who would not contribute to AIDS so that there were 50 000 S people at time 0; for more details, see [3].

6.2. The prior distributions of pS…t†andc…u†

For the San Francisco homosexual population, Tan and Xiang [3,4,18] have estimated both the HIV infection density fI…t† and the HIV incubation density g…t† by using the SFCCC data

Table 1

San Francisco AIDS case report for 1981±1994 by month of primary diagnosisa

Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

1981 1 3 2 1 1 3 3 3 5 3 3 8

1982 6 5 0 6 6 15 12 10 10 14 20 12

1983 24 19 31 25 19 21 27 35 28 31 26 31

1984 46 32 39 43 39 48 67 59 70 54 60 56

1985 77 62 72 77 73 80 94 89 76 89 70 89

1986 104 93 114 100 103 109 121 138 112 149 99 142

1987 136 137 142 130 149 149 150 148 161 140 123 131

1988 156 144 184 141 130 155 139 136 156 117 132 155

1989 159 141 179 197 163 201 169 160 138 155 138 142

1990 200 174 191 156 180 173 176 195 156 175 182 150

1991 220 195 195 190 208 193 229 243 220 303 227 241

1992 286 303 229 215 218 241 267 244 249 240 196 226

a

(13)

available from CDC, and the SFMHS data, respectively. Letf^I…j†and^g…j†denote the estimates of

fI…j† and g…j†, respectively. Since the sample sizes of the SFCCC data is nˆ1095, we have

n…j† ˆ1095f^I…j† and N…j† ˆ

PtM

lˆjn…l†. (For practical purpose, one may take tM as tM ˆ360

months.) Thus,a1…j† ˆn…j† ‡1 anda2…j† ˆN…j† ÿn…j† ‡1. These estimates are given in Table 2. Similarly, since the minimum size of the SFMHS data is 711, we have m…j† ˆ711g^…j† and

M…j† ˆPtM

lˆjm…l†so thatb1…j† ˆm…j† ‡1 andb2…j† ˆM…j† ÿm…j† ‡1. These estimates are given

in Table 3.

Given the above prior distribution and given fS…0†;I…0;0† ˆI…0†g, by using the procedures given in Section 5, one may readily derive simultaneously the estimates of the HIV infection distribution, the HIV incubation distribution and the numbers of S people, I people and AIDS cases over the time span. These results are plotted in Figs. 1±3. Given below we summarize our basic ®ndings:

Table 2

Prior information for infection distribution

Time Jun 1977 Dec 1977 Jun 1978 Dec 1978 Jun 1979 Dec 1979 Jun 1980 Dec 1980

a1…t† 4.03 5.92 7.42 8.39 11.15 15.61 15.50 14.02

a2…t† 1060.42 1033.92 1001.62 957.60 903.19 829.79 733.92 650.26

Time Jun 1981 Dec 1981 Jun 1982 Dec 1982 Jun 1983 Dec 1983 Jun 1984 Dec 1984

a1…t† 13.53 10.25 6.67 4.95 3.40 2.45 2.03 1.85

a2…t† 574.57 511.87 469.05 441.70 424.51 413.57 406.48 401.15

Time Jun 1985 Dec 1985 Jun 1986 Dec 1986 Jun 1987 Dec 1987 Jun 1988 Dec 1988 a1…t† 1.52 1.52 1.28 1.31 1.61 1.87 1.47 1.25 a2…t† 397.31 394.28 391.98 390.37 387.22 383.12 377.77 375.83

Time Jun 1989 Dec 1989 Jun 1990 Dec 1990 Jun 1991 Dec 1991 Jun 1992 Dec 1992

a1…t† 1.41 1.18 1.22 1.69 5.09 9.33 10.25 9.04

a2…t† 373.92 372.56 371.38 368.87 355.00 313.97 259.07 207.49

Table 3

Prior information for incubation distribution

Time Jun 1977 Dec 1977 Jun 1978 Dec 1978 Jun 1979 Dec 1979 Jun 1980 Dec 1980

b1…t† 5.34 5.46 5.51 5.52 5.48 5.40 5.29 5.15 b2…t† 528.82 502.34 475.37 448.24 421.23 394.61 368.58 343.32

Time Jun 1981 Dec 1981 Jun 1982 Dec 1982 Jun 1983 Dec 1983 Jun 1984 Dec 1984 b1…t† 4.99 4.82 4.63 4.44 4.25 4.05 3.86 3.67 b2…t† 318.96 295.62 273.36 252.24 232.27 213.48 195.84 179.34

Time Jun 1985 Dec 1985 Jun 1986 Dec 1986 Jun 1987 Dec 1987 Jun 1988 Dec 1988 b1…t† 3.49 3.31 3.15 2.99 2.84 2.69 2.56 2.44 b2…t† 163.95 149.63 136.34 124.02 112.62 102.10 92.41 83.48

Time Jun 1989 Dec 1989 Jun 1990 Dec 1990 Jun 1991 Dec 1991 Jun 1992 Dec 1992 b1…t† 2.32 2.21 2.11 2.02 1.93 1.85 1.78 1.71

(14)

(a) From Fig. 1, the estimated density of the HIV infection clearly showed a mixture of dis-tributions with two obvious peaks. The ®rst peak occurs in May 1981 and is exactly two months earlier (July 1981) than the estimated peak of sero-conversion by Bacchetti [29] and by Tan et al. [1]. The second peak occurs at June 1995 and is considerably lower than that of the ®rst peak. Comparing the estimated density of the HIV infection in Fig. 1 with the estimated density of the HIV sero-conversion by Tan et al. [1], one may note that the two curves are quite similar to each other.

(b) From Fig. 2, the estimated density of the HIV incubation distribution also appeared to be a mixture of distributions with two peaks. The higher peak occurs at around 143 months after infection and the lower peak occurs around 83 months after infection. This result seems to suggest a staged model for HIV incubation as used by Satten and Longini [2].

Fig. 1. Plots of the estimated HIV infection distribution.

(15)

(c) From Fig. 3(a), we observe that the estimates of the AIDS incidence by the Gibbs sampler are almost identical to the corresponding observed AIDS incidence, respectively, suggesting the usefulness of the method. This result indicates that the estimates by the Gibbs sampler can trace the observed values very closely.

(d) To estimate the number of S people and I people, we ®gured in a 1% increase in population size annually and assumed a population size of 50,000 att0ˆ0. Then, as shown in Fig. 3(b), the total number of S people before January 1978 were always above 50 000 and were between 31 000 and 32 000 during January 1983 and January 1992. The total number of people who do not have AIDS were estimated around 50,000 before January 1992.

(e) Results in Fig. 3(c) showed that the total number of infected people reached a peak around the middle of 1985 and then decreased gradually to the lowest level around 1992. The results before 1992 appeared to be consistent with those obtained by Bacchetti et al. [30] through backcalculation method.

(16)

(f) To assess in¯uence of prior information on fpS…t†;c…t†g, we plot in Figs. 1 and 2 the

esti-mates of the HIV infection and the HIV incubation under both with and without (i.e., non-in-formative uniform prior) prior information. The results show clearly that the prior information seems to have little e€ects, especially in the case of HIV infection.

(g) To start the procedure, one needs some initial parameter values for pS…t† and c…u†. In this paper, we ®rst assume an initial incubation distribution with a mean of 10 yr and derive estimates of the infection distribution by using the standard backcalculation method (see [9,10]). This as-sumed incubation distribution and the associated estimate of the infection distribution will then be used to give initial values for the parameters pS…t† and c…u†. To check e€ects of the initial incubation distribution, we have assumed di€erent incubation distributions as the initial assumed distribution. These assumed distributions include uniform distribution, exponential distribution, c-distribution, Weibull distribution and the generalizedc-distributions with the same mean value of 10 yr. We are elated to ®nd out that all initial distributions gave almost identical estimates. (As an illustration, we plot in Fig. 4 the estimated HIV infection distributions under four di€erent initial incubation distribution.) This robustness property indicates that the procedure is quite independent of the initial values of fpS…t†;c…u†g.

7. Conclusions and discussion

In the classical Kalman ®lter method, one has to assume the parameters as known in order to derive optimal estimates or predicted values of the state variables. For example, in the HIV epidemic, one has to assume that the probabilities pS…t†and the transition rate c…t† as known in

(17)

order to derive optimal estimates of the numbers of S people, I people and AIDS cases. In this paper, we have developed a general procedure to estimate simultaneously the state variables and the unknown parameters in HIV epidemic via the state space models. By using the San Francisco homosexual population as an example, we have illustrated how to use the methodology to esti-mate simultaneously the HIV infection distribution, the HIV incubation distribution as well as the numbers of S people, I people and AIDS cases in this population. From this analysis we have drawn the following conclusions:

(1) The estimates of the AIDS cases traced the observed AIDS cases extremely well.

(2) Our analysis predicted two waves of HIV infection. The ®rst wave peaked around the middle of 1985, a result which was consistent with ®ndings by Bacchetti et al. [30]. However, our analysis predicted a second wave of HIV infection which will peak some time around the year 2000. This important message indicates that there is a high proportion of restricted mixing (i.e., like with like mixing) among the San Francisco homosexual population as have been suggested by Tan et al. [31] and by Becker and Egerton [32].

(3) In studying HIV epidemic, Satten and Longini [2] have used a Markov-staged model which partitions the infective stage into ®ve substages based on the number of CD4‡ T-cell counts per

mm3. It follows that the probability distribution of the HIV incubation is a mixture of several exponential distributions. (For proof of these results, see [33,34].) Our estimates show that the probability distribution of the HIV incubation is a mixture of distributions with two obvious peaks, thus providing strong support for the staging of infective stage by Satten and Longini [2]. (4) Results from Figs. 1 and 2 have shown that the estimated curves of the HIV infection distribution and the HIV incubation distribution by using conjugate priors do not di€er signi®-cantly from those by using non-informative uniform prior. Since non-informative uniform prior corresponds to no prior information, these results suggest that our approach with information only from the data and from the stochastic system model have provided almost all information on the HIV epidemic.

(18)

and the proportional mixing pattern, respectively. Then, as illustrated in [3], with the estimates of

pS…t† and the state variables, one can always estimate the relevant parameters inpS…t†.

In deriving the results, we have assumed that during a one month period, the numbers of immigrants and recruitment of the S people and I people equal to those of the death and mi-gration out of these people, respectively. Our Monte Carlo studies seemed to indicate that this assumption has little impact on the estimates of the HIV infection distribution and the HIV in-cubation distribution. As a further con®rmation, we note that for the San Francisco homosexual population, the estimates of the HIV infection distribution and the incubation distribution are almost identical to those derived before by Tan and Xiang [3,4] by using other data sets. We note, however, that this assumption does have some impact on the estimation of the number of S people and I people. To see this, denote by fkS;kIg the immigration rates of S people and I people, respectively, and flS;lIg the death rates of S people and I people, respectively. To account for e€ects of immigration and death, one needs then to add S…t†…kSÿlS† to Eq. (1) and add

I…u;t†…kSÿlI†to Eq. (3), respectively. Thus, if assumption (2) in Section 2 fails, then one would expect that the method would underestimate these numbers. To correct this, we have ®gured a 1% increase in estimating the numbers of S people and I people. Notice that with these adjustments, the estimates of the numbers of S people and I people are almost identical to those given in [3,4]. In the studies of the HIV epidemic, Brookmeyer and Gail [9,10] have proposed a backcalcu-lation method to estimate the HIV infection and to give short term projection of future AIDS cases. This method uses AIDS incidence data and is based on the formulation that the distribution of the time to AIDS onset is a convolution of the distribution of the HIV infection and the distribution of the HIV incubation. However, there are two major diculties associated with this method. First, the method is not identi®able if both the distribution of the HIV infection and the distribution of the HIV incubation are unknown. Hence, one would need to assume the distri-bution of the HIV incubation as known if one wants to estimate the distridistri-bution of HIV infection [9]. Similarly, one would need to assume the distribution of HIV infection as known if one wants to estimate the distribution of the HIV incubation [35]. Second, the method is very sensitive to the choice of the distribution of the HIV incubation or the distribution of the HIV infection [10,30,36]. In this paper, we have solved these problems through the state space models. In Sections 2 and 3, we have in fact shown that the backcalculation method is a special case of the observation model. Thus, in addition to information from the data, the stochastic system model has provided additional information from the system, thus helping solve the identi®ability problem confronting the backcalculation method.

(19)

Acknowledgements

The research of this paper was partially supported by a research grant from National Institute of Allergy and Infections Diseases/NIH, Grant No. R21 AI31869.

References

[1] W.Y. Tan, S.C. Tang, S.R. Lee, Estimation of HIV seroconversion and e€ects of age in San Francisco homosexual populations, J. Appl. Stat. 25 (1998) 85.

[2] G. Satten, I. Longini, Markov chain with measurement error: estimating the `true' course of marker of the progression of human immunode®ciency virus disease, Appl. Stat. 45 (1996) 275.

[3] W.Y. Tan, Z. Xiang, The state space model of the HIV epidemic in homosexual populations and some applications, Math. Biosci. 152 (1998) 26.

[4] W.Y. Tan, Z. Xiang, The state space model of the HIV epidemic with variable infection in homosexual populations, J. Stat. Planning Inferance 78 (1999) 71.

[5] W.Y. Tan, Z.Z. Ye, Some state space models of the HIV epidemic and applications for the estimation of HIV infection and HIV incubation, Comm. Stat. (Theory Method) 29 (2000) 1059.

[6] CDC, 1993, Revised classi®cation system for HIV infection and expanded surveillance case de®nition for AIDS among adolescents and adults, MMWR, 41, No. RR17, 1992.

[7] W.Y. Tan, R.H. Byers, A stochastic model of the HIV epidemic and the HIV infection distribution in a homosexual population, Math. Biosci. 113 (1993) 115.

[8] W.Y. Tan, S.C. Tang, S.R. Lee, E€ects of randomness of risk factors on the HIV epidemic in homosexual populations, SIAM J. Appl. Math. 55 (1995) 1697.

[9] R. Brookmeyer, H.M. Gail, A method for short-term projections and lower bounds on the size of the AIDS epidemic, J. Am. Stat. Assoc. 83 (1988) 301.

[10] R. Brookmeyer, M.H. Gail, AIDS Epidemiology: A Quantitative Approach, Oxford University, Oxford, UK, 1994.

[11] H.W. Hethcote, J.W. Van Ark, Modeling HIV transmission and AIDS in the United States, Lecture Notes in Biomath., Springer, Berlin, 1992.

[12] N.T.J. Bailey, Prediction and validation in the public health modelling of HIV/AIDS, Stat. Med. 13 (1994) 1933. [13] H.W. Hethcote, J.W. Van Ark, I.M. Longini, A simulation model of AIDS in San Francisco: I. Model formulation

and parameter estimation, Math. Biosci. 106 (1991) 203.

[14] U.S. Bureau of the Census, Statistical Abstract of the United States: 108th Ed., Washington, DC, 1987. [15] J.A. Jacquez, C.P. Simon, J. Koopman, L. Sattenspiel, T. Perry, Modelling and analyzing HIV transmission: The

e€ect of contact patterns, Math. Biosci. 92 (1988) 119.

[16] G.F. Lemp, S.F. Payne, G.W. Rutherford, et al., Projections of AIDS morbidity and mortality in San Francisco, J. Am. Med. Assoc. 263 (1990) 1497.

[17] G.E.P. Box, G.C. Tiao, Bayesian Inference in Statistical Analysis, Addison-Wesley, Reading, MA, 1973. [18] W.Y. Tan, Z. Xiang, Bayesian estimation of HIV infection and incubation through backcalculation method,

Invited paper at the IMS Asian and Paci®c Region and ICSA meeting in Taipei, Taiwan, 7±9 July 1997. [19] Z.H. Xiang, Modelling the HIV epidemic: part I. Bayesian estimation of the HIV infection and incubation via

backcalculation: part II. The state space model of the HIV epidemic in homosexual populations, PhD thesis, 1997, Department of Mathematical Sciences, University of Memphis, TN, USA.

[20] D.E. Catlin, Estimation, Control and Discrete Kalman Filter, Springer, New York, 1989. [21] A. Gelb, Applied Optimal Estimation, MIT Press, Cambridge, MA, 1974.

[22] A.P. Sage, J.L. Melsa, Estimation Theory With Application to Communication and Control, McGraw-Hill, New York, 1971.

[23] N. Shephard, Partial non-Gaussian state space, Biometrika 81 (1994) 115.

(20)

[25] G. Kitagawa, A self organizing state space model, J. Am. Stat. Assoc. 93 (1998) 1203.

[26] A.F.M. Smith, A.E. Gelfand, Bayesian statistics without tears: A sampling±resampling perspective, Am. Stat. 46 (1992) 84.

[27] H. Rai€a, R. Schlai€er, Applied Statistical Decision Theory, MIT Press, Cambridge, MA, 1968. [28] IMSL, MATH/LIBRARY User's Manual, IMSL, Houston, TX, US, 1989.

[29] P. Bacchetti, Estimating the incubation period of AIDS comparing population infection and diagnosis pattern, J. Am. Stat. Assoc. 85 (1990) 1002.

[30] P.R. Bacchetti, M.R. Segal, N.P. Jewell, Backcalculation of HIV infection rates, Stat. Sci. 8 (1993) 82.

[31] W.Y. Tan, S.R. Lee, S.C. Tang, Characterization of HIV infection and seroconversion by a stochastic model of HIV epidemic, Math. Biosci. 126 (1995) 81.

[32] N.G. Becker, L.R. Egerton, A transmission model for HIV with application to the Australian epidemic, Math. Biosci. 119 (1994) 205.

[33] W.Y. Tan, On the incubation distributions of the HIV epidemic, Stat. Probability Lett. 18 (1993) 279.

[34] W.Y. Tan, S.C. Tang, S.R. Lee, Characterization of HIV incubation and some comparative studies, Statist. Med. 15 (1996) 197.

[35] P. Bacchetti, A.R. Moss, Incubation period of AIDS in San Francisco, Nature 338 (1989) 251.

Referensi

Dokumen terkait

Berdasarkan kondisi tersebut peneliti tertarik untuk menelaah secara lebih mendalam terhadap upaya yang dilakukan pengasuh di TPA yang ada di Kecamatan Coblong

PELAKSANAAN PEMBELAJARAN KEMANDIRIAN ACTIVITY OF DAILY LIVING ANAK LOW VISION SEKOLAH DASAR KELAS IV DI SLB NEGERI A KOTA BANDUNG.. Universitas Pendidikan Indonesia |

Sosiologi pendidikan mencoba mengkaji hubungan antara fenomena yang terjadi dalam masyarakat dengan pendidikan; sosiologi agama yang mempelajari hubungan antara fenomena

Dengan desain penelitian One – Shot Case Study, desain Instrument dalam penelitian ini adalah menggunakan Accutrend Lactacid untuk mengukur asam laktat.. Uji

20 Otonomi Daerah, Pemerintahan Umum, Administrasi Keuangan Daerah, Perangkat Daerah, Kepegawaian dan Persandian Unit Organisasi

Hasil penelitian hubungan menunjukkan tidak terdapat hubungan yang siginifikan antara variabel faktor individual pengusaha mikro dan kecil sektor formal dengan kinerja

Eniki n Bffi Acan Feqtshan (MP) ini dibuat'@rgan sebenanrya dan dftaMahngani bersarna wakil Pesta Ldarlg dan Pani$a FengBdaan Bararq/Jasa, ,nduk

Kepedulian kalangan yang tidak berada di struktur kepemimpinan Muhammadiyah, sebut saja kalangan kultural, terhadap masalah tersebut bukannya didukung