Summary - PDF eres.library.adelaide.edu.au

Chapter 4 Importance sampling for multiple observations of a single outbreak

In this chapter we develop a particle filtering approach for modelling epidemics when observations are made of more than a single event type. The particle filter uses importance sampling to simulate realisations of an epidemic model which are consistent with more than one observed event type. This particle filter is used in a pmMH procedure to conduct inference on the parameters of an outbreak of Ebola in Kikwit, 1995. The results are compared against previous studies of the Kikwit outbreak which use data-augmented MCMC (DA-MCMC) and we see strong agreement with our method.

4.1 Case study: 1995 DRC Ebola outbreak

The outbreak of Ebola in 1995 in the Democratic Republic of the Congo (DRC) was one of the largest outbreaks in the country [13, 41]. The outbreak began on January 6, 1995 and ended 191 days later on July 16 [41]. Intervention measures were introduced on May 9, 123 days into the outbreak, and these measures primarily concerned education of the public, as well as increased use of personal protective equipment (PPE) amongst health care workers [18, 41, 45]. It is documented that on January 6, a charcoal mine worker was identified with a case of Ebola, but that this was only confirmed to be the causative agent following an analysis of specimens from the early stages of the outbreak on the May 9 [41].

An outbreak was confirmed on March 2 which means there is a period of 55 days without reports. During this time it is reasonable to assume that there were not widespread cases, as otherwise authorities would have been alerted to the outbreak earlier.

0 50 100 150 200 0

5 10 15

Time (days)

Observedonsets

(a)

0 50 100 150 200

0.0 2.5 5.0 7.5 10.0 12.5

Time (days)

Observedremovals

(b)

Figure 4.1: Daily onset and removal incidence for the 1995 DRC Ebola outbreak. In- terventions were introduced on May 9th (123 days after the first case) and this date is indicated by the red line. The index case on 6th of January is taken as the initial condition.

The dataset is comprised of two time series: one for incidence of symptom onset which coincides with infectiousness (which is a reasonable assumption for Ebola), and another for removal incidence. The two time series are at daily resolution and feature the counts of the number of individuals who either developed symptoms, or died, respectively. A total of 316 individuals were identified to have been infected over the course of the outbreak including the index case. A number of the onset and removal times were missing and as such only the times of 291 onsets and 236 deaths were known up to daily precision. Daily incidence are shown in Figure 4.1 and we can see the lack of observations over the first 54 days as well as the time reporting began and the time interventions were introduced.

From the difference between the total number of reported onsets and removals to the reported final size, this suggests that there are at least 25 missing dates of symptom onset and a further 80 missing dates of removal.

Z

₁

Z

₂

Z

₃

S E I R

Figure 4.2: The SEIR compartmental model. Details the transitions of individuals through the compartments. The variables Z₂ and Z₃ correspond to the observed onset and removal events.

Throughout this chapter we consider the same SEIR model (see Figure 4.2) used by

67 4.1. Case study: 1995 DRC Ebola outbreak Lekone and Finkenst¨adt [45] but provide some modifications to account for the missing onset and removal times. Our results will be compared against those obtained in previous studies [18, 45, 51, 55]. Each of these papers takes a different approach but produces (relatively) comparable results. Lekone and Finkenst¨adt [45] was the original motivation for development of these methods and consideration of this dataset, and their approach was to approximate the process in discrete-time using a chain-binomial model. Many of the assumptions surrounding both the model and their approach are reasonably explained but there is little model checking to demonstrate whether the approach they took appro- priately captures the heterogeneity in the process. The more recent study of McKinley et al.[51] provides a comparison of DA-MCMC and Approximate-Bayesian-Computation (ABC) approaches on the same dataset. The approach in [51] is an extension of the approach taken in [45], but they do not approximate the process with a binomial or assume that the process evolves in discrete time. Removal of these assumptions leads to a model which appears to more reasonably capture the full dynamics of the process. The DA- MCMC approach in particular provides a good basis to compare our methods against as it targets the exact posterior. Using this model means that we assume all individuals are removed and transition to the same compartment and so may not necessarily die upon transitioning to the removal compartment. This means the average infectious period will be estimated over individuals who die and recover.

The issue with the analysis in [55] is that the MCMC approach is not well documented.

Priors are not mentioned and it appears that they fit their model only to the available data and do not account for any missing information. A further issue with the study is that some of the parameter estimates are grossly inconsistent with the previously presented estimates, specifically the incubation period, which they estimate to be 1.65 and 1.69 days for the least squares approach and MCMC approaches, respectively. In the literature the incubation period is on average estimated to be around 6 days, and this tends to give weight to the idea that the priors used may have been poorly informed. For this reason we consider only comparisons against the other three analyses ([18, 45, 51]). The major inconsistency with comparing results to these previous studies are that Chowell et al.

[18] and Lekone and Finkenst¨adt [45] appear to consider only the second phase (day 55 onwards) in their model fitting. In contrast, McKinley et al. [51] fit their model to the entire duration of the outbreak taking the 6th of January as the initial day. We also take the 6th of January to be the date the outbreak began. We assume the same population size used in [18, 45, 51] ofN = 5,364,500.

The first and perhaps most appropriate approach to handle the missing data is that of data-augmented MCMC (DA-MCMC), which relies on inferring the missing times and the entire missing latent curve, as well as the parameters [33, 58]. When all the event times are known, the calculation of the likelihood is trivial. This approach is outlined in Gibson [28] and essentially involves proposing moves to the missing event times and

analytically calculating the likelihood. While this method is appropriate, it does suffer from some issues. It can be challenging to determine that the DA-MCMC approach has indeed converged to the appropriate stationary distribution [52, 58]. Furthermore, it scales poorly in higher dimensions [58]. We highlight this issue of scaling as we aim to conduct inference on multiple outbreaks of the Zaire ebolavirus using a hierarchical model in Chapter 5. Due to the sheer amount of data, the dimension of the augmented parameter space is in the order of around 1,000 dimensions. The reason for this high-dimensional space is that the latent curve for each outbreak is completely missing which drastically increases the dimensionality of the problem. The DA-MCMC approach would prove to be computationally inefficient for such a problem and it would be very challenging to determine that the chains had converged. Here we propose a pmMH approach with a particle filter that uses importance sampling to match the counts in the two time series exactly and account for some missing data.

Dalam dokumen PDF eres.library.adelaide.edu.au (Halaman 73-78)