epidemic surveillance of novel coronavirus 2019

This dissertation/thesis entitled "2019 NOVEL CORONAVIRUS EPIDEMIC SURVEILLANCE THROUGH PROBABILISTIC MODELS" was prepared by SHERILYNN NGERNG SIEW FONG and submitted as partial fulfillment of the requirements for the Master of Mathematics degree at Rayon Universiti. Due acknowledgments should always be made to any material contained in or derived from this report.

Introduction

Background of Study

Based on the severity of the epidemic, precautions should be taken in advance to contain and prevent a subsequent outbreak. A more detailed description and evaluation of the three methods mentioned above are covered in the literature review section of this study.

Importance of the Study

In this research study, we will build an implementation version for each of the three approaches to explore their properties as a covid-19 surveillance system to develop suggestions to integrate the strength of different methods to compensate for the individual system's weakness. Therefore, this study could provide insight into the shortcomings of the models related to this non-historical but long-lasting pandemic and show how they complement each other for monitoring covid-19.

Problem Statements

This study is important because no one expected covid-19 to last longer than its predecessor (the SARS outbreak) while affecting our livelihoods for two years and beyond. Most of the model studies discussed have been conducted with seasonal diseases or diseases rich in historical data.

Objectives

In addition, information about the coronavirus is not reflected on a large scale to observe and identify the possible disease hotspot or origin in real time. According to this research study, a network visualization of the disease distribution will be mapped by the R Studios client to illustrate the studied epidemic impact.

Project Scope

Overview of Epidemic Surveillance Models

2016 Quasi-Poisson versus negative binomial regression models in identifying factors influencing initial change in CD4 cell count due to antiretroviral therapy administered to HIV-positive adults in Northwestern Ethiopia (Amhara Region) 2020 Investigating the Significant Individual Historical Factors of Driving Risk Using Hierarchical Clustering Analysis and Quasi-Poisson Regression Model 2020 Modeling Burglary Incident Data. 2017 Applying spatio-temporal models to assess variations across health care areas and regions: lessons from the decentralized Spanish national health system.

It is clear that the EARS detection algorithms are highly dependent on the choice of syndrome definition, which influences the daily syndrome counts (Hagen et al., 2011). This justifies the view that a quasi-Poisson regression is preferable in accurately predicting diurnal alarms (Noufaily et al., 2019).

Research Flowchart

Based on Figure 1, this research study assumed a hypothetical environment that allows reporting of control data to be immediate. Therefore, it is hypothetically recognized that in this research study, the control data is automatically obtained through the blockchain system. The chief complaint is then interpreted into covid-19 symptoms and diagnosis, which will then be translated into surveillance data.

After the surveillance data has been extracted, the data is converted into a 𝑛 × 𝑚 matrix format to study the development of covid-19 over the days. The model will also be compared to adapt to rapid disease outbreaks with little to no historical surveillance data, which is often the most alarming situation that humanity least expected.

Type of Data to be Stored

Qualitative Analysis Parameters

Overview of the Software

Program Setup and Display

Data Pre-Processing

The data in this study was collected from January to reflect ECDC's adjustments when moving from daily reporting to a weekly reporting schedule, which will stop all daily updates from December 14, 2020. The reporting channel achieves this through the number of cases and deaths reported worldwide and published every Thursday. A snapshot of ECDC's raw global data on covid-19 cases and deaths is presented below.

However, the models will read the data by matching the number of cases occurring during the day to their country as the models will study the data from column B onwards. A demonstration of the transformed data is detailed in the attached image below, where the data is now converted to a 𝑛 × 𝑚 matrix format.

Figure 3: Snapshot of transformed surveillance dataset in 𝑛 × 𝑚 matrix format

Explanatory Data Analysis of Preliminary Data and Findings

However, it can be observed that Brazil has higher mortality rates than India as it was the third leading number of infected cases but has the next highest number of deaths. The proportion of deaths for India is closely proportional to the number of deaths from Mexico. Based on the donut chart from Figure 6, it is clear that America leads in the number of infected cases because it is the leader in the number of infected patients and deaths by country.

On the other hand, Asia is the third highest in the number of infected patients ranked by continent. As seen in Figure 8, the number of infected cases for Covid-19 shows an overall exponential increase from April 2020.

Figure 6: Donut Chart of Number of Infected Cases by Continent

Construction of Epidemic Surveillance Models

In particular, the CUSUM chart shows the cumulative sums of the deviations of the sample values from a target value. In this study, the C2 algorithm will be selected to study the nature of the EARS algorithm while overcoming the limitation of the C1 algorithm for the absence of a 2-day guard band interval. This argument is supported by the previous discussion that the outbreak detection sensitivity of the C2 algorithm can be improved using a guard band because outbreaks spread over several days are not missed (Salmon, M. et al., 2015).

There are five critical aspects of the statistical model for this quasi-Poisson regression-based algorithm. Sometimes a divided view of the population is recorded, thus categorizing individuals into one of three states: (S) susceptible, (I) infectious or (R) removed.

Selection of Epidemic Surveillance Models

One of the potential drawbacks of CUSUM is that if measurement continues, CUSUM will lose sensitivity over time because it will take longer to respond to small changes in the process average as error accumulates. An adaptive reweighting scheme is one of the alternatives that is roughly equivalent to the scaled Anscombe residuals reweighting method. A unique feature of the newly fitted models using baseline data allows for better estimation of trend and variance.

These investigations also support the credibility of the negative binomial model but only in suitable conditions. In the review of the Spatio-Temperal Endemic-Epidemic model, the additional covariate in the epidemic component is not flexible in terms of modeling infectivity and susceptibility.

Table 2: Summary of Function and Limitations of Surveillance Models studied The EARS methods were initially designed for a drop-in surveillance system with little or no baseline data available (Fricker, 2010)

Implementation of Surveillance Data for Models

EARS C1 Model

Since the EARS method acts as a drop-in surveillance system for studying diseases with minimal historical data, it can be seen from Figure 13 that it outperformed its rivals during this pandemic in identifying subsequent waves of outbreaks and raising the alarm. The model continuously rescaled itself to zero to maintain sensitivity to outbreak development, and the statistical alert is only produced in week 𝑡 with observed count 𝑌(𝑡) if the C1 variation statistics exceed the threshold, which is the baseline count mean is added by a multiple of the standard deviation. Therefore, the EARS model would aid in quality control in assessing outbreaks of any time interval and identifying the next potential increase in infected cases.

Underreporting of cases can be tracked by comparing model predictions with the performance of the Farrington model (Figure 15) as an option. Since no historical data is provided before the pandemic, the model could only study this outbreak if an outbreak occurred, so the specificity is justified to be 1, as alarms were raised at the beginning of this research study.

Figure 13: Performance of the EARS C1 model

Farrington's (Quasi-Poisson) Model

Farrington's model fits the data well as ECDC updated their reports weekly and this model allows for weekly detection. The quasi-Poisson concept of the Farrington model is popular in current epidemic studies because of its simplicity and familiarity with the Poisson model. However, no feature was implemented for the Farrington model to overcome the pandemic by monitoring sudden outbreaks in the surveillance package.

Furthermore, the eruption period was not correctly identified by the Farrington model compared to the EARS C1 model. The EARS C1 model was able to locate and display monitoring results from 2020 to 2021, while the Farrington model demonstrated 2001, where the same data set was used throughout this research study.

Figure 15: Performance of the Farrington (Quasi-Poisson) model

Spatio-Temporal Models (Spatio-Temporal Endemic-Epidemic Model and Spatio-Temporal SIR Model)

In the early stages of the pandemic, 26 individuals per 1,000 individuals were infected, while remaining unreported cases are susceptible to outbreaks as the outbreak progresses. This transition is interpreted in the model that most of the studied individuals are eventually removed from the study. Eventually, after the infection period reaches a plateau around the first quarter of 2020, this state lasts for about four months until the number of infected individuals drops exponentially again, as they are now "removed".

Using the spatiotemporal SIR model, we can understand the ability of a disease to infect individuals and develop an outbreak in just two months. In the early stages of an outbreak, there are initially large numbers of individuals susceptible to the disease.

Figure 17: Study of the syndromic outbreaks in the SIR states using the Spatio- Spatio-Temporal model

Challenges of the Proposed Model

The raw data has shown that there are negative values of the number of cases reported between the next day, and there are cases where a large number of patients were included in the report again. The geospatial visualization of the spread of Covid-19 disease could not be privately applied to this study, as no readily available population density data were available. The concept of the epidemic surveillance models discussed in the literature review has been categorically adapted to recent studies based on the pandemic.

Based on this, the Farrington model can be incorporated into the EARS model to handle highly scattered and inconsistent data. It was efficient in providing a more precise detail of disease monitoring as it took into account the overdistribution of the rapidly changing number of infected cases. This research study could be improved better if there were real-time data updates easily and openly.

Creating a geospatial analysis of the spread of the covid-19 disease by simulating the epidemic in Yerevan as referenced from the