• Tidak ada hasil yang ditemukan

Materials and Methods

Taylor & Francis

17.2 Materials and Methods

In this chapter, a combined model of wavelet and neural network was used to predict COVID-19 cases based on data reported by WHO, due to its ability to complex time series prediction [3].

Also, its hot/cold spots were identified by specified important periods and features. The theoretical foundations of the methods used in this research were described in this section.

17.2.1 Wavelet transforms

Wavelet transform is a mathematical approach to the decomposition of data into a variety of frequency components. It can extract special patterns hidden in a huge amount of data. The wavelet transform can be used to analyze non-stationary time series data at several different frequencies.

Wavelet transforms are generally divided into the Continuous Wavelet Transformation (CWT) and Discrete Wavelets Transformation (DWT) [15]. The DWT and CWT are the wavelet transform

Materials and Methods 213 implementation using discrete and arbitrary sets of the wavelet scales, respectively. They can be used to decompose wavelets, process signals, extract features, and denoise noisy signals. There are many types of mother wavelets that can be used for wavelet transforms. The different mother wavelets that are used to examine the same signal will yield varying results. Therefore, different mother wavelet types of CWT and DWT were examined to select best of them for capturing the multiscale features of signals [16,17].

17.2.1.1 CWT

The CWT is an important method for assessing non-stationary signals and providing a number of signal information, such as time, frequency, scale, and local signal correlation. The CWT is used to decompose a signal into small wavelets that are highly localized in time [18]. It generally used short-time Fourier transform for decomposes a signal into unlimited length sines and cosines, basically eliminating all time-localization information and replacing it with time-frequency signal representation that provides very strong time and frequency localization. The wavelet function is defined as Eq. (17.1) The basic functions of the CWT are scaled and shifted versions of a function called the time-localized mother wavelet Ψ(t). Eq. (17.2) express wavelet transform that is the convolution of time series data and wavelets [18,19].

Z

−∞

Ψ(t)dt = 0. (17.1)

F(a, b) = 1

√a Z

−∞

f(t)Ψ t−b a

dt. (17.2)

Where a and b are the scale and shift parameters,tis the time,f(t) is the data, Ψ(t) is the mother wavelet, and F(a, b) is the time-scale representation of the signal. Several mother wavelets are available for the CWT, including Mexican hat wavelet, analytic Morlet wavelet, generalized Morse wavelet, Bump wavelet, and so on. The Morlet is a wavelet composed mainly of an exponential function multiplied by a Gaussian window. This wavelet is closely related to human perception and demonstrates good performance in analyzing the periodicity of local signals. Generalized Morse wavelets are a family of analytical wavelets with two parameters, symmetry and time-bandwidth of the product. Bump wavelet has a larger variance in time and smaller variance in frequency. Each of these wavelets has different parameters for studying different behaviors and properties [18,20].

17.2.1.2 DWT

The DWT is a powerful time series analysis tool used to break down the original time series into different components, each of which can produce meaningful information from the original data. The DWT can be decomposed the signal into low and high frequencies. The low and high frequencies are also called as approximation and detail coefficients. Commonly the approximation is decomposed to a higher level after the first level. The DWT is used to reduce time series data in order to save storage space while losing a small amount of detailed information. The last coefficient of approximation and a few of the high-level detail coefficients are typically chosen for preservation that are the only coefficients needed for perfect reconstruction. Hence, The DWT is known as a lossless transformation, whereby transformed domain data can collectively rebuild the original data [16,21].

The DWT has several mother wavelet families, such as Daubechies, Coiflets, Symlets, Fej´er-Korovkin, discrete Meyer, Biorthogonal, and reverse Biorthogonal, which each of them has different orders. The Daubechies family is an orthonormal wavelet, which makes the analysis of wavelets possible in discrete time. The first order Daubechies (db1) wavelet resembles a simple step function and the higher-order Daubechies functions (db2, db3, db4, etc.) are not easy to define with an analytical expression. The order of this function shows the number of vanishing moments or the number of zero wavelet moments. The Coiflet (coif) wavelet family is more symmetrical and has more vanishing moments than the Daubechies wavelets. Symlets’ (sym) properties are similar to Daubechies, that are near-symmetric and have the least asymmetry. Fej´er-Korovkin (fk) wavelet family minimizes the gap between the ideal since lowpass filter and the valid scaling filter. The

214 COVID-19 Spatiotemporal Hotspots and Prediction discrete Meyer (dmey) wavelet family is defined in the frequency domain. The Biorthogonal (bio) and reverse Biorthogonal (rbio) families use separate wavelet and scaling functions for the analysis, synthesis, and vice versa [22].

17.2.2 Neural networks

ANNs are human brain-inspired methods that consist of a large number of simple and highly interconnected computing elements and use them as a huge data processing system. ANN-based methods are very useful for prediction problems. Three main parts of the ANN are the input, hidden, and output layers [23]. Different models of the ANNs were introduced to solve specific problems. In this research Multi Layer Perceptron (MLP) was used as a class of feedforward neural networks which is an efficient and popular algorithm. The MLP is a type of supervised learning algorithms that uses backpropagation to train. The MLP contains a number of layers, neurons, weights, and transfer functions. Transfer functions are used to aggregate the input neurons with different weights to the output where the neuron is a link between the layers. First, the input neurons are multiplied by their respective weights, then the output neuron is summed up and determined via the transfer function. Tansig is one of the widely used transfer functions that has been selected for this research. This function makes MLP networks so powerful, because of the ability to represent nonlinear functions [23,24].

Finally, normalized Root Means Square Error by the Inter-Quartile Ranges (RMSEIQR) was used to evaluate the accuracy of the prediction results of the proposed model because it is suitable for comparing the different values obtained for various countries with a different population and confirmed cases. The RMSEIQR is an interquartile range of the RMSE that normalizes values and is less sensitive to extreme values (outliers) than the RMSE. The RMSEIQR is defined as Eq. (17.3) where ˆytandytare predicted and observed values over T time and IQR is the difference of quartile functions. In other words, the RMSEIQR calculates by dividing the RMSE into the IQR [25].

RM SEIQR= q

PT

t=1(ˆyt−yt)2 IQR√

T . (17.3)

17.2.3 Hot/Cold spot analysis

Spatiotemporal analysis was used to identified hot/cold spots by specified important periods and features in previous steps. STHCSA is an important component of spatiotemporal analysis since location and time are two critical aspects of important events such as the Coronavirus epidemic. The outputs of such analyses can provide useful information to guide the activities aimed at preventing, detecting, and responding to pandemic problems [26]. There are various methods for spatiotemporal analysis. In this study, Mann-Kendall test was used to detect trends of data and Getis-Ord Gi statistic was used to identify hot/cold spots.

Network Common Data Form (NetCDF) is a file format to store multi-dimensional scientific data such as temperature, humidity, disease, and crime. The NetCDF cube is generated using the COVID-19 x, y, and time data as x, y, and z axes. It summarizes a collection of points into a NetCDF by aggregating them into space-time bins. The Mann-Kendall p-values and z-scores show the statistical significance of the trend in a hot spot (spatial clusters of high values) or cold spot (spatial clusters of low values) at a location. A positive or negative z-score indicates an upward or downward trend respectively [9,27]. Then, the pattern in the spatiotemporal data was identified with Getis-Ord Gi statistic based on neighborhood distance and neighborhood time step. The Getis-Ord Gistatistic is calculated for each bin as follows [28,29]:

¯ x=

Pn j=1xj

n . (17.4)

s= sPn

j=1x2j

n −x¯2. (17.5)

Results of Proposed Model 215 Gi =

Pn

j=1wijxj−x¯Pn j=1wij

s r

nPn

j=1wij2−(Pn j=1wij)2 n−1

. (17.6)

Wherexj is the value of featurexat locationj,n is the number of data andwij is the element of the weight matrix. The Gi is recorded as a z-score for each variable in the dataset. The more intense the clustering of high values or hot spots are, the larger positive z-scores become and the more intense the clustering of low values or cold spots are, the smaller negative z-scores get. Based on p-values and z-scores of this statistic, 17 pattern types include: no pattern detected, new hot spot, consecutive hot spot, intensifying hot spot, persistent hot spot, diminishing hot spot, sporadic hot spot, oscillating hot spot, historical hot spot, new cold spot, consecutive cold spot, intensifying cold spot, persistent cold spot, diminishing Cold Spot, sporadic cold spot, oscillating cold spot and historical cold spot are extracted [9].