• Tidak ada hasil yang ditemukan

Analysis of Covid-19 data and predicting future coronavirus cases by using Machine learning

N/A
N/A
Protected

Academic year: 2023

Membagikan "Analysis of Covid-19 data and predicting future coronavirus cases by using Machine learning"

Copied!
52
0
0

Teks penuh

Effective strategies to manage the pandemic require accurate and timely prediction of the spread of the virus. Objectives: This thesis work aims to analyze the Coronavirus data and the number of cases and predict the future behavior of Covid-19 in Kazakhstan which helps to make key decisions related to the virus and prevent the country from the global economic crisis. The dataset included daily counts of confirmed Covid-19 cases, deaths, recoveries and tests in various countries and regions worldwide.

This work used four ML algorithms in our study, including decision tree, random forest, linear regression (LR), and polynomial regression. Results: The results showed that all four ML algorithms produced relatively accurate predictions of Covid-19 cases. Conclusion: In conclusion, this study demonstrates the potential of ML algorithms to predict the number of Covid-19 cases.

The findings show that the random forest algorithm is the most effective in predicting Covid-19 cases. The results of this study can help inform policymakers and health professionals in developing effective strategies to manage the Covid-19 pandemic. Askar Boranbayev for supervising my thesis and guiding me during the thesis writing process and providing useful feedback.

Siamac Fazli who helped to write my thesis correctly and provided criteria and tips for writing my thesis correctly.

Background and context

Aim

Research Questions

Research approach and methodology

Scope and limitations

The other limitation that may occur is data quality, which is the completeness and accuracy of data that will be used to test and train the model.

Outline

During the writing of the thesis, there may be limitations in the provision of a data set due to geographical and privacy reasons. For this research, authors collected a large dataset of X-ray images and the system identified them by dividing them into Covid-19 positive case and Normal. They presented a comprehensive survey on the use of IoT and machine learning in the context of Covid-19.

They highlighted the promise of these tools in improving the accuracy and speed of COVID-19 diagnosis, predicting disease progression, and monitoring patients remotely. Evaluation of the two types of regression models used the R-squared score and error values. This article not only predicts whether a person has Covid-19 or not, but also suggests strategies to combat Covid such as social distancing and mask detection.

The authors used the dataset obtained from John Hopkins University's publicly available dataset. There were other deep learning models like multi-layer perceptrons (MLP), Convolution Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). In this thesis, 2 types of research methodologies have been used: Systematic literature review and Experiment.

To answer research question 1, a systematic literature review was made, through which we find out what is the best Machine learning algorithm or method for making predictions of Covid-19 cases. The second part of the thesis work carried out an experiment that predicted future cases of Covid-19 and was carried out accurately. Keyword Determination: The keywords most likely to be identified are Machine Learning Algorithms, Covid-19, Prediction, Comparison and Forecasting.

Quality assessment: Evaluating papers using a predetermined set of factors, such as study design, sample size and statistical techniques, the quality of the studies. Synthesizing the results: Combine the results of the selected studies using a variety of techniques, including narrative synthesis and meta-analysis. Results: The results of the systematic literature review are presented in a clear, structured manner in accordance with recognized reporting standards.

Experiment

  • Software Toolset
  • Dataset
  • Data Preprocessing and Analysis
  • Model Selection and Implementation
  • Performance metrics

Most of the time pandas is helping with data cleaning, database merging and linking, and data wrangling. It is really useful for data scientists and machine learning engineers who need to perform prediction tasks with large datasets. One of the most popular data sets is obtained from the WHO website and the John Hopkins University website.

Covid-19 cases, distributed by regions of Kazakhstan, were also taken from this site - https://data.humdata.org/dataset/kazakhstan-coronavirus-covid-19-subnational-cases. Then the columns 'Confirmed_pastday', 'Confirmed_2daysago' and 'Confirmed_3daysago' are created, which include the previous values ​​of 'Daily_confirmed'. It measures the average size of the errors in a set of forecasts, without considering their direction.

There we need to find out which method or model is the best to predict coronavirus cases. Prediction of Covid-19 by reference to three supervised ML algorithms: A comparative study using WEKA [9]. The purpose of this paper was to determine whether a person is infected with Covid-19 or not and performed classification algorithms such as Decision Tree, Random Forest, Support Vector Machine, Naive Bayes and Logistic Regression.

Analysis and implementation of a novel AI-based hybrid model for the detection, prediction and identification of the spread of COVID-19 [12]. The authors demonstrate the prediction of Covid-19 cases using regression models such as linear and polynomial regression. COVID-19 time series forecasts of daily cases, deaths and recovered cases using long-term memory networks [6].

Research to predict the upcoming cases of Covid-19 based on the current situation using a polynomial based Deep Learning model (Linear Regression). Diabetes is one of the most common diseases and at some point it cannot be cured. The paper predicts the number of Covid-19 cases with high accuracy using SVR and PR models.

In this research, the price of used cars was predicted using artificial neural networks and machine learning. Heart disease is one of the most dangerous diseases for humans and it is really important to detect it in the early stages.

Figure 3-1: Jupyter notebook
Figure 3-1: Jupyter notebook

Results of Experiment

Data Visualization

For predicting the Covid-19 data, primarily imported libraries for visualization, data manipulation, performance evaluation and LR, PR, DT and RF algorithms. Research Question 3 was: “What problems may arise in predicting the Covid-19 situation?” and here is the answer to this question. During the writing of this thesis, there were many problems regarding the prediction of Covid-19.

The COVID-19 pandemic has been a major challenge for countries around the world, with many struggling to control the spread of the virus and minimize its impact on public health and the economy. This thesis analyzed COVID-19 data and developed machine learning algorithms to predict future cases of the coronavirus. The thesis work was carried out Systematic literature review to find the most suitable Machine Learning algorithm for the prediction of Covid-19 cases.

The results suggest that machine learning algorithms can be a valuable tool in predicting cases of COVID-19 and can help policymakers make informed decisions about public health interventions and resource allocation. Overall, this thesis demonstrates the potential of machine learning algorithms in the analysis and prediction of COVID-19 cases and provides valuable insights for future research and public health efforts. For future work, several areas could be explored to build on the findings of this thesis and further improve the accuracy of COVID-19 predictions using machine learning algorithms.

Finally, it may be useful to include data from other countries and regions to develop a more comprehensive understanding of COVID-19 transmission patterns and improve the generalizability of machine learning models. Roja Edinburgh, "Analysis and Prediction of COVID-19 Using Regression Models and Time Series Forecasting" th International Conference on Cloud Computing, Data Science. U, "Analysis And Implementation of a Novel-AI-Based Hybrid Model for Detecting, Predicting and Identification of the Spread of COVID-19 rd International Conference on Advances in Computing, Communication Control and Networking (ICAC3N), Greater Noida, India, 2021, p.

Iyer, "Artificial Intelligence Supported Covid-19 Disease Forecasting Methods and Anti-Covid Strategies. IEEE International Conference on Distributed. Trurupthi, "COVID-19 Time Series Forecasting of Daily Cases, Deaths Caused and Recovered Cases using Long Short Term Memory Networks IEEE 5th International Conference on Computer Communication and Automation (ICCCA), Greater Noida, India, 2020, p. Spread rd International Conference on Advances in Computing, Communication Control and Networking (ICAC3N), Greater Noida, India, 2021, p.

Gupta, "Polynomial-based linear regression model to predict cases of COVID-19 International Conference on Recent Trends in Electronics, Information, Communication Technology (RTEICT), Bangalore, India, 2021, pp. Gupta, "Global Forecast of COVID-19 Cases and Deaths Using Machine Learning Sixth International Conference on Image Information Processing (ICIIP), Shimla, India, 2021, p.

Figure 4-2: Confirmed cases of COVID-19 in Kazakhstan
Figure 4-2: Confirmed cases of COVID-19 in Kazakhstan

Features description of dataset 1

Features description of dataset 2,3,4

Accuracy of Linear Regression

Accuracy of Polynomial Regression

Accuracy of Random Forest

Accuracy of Decision Tree

R2 score comparison of ML Algorithms

Mean squared error of ML Algorithms

Mean absolute error of ML Algorithms

Gambar

Figure 3-1: Jupyter notebook
Table 3.1: Features description of dataset 1
Figure 3-2: Time series Covid-19 global dataset
Table 3.2: Features description of dataset 2,3,4
+7

Referensi

Dokumen terkait

Research Stages 2.2 Classification and Models At this stage the algorithms that will be used are Logistic Regression, Gaussian, Decision Tree DT, Random Forest, KNN, Support Vector