Jln. Khatib Sulaiman Dalam No. 1, Padang, Indonesia
Website: ijcs.stmikindonesia.ac.id | E-mail: [email protected]
Sentiment Analysis of Tweets About Allowing Outdoor Mask Wear Using Naïve Bayes and TextBlob
Ilham Firman Ashari, Fadhillah A., M. Daffa, Sekar Ali [email protected]
Institut Teknologi Sumatera
Article Information Abstract Submitted : 14 Jun 2023
Reviewed : 21 Jun 2023 Accepted : 29 Jun 2023
Covid-19, a virus that attacks the respiratory tract and has a fairly high mortality rate, has spread throughout the country. On March 11, 2020, WHO declared Covid-19 a global pandemic. The government is trying various efforts to reduce the number of sufferers of this virus. Starting from the implementation of the lockdown, PPKM, to making Government Regulations related to the use of masks and so on for personal protection. In June 2021, there was a spike in Covid-19 cases in Indonesia and Covid-19 patients increased drastically. Conditions at that time were very chaotic, and left trauma for some people. On May 17, 2022, the government made concessions in the use of masks in open spaces while maintaining social distance. Even though masks play an important role in preventing the spread of the virus. With this, a research related to "Analysis of Sentiment on Tweets regarding Allowance for the Use of Masks in Outdoors using Naive Bayes was carried out" to find out public opinion. The research was conducted using Text Mining through Twitter sentiment and Naive Bayes for classification.
Based on research, the majority of twitter users give a neutral response. This is indicated by the number of neutral sentiments of 75.76% or about 757 tweets. The data used in this study, namely 1000 Indonesian tweets with the keyword 'jokowi mask'. Testing data of 20% resulted in a more accurate model, which resulted in an accuracy of about 85%, while the model using testing data of 30% only produced an accuracy of about 83%.
Keywords
Covid-19, Naïve Bayes, Sentiment Analysis, Text Mining
A. Introduction
Covid-19 is a virus that has spread throughout the country. On March 11, 2020, WHO (World Health Organization) declared Covid-19 a global pandemic.
Media spread through the air. This virus attacks the respiratory tract and has a fairly high mortality rate. For heavyweight patients, patients can experience difficulty breathing so they need breathing aids, such as ventilators. This is caused by the airways being blocked by lots of mucus [1].
Since Covid-19 entered Indonesia, the government has tried various efforts to reduce the number of sufferers of this virus [2]. Starting from implementing a lockdown in an effort to break the chain of transmission, gradual PPKM to reduce social intensity, to the basics such as the mandatory use of masks stated in Government Regulation number 21 of 2020. With this, the Covid-19 case had subsided.
Even so, Covid-19 did not really disappear and instead mutated until other variants emerged. The development of this case is always up and down. Because it has been protracted and people need to carry out activities to meet their daily needs, people are getting used to implementing health protocols when there are needs outside the home, such as wearing masks and maintaining a safe distance. In addition, researchers around the world are trying to develop a vaccine [3].
In June 2021, there was a spike in Covid-19 cases in Indonesia where Covid- 19 patients increased drastically. Several hospitals were forced to close the emergency room and add more ICU rooms. Due to the lack of places to treat patients, treatment was carried out in emergency tents. In fact, the athlete's guesthouse was once used as a room for Covid-19 patients. At that time, there were several medical personnel who were infected with this virus because they were tired of treating patients due to decreased immunity so they were susceptible to infection. The imbalance between the number of medical personnel and patients causes the treatment provided to be not optimal. In addition, the hospital does not only provide services for Covid patients. Non-Covid patients are worried about being exposed to the virus when they come to the hospital, but if they are ignored they will get worse. Conditions at that time were really chaotic, many died, both medical personnel, non-Covid patients, and Covid patients. This leaves quite a trauma for those left behind [4].
Delivered directly by President Joko Widodo on May 17 2022, the government provides leeway in the use of masks in open spaces while maintaining social distancing. Even though masks play an important role in inhibiting the spread of the virus because the external respiratory system (mouth and nose) is blocked from free air as a transmission medium. This raises pros and cons and sentiment in society, where there are some people who still have trauma from the Covid-19 virus) [5].
Related research on sentiment in society related to Covid-19 was carried out by [6], his research used crowdtangle on 74 data uploads, the results of his research were that face-to-face learning was more popular and liked by users of Instagram social media.
Other research from [7], the research data used is 34 data with the classification method using naïve Bayes, it is found that the percentage accuracy
regarding the Covid-19 vaccination, in his research using more than 4708 data which was divided into 80% training data and 20% testing data, an accuracy of 73.6% was obtained using the Support Vector Machine (SVM) algorithm.
Data is an important aspect in information processing. With this, research was conducted regarding "Sentiment Analysis of Tweets regarding the Allowance for Wearing Masks in Open Spaces using Naive Bayes" to find out more about public sentiment regarding the government's new policy.
B. Research Method
The method used in solving the problem in this research is by doing Text Mining through Twitter sentiment and Naive Bayes to classify in text mining. The following are the stages carried out in the research [9].
Figure 1. Research Stages 2.1. Crawling Data
In this study data collection was carried out by crawling process. Crawling data in this study uses data from the latest 1000 (one thousand) tweets from Twitter with the keyword "Jokowi mask" by filtering retweets and only using tweets in Indonesian. The stages of crawling data can be seen in Figures 2 to 6.
Figure 2. Key to make a connection with the Twitter API
Figure 3. Import the tweepy module to help access the Twitter API
Figure 4. Import library needed
Figure 5. Retrieve tweets data with the keyword "jokowi mask"
Figure 6. Saving data in the form of a csv file An example of a tweet taken can be seen in Figure 7 below:
Figure 7. Some of the results of the tweets taken 2.2. Preprocessing
Preprocessing is the stage of preparing the data that has been obtained to facilitate data processing at a later stage [10]. The preprocessing stage in this study includes tokenizing and cleaning.
2.2.1. Tokenizing
The program code for tokenizing can be seen in Figure 8 below.
Figure 8. Program code for tokenizing
At this stage, all tweets will be converted to lowercase, then special characters will be removed using regex . Special characters include emoji, spaces, and symbols like '@', '#', '$', '*', etc. After that, the existing tweets will be separated into words - per word to simplify the cleaning process.
2.2.2. Cleaning
The program code in the cleaning phase can be seen in Figures 9, 10 and 11.
Figure 9. Program code for cleaning (1)
Figure 10. Program code for cleaning (2)
Figure 11. Program code for cleaning (3)
Data that has previously gone through the tokenizing stage will be cleaned of unwanted words [11]. The unwanted words in this case are stop words. Stopwords include words like “and”, “to”, “there”, etc. After the stop words have been successfully removed, the data that was separated per word during the tokenizing process will be reconnected into one complete sentence. The results of cleaning can be seen in Figure 12 below:
Figure 12. Some of the results of the cleaning process 2.3. Feature Extraction
Feature Extraction is a change in the form of a text, so that it becomes more structured to facilitate the classification process [12]. In this research, the application of feature extraction is done by word count, count vectorizer, and visualization.
2.3.1. Word count
The program code for word count can be seen in Figure 13.
Figure 13. Program code to perform word count
At this stage, the data that has gone through the cleaning process will be separated again into words - per word so that the words that appear most often can be calculated. The word count obtained is as follows:
Figure 14. Results of the word count process
2.3.2. Count vectorizer
The program code for the count vectorizer can be seen in Figures 15 and 16.
Figure 15. The program code for the count vectorizer using cleaning results
Figure 16. Program code to do the count vectorizer
At this stage, data that has gone through the cleaning process will be changed from being in the form of text to numeric data.
2.3.3. Visualization
The program code for visualization using bar graph plots can be seen in Figure 17 Figure 17. Program code for visualization using data counting
The program code using wordcloud can be seen in Figure 18.
Figure 18. Program code for word cloud visualization
The previously obtained word count will be visualized in the form of a word cloud and also a bar chart which can be seen in Figure 19 and Figure 20.
Figure 19. Word cloud visualization results
Figure 20. Bar chart visualization results
It can be seen from the picture above, the word that is most often used is the word
"mask" and also the word "jokowi".
2.4. Sentiment Analysis Using Textblob
The complete program code for sentiment analysis can be seen in Figure 21.
Figure 21. Program code for sentiment analysis (1)
Sentiment analysis in this study utilizes the textBlob library where textBlob will determine the polarity value of the input text. However, textBlob only supports English. Therefore, the translation process into English is carried out first. The translation process here utilizes the deep_translator library. After the translation process is complete, the results will be input to the textBlob polarity function to determine the polarity value. The program code for sentiment analysis
using textblob can be seen in Figures 22 and 23. Where polarity > 0 is positive, polarity more < 0 is negative, and polarity = 0 is neutral.
Figure 22. Program code for sentiment analysis (2)
Figure 23. Program code for sentiment analysis (3) saving into a data frame After the polarity value is obtained, the value will be stored in the dataframe to be used as training data.
2.5. Naive Bayes Classifier
Text classification in this study was carried out using the Naive Bayes Algorithm with sub-processes including training data and naive Bayes classifier.
2.5.1. Training data
At this stage, two experiments were carried out. In the first experiment, the testing data used was 20% of the total data. The program code for training data can be seen in Figures 25 and 26.
Figure 24. Program code for training data (1)
As for the second experiment, the testing data used was 30% of the total data.
Figure 25. Program code for training data (2)
2.5.2. Naive bayes classifier
The training process in this study uses the help of the MultinomialNB function from the sklearn naive Bayes library where the arguments or input used are training data that has been created in the previous process. After that, testing data is also inputted to determine the accuracy of the model made. The program code can be seen in Figure 26.
Figure 26. Program code for classification using Naive Bayes
C. Result and Discussion
The following are the results and analysis based on research that has been conducted regarding "Sentiment Analysis of Tweets regarding the Allowance for Wearing Masks in Open Spaces using Naive Bayes."
2.6. Sentiment Analysis
After getting the polarity value, the sentiment from the tweet will be classified into three classes, namely positive, negative, and neutral. Data with a polarity value of more than zero will receive a positive label, data with a polarity value of less than zero will receive a negative label, and data with a zero value will receive a neutral label. The results of the sentiment analysis can be seen in the image below:
Figure 27. Results of sentiment analysis Or can be seen in the pie chart below:
Figure 28. The results of the sentiment analysis in the form of a pie chart It can be seen that 75.56% of the tweets have a neutral sentiment value, 14.41% of the tweets have a positive sentiment value, and 9.83% of the tweets have a negative sentiment value.
2.7. Naive Bayes Classifier 2.7.1. The First Experiment
For the first experiment, where the testing data used was 20%, the resulting accuracy was around 85% as can be seen in Figure 29 below.
Figure 29. The accuracy value of the naive Bayes classifier experiment 1 The confusion matrix and values for precision, recall, and f1 - score can be seen in Figures 30 and 31 below
Figure 30. Confusion Matrix from naive Bayes classifier experiment 1
Figure 31. Confusion Matrix from naive Bayes classifier experiment 1 2.7.2. The Second Experiment
For the second experiment, where the testing data used was 30%, the resulting accuracy was around 83% as can be seen in Figure 32 below.
The confusion matrix and values for precision, recall, and f1 - score can be seen in Figures 33 and 34 below.
Figure 33. Confusion Matrix from naive Bayes classifier experiment 2
Figure 34. Confusion Matrix from naive Bayes classifier experiment 2
D. Conclusion
After conducting research on “Sentiment Analysis of Tweets regarding Allowance for Wearing Masks in Open Spaces using Naive Bayes.”, it can be concluded that: Most people on Twitter do not give their personal opinion regarding the policy of liberalizing the use of masks. This is indicated by the number of neutral sentiments which dominates where out of 1000 tweets there are 75.76% tweets or around 757 tweets that have neutral sentiment values. For the data used in this study, namely 1000 tweets in Indonesian with the keyword 'jokowi mask', testing data of 20% produces a more accurate model. This can be seen where the accuracy of the model with 20% testing data produces an accuracy of around 85% while the model using 30% testing data only produces an accuracy of around 83%.
E. References
[1] S. Y. Nursyi’ah, A. Erfina, and C. Warman, “Analisis Sentimen Pembelajaran Daring Pada Masa Pandemi Covid-19 Di Twitter Menggunakan Algoritma Naïve Bayes,” J. Media Inform. Budidarma, pp. 117–123, 2021.
[2] M. D. Alizah, A. Nugroho, U. Radiyah, and W. Gata, “Sentimen Analisis Terkait
‘Lockdown’ pada Sosial Media Twitter,” CSRID (Computer Sci. Res. Its Dev.
Journal), vol. 12, no. 3, p. 143, 2021, doi: 10.22303/csrid.12.3.2020.143-149.
[3] A. Harun and D. P. Ananda, “Analysis of Public Opinion Sentiment About Covid-19 Vaccination in Indonesia Using Naïve Bayes and Decission Tree Analisa Sentimen Opini Publik Tentang Vaksinasi Covid-19 di Indonesia Menggunakan Naïve Bayes dan Decission Tree,” Indones. J. Mach. Learn.
Comput. Sci., vol. 1, no. April, pp. 58–63, 2021.
[4] N. P. G. Naraswati, R. Nooraeni, D. C. Rosmilda, D. Desinta, F. Khairi, and R.
Damaiyanti, “Analisis Sentimen Publik dari Twitter Tentang Kebijakan Penanganan Covid-19 di Indonesia dengan Naive Bayes Classification,”
Sistemasi, vol. 10, no. 1, p. 222, 2021, doi: 10.32520/stmsi.v10i1.1179.
[5] Nasrullah and L. Sulaiman, “Analisis Pengaruh COVID-19 Terhadap Kesehatan Mental Masyarakat di Indonesia,” Media Kesehat. Masy. Indones., vol. 20, no. 3, pp. 206–211, 2021.
[6] A. S. Afif and A. R. Pratama, “Analisis Sentimen Kebijakan Pendidikan di Masa Pandemi COVID-19 dengan CrowdTangle di Instagram,” Automata, 2021,
[Online]. Available:
https://journal.uii.ac.id/AUTOMATA/article/view/19429.
[7] F. Fathonah and A. Herliana, “Penerapan Text Mining Analisis Sentimen Mengenai Vaksin Covid - 19 Menggunakan Metode Naïve Bayes,” J. Sains dan Inform., vol. 7, no. 2, pp. 155–164, 2021, doi: 10.34128/jsi.v7i2.331.
[8] Herwinsyah and A. Witanti, “Analisis Sentimen Masyarakat Terhadap Vaksinasi Covid-19 Pada Media Sosial Twitter Menggunakan Algoritma Support Vector Machine (Svm),” J. Sist. Inf. dan Inform., vol. 5, no. 1, pp. 59–
67, 2022, doi: 10.47080/simika.v5i1.1411.
[9] M. M. Nurrochman and A. L. Prasasti, “Implementasi Machine Learning Untuk Mendeteksi Unsur Depresi Pada Tweet Menggunakan Metode Naïve Bayes ( Machine Learning Implementation for Depression Detection in Tweet Using Naïve Bayes Method ),” e-Proceeding Eng., vol. 8, no. 5, pp.
6250–6257, 2021.
[10] I. F. Ashari, “Analysis Sentiments In Facebook Down Case Using Vader And Naive Bayes Classification Method,” Multitek Indones. J. Ilm., vol. 16, no. 2, pp.
75–89, 2022.
[11] R. P. Sidiq, B. A. Dermawan, and Y. Umaidah, “Sentimen Analisis Komentar Toxic pada Grup Facebook Game Online Menggunakan Klasifikasi Naïve Bayes,” J. Inform. Univ. Pamulang, vol. 5, no. 3, p. 356, 2020, doi:
10.32493/informatika.v5i3.6571.
[12] M. C. Untoro, M. Praseptiawan, I. F. Ashari, and A. Afriansyah, “Evaluation of Decision Tree, K-NN, Naive Bayes and SVM with MWMOTE on UCI Dataset,” J.
Phys. Conf. Ser., vol. 1477, no. 3, 2020, doi: 10.1088/1742- 6596/1477/3/032005.