View of Exploring YouTube Comments to Understand Public Sentiment on COVID-19 Vaccines through Deep Learning-based Sentiment Analysis

(1)

13 ORIGINAL ARTICLE

Exploring YouTube Comments to Understand Public Sentiment on COVID-19 Vaccines through Deep Learning-based Sentiment Analysis

Mohd Suffian Sulaiman^1*, Farizul Azlan Maskan¹, Zuraidah Derasit², Noor Hasimah Ibrahim Teo³

1School of Computing Sciences, College of Computing, Informatics & Mathematics, Universiti Teknologi MARA, 40450 Shah Alam, Selangor, Malaysia

2School of Mathematical Sciences, College of Computing, Informatics & Mathematics, Universiti Teknologi MARA, 40450 Shah Alam, Selangor, Malaysia

3School of Computing Sciences, College of Computing, Informatics & Mathematics, Universiti Teknologi MARA, Melaka Branch, Jasin Campus, 77300 Merlimau, Melaka, Malaysia

*Corresponding author: [email protected]

Received: 01/04/2023, Accepted: 15/08/2023, Available Online: 31/10/2023

Abstract

COVID-19 was first found in China in 2019. Since then, it has quickly spread around the world, which has led to a lot of news stories and social media posts about the pandemic. YouTube, a popular video-sharing website, has become a valuable source of information on COVID-19 and other topics. However, it can be difficult to extract useful insights from the vast array of user comments that accompany these videos. One potential method for understanding public sentiment is to use sentiment analysis, which involves classifying text as positive, negative, or neutral. In this study, the dataset of over 44,000 YouTube comments related to COVID-19 vaccines was used, which was filtered to a total of 16,073 comments for analysis. The data was cleaned and organised using NeatText and then processed using GloVe word embedding, a technique for establishing statistical relationships between words. Based on the experiment, the performances of three different types of deep learning techniques: recurrent neural networks (RNN), gated recurrent units (GRU) and long short-term memory (LSTM) are compared in accurately classifying the sentiment of the comments.

The study found that the GRU had the highest accuracy of 80.19%, followed by the LSTM with 79.00%

accuracy, and the RNN with 67.15% accuracy. These findings highlight the effectiveness of cutting-edge deep learning techniques for capturing the nuanced sentiment expressed in YouTube comments about the COVID-19 vaccination. The comparative examination of the models highlights how good the GRU and LSTM approaches are at identifying subtle differences in sentiment.

Keywords: Sentiment Analysis; COVID-19; Vaccine; YouTube; Deep learning https://journal.unisza.edu.my/myjas

(2)

14

Introduction

The current digital era, the internet has become an indispensable component of human life. Young people and adults use the internet to carry out a variety of activities effectively and efficiently, which necessitates minimizing direct contact with other people, thereby making the existence of the internet even more beneficial for supporting a variety of activities, including work, education, and communication with the closest people without the need to meet in person. There are now 5.16 billion Internet users globally. Its number has increased by 1.9% over the prior year, reaching approximately 98 million. This constitutes around 64.4% of the world's population, or 8.01 billion individuals (Simon Kemp, 2023).

The YouTube platform is one example of internet usage (Tiago Bianchi, 2023). If we have a YouTube account, we can upload as many videos as we like. The uploaded videos are accessible to the entire world. There is a comment feature, which enables users to post online opinions or remarks about a video subject. Nonetheless, the demand for online video is unknown due to variations in audience psychology, educational level, and theme choices. Due to the varying viewpoints of YouTube users, this ambiguity frequently appears in video comments. Also, video comments leave a digital communication trail, which other users can read as a reference for video material. As a result, the comments left on a YouTube video can tell you about its reputation. The COVID-19 pandemic has brought unprecedented global challenges, leading to a race for effective vaccines. However, the rapid development of vaccines has sparked discussions, debates, and concerns among the public. Social media platforms, such as YouTube, have become popular channels for people to express their opinions and emotions on COVID-19 vaccines. Sentiment analysis is a powerful tool to extract insights from large volumes of unstructured text data and identify the sentiment of the authors (Hassan & Islam, 2021).

While there has been growing research interest in sentiment analysis of social media data related to COVID-19 (Alhujaili & Yafooz, 2021; Alkaff et al., 2020; Alzazah et al., 2022; Aufar et al., 2020; Kapali et al., 2022; Putri et al., 2021; Rama Prasad Kollu & Garapati, 2022) there is still a research gap in exploring the public sentiment towards COVID-19 vaccines specifically on YouTube. YouTube is one of the most popular social media platforms where users can express their opinions and emotions through comments, and it has been recognized as an important source of information during the pandemic. However, a lack of attention led to an investigation of the sentiment of YouTube comments towards COVID-19 vaccines using deep learning-based sentiment analysis methods. Therefore, this research aims to fill this gap by proposing a deep learning-based approach to analyse the sentiment of YouTube comments related to COVID-19 vaccines. Additionally, there is a need to evaluate the effectiveness of the proposed approach on a large-scale YouTube dataset and compare its performance with other sentiment analysis methods. The results of this study could provide insights for public health organizations and policymakers to develop effective communication strategies to address the public's concerns and doubts about COVID-19 vaccines.

Methodology

Project Methodology

The hybrid methodology, which combines agile and waterfall methodologies, is employed, as illustrated in Figure 1. Requirement, design, development, testing, and evaluation are the five stages of the hybrid methodology paradigm (Rasydan Ismail et al., 2019). The waterfall methodology is used for the requirement and assessment phases, whereas the agile methodology is used for the design, test, and development phases (Sulaiman & Azmi, 2021). Deep learning techniques, tools, and datasets are obtained during the requirements phase of the project.

(3)

15

Throughout the design phase, graphical user interfaces were created to make it simple for users to enter data and view sentiment analysis results. The deep learning-based predictive models were created and evaluated during the development phase until the best outcomes were obtained.

Performance indicators, including precision, recall, F1-score, and accuracy, are used to assess the models in the evaluation phase

.

Figure 1. Hybrid Methodology Model Model Training Flow

Figure 2 shows the model training flow that was used to train and test the deep learning method while it was being made. The dataset is utilized for data pre-processing, sentiment labelling, training a model using test and training data, and model evaluation. Data transformation, data balancing, missing value removal, outlier removal, and other steps are all part of the preparation of the data. The data will be divided into training data and testing data after it has been cleaned.

While testing data will be used to assess the model's performance, the model will learn from the training data.

Figure 2. Model Training Flow

High-level Graphical User Interface

The sentiment analysis system's graphical user interface (GUI), which will allow content developers to communicate with the system, is seen in Figure 3. The GUI is composed of seven web pages that have input forms and output sentiment results (Azmy et al., 2021).

(4)

16

Figure 3. Proposed Graphical User Interface

(5)

17

Development of Deep Learning Model

Dataset

For this study, secondary sources were used to gather the data, notably the Kaggle website (https://www.kaggle.com/datasets/seungguini/youtube-comments-for-covid19-related-videos).

This research will concentrate on a comment that contains or is only related to vaccines, even though over 40,000 pieces of data have been acquired. More comprehensive datasets are required to acquire greater accuracy values, even after filtering the Kaggle dataset, which contained roughly 4593 data. Therefore, additional work needs to be done to extract more data from YouTube videos using Python.

The video ID from the URL, such as https://www.youtube.com/watch?v=Can7wPZZ-g0, where the id is intended to be 11 characters after the '=' sign and is 'Can7wPZZ-g0', needs to be copied or captured to extract the comments from the YouTube video. The next step is to establish an array to keep all the comments; however, for the sake of this project, it will only be used to store the comments. The name, comment, time, number of likes, and number of replies are all listed as columns in this array. Once all the data has been collected, a CSV file will be created from the data, as illustrated in Figure 4. A total of 16,073 pieces of information have been collected and will be used in this project.

Figure 4. Sample of Data in CSV file Tools

The predictive model is developed using Google Colab is a cloud-based environment for running Jupyter notebooks that uses Python as its primary programming language

(Msomi & Thango, 2023)

. The GUI forms are designed using Tkinter. Tkinter is Python’s GUI toolkit binding. All the data is stored in a CSV file.

Pre-processing and Data Cleaning

In sentiment analysis, pre-processing is very important because all the data must be cleaned and processed before it can be classified. HTML tags, scripts, and ads are all examples of noise and useless information that often show up in reviews and comments. Keeping troublesome terms at the top of the list complicates classification because each word in a text is evaluated independently. Getting rid of textual noise should make the classifier work better and speed up the classification process, making real-time text sentiment analysis easier.

During the data cleaning phase, NeatText, a simple NLP tool for cleaning text data and pre-processing text data, will be used to clean some text data (Rawat et al., 2022). It can extract emails, phone numbers, and web links from emojis and sentences. It can also be used to set up text pre-processing pipelines. The cleaning procedure using the NeatText function will be

(6)

18

employed, such as normalised, remove_stopwords, remove_hashtags, remove_userhandles, remove_multiple_spaces, remove_urls, remove_emojis, and remove_special_characters. The process of lemmatization will then continue. Lemmatization is the process of combining the various inflected forms of a word into a single unit for study purposes. This facilitates a more efficient study of the word. Lemmatization is a similar process to stemming, but it adds meaning to the words.

Consequently, it associates terms with similar meanings with a single term.

Sentiment Labelling

After the data has been imported from CSV files, it will first go through the data cleaning process before proceeding with the sentiment labelling process. Each comment will be marked as positive, negative, or neutral, then labelled as 1, 0, or 2, respectively. Due to the procedure, out of 16,073 pieces of data, about 6,804 comments were positive, 3,269 were negative, and 6,000 were neutral, as displayed in Table 1.

Table 1. Sentiments Label and Results

Sentiment Label Data

Negative 0 3,269

Positive 1 6,804

Neutral 2 6,000

Word Embedding

In this study, the GloVe technique was used to embed words. In natural language processing, word embedding is a technique used to represent words as dense, low-dimensional vectors that capture their semantic and syntactic relationships

(Cheng & Tsai, 2019)

. GloVe can capture the semantic similarity of words by evaluating the cosine similarities of word vectors. GloVe investigates how the most frequent words in the corpus are combined with other terms. In addition, because it gives uncommon words disproportionate weight, it prevents frequent co-occurrence terms from having less of an impact than they should on uncommon words (Alhuri et al., 2020).

Parameter Train and Test

In this study, the dataset will be divided as follows: 80 percent will be used for training, and 20 percent will be used for testing. In the source code, it will be set as test_size = 0.2, which will represent 20 percent of the data that will be used for testing; there will be no changes made while the experiment is being conducted. The random state parameter is the second one that will be tuned in the train test split. This will be performed three times to differentiate the results based on changing the parameters. Table 2 contains all the relevant information.

Table 2. Parameters for Train and Test Data Parameters Experiment

1

Experiment 2

Experiment 3

Data size 16,073 16,073 16,073

Train: Test (ratio) 80:20 80:20 80:20

Random State 32 42 52

(7)

19

Fine-Tuning Hyper-Parameter

Deep learning algorithms change their internal parameters automatically based on what they have learned. These parameters are known as "model parameters." Other factors, however, are not modified during the learning process and must be preconfigured prior to the beginning of the learning process. These parameters are frequently referred to as "hyperparameters" (Weerts et al., 2020). The model parameters specify how input data are transformed into the desired output, while the hyperparameters reveal the structure of the model (Elgeldawi et al., 2021). The performance of a machine learning model can vary significantly based on the selection and configuration of its hyperparameters. Table 3 displays all parameters, inputs, and data in a sequential container. Some parameters will always be the same; the only change is the number of batch sizes and the units used.

Table 3. Parameters for Sequential Containter

Functions Layer Input Experiment

1

Experiment 2

Experiment 3 model.add embedding nb_words 15,386 15,386 15,386

dimension 100 100 100

input_length 15 15 15 trainable false false false

RNN units 100 200 300

LSTM units 100 200 300

GRU units 100 200 300

dense units 2, activation = softmax

model.compile loss categorical_crossentropy

optimisier adam

metrics accuracy

model.fit epochs 20 20 20

batch_size 120 130 140

verbose 1 1 1

validation_data valid_x, valid_y

Model Training

In this study, three deep learning techniques: Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) have been employed. The brief methodology, working procedure and differences are provided as follows.

Recurrent Neural Network (RNN)

RNN is a type of neural network that effectively processes sequential data, such as time-series data, natural language text, and audio. It has a "memory" that can store information from previous inputs and predict future inputs based on this information. In an RNN, the previous time step's output is fed back into the network as an input alongside the current input. This creates a loop that enables the network to maintain context and temporal data (Zhang et al., 2021).

RNN can be used for sentiment analysis by sequentially processing text and predicting the sentiment of the entire text based on the sequence of hidden states. RNN are trained on labelled datasets of positive and negative texts and are taught to map each word to a hidden state that encapsulates its meaning and context within the entire text. Word embeddings and attention mechanisms are added to the RNN to improve its performance. After training, the RNN can predict the sentiment of new texts. RNNs are useful for sentiment analysis because they capture complex word-context relationships (Kohsasih et al., 2022).

(8)

20

Long Short-Term Memory (LSTM)

LSTM is a form of RNN designed to deal with long-term dependencies in sequential data. LSTMs solve the "vanishing gradient" problem of simple RNNs by controlling the flow of information with gates. Three gates comprise LSTM: input, forget, and output. The input gate determines how much new data is allowed into the memory cell, the forget gate determines how much data is removed from the cell, and the output gate determines how much data is output. The memory cell stores information from previous inputs, and the gates regulate the flow of information, enabling the LSTM to selectively remember or forget information over long sequences (Lee et al., 2019).

For sentiment analysis, LSTM can be used to process text one word at a time and predict the overall mood of the text based on the sequence of hidden states. LSTM are great for analysing sentiment because they can pick up on long-term dependencies in the text, which is important for predicting sentiment accurately. The input, forget, and output gates of LSTM can learn to remember or forget information based on the context of the text. This lets the network pick up on negation, sarcasm, and other subtleties in language that affect how people feel (Chai et al., 2021).

Gated Recurrent Unit (GRU)

GRU are a type of RNN that can handle long-term dependencies in sequential data with fewer parameters. GRU use reset and update gates. The reset gate determines how much of the previous hidden state to discard, whereas the update gate determines how much of the current input to include in the new hidden state. GRU combine the input and forget gates of LSTM into a single gate, making them more computationally efficient and simpler to train (X. Wang et al., 2019).

GRU can be used to figure out how people feel about something by processing text one word at a time and guessing how they feel about the whole thing based on the order of hidden states. GRU are good for sentiment analysis because they can find long-term dependencies in the text and need fewer parameters than LSTM, which makes them easier and faster to train.

GRU's reset and update gates can learn to forget or keep information based on the context of the text. This lets the network understand things like negation, sarcasm, and other subtleties in language that affect mood

(Zouzou & Azami, 2021)

.

Evaluation Model

Deep learning systems are judged by how well they work based on their accuracy, precision, F1- score, and recall (Sulaiman et al., 2015). Based on the confusion matrix, these evaluation metrics used four fundamental attributes: True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). The accuracy value is calculated using Eq. (1).

Accuracy = TP + TN / (TP + FP + TN + FN) (1)

The accuracy numbers are supported by the precision, recall, and F1-score metrics. Eq. (2), Eq.

(3), and Eq. (4), respectively, each provide the methodology for these measures.

Precision = TP / (TP + FP) (2)

Recall = TP / (TP + FN) (3)

F1-score = 2 x (Recall x Precision) / (Recall + Precision) (4)

(9)

21

Result and Discussion

In the first experiment, the random state that is preserved when separating the train and test data is the variable that is prone to change. Since this value is currently set to 32, regardless of execution, it will always get the identical train and test sets. Then, for model training, a value or parameter will alter the units of each model, which are currently set to 100 units for all models, as well as batch size, which will also change the function fit and is currently set to 120 units, while all other values will remain the same. When the model training occurs, this change will occur. Tables 4, 5 and 6 show each model's outcomes from the first experiments.

Table 4. Classification Report – First Experiment of RNN

Precision Recall F1-Score Support

0 0.56 0.36 0.44 650

1 0.82 0.59 0.69 1220

2 0.6 0.86 0.71 1345

Accuracy 0.66 3215

Macro Avg 0.66 0.6 0.61 3215

Weighted Avg 0.68 0.66 0.65 3215

Accuracy Score: 65.63%

Table 5. Classification Report – First Experiment of LSTM

0 0.64 0.68 0.66 650

1 0.86 0.82 0.84 1220

2 0.81 0.81 0.81 1345

Accuracy 0.79 3215

Macro Avg 0.77 0.77 0.77 3215

Weighted Avg 0.79 0.79 0.79 3215

Table 6. Classification Report – First Experiment of GRU

0 0.78 0.58 0.66 650

1 0.87 0.83 0.85 1220

2 0.76 0.89 0.82 1345

Accuracy 0.8 3215

Macro Avg 0.8 0.76 0.78 3215

Weighted Avg 0.81 0.8 0.8 3215

Based on Tables 4, 5, and 6, it is possible to draw the conclusion that the GRU models have the highest accuracy value, as shown by their accuracy rate of 80.19%. The initial value of the epoch was 20. Ultimately, it was decided to halt the project early to avoid overfitting. The experiment shows that while GRU stops at epochs 14, it performs better than RNN, which stops at epochs 6.

The goal of this project is to achieve results with more precision than any other implementation technique. For this initial experiment's conclusion, we went with the GRU model that produced greater accuracy.

(10)

22

To determine if the results were any better than the first experiment, the random state's value was modified to 42 in the second experiment. Then, for model training, a value or parameter will alter the units of each model, where it is set to 200 units for all models, and batch size will also affect the function fit, where it is set to 140 units, while other values will stay the same. Each model's units will alter when it is trained. This tweak will be put into effect during the model training.

The classification report for each model is shown in Tables 7, 8, and 9.

Table 7. Classification Report – Second Experiment of RNN

0 0.52 0.42 0.47 676

1 0.79 0.69 0.74 1204

2 0.64 0.78 0.7 1335

Accuracy 0.67 3215

Macro Avg 0.65 0.63 0.64 3215

Weighted Avg 0.67 0.67 0.67 3215

Table 8. Classification Report – Second Experiment of LSTM

0 0.59 0.75 0.66 676

1 0.84 0.81 0.83 1204

2 0.82 0.73 0.77 1335

Accuracy 0.76 3215

Macro Avg 0.75 0.76 0.75 3215

Weighted Avg 0.78 0.76 0.77 3215

Table 9. Classification Report – Second Experiment of GRU

0 0.66 0.69 0.68 676

1 0.87 0.82 0.85 1204

2 0.79 0.81 0.8 1335

Accuracy 0.79 3215

Macro Avg 0.77 0.77 0.77 3215

Weighted Avg 0.79 0.79 0.79 3215

This second experimental result allows us to draw the conclusion that the GRU models have the highest accuracy value, as shown by their scores of 78.94% in Table 9. The experiment shows that the hyperparameter modifications influence the early stop of each model. RNN stops in the first experiment at epoch 6 but stops at epoch 5 in the second experiment. LSTM stops in the first experiment at epoch 12 but stops at epoch 7 in the second experiment. In the second experiment, the accuracy of the RNN model grew by 1.52%, while the accuracy of the LSTM model decreased by 2.58 percent and the accuracy of the GRU model increased by 1.52 percent from the first experiment. Out of all the models tested in this experiment, the GRU model provides the greatest accuracy value.

In the third experiment, the random state's value was changed to 52. Then, for the model's training, the parameter's value that modifies each model's units will be set to 300 units for all

(11)

23

models, and the batch size will also modify how well functions fit, with a value of 130 units, while all other parameters will remain the same. When a model is trained, its units will change for each model. Every time a model is trained, its units will change accordingly. At the time of the planned model training, this adjustment will be put into practise. The outcomes of this third experiment on each model are shown in Tables 10, 11, and 12.

Table 10. Classification Report – Third Experiment of RNN

0 0.46 0.39 0.42 655

1 0.76 0.65 0.7 1211

2 0.63 0.76 0.69 1349

Accuracy 0.64 3215

Macro Avg 0.62 0.6 0.6 3215

Weighted Avg 0.64 0.64 0.64 3215

Table 11. Classification Report – Third Experiment of LSTM

0 0.68 0.64 0.66 655

1 0.82 0.85 0.84 1211

2 0.8 0.79 0.79 1349

Accuracy 0.78 3215

Macro Avg 0.77 0.76 0.76 3215

Weighted Avg 0.78 0.78 0.78 3215

Table 12. Classification Report – Third Experiment of GRU

0 0.66 0.68 0.67 655

1 0.83 0.83 0.83 1211

2 0.8 0.79 0.79 1349

Accuracy 0.78 3215

Macro Avg 0.76 0.77 0.76 3215

Weighted Avg 0.78 0.78 0.78 3215

The third experiment revealed that LSTM models scored correctly 78.41% of the time, which is the best level of accuracy. The experiment shows a relationship between the accuracy outcomes for each model and the hyperparameter adjustments. In the first trial, the RNN algorithm stops at number 6 with an accuracy of 65.63%, at number 5 with a figure of 67.15%, and at number 7 with a figure of 64.26%. Using the LSTM, the first experiment stops at 12 with a precision of 79%, the second experiment stops at 7 with a precision of 76.24%, and the third experiment stops at 12 with a precision of 78.14%. According to the GRU results, the first experiment stops with an accuracy of epoch 14 (80.19%), the second experiment stops at epoch 11 (78.94%), and the third experiment terminates at epoch 13 (78.07%). The LSTM model has the highest level of accuracy when compared to the other models put to the test in this experiment.

Three experiments were performed to assess the accuracy of RNN, LSTM, and GRU models as illustrated in Table 13. In the first experiment, the GRU model demonstrated the highest

(12)

24

accuracy, 80.19%, and was therefore chosen. In the second experiment, variations in hyperparameters affected each model's early termination. GRU had the highest accuracy at 78.94% but lost 1.25 percentage points in comparison to the initial experiment. The LSTM model achieved the highest accuracy of 78.41% in the third experiment. In all experiments, the RNN model had the lowest accuracy, while the GRU model consistently outperformed the LSTM model.

All models' accuracy results were influenced by hyperparameter changes. Table 13 shows the performance summarization of the three deep learning techniques that have been investigated in this study.

Table 13. Summarization of Experiment

Experiment RNN LSTM GRU

Epoch stop Accuracy Epoch stop Accuracy Epoch stop Accuracy

First 6 65.63 12 79 14 80.19

Second 5 67.15 7 76.42 11 78.94

Third 7 64.16 12 78.41 13 78.07

The objective of this study was to report on the performance of deep learning techniques for sentiment analysis of COVID-19 vaccines using YouTube comments and a publicly available dataset from the Kaggle repository. Deep learning techniques offer an additional efficient method for extracting insights from large volumes of unstructured text data and identifying the authors' sentiments. In our study, the average predictive accuracy of the three deep learning techniques was comparable. This is consistent with previous findings utilizing the GRU model (Lee et al., 2019; Ni & Cao, 2020; Q. Wang et al., 2019). The GRU model outperformed the other deep learning models with the highest accuracy for sentiment analysis of COVID-19 vaccines based on YouTube comments.

The findings of the three experiments have led us to conclude that these parameters significantly influence the correctness of the model. In the first experiment, the model’s parameters were adjusted to have 32 random states, 100 units for each model, and 120 batch sizes. This was done so that the subsequent comparison could use these values as a baseline. In the next experiments, these three values were set following the directions given in Table 2 and Table 3, and it was discovered that the accuracy value for each model was getting worse. The value of the random state and the value of units in models offer the most significant effect on accuracy. To sum up the results of this experiment, this means that 32 random states are the best number of random states for sentiment analysis. One hundred units for the model, and the GRUs model is the most accurate representation of this experiment.

Conclusion

This paper discussed the sentiment analysis of COVID-19 vaccines from YouTube comments using deep learning techniques. In this study, a comparison of three deep learning techniques, GRU, LSTM, and RNN was performed, and GRU was found to achieve the best performance score. This study demonstrates the feasibility of applying deep learning techniques that could help public health organizations and policymakers develop effective communication strategies to address the public's concerns and scepticism regarding COVID-19 vaccines. For further research, a hybrid approach combining a deep learning technique with an optimization algorithm and more data can be used to further improve the accuracy.

(13)

25

Acknowledgements

The author would like to thank the College of Computing, Informatics & Mathematics, Universiti Teknologi MARA, Shah Alam, Selangor, Malaysia for all the supports.

References

Alhujaili, R. F., & Yafooz, W. M. S. (2021). Sentiment Analysis for Youtube Videos with User Comments:

Review. Proceedings - International Conference on Artificial Intelligence and Smart Systems, ICAIS 2021, 814–820.

Alhuri, L. A., Aljohani, H. R., Almutairi, R. M., & Haron, F. (2020). Sentiment Analysis of COVID-19 on Saudi Trending Hashtags Using Recurrent Neural Network. Proceedings - International Conference on Developments in ESystems Engineering, DeSE, 2020-December, 299–304.

https://doi.org/10.1109/DeSE51703.2020.9450746

Alkaff, M., Rizky Baskara, A., & Hendro Wicaksono, Y. (2020, November 3). Sentiment Analysis of Indonesian Movie Trailer on YouTube Using Delta TF-IDF and SVM. 5th International Conference on Informatics and Computing, ICIC. https://doi.org/10.1109/ICIC50835.2020.9288579

Alzazah, F., Cheng, X., & Gao, X. (2022). Predict Market Movements Based on the Sentiment of Financial Video News Sites. Proceedings - 16th IEEE International Conference on Semantic Computing, ICSC 2022, 103–110.

Aufar, M., Andreswari, R., & Pramesti, D. (2020). Sentiment Analysis on YouTube Social Media Using Decision Tree and Random Forest Algorithm: A Case Study. International Conference on Data Science and Its Applications (ICoDSA).

Bin Azmy, I. H., Azmi, A. Bin, Sulaiman, M. S., & Yusop, O. B. M. (2021). Digital Transformation in Oil and Gas Industry: Developing an OSDU Third-Party Application. 7th International Conference on Engineering and Emerging Technologies, ICEET. https://doi.org/10.1109/ICEET53442.2021.9659636 Chai, Z., Huang, J., Hao, F., & Wang, R. (2021). Sentiment Analysis of E-Commerce Reviews Based on Long Short-Term Memory Networks with Dropout Layer and Optimization. 20th International Conference on Ubiquitous Computing and Communications, 590.

Cheng, L. C., & Tsai, S. L. (2019). Deep learning for automated sentiment analysis of social media.

IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM, 1001–1004. https://doi.org/10.1145/3341161.3344821

Elgeldawi, E., Sayed, A., Galal, A. R., & Zaki, A. M. (2021). Hyperparameter tuning for machine learning algorithms used for arabic sentiment analysis. Informatics, 8(4).

https://doi.org/10.3390/informatics8040079

Hassan, R., & Islam, M. R. (2021). Impact of Sentiment Analysis in Fake Online Review Detection. 2021 International Conference on Information and Communication Technology for Sustainable

Development, ICICT4SD 2021 - Proceedings, 21–24.

https://doi.org/10.1109/ICICT4SD50815.2021.9396899

Kapali, N., Tuhin, T., Pramanik, A., Rahman, M. S., & Noori, S. R. H. (2022). Sentiment Analysis of Facebook and YouTube Bengali Comments Using LSTM and Bi-LSTM. 13th International Conference on Computing Communication and Networking Technologies, ICCCNT.

https://doi.org/10.1109/ICCCNT54827.2022.9984395

(14)

26

Kohsasih, K. L., Hayadi, B. H., Robet, Juliandy, C., Pribadi, O., & Andi. (2022). Sentiment Analysis for Financial News Using RNN-LSTM Network. 2022 4th International Conference on Cybernetics and Intelligent System, ICORIS 2022. https://doi.org/10.1109/ICORIS56080.2022.10031595

Lee, J. S., Zuba, D., & Pang, Y. (2019). Sentiment analysis of Chinese product reviews using gated recurrent unit. Proceedings - 5th IEEE International Conference on Big Data Service and Applications, BigDataService 2019, Workshop on Big Data in Water Resources, Environment, and Hydraulic Engineering and Workshop on Medical, Healthcare, Using Big Data Technologies, 173–181.

Msomi, N. L. Z., & Thango, B. A. (2023). Development of Dissolved Gas Analysis-based Fault Identification System using Machine Learning with Google Colab. Proceedings of the 31st Southern African

Universities Power Engineering Conference, SAUPEC 2023.

https://doi.org/10.1109/SAUPEC57889.2023.10057713

Ni, R., & Cao, H. (2020). Sentiment Analysis based on GloVe and LSTM-GRU. 39th Chinese Control Conference, 7492–7497.

Putri, A. M., Ananda Putra Basya, D., Ardiyanto, M. T., & Sarathan, I. (2021). Sentiment Analysis of YouTube Video Comments with the Topic of Starlink Mission Using Long Short Term Memory. International Conference on Artificial Intelligence and Big Data Analytics, ICAIBDA, 28–32.

https://doi.org/10.1109/ICAIBDA53487.2021.9689718

Rama Prasad Kollu, S., & Garapati, Y. (2022). Social and Movie Video Data Analysis for Representing Sentiments based on ML Approaches. Proceedings of the International Conference on Electronics and Renewable Systems, ICEARS, 1463–1469. https://doi.org/10.1109/ICEARS53579.2022.9752110 Rasydan Ismail, M. E., Ahmad Shukri, I. F., Azmi, A., Yahya, Y., Ismail, S. A., & Sulaiman, M. S. (2019).

Development of Agronomist Station System for Water Table Management at Peatland. International Conference on Research and Innovation in Information Systems, ICRIIS.

Rawat, A., Maheshwari, H., Khanduja, M., Kumar, R., Memoria, M., & Kumar, S. (2022). Sentiment Analysis of Covid19 Vaccines Tweets Using NLP and Machine Learning Classifiers. 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing, COM-IT-CON 2022, 225–

230. https://doi.org/10.1109/COM-IT-CON54601.2022.9850629

Simon Kemp. (2023). Digital 2023: Global Overview Report. https://datareportal.com/reports/digital-2023- global-overview-report

Suffian Sulaiman, M., Nordin, S., & Jamil, N. (2015). Enhancing The Performance of Multi-Modality Ontology Semantic Image Retrieval Using Object Properties Filter. 5th International Conference on Computing and Informatics, ICOCI. http://www.uum.edu.my

Sulaiman, M. S., & Azmi, A. (2021). Evaluation of Interactive WebSIR Using Software Usability Measurement Inventory (SUMI). Lecture Notes in Electrical Engineering, 741 LNEE, 109–116.

https://doi.org/10.1007/978-981-33-6490-5_10

Tiago Bianchi. (2023). Most popular websites worldwide as of November 2022, by total visits.

https://www.statista.com/statistics/1201880/most-visited-websites-worldwide/

Wang, Q., Sun, L., & Chen, Z. (2019). Sentiment Analysis of Reviews Based on Deep Learning Model.

IEEE/ACIS 18th International Conference on Computer and Information Science (ICIS), 258–261.

Wang, X., Xu, J., Shi, W., & Liu, J. (2019). OGRU: An Optimized Gated Recurrent Unit Neural Network.

Journal of Physics: Conference Series, 1325(1). https://doi.org/10.1088/1742-6596/1325/1/012089

(15)

27

Weerts, H. J. P., Mueller, A. C., & Vanschoren, J. (2020). Importance of Tuning Hyperparameters of Machine Learning Algorithms. ArXiv Preprint ArXiv:2007.07588. http://arxiv.org/abs/2007.07588

Zhang, F., Zeng, Q., Lu, L., & Li, Y. (2021). Sentiment analysis of movie reviews based on deep learning.

Journal of Physics: Conference Series, 1754(1). https://doi.org/10.1088/1742-6596/1754/1/012234 Zouzou, A., & Azami, I. El. (2021). Text sentiment analysis with CNN GRU model using GloVe. 5th

International Conference on Intelligent Computing in Data Sciences, ICDS 2021.

https://doi.org/10.1109/ICDS53782.2021.9626715

How to cite this paper:

Sulaiman, M. S., Maskan, F. A., Derasit, Z., Ibrahim Teo, N. H. (2023). Exploring YouTube Comments to Understand Public Sentiment on COVID-19 Vaccines through Deep Learning-based Sentiment Analysis. Malaysian Journal of Applied Sciences, 8(2), 22-27.