Sentiment Analysis for opinion leaders on Twitter: A Case Study of COVID-19

The author, whose copyright is stated on the title page of the work, has granted the British University in Dubai the right to loan his/her research work to users of its library and to make partial copies or for educational and research use only. The author has also granted permission to the University to retain or make a digital copy for similar use and for the purpose of digitally preserving the work. Experiments were conducted targeting specific tweets about COVID-19 from four opinion leaders by applying machine learning models.

Since the tweets directly affect people's thoughts, the purpose of these results was to learn more about the tweet's feelings of various public opinion leaders around the world during COVID-19. The first case of the virus was discovered in 2019 in China, Wuhan (WHO, 2020), where it was later named coronavirus or COVID-19. Opinion leaders are individuals who have a great deal of authority within the community while at the same time having the power to shape the opinions of others with whom they are connected.

We specifically target the twitter social media to evaluate the sentiment analysis of public from famous public leader's posts. In our research we have selected four famous public opinion leaders which are Donald Trump, Emmanuel Macron, Justin Trudeau and Boris Johnson. Since social media platform twitter played an important role of communication during the virus outbreak, (Zhu et al., 2016), tweets from public opinion leaders are used to identify the sentiments. (el Barachi et al., 2021) researched this.

With this in mind, we perform sentiment analysis on these tweets to categorize the tweets and further examine the results using machine learning models.

Problem Statement

Related work

Research Questions

Contribution

Tools

Scope

What are the results of public leader tweets about COVID-19 based on sentiment analysis.

Organization of Thesis

It uses various machine learning algorithms that use sentiment analysis to determine whether the tweets are positive or negative. Machine learning is a branch of computer science that used various algorithms to build machine learning models. The models of machine learning are trained according to different patterns to develop machine learning models (Bi et al., 2019).

In our research, we will use five machine learning models, the details of which are discussed below. We used decision tree model as it is one of the well-known supervision. Since the estimated values of. the function with respect to model parameters was also calculated at each stage, i.e. to upgrade the variables, MLP classifier is repeatedly trained. scikit-learn . developers, 2022b) Data provided as numpy arrays of floating variable values is compatible with this approach.

In this research, word embedding (word2vec) and Bag of words technique are used to convert tweets into frequency-based scoring. The scope of each phrase is represented in a data visualization approach to express textual information, showing their occurrence and relevance. We will discuss various outcomes and findings of machine learning models implemented in the dataset. Before implementing and showing the output of the machine learning models, data is first visualized using the frequent words in the dataset.

In conclusion, our research investigated sentiment analysis of COVID data on Twitter using different Machine learning models. Using machine learning models, it is observed that the ML model has worked effectively with the data set with good observations recorded. The sentiment analysis case study on covid using machine learning models also reveals the negative emotions under tweets exposed to sadness and fear among people.

This can also be extended by implementing other machine learning algorithms and combining two machine learning models as one algorithm while testing the dataset. Moreover, it is crucial to explore additional social media platforms in terms of sentiment analysis using machine learning models. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 13094 LNCS, 280–297. https://www.jeremyjordan.me/evaluating-a-machine-learning-model/. Public sentiment analysis on Twitter data during the COVID-19 outbreak.

Literature Review

Twitter API

To use the dataset from Twitter API, initially Twitter login was used to create an account. There are several questions that must be answered from Twitter in order to approve the access. After numerous questions, queries and emails, Twitter developer account access was given with Twitter API v2 elevated account.

Twitter Sentiment analysis

Machine Learning Models

We used 80 percent of the data for training the decision tree model and 20 percent of the data for testing. In our research, we used one hidden layer MLP which is shown in Figure 1 below. In our research, we used various methods and models to find the results of our data set.

Some irrelevant tweets or tweets in languages other than English were excluded from the dataset. Some tweets contain information other than the selected topic, ie. Based on best practices, 80 percent is used for training data and 20 percent for testing data. We used the textblob sentiment analysis library to determine the rating of a tweet as positive, negative, or neutral.

All non-overlapping pattern matches would be returned by the function in a single component after iterating through each row in the dataset. By completely disregarding the comparison data of the words in the data file, the data file is tagged with word frequency. It shows the group of words that mostly appear in the data set and targets the main words that most public leaders focus on.

In our research, we used the porter stemmer method, which is famous for its efficiency and simplicity. When the pandemic came, there was no vaccine and it only appeared later, so the hashtag #getboost had the least number of tweets from opinion leaders motivating people to get vaccinated against covid to limit the spread of the virus. Two graphs were shown in CNN, where the second graph named as model accuracy shows the epoch accuracy used in the model history of the CNN parameters.

The metrics of the confusion matrix above are used to estimate accuracy, precision, and recall statistics. In our research, we used scikit-learn library functions to calculate precision, accuracy, and recall. The highest logistic regression model accuracy of 86.61% followed by MLP of 83.17% continued with SVM, CNN and DT models.

Figure 2 shows the summary of our model. We have used sequential model with one Dimensional (1D) convolutional layer and 1Max Pooling layer using sequential method

Methodology

Decision Tree

Support Vector Machine

Multi-Layer Perceptron

Convolutional Neural Network

Results and Discussion

Word Cloud of tweets

Before implementing and displaying the results of machine learning models, the data is first visualized using common words in the dataset. Flores-Ruiz et al., 2021) Figures show a word cloud from a dataset that includes opinion leaders' covid tweets. In our environment, it was often helpful as data was gathered for quick memorization or retrieval of target keywords. It has English libraries that are used to recognize words using different algorithmic guidelines through a lookup table.

In the amount of all positive tweets, the most common terms were 'vaccine', 'health', 'protect', 'will', 'together'.

Figure 6 Word cloud of preprocessed tweets

Tweet Hashtags

Performance Evaluation of Machine learning models

From the confusion matrix table above, in the decision tree model, 72.12% of the tweets are positive based on the actual and predicted values. In the multilayer perceptron model, 10.58% of tweets were predicted as negative when they were actually negative, followed by only 1.44% predicted as negative when they were actually positive tweets. Similarly, with a convolutional neural network using an embedded library, 70.67% of the values were accurately predicted from the actual values, with 10.10% accurately predicted as negative, with tweets with real data also being negative.

Table 3 Confusion matrix of Machine learning models

Graph Evaluation of Machine learning models

As the Decision Tree shows the ROC range of 0.6, which shows a lower rate of the classification model. The highest precision was recorded for the support vector machine and the multilayer perceptron model as 80.11% continued with the LR and CNN model with the least. We analyzed that the opinion leaders have a more positive manner during the global COVID-19 outbreak on Twitter.

The sentiment analysis of tweets during covid by opinion leaders from different countries shows that they promote a safer environment to protect people and guide them to '#getboost' vaccinated to reduce the risk of covid. A new sentiment analysis framework for real-time monitoring of evolving public opinion: Case study on climate change.