Aspect-Based Sentiment Analysis on Twitter Using Bidirectional Long Short-Term Memory

(1)

Aspect-Based Sentiment Analysis on Twitter Using Bidirectional Long Short-Term Memory

Rizki Annas Sholehat^*, Erwin Budi Setiawan, Yuliant Sibaroni School of Computing, Informatics, Telkom University, Bandung, Indonesia

Email: ^1,*rizkiannas@student.telkomuniversity.ac.id, ²erwinbudisetiawan@telkomuniversity.ac.id,

3yuliant@telkomuniversity.ac.id

Correspondence Author Email: rizkiannas@student.telkomuniversity.ac.id

Abstract−Twitter as one of the social media with the most users in the world, is often used as a medium for sharing opinions that can be positive or negative. Movie reviews containing many complex explanations and judgments will be challenging to classify. Therefore a sentiment analysis process based on aspects is needed to analyze the polarity of film review opinions based on predetermined aspects. This research aims to analyze the polarity of film review opinions based on aspects using the Bidirectional Long Short-Term Memory method and GloVe feature extraction. This study uses plot, acting, and director aspects with a total dataset of 17.247 data. Bidirectional Long Short-Term Memory is proven to produce relevant and accurate results for sentiment analysis with the greatest accuracy of 56,29% in the plot aspect, 87,07% in the acting aspect, and 85,55% in the director aspect. GloVe feature extraction is proven to increase the performance value of this research by up to 13,57% in the plot aspect, 4,16% in the acting aspect, and 10,48% in the director aspect.

Keywords: Deep Learning; Bidirectional Long Short-Term Memory; GloVe; Sentiment Analysis; Aspect

1. INTRODUCTION

Modern society depends on technology for their daily needs in today's digital era. Social media is a technology that continues to develop and is used by the wider community. Social media users can communicate, share, and even create new things. One of the most used social media today is Twitter. Based on statistics compiled by [1], Indonesia's number of active Twitter social media users is ranked fourth in the world. This proves that the engagement of active Twitter users in Indonesia is quite high.

The easier it is to share something on social media, it also impacts one of the realms of the entertainment world that is much loved by most people, namely the film industry. Twitter users widely use this access to make a tweet or tweet that contains reviews about the films they have watched before. These reviews can be in the form of positive or negative reviews. The more information shared by Twitter users with language that is sometimes ambiguous, it will greatly affect other users' understanding of the context of a film being reviewed. Therefore, sentiment analysis is needed in order to classify the film reviews.

Sentiment analysis is a process of extracting information about emotions or feelings from someone in response to something [2]. The current combination of sentiment analysis has merged entities or aspects in its application to achieve more precise analysis results. Aspect-based sentiment analysis (ABSA) provides certainty as to whether a positive opinion also produces a positive opinion on all aspects of the entity in it [3]. This combination supports this research because film reviews usually contain aspects such as genre, cast, storyline, and others.

In its processing, sentiment analysis is currently using deep learning as the newest processing method that continues to develop. Deep learning (DL) is a machine learning method that studies data representation using artificial neural network algorithms that can be used in layers [4]. The deep learning model often used for sentiment analysis is Bidirectional Long Short-Term Memory (Bi-LSTM). Bi-LSTM is a developmental variant of the Long Short-Term Memory model. Bi-LSTM has two types of input, namely forward input and backward input. This allows Bi-LSTM to learn past and future information in each input sequence. The performance of Bi-LSTM is evident from research [5] conducted by Cheng and Tsai on social media sentiment analysis using several deep learning models showing that Bi-LSTM produces the highest accuracy, precision, recall, specificity, and F1 scores compared to other deep learning models.

The problem raised in this final project is how the effect and level of performance after the TF-IDF feature extraction technique and feature expansion using the GloVe method are applied to the Bidirectional Long Short- Term Memory algorithm to analyze sentiment on Twitter based on aspects.

The limitation of the research in this Final Project is a collection of Indonesian language sentiment data; a total of 17,247 tweets contain only the topic of film reviews, and the sentiment labeling process is done manually into three categories, namely positive, neutral and negative.

The purpose of this study is to implement, measure performance values, and analyze the results of a sentiment classification system built using the TF-IDF feature extraction method and GloVe feature expansion on Indonesian sentiment data in the tweet dataset that has been prepared.

Several studies have been conducted previously regarding sentiment analysis using deep learning. The review classification process will make it easier for users to categorize an opinion that is positive and negative in a more precise manner.

(2)

In research [6], Guixian Xu et al. in 2019 conducted a sentiment analysis study of the text of comments on hotel reservation service provider sites using the Bi-LSTM method. This study used Mondayfo+TF-IDF as the best word vector representation in terms of performance in the Bi-LSTM model in the sentiment analysis process of the research. Then, to prove the effectiveness of the Bi-LSTM method, researchers compared the performance of Bi-LSTM with the RNN, CNN, LSTM, and Naive Bayesian methods. The Bi-LSTM method obtained an F1 score of 92.18%, compared to other methods, which were between 84% and 89%.

In another study, Kemal Hernandi et al. in 2021 [7] conducted a sentiment analysis study of indihome customer experience on Twitter using the Bi-LSTM method. In its implementation, the researcher made two models. Model 1 detects negative and non-negative sentiments, and model 2 detects positive and neutral sentiments. Of the many model configurations that can be done, RMSProp produces the best optimizer and learning rate, namely with a learning rate of 0.0001; it can achieve a test accuracy of 89.22% in model 1 and 88.2% in model 2.

Subsequent research on using Bi-LTSM in sentiment analysis by Elfaik in 2020 [8] concluded that Bi- LTSM is more effective than CNN and LSTM in learning context from word to word in a sentence because this method combines forward hidden layers and backward hidden layers.

Related to the use of aspects in sentiment analysis in film reviews, research by Parkhe in 2014 [9] includes film aspects such as screenplay, plot, movie, direction, acting, and music. It was concluded that the most important aspects of this research were the plot, movie, and acting.

2. RESEARCH METHODOLOGY

2.1 Research Stages

The system is built as shown in Figure 1. The system is divided into several parts: data crawling, data labeling, preprocessing, feature extraction, GloVe feature expansion, split data, and sentiment analysis modeling. The following is an explanation of each stage.

Figure 1. Sentiment Analysis System 2.2 Crawling and Labeling

The data crawling process is carried out using the Application Programming Interface (API) that Twitter has provided in an open-source manner through the Twitter Developer. Crawling data on Twitter can use two search systems, by the user and by keyword [10]. The keywords used relate to the three aspects that have been determined.

For example, the plot aspect contains the keywords "cerita" and "alur", the acting aspect contains "aktor" and

"pemeran", and the director aspect contains "cinematography" and "filmmaking". Then five people label and validate the data based on subjective judgment. Table 1 is an example of tagged tweet data.

Table 1. Data labeling

Tweet Plot Akting Direktor

the pursuit of happyness verry nice y cerita yg menarik dan memotivasi

1 0 0

red notice jelek akting nya ga dapet semua 0 1 0

2.3 Pre-Processing Data

At this stage, the data that has been collected will be prepared. Data quality will improve as the cleaning process at this stage is running and will affect the accuracy of the classification results [11]. The preparation process includes data cleaning, data normalization, tokenization, case folding, stop word removal, and stemming.

(3)

a. Data Cleaning removes punctuation, symbols, numbers, and emojis in tweets because it will reduce the accuracy of data analysis results. Therefore, in this process, these things will be deleted.

b. Data Normalization is a step to identify words with too many inappropriate spellings, which will then be replaced with spellings that match KBBI.

c. Tokenization is separating each word in a sentence by a space. It aims to get the words that frequently appear on a topic.

d. Case Folding is changing all letters to lowercase.

e. Stop Word Removal is the process of removing words that are considered unimportant. These little words have no special meaning.

f. Stemming is the stage of changing words that have affixes into essential words.

2.4 Term Frecuency - Inverse Document Frequency (TF-IDF)

The feature extraction process is the main factor affecting the accuracy of the classification stage. TF-IDF is the most widely used extraction feature today. This method calculates the weight of each commonly used word [12].

The representation of data depends on the number of features obtained across topics. This TF-IDF calculation is based on the following formula:

𝑊𝑖𝑗 = 𝑡𝑓𝑖𝑗 × 𝐼𝑑𝑓𝑗, 𝐼𝑑𝑓𝑗 = 𝑙𝑜𝑔(^𝑁

𝑑𝑓) (1)

2.5 Global Vector (GloVe)

Global Vector, or GloVe, is a model that can store statistics on the occurrence of words globally, and this model will later be used to represent meaning. The GloVe model was built because of the observation that the frequency of word proximity can lead to a conclusion [13]. This value is used to expand the sentiment analysis feature. The results of this GloVe algorithm will produce output in the form of a list of words that are considered similar. For example, Table 2 lists words that are similar to the word "movie" and have been ranked.

Table 2. Similarity Word

Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Rank 7 Rank 8 Rank 9 cerita dokumenter sekuel hostiles keinget animasi mcu superhero sequel 2.6 Feature Expansion

After creating a corpus made with GloVe, the understanding of the features possessed can be expanded using feature expansions. Feature expansion is used in sentiment analysis to increase the number of features used in the model, improving its ability to understand the context of the text and make more accurate predictions[14]. Feature expansion replaces vectors containing the value "0" with words similar to the GloVe corpus [15].

2.7 Bidirectional Long Short-Term Memory (Bi-LSTM)

Bidirectional Long Short-Term Memory (Bi-LSTM) is a derivative variant of Long Short-Term Memory (LSTM).

Compared to LSTM, Bi-LSTM can complete sequential modeling tasks better because LSTM only exploits past or backward contexts [16]. Bi-LSTM can simultaneously include two inputs into the architecture, namely forward input and backward input. However, this architecture makes only one output issued. Bi-LSTM processes both prior and subsequent information in a two-way manner. The following process is feed-forward neural for a more detailed classification. There are hidden layers opposite each other, so the training process will better understand the data in the time series [17].

Figure 2. Bi-LSTM Structure

The figure above is an open structure of the Bi-LSTM with three sequential steps. As a forward LSTM layer, output sequence ℎ⃗ obtained with unidirectional or unidirectional grooves. The backward LSTM layer ℎ⃖⃗ is

(4)

calculated using reversed inputs from time 𝑡 - 1 to 𝑡 - n. This output sequence is then passed to the 𝜎 function to combine them into a vector 𝓎𝓉 [18]. Similar to the LSTM layer, the final output of the Bi-LSTM layer can be represented by a vector, 𝛶𝓉 = 𝓉 [𝒴𝓉−𝑛, . .. , 𝒴𝓉−1], where the last element, 𝒴𝓉−1, is the estimate for the next iteration.

2.8 Performance Measurement

Performance measurement using a confusion matrix. The confusion matrix will produce an actual value classification and a prediction value. There are four basic terminologies in the confusion matrix. First, True Positives (TP) are when the actual and prediction values are both positive. Second, True Negatives (TN) are when both the actual and prediction values are negative. Third, False Positives (FP), when the actual value is negative, but the prediction value is positive, commonly known as a Type 1 error. Fourth, False Negatives (FN) are when the actual value is positive, but the prediction value is negative, commonly known as a Type 2 error [19].

Alternatively, it can be seen in the Table 3.

Table 3. Confusion Matrix Table

Confusion Matrix Actual Values Positive Negative Predicted

Values

Positive TP FP

Negative FN TN

Then the performance value can be calculated with the following formulas:

a) Accuracy

Accuracy helps measure how often the classifier makes correct predictions.

𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = ^{𝑇𝑃+𝑇𝑁}

𝑇𝑃+𝐹𝑃+𝑇𝑁+𝐹𝑁 (2)

b) Precision

Precision is the ratio between a positive value that is expected to be confirmed with any expected positive data.

𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = ^𝑇𝑃

𝑇𝑃+𝐹𝑃 (3)

c) Recall

Recall is a comparison of the ratio of the total correct positive prediction data with all available positive data.

𝑟𝑒𝑐𝑎𝑙𝑙 = ^𝑇𝑃

𝑇𝑃+𝐹𝑁 (4)

d) F1 Measure

F1 Measure is a performance metric that takes recall and precision into account.

𝐹1 𝑠𝑐𝑜𝑟𝑒 = 2 (𝑟𝑒𝑐𝑎𝑙𝑙 × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛)

(𝑟𝑒𝑐𝑎𝑙𝑙 + 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛) (5)

3. RESULT AND DISCUSSION

3.1 Data Distribution

The tweet data from the data crawling process totaled 17,274 tweets in Indonesian, focusing on film reviews.

Figure 2 shows the labeling results from the tweet data.

Figure 3. Data Distribution Amount of Each Aspect

Then, the data in making the word dictionary uses data from several news media and the tweet dataset.

There are three GloVe corpus that is made for later use in modeling, including a corpus with a tweet dataset, a corpus with a news dataset, and a corpus with a tweet+news dataset.

0 5000 10000 15000 20000

Plot Aspect Acting Aspect Director Aspect Positive Neutral Negative

(5)

Table 4. Corpus Data Count

Corpus Amount

Twitter 7296

News 86.853

Twitter + News 89.119

Total 183.268

3.2 Test Result

In this study, four scenarios will be carried out using the Bidirectional Long Short-Term Memory classification.

The first scenario will compare ratios between 90:10, 80:20, and 70:30 for split test data and train data. The parameters for this ratio comparison are the accuracy value and the best F1 score. The second scenario is feature extraction using TF-IDF. After feature extraction, then perform feature expansion or change the number 0 in the data so that it can have a similarity value. Moreover, the last one combines the oversampling technique using SMOTE to see if there is still a performance increase if data balancing is done.

3.2.1 Scenario 1 (Baseline)

This scenario is carried out to find the best ratio of train and test data, which will later be used as a ratio reference in running the following scenario. The Plot and Acting aspects produce the best accuracy and F1-score values at a ratio of 90:10, while the Director aspect produces accuracy values and F1-scores that do not differ much between ratios, but the best value is at a ratio of 70:30.

Table 5. Scenario 1

Rasio Aspect Accuracy F1-score 70:30:00

Plot 52,52% 45,82%

Acting 83,56% 42,85%

Director 85,55% 30,73%

80:20:00

Plot 53,77% 42,74%

Acting 86,00% 48,89%

Director 85,54% 30,73%

90:10:00

Plot 56,29% 50,48%

Acting 87,07% 51,51%

Director 85,51% 30,72%

3.2.2 Scenario 2 (TF-IDF)

The following scenario is to perform feature extraction using TF-IDF so that each word has a weight. Table 5 shows a significant increase in accuracy and F1-score in all aspects after applying TF-IDF.

Aspect Accuracy F1-score

Before After Before After Plot 56,29% 68,02%

50,48% 65,29%

(+11,73%) (+14,81%)

Acting 87,07% 89,32%

51,51% 65,29%

(+2,25%) (+13,78%)

Director 85,55% 88,79%

30,73% 51,71%

(+3,24%) (+20,98%)

3.2.3 Scenario 3 (GloVe Expansion Feature)

This scenario aims to expand the vocabulary of a word in the dataset. It can be seen from Table 6, Table 7, and Table 8 that plot and acting aspects have increased accuracy and f1-scores in the Top-1 feature, and for the director aspect, there has been an increase in the Top-20 features. The increase in accuracy and F1-score in this scenario is insignificant. The highest increase is only 2.36% in the acting aspect of the F1-score section.

Table 7. Plot Aspect

Feature

Accuracy F1-score

Before Corpus Tweet

Corpus News

Corpus Tweet+

News

Corpus News

News Top-1 68,02% 66,75% 66,78% 68,09% 65,29% 63,85% 66,78% 65,45%

Top-5 68,02% 67,56% 67,66% 67,35% 65,29% 64,58% 67,66% 64,91%

(6)

Feature

Corpus News

News

Corpus News

News Top-10 68,02% 67,27% 66,76% 67,55% 65,29% 64,65% 66,76% 64,96%

Top-20 68,02% 67,58% 67,65% 67,62% 65,29% 64,96% 67,65% 64,57%

Table 8. Acting Aspect

Feature

Corpus News

News

Corpus News

News Top-1 89,32% 89,41% 89,47% 89,68% 63,20% 62,19% 64,82% 65,56%

Top-5 89,32% 89,11% 89,16% 89,43% 63,20% 63,77% 62,56% 62,38%

Top-10 89,32% 89,43% 89,34% 89,36% 63,20% 65,49% 64,06% 62,21%

Top-20 89,32% 89,26% 89,44% 89,02% 63,20% 63,95% 63,04% 62,21%

Table 9. Director Aspect

Feature

Corpus News

News

Corpus News

News Top-1 88,79% 88,57% 88,60% 88,50% 51,71% 49,24% 51,19% 49,62%

Top-5 88,79% 88,51% 88,55% 88,66% 51,71% 50,38% 51,19% 50,32%

Top-10 88,79% 88,47% 88,51% 88,61% 51,71% 50,38% 50,19% 50,51%

Top-20 88,79% 88,65% 88,57% 88,91% 51,71% 51,39% 49,51% 51,84%

Top-30 88,79% - - 88,56% 51,71% - - 51,05%

3.2.4 Scenario 4 (SMOTE)

Because this dataset has imbalanced data, the final scenario will be an oversampling technique using SMOTE for balancing data [20]. This scenario is carried out last in order to be able to see the results from TF-IDF and GloVe purely on the dataset in advance. It can be seen in Table 9 that all aspects experienced a significant increase in accuracy and F1-score.

Aspect Accuracy F1-score

Before After Before After Plot 68,09% 69,86%

65,45% 69,69%

(+1,77%) (+4,24%)

Acting 89,68% 91,23%

65,56% 91,04%

(+1,55%) (+25,48%)

Director 88,91% 96,03%

51,84% 96,02%

(+7,12%) (+44,18%)

3.2.5 Analysis and Evaluation

Figure 4. Accuracy Growth 0,00%

11,73% 11,80%

13,57%

2,25% 2,61%

4,16%

0,00%

3,24% 3,36%

10,48%

0,00%

2,00%

4,00%

6,00%

8,00%

10,00%

12,00%

14,00%

16,00%

Scenario 1 Scenario 2 Scenario 3 Scenario 4

Plot Acting Director

(7)

Figure 5. F1-score Growth

In the first scenario, the test results on train data and test data with a ratio of 90:10, 80:20, and 70:30 produce a ratio of 90:10 which is the best for Plot and Acting aspects, and a ratio of 70:30 for the Director aspect. The second scenario is to apply TF-IDF feature extraction, which results in a very significant increase in accuracy and F1-score in all aspects. In the third scenario, the data is tested using the GloVe feature expansion. The data were tested using three corpus: tweet corpus, news corpus, and tweet+news corpus. In this scenario, the best results were obtained from the tweet+news corpus in the top-1 feature for Plot and Acting aspects, then in the top-20 features for the Director aspect. The performance value in this scenario has increased because the data is trained beforehand by expanding the features of the corpus that has been built. Then the last scenario is data testing with the OverSampling technique using SMOTE. In this scenario, the performance of the training data increases significantly because the data being tested becomes balanced and affects the data tested in previous scenarios.

4. CONCLUSION

After going through four test scenarios, each test scenario can impact performance based on the example of the Bidirectional Long Short-Term Memory that has been built. Balancing data using SMOTE can provide an increase in the performance value of each aspect. Implementation of the expansion feature using GloVe is proven to increase accuracy (%) to be higher. In the plot aspect, the highest accuracy score that can be achieved is 69.86%,

& the F1-score is 69.69%; for the acting aspect, the highest accuracy value that can be achieved is 91.23% & the F1-score is 91.04%. For the director aspect, the highest accuracy score that can be achieved is 96.03% & the F1- score is 96.02%. Suggestions for further research are to try using a combination of other feature extraction methods, such as Bag of Words (BOW), and using other feature expansions, such as word2vec or FastText. As a note to build datasets pay more attention to the quality of the data in order to produce higher accuracy.

REFERENCES

[1] H. Tankovska, “Twitter: most users by country | Statista,” Jan. 2022. https://www.statista.com/statistics/242606/number- of-active-twitter-users-in-selected-countries/ (accessed May 14, 2022).

[2] H. H. Do, P. Prasad, A. Maag, and A. Alsadoon, “Deep Learning for Aspect-Based Sentiment Analysis: A Comparative Review,” Expert Syst Appl, vol. 118, pp. 272–299, Mar. 2019, doi: 10.1016/j.eswa.2018.10.003.

[3] S. M. Jiménez-Zafra, M. T. Martín-Valdivia, E. Martínez-Cámara, and L. A. Ureña-López, “Combining resources to improve unsupervised sentiment analysis at aspect-level,” J Inf Sci, vol. 42, no. 2, pp. 213–229, Apr. 2016, doi:

10.1177/0165551515593686.

[4] L. Zhang, S. Wang, and B. Liu, “Deep learning for sentiment analysis: A survey,” WIREs Data Mining and Knowledge Discovery, vol. 8, no. 4, Jul. 2018, doi: 10.1002/widm.1253.

[5] L.-C. Cheng and S.-L. Tsai, “Deep learning for automated sentiment analysis of social media,” in Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Aug. 2019, pp. 1001–1004.

doi: 10.1145/3341161.3344821.

[6] G. Xu, Y. Meng, X. Qiu, Z. Yu, and X. Wu, “Sentiment Analysis of Comment Texts Based on BiLSTM,” IEEE Access, vol. 7, pp. 51522–51532, 2019, doi: 10.1109/ACCESS.2019.2909919.

[7] M. K. Hernandi, S. A. Wibowo, and S. Suyanto, “Sentiment Analysis Implementation For Detecting Negative Sentiment Towards Indihome In Twitter Using Bidirectional Long Short Term Memory,” in 2021 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT), Jul. 2021, pp. 143–147. doi:

10.1109/IAICT52856.2021.9532546.

[8] H. Elfaik and E. H. Nfaoui, “Deep Bidirectional LSTM Network Learning-Based Sentiment Analysis for Arabic Text,”

Journal of Intelligent Systems, vol. 30, no. 1, pp. 395–412, Dec. 2020, doi: 10.1515/jisys-2020-0021.

0,00%

20,98% 21,11%

65,29%

11,69%

14,05%

39,53%

3,24% 3,36% 10,48%

0,00%

10,00%

20,00%

30,00%

40,00%

50,00%

60,00%

70,00%

Scenario 1 Scenario 2 Scenario 3 Scenario 4

Plot Acting Director

(8)

[9] V. Parkhe and B. Biswas, “Aspect Based Sentiment Analysis of Movie Reviews: Finding the Polarity Directing Aspects,”

in 2014 International Conference on Soft Computing and Machine Intelligence, Sep. 2014, pp. 28–32. doi:

10.1109/ISCMI.2014.16.

[10] J. Eka Sembodo, E. Budi Setiawan, and Z. Abdurahman Baizal, “Data Crawling Otomatis pada Twitter,” in INDOSC 2016, Sep. 2016, pp. 11–16. doi: 10.21108/INDOSC.2016.111.

[11] P. Badjatiya, S. Gupta, M. Gupta, and V. Varma, “Deep Learning for Hate Speech Detection in Tweets,” in Proceedings of the 26th International Conference on World Wide Web Companion - WWW ’17 Companion, 2017, pp. 759–760. doi:

10.1145/3041021.3054223.

[12] K. Kumar, B. S. Harish, and H. K. Darshan, “Sentiment Analysis on IMDb Movie Reviews Using Hybrid Feature Extraction Method,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 5, no. 5, p. 109, 2019, doi: 10.9781/ijimai.2018.12.005.

[13] Febiana Anistya and Erwin Budi Setiawan, “Hate Speech Detection on Twitter in Indonesia with Feature Expansion Using GloVe,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 6, pp. 1044–1051, Dec. 2021, doi:

10.29207/resti.v5i6.3521.

[14] S. P., O. v. Ramana Murthy, and S. Veni, “Sentiment analysis by deep learning approaches,” TELKOMNIKA (Telecommunication Computing Electronics and Control), vol. 18, no. 2, p. 752, Apr. 2020, doi:

10.12928/telkomnika.v18i2.13912.

[15] E. B. Setiawan, D. H. Widyantoro, and K. Surendro, “Feature expansion using word embedding for tweet topic classification,” in 2016 10th International Conference on Telecommunication Systems Services and Applications (TSSA), Oct. 2016, pp. 1–5. doi: 10.1109/TSSA.2016.7871085.

[16] G. Liu and J. Guo, “Bidirectional LSTM with attention mechanism and convolutional layer for text classification,”

Neurocomputing, vol. 337, pp. 325–338, Apr. 2019, doi: 10.1016/j.neucom.2019.01.078.

[17] M. Ilse, J. M. Tomczak, and M. Welling, “Attention-based Deep Multiple Instance Learning,” Feb. 2018.

[18] Z. Cui, R. Ke, Z. Pu, and Y. Wang, “Deep Bidirectional and Unidirectional LSTM Recurrent Neural Network for Network- wide Traffic Speed Prediction,” Jan. 2018.

[19] A. Suresh, “What is a confusion matrix?,” Nov. 17, 2020. https://medium.com/analytics-vidhya/what-is-a-confusion- matrix-d1c0f8feda5#:~:text=A%20Confusion%20matrix%20is%20an,by%20the%20machine%20learning%20model.

(accessed May 24, 2022).

[20] L. Demidova and I. Klyueva, “SVM classification: Optimization with the SMOTE algorithm for the class imbalance problem,” in 2017 6th Mediterranean Conference on Embedded Computing (MECO), Jun. 2017, pp. 1–4. doi:

10.1109/MECO.2017.7977136.