Hate Speech Hashtag Classification Using Hybrid Artificial Neural Network (ANN) Method

(1)

JURIKOM (Jurnal Riset Komputer), Vol. 9 No. 4, Agustus 2022 e-ISSN 2715-7393 (Media Online), p-ISSN 2407-389X (Media Cetak) DOI 10.30865/jurikom.v9i4.4425

Hal 784−789 http://ejurnal.stmik-budidarma.ac.id/index.php/jurikom

Hate Speech Hashtag Classification Using Hybrid Artificial Neural Network (ANN) Method

Lintang Aryasatya¹, Yuliant Sibaroni^2,*

Faculty of Informatics, Informatics, Telkom University, Bandung, Indonesia Email: ¹[email protected], ^2,*[email protected]

Email Penulis Korespondensi: [email protected] Submitted 04-07-2022; Accepted 14-08-2022; Published 30-08-2022

Abstract

Social networking sites Twitter is frequently used as a platform for information gathering various communities/forums as well as individuals to discuss certain things. Dissemination of information on Twitter can be in the form of positive information and negative information. One of the negative information is hate speech contained in the form of hashtags on twitter. Hate Speech Hashtag Classification was be carried out using the Hybrid Artificial Neural Network (ANN) method to produce satisfactory results compared to previous methods such as KNN and so on because the large amount of data in Twitter will be very profitable and produce good accuracy when using Hybrid Learning, Hybrid Learning with 5 Cross Validation the highest accuracy is 79% , the lowest is 73%, the average accuracy is 76%.

Keywords: Twitter; Hate Speech; Hybrid Aritificial Neural Network (ANN); Classification; Social Networking

1. INTRODUCTION

The Internet's global accessibility has fundamentally altered our thoughts about the realm. Media Social in various forms, like Social Media, News, Forum, Dating, Online Game, Example of Social Media with different purpose are Youtube for streaming video, Instagram for sharing photos, Linkedin for business,Facebook and Twitter for sharing opinions [1].

Twitter has about 3.7 million users with over 10 million tweets in a day, And Twitter is the 2nd popular social media [2].

Twitter is the best media and a place for researchers because it is the most important data sources[3].Twitter has also evolved into a valuable place for analysis to predict crime, and track terrorists, and to detect and predict hate speech.

Because of Twitter with its popularity and the amount of data that is tweeted on user-generated twitter, the number of hate speeches continues to rise. Dynamic research continues to focus on the classification of hate speech using social media data [1]. The classification of hate speech using the Naive Bayes method gets unsatisfactory results with low accuracy compared to other methods. Naive Bayes method slightly outperforms other methods with an accuracy of 60%

[4].

In Indonesia, researchers conducted research on the classification of hate speech [5][6][7][8] on Twitter and Instagram platforms. Classification of English hate speech has been done previously[9][10][11][12]. In the study[13]

auto-detected online cyber hate with a special hashtag for women's clothing on a twitter platform in Turkey. The study collected data using a hashtag filter. Then, for the classification stage, feature extraction is performed using many classifier such as Decision Tree, Naïve Bayes,SVM or Support Vector Machine and J48. The last stage is done by validating the model with 4-fold cross validation. Preliminary results show that hateful content can be detected with a high precision value (97%) but a more sophisticated approach is needed to increase recall scores. From the results of the experiments carried out, the Naïve Bayes algorithm and Linear SVM have good results compared to other methods but have a small recall value.

Pereira-Kohatsu et al. created HaterNet, HaterNet is currently used by the Spanish National Office to identify hate speech on Twitter and HaterNet as a smart system.This study used 2 million tweets for data, and feature extraction was done using word2vec. The LASSO model is then used to select features for the classification stage, which includes Logistic Regression (LR), Random Forest (RF), SVM, Neural Networks and QDA or Quadratic Discriminant Analysis, and LDA or Linear Discriminant Analysis and confusion matrix evaluation. From the experiments that have been carried out, the AUC value of 0.828 was obtained using words embedding, emoji, and token expressions [1].

The research used the Deep Learning Method so that the required dataset is quite large and the processing takes a long time. Following research by Al-Hassan, A. et al. for identify and categorize Language Arabic tweets into five categories: General Hatred, None, Sexism and Racism also religious. This study used data from 11 thousand Arabic- language tweets. Deep Learning and SVM methods are used for classification. For evaluate classification using confusion matrix. In terms of detecting hateful tweets, the four model deep learning outperformed the SVM model. Although SVM achieves a recall of 74%, the deep learning model achieves an average recall of 75%. Adding a CNN layer to LTSM, on the other hand, improves overall detection performance by 72% in terms of precision, 75% in terms of recall, and 73% in terms of F1-score [5]. In terms of research, Pratama, B. Y., and colleagues are conducting research to improve the accuracy of previous research. The study also discusses how to create a system to predict personality from texts written by Twitter users. The dataset is derived from the user's username and English tweet[14]. Nave Bayes, K-Nearest Neighbor (KNN), and SVM are used for classification. The classification results are validated using 10-fold cross validation. Based on the text of the tweet, the personality of Twitter users was successfully predicted. With an average accuracy of 60 percent, Nave Bayes slightly outperforms the other methods among the three used. Research on classification problems has also been carried out by Kaur & Karla using the Hybrid ANN method. The combination of swarm intelligence (SI)

(2)

Hal 784−789 http://ejurnal.stmik-budidarma.ac.id/index.php/jurikom optimization and ANN is extremely beneficial in accelerating the convergence of Hybrid ANN in classification to various benchmark problems. The results demonstrated that Hybrid ANN reduced misclassification [15].

Previous research by Hassan and Dossari explain that on hate speech classification revealed that the four deep learning models outperformed SVM, despite SVM achieving 74 percent recall and deep learning achieving an average of 75 percent [5]. However, because deep learning requires a very large dataset, it takes a long time to process, but in this research is using Hybrid Artificial Neural Networks so the dataset doesn’t need to be too much or large, and doesn’t take long time to process. Hybrid Artificial Neural Netowrks are a class of flexible nonlinear models designed to mimic biological nervous systems. ANN can solve problems such as high data complexity problems. This is the rationale for implementing the Hybrid Artificial Neural Networks algorithm in the Hate Speech Hastag Classification case with high accuracy.

2. RESEARCH METHODOLOGY

2.1 Research Flow Chart

In this study, flowcharts are used to describe the modeling system built for Hate Speech Hastag Classification Using Hybrid Artificial Neural Network (ANN) Method. Hate Speech Detection System flow chart can be seen in Figure 1.

Figure 1. Hate Speech Detection System 2.2 System Development Method

The development of the Hate Speech Detection System consists of several activities which are of course in accordance with the stages that have been carried out in the system development process flow with details:

a. Data Crawling

The first step is to crawl data using API and its provided using the Twitter Developer. Tweets with the selected keywords are extracted based on topics during the crawling process. The trending topic is used to determine the topic of the tweet. Crawling of data takes place from November 2021 to June 2022.

b. Data Labeling

Data labeling is a step in the machine learning process. The data labeling process uses hate speech tweets that have been collected.In a legal sense, hate speech is a word or sentence, as well as behavior and writing, or performance

(3)

Hal 784−789 http://ejurnal.stmik-budidarma.ac.id/index.php/jurikom that is prohibited because it can cause acts of social conflict, and prejudice also violence on the part of the perpetrator or victims of such acts [16]. After collecting hatepseech data using the API on Twitter, then the data will be labeled to be used as a dataset manually by reading whether the data includes hate speech or not and labeling is done manually with 2 people simultaneously so that if there is a difference of opinion then the data is discussed again to get the right type of label. Data labeling aims to determine the type of tweet in each tweet that has been crawled before.

c. Preprocessing Data

Preprocessing data is an important step in categorizing hate speech hashtags. This is because the raw data still has many imperfections, redundancy and inconsistencies. There are several stages carried out in data preprocessing, following:

1. Case Folding

Process that converts phrases and words in tweet text into a lowercase letters (a to z) using regular expressions. Its purpose is to assist in resolving the issue of differently capitalized words.

2. Data Cleaning

In Data Cleaning, numbers, spaces at the beginning and end, several spaces become single spaces, deletion of @ , RT , # , url , and other filters using regular expression.

3. Stemming

By removing prefixes and infixes, suffixes, and confixes from derivative words, stemming is the process of turning a word or token into a fundamental word. In this study, stemming was done using the Sastrawi library.

d. Feature Extractions

The count vectorizer is a technique that turns a sentence into a vector matrix of the number of words that appear in the sentence, and it is the feature extraction approach employed in this study. The extractions feature produces features that are extracted in the form of tokens from the text of the data they have which will later be used as input values in the form of sentences and the result is a matrix of words with the size of the matrix (amount of data x number of token data). Count Vectorize involved stopwords removal in their process to minimize irrelevant features in the data.

e. K-fold Cross Validation

One method used to divide data into train and test are K-Fold cross validation. Due of its ability to lessen sampling bias, this strategy is frequently used by researchers. Each data will have the chance to be a test [17], which is the large number of data disaggregation utilized for the distribution of train and test. K-Fold cross validation applies constantly to split the data into train data and test data.

f. Model Hybrid ANN Learning

Hybrid is a combination of two or more systems in one function [18]. Hybrid Model classification is a method that works by combining more than 1 model in this case using voting method and combines Multi-Layer Perceptron, Support Vector Machines and Naive Bayes Classifier to create prediction. Voting classifier is a meta classifier that makes predictions by combing predictions several independent classifiers based on a predetermined voting strategy.

The way this hybrid works is by ensure that errors made by one classifier can be resolved by another classifier. Hybrid model trains a lot model and predict the output based on their highest probability of selecting the class as the output.

The hybrid process can be seen in the following example in Figure 2.

Figure 2. Model Hybrid ANN g. Performance Evaluation

A performance indicator for machine learning classification issues when the output can be two or more classes is the confusion matrix. A table containing four possible combinations of expected values and actual values is the confusion matrix. The F1 Score and accuracy are two metrics used in this performance evaluation to compare average recall and precision and to determine how accurate the model is at accurately classifying data. An example of the confusion matrix is shown in Table 1 below.

Table 1. Confusion Matrix Prediction Class Actual Class

Negative Positive

Negative True Negative (TN) False Positive (FP) Positive False Negative (FN) True Positive (TP)

(4)

The following output values can be obtained From Table 4.4 with details:

1. Recall

Calculations used to find out how big the percentage of true cases identified as true.

Recall = TP

TP+FN (1)

2. Accuracy

Comparison of all incorrectly classified prediction results with the rest of the data.

Accuracy = ^TP+TN

TP+FN+TN+FN (2)

3. Precision

The comparison between the results of the positive classification that is rightly classified as positive with all the predicted results is positive.

Precision = TP

TP+FP (3)

4. F1 Score

Averages for Precision and Recall are compared using weights.

F1 Score = 2∗TP

2∗TP+FP+FN (4)

3. RESULT AND DISCUSSION

3.1 Data Crawling

The study using the Hybrid Artificial Neural Network (ANN) method with the first are generated from a crawling process that retrieves tweets with keywords based on topics according to trending topics. Data crawling is carried out in the period November 2021 to June 2022. According to the results of the crawling, there are 3119 tweets that used for training and testing. The hashtags are #JokowiKejarSetoran and #BubarkanKhilafatulMuslimin.

3.2 Data Labelling

After collecting hate speech data using the API on Twitter, then the data labeled to be used as a dataset manually by reading whether the data includes hate speech or not. Data labeling aims to determine the type of tweet in each tweet that has been crawled before. The labels used are: HS and Non_HS. Result of data labels is shown on Table 2.

Table 2. Dataset Label Label Tweet

HS #JokowiKejarSetoran @jokowi Mundur lah!!bikin susah rakyat aja bisa ny!!

Masarakat jangan mau di bodohi dengan orang orang yang mau mendirikan Negara khilafah. Organisasi khilafatul muslimin adalah iblis bersorban. Tolak khilafah. #BubarkanKhilafatulMuslimin https://t.co/1UqqHtWTQs

Non_HS Ggaasssss pooolll

#JokowiKejarSetoran

#BubarkanKhilafatulMuslimin , usut siapa saja yg terlibat, sikaat sampe tuntas!! NKRI HARGA MATI!! https://t.co/jE9r6odKBr 3.3 Preprocssing Data

Preprocessing is carried out on tweet datasets such as case folding, eliminating numbers, removing punctuation marks, removing blank spaces, removing urls, removing unnecessary text and encoding labels. The result of preprocessing data can be seen on Table 3.

Table 3. Preprocessing Data Stage

Preprocessing Stage Before Preprocessing After Preprocessing

Case Folding & Data Cleaning Katanya mau brantas KKN BOHONG....!!!!???

#JokowiKejarSetoran #JokowiKejarSetoran https://t.co/sVGGrtP7yo

katanya mau brantas kkn bohong

Stemming katanya mau brantas kkn bohong kata mau brantas kkn

bohong 3.4 Feature Extractions

(5)

Hal 784−789 http://ejurnal.stmik-budidarma.ac.id/index.php/jurikom Feature extraction is also done using the count vectorizer feature, then each feature shows how many words are in the tweet. Stopwords removal are involved in count vectorizer for remove non-topics word such as “ini”, “atau”, “adalah”,

“dan”, “yang” and others.Stopwords Removal using NLTK library, the goal is to minimize irrelevant features in the data.

Example of count vectorizer is shown on Figure 3 and obtained 4392 unique words.

Figure 3. Count Vectorizer Example 3.5 K-fold Cross Validation and Hybrid Process

This system uses 5 – Fold Cross Validation that applies constantly to split the data into train data and test data. Data sharing utilizing the K-Fold cross validation method is demonstrated in Table 4. The MLP classifier, Support Vector Machine (SVM), Naive Bayes (NB), and Hybrid MLP & SVM & NB algorithms were all used to test this system. The statistics in Table 5 are all expressed as percentages. Table 5 displays the test outcomes.

Table 4. K-fold Cross Validation

Fold 1 Fold 2 Fold 3 … Fold K

Test Train Train … Train

Train Test Train … Train

Train Train Test … Train

… … … … …

Train Test Train … Test

Table 5. Model Testing

Method Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Average

A F1 A F1 A F1 A F1 A F1 A F1

MLP 70 70 74 74 74 74 71 71 71 71 72 72

SVM 74 74 71 71 77 77 72 72 75 75 74 74

NB 70 70 72 72 74 74 73 72 74 74 73 72

Hybrid 73 73 74 74 79 79 76 76 75 75 76 75 Description: A: Accuracy; F1: F1 Score

The best results are obtained for MLP with 5 Cross Validation, the highest accuracy is 74%, the lowest accuracy is 70%, and the average accuracy is 72%, and for SVM with 5 Cross Validation the highest accuracy is 77%, the lowest is 71%, the average accuracy is 74%, and for NB with 5 Cross Validation, the highest accuracy is 74%, the lowest is 70%, the average accuracy is 73%. For the Hybrid Learning with 5 Cross Validation, the highest accuracy is 79%, the lowest is 73%, and the average accuracy is 76%.

4. CONCLUSION

This research has given good results in classifying or predicting the types of tweets on hate speech hashtags. To clean the noise in the data, case folding is applied, eliminating numbers, removing punctuation marks, eliminating blank spaces, eliminating urls, eliminating unimportant words, and encoding labels. Feature extraction is performed to remove irrelevant features for the testing process. Based on the outcomes of experiments with the Multi Layer Perceptron, Support Vector Machines, and Naive Bayes, and hybrid learning techniques. The outcomes demonstrated that the Hybrid Learning approach performed the best. The system shows that the level of accuracy and f1-score of the system for predicting the type of tweet in the hate speech hashtag is the highest in the 3^rd folds with 79%.

(6)

REFERENCES

[1] J. C. Pereira-Kohatsu, L. Quijano-Sánchez, F. Liberatore, And M. Camacho-Collados, “Detecting And Monitoring Hate Speech In Twitter,” Sensors, Vol. 19, No. 21, P. 4654, 2019, Doi: 10.3390/S19214654.

[2] M. Yoosefi Nejad, M. S. Delghandi, A. O. Bali, And M. Hosseinzadeh, “Using Twitter To Raise The Profile Of Childhood Cancer Awareness Month,” Netw. Model. Anal. Heal. Informatics Bioinforma., Vol. 9, No. 1, Pp. 1–5, 2020, Doi:

10.1007/S13721-019-0206-4.

[3] E. Fehn Unsvåg And B. Gambäck, “The Effects Of User Features On Twitter Hate Speech Detection,” In Proceedings Of The 2nd Workshop On Abusive Language Online (Alw2), 2018, Pp. 75–85. Doi: 10.18653/V1/W18-5110.

[4] B. Y. Pratama And R. Sarno, “Personality Classification Based On Twitter Text Using Naive Bayes, Knn And Svm,” In 2015 International Conference On Data And Software Engineering (Icodse), 2015, Pp. 170–174. Doi: 10.1109/Icodse.2015.7436992.

[5] A. Al-Hassan And H. Al-Dossari, “Detection Of Hate Speech In Arabic Tweets Using Deep Learning,” Multimed. Syst., Pp. 1–

12, Jan. 2021, Doi: 10.1007/S00530-020-00742-W.

[6] A. Briliani, “Deteksi Ujaran Kebencian Dalam Bahasa Indonesia Pada Kolom Komentar Instagram Dengan Metode Klasifikasi K-Nearest Neighbor,” 2019.

[7] A. Fadilah, “Penerapan Algoritma K-Nearest Neighbor Untuk Mendeteksi Ujaran Kebencian Dan Bahasa Kasar Pada Twitter Bahasa Indonesia.” Universitas Islam Negeri Sultan Syarif Kasim Riau, 2021.

[8] U. S. A. Rahman, Y. Wibisono, And E. P. Nugroho, “Implementasi Multinomial Naive Bayes Untuk Klasifikasi Ujaran Kebencian Pada Dataset Kicauan (Twitter) Bahasa Indonesia,” J. Apl. Dan Teor. Ilmu Komput., Vol. 3, No. 2.

[9] J. Salminen, M. Hopf, S. A. Chowdhury, S. Jung, H. Almerekhi, And B. J. Jansen, “Developing An Online Hate Classifier For Multiple Social Media Platforms,” Human-Centric Comput. Inf. Sci., Vol. 10, No. 1, P. 1, Dec. 2020, Doi: 10.1186/S13673-019- 0205-6.

[10] F. E. Ayo, O. Folorunso, F. T. Ibharalu, I. A. Osinuga, And A. Abayomi-Alli, “A Probabilistic Clustering Model For Hate Speech Classification In Twitter,” Expert Syst. Appl., Vol. 173, P. 114762, Jul. 2021, Doi: 10.1016/J.Eswa.2021.114762.

[11] R. Martins, M. Gomes, J. J. Almeida, P. Novais, And P. Henriques, “Hate Speech Classification In Social Media Using Emotional Analysis,” In 2018 7th Brazilian Conference On Intelligent Systems (Bracis), Oct. 2018, Pp. 61–66. Doi:

10.1109/Bracis.2018.00019.

[12] G. Rizos, K. Hemker, And B. Schuller, “Augment To Prevent: Short-Text Data Augmentation In Deep Learning For Hate-Speech Classification,” In Proceedings Of The 28th Acm International Conference On Information And Knowledge Management, Nov.

2019, Pp. 991–1000. Doi: 10.1145/3357384.3358040.

[13] H. Sahi, Y. Kilic, And R. B. Saglam, “Automated Detection Of Hate Speech Towards Woman On Twitter,” In 2018 3rd International Conference On Computer Science And Engineering (Ubmk), Sep. 2018, Pp. 533–536. Doi:

10.1109/Ubmk.2018.8566304.

[14] B. Y. Pratama And R. Sarno, “Personality Classification Based On Twitter Text Using Naive Bayes, Knn And Svm,” In 2015 International Conference On Data And Software Engineering (Icodse), Nov. 2015, Pp. 170–174. Doi:

10.1109/Icodse.2015.7436992.

[15] J. Kaur And A. Kalra, “Hybrid Artificial Neural Network For Data Classification Problem,” In 2017 4th International Conference On Signal Processing, Computing And Control (Ispcc), Sep. 2017, Pp. 66–71. Doi: 10.1109/Ispcc.2017.8269651.

[16] I. Kamalludin And B. N. Arief, “Kebijakan Formulasi Hukum Pidana Tentang Penanggulangan Tindak Pidana Penyebaran Ujaran Kebencian (Hate Speech) Di Dunia Maya,” Law Reform, Vol. 15, No. 1, P. 113, May 2019, Doi:

10.14710/Lr.V15i1.23358.

[17] T. Ridwansyah, “Implementasi Text Mining Terhadap Analisis Sentimen Masyarakat Dunia Di Twitter Terhadap Kota Medan Menggunakan K-Fold Cross Validation Dan Naïve Bayes Classifier,” Klik Kaji. Ilm. Inform. Dan Komput., Vol. 2, No. 5, Pp.

178–185, Apr. 2022, Doi: 10.30865/Klik.V2i5.362.

[18] Fatri Nurul Inayah, Sri Suryani Prasetiyowati, And Yuliant Sibaroni, “Classification Of Dengue Hemorrhagic Fever (Dhf) Spread In Bandung Using Hybrid Naïve Bayes, K-Nearest Neighbor, And Artificial Neural Network Methods,” Int. J. Inf. Commun.

Technol., Vol. 7, No. 1, Pp. 10–20, Jun. 2021, Doi: 10.21108/Ijoict.V7i1.562.