Tampilan Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications

(1)

E-ISSN: 2623-064X | P-ISSN: 2580-8737

Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications

Gregorius Airlangga^1

1 Information System Study Program, Atma Jaya Catholic University of Indonesia, Indonesia

Article Information ABSTRACK

Article History Received : January 18, 2024

Revised : January 20, 2024

Accepted : January 22, 2024

This research aims to compare the effectiveness of two Natural Language Processing (NLP) techniques of word embeddings and TF-IDF vectorization in identifying hate speech in online comments. Hate speech is an increasingly relevant issue in the digital era, where online interactions can easily be influenced by harmful and damaging content. Therefore, it is important to develop automated tools that can quickly and accurately identify and address hate speech. By using a balanced dataset, each model is carefully evaluated for its ability to classify comments as 'harmful' or 'not harmful.' The evaluation metrics employed include precision, recall, F1 score, and overall accuracy. The model using SpaCy word embeddings achieved an accuracy of 65%, with equivalent precision and recall for both classes. Meanwhile, the TF-IDF Sklearn vectorization model demonstrated superior performance with an overall accuracy of 75% and better capabilities in identifying harmful comments, as evidenced by a recall rate of 77%. This indicates that the TF-IDF model is more proficient in recognizing nuanced expressions of hate speech.

Keywords: ABSTRAK

NLP; Hatefull Speech Detection; Machine Learning;

Word Embedding; TFIDF

Penelitian ini bertujuan untuk membandingkan efektivitas dua teknik Pengolahan Bahasa Alami (NLP) yaitu word embeddings dan vektorisasi TF-IDF dalam mengidentifikasi ujaran kebencian dalam komentar online. Ujaran kebencian merupakan isu yang semakin relevan dalam era digital, di mana interaksi online dapat dengan mudah dipengaruhi oleh konten berbahaya dan merugikan. Oleh karena itu, penting untuk mengembangkan alat otomatis yang dapat mengidentifikasi dan mengatasi ujaran kebencian dengan cepat dan akurat. Dengan menggunakan dataset yang seimbang, setiap model dievaluasi secara teliti atas kemampuannya untuk mengklasifikasikan komentar sebagai 'berbahaya' atau 'tidak berbahaya'. Metrik evaluasi yang digunakan adalah presisi, recall, skor F1, dan akurasi. Model yang menggunakan word embeddings SpaCy mencapai akurasi 65%, dengan presisi dan recall yang setara untuk kedua kelas. Sementara itu, model vektorisasi TF-IDF Sklearn menunjukkan kinerja yang lebih unggul dengan akurasi sebesar 75% dan kemampuan yang lebih baik dalam mengidentifikasi komentar berbahaya, dibuktikan dengan tingkat recall sebesar 77%. Hal ini menunjukkan bahwa model TF- IDF lebih cakap dalam mengenali ekspresi ujaran kebencian.

Kata Kunci:

NLP; Deteksi Kebencian;

Machine Learning; Word Embedding; TFIDF

Corresponding Author : Gregorius Airlangga

Information System Study Program, Atma Jaya Catholic University of Indonesia, Indonesia Jakarta, Indonesia

E-mail: [email protected]

(2)

INTRODUCTION

In the evolving landscape of digital communication, the proliferation of social media platforms has led to an exponential increase in user-generated content (Çinar, 2020; Eusebius, 2020; Heath, 2020). This content, while fostering global connectivity and freedom of expression, has also become a breeding ground for harmful online behaviours, notably hate speech(Anansaringkarn & Neo, 2021; De Gregorio, 2020; Tworek, 2021). Hate speech, defined as any communication that disparately targets individuals or groups based on characteristics such as race, religion, ethnicity, gender, or sexual orientation, poses significant social, psychological, and cultural threats. Its detection and mitigation are paramount in maintaining the integrity of online discourse and protecting users from cyberbullying, discrimination, and incitement to violence (Iosifidis & Nicoli, 2020; Lukings & Habibi Lashkari, 2022; Mann, 2020). The urgency to address hate speech in digital platforms is underscored by its growing prevalence and the profound impact it has on victims (Markov & DJordjević, 2024; Tworek, 2021; Yin & Zubiaga, 2021). Studies have linked exposure to hate speech with increased stress, anxiety, and fear, which can lead to real- world consequences such as social isolation and exacerbation of community tensions. The need for effective detection mechanisms is further amplified by the sheer volume of content generated daily, making manual monitoring unfeasible (Chen et al., 2022; Homayounfar & Andrew, 2020;

Nguyen et al., 2023).

Extensive research has been conducted in the realm of automated hate speech detection.

Early studies primarily focused on keyword-based approaches, which, while straightforward, often resulted in high false positive rates due to the inability to understand context and linguistic nuances (Abardazzou, 2023; Miric, Jia, & Huang, 2023; Zhu et al., 2023). Recent advancements have shifted towards machine learning and natural language processing techniques, offering more sophisticated and context-aware solutions. A notable direction in recent literature is the use of supervised learning algorithms for text classification. Researchers have explored various models, including Support Vector Machines, Random Forests, and Neural Networks, demonstrating promising results in differentiating hate speech from benign content (Nodehi et al., n.d.; Shawkat, 2023; Vidgen & Yasseri, 2020). However, challenges persist, particularly in addressing the subtleties of language, such as sarcasm and metaphor, which can lead to misclassification. Another emerging area is the application of deep learning techniques, such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), which have shown significant potential in capturing semantic and syntactic relationships in text (Duwairi, Hayajneh, &

Quwaider, 2021; Machova et al., 2021; McDermid, Jia, Porter, & Habli, 2021). These models, however, require large datasets and substantial computational resources, limiting their accessibility for some researchers.

The current state of the art in hate speech detection is characterized by a move towards more nuanced and context-sensitive approaches. This includes the integration of socio-linguistic features, user profiling, and cross-platform analysis to understand the multifaceted nature of hate speech. Additionally, there is an increasing focus on developing models that are not only accurate but also explainable and transparent, addressing ethical considerations in AI deployment (Borrego- Diaz & Galán-Páez, 2022; Hassija et al., 2023; Xi et al., 2023). Despite these advancements, the field faces ongoing challenges. One such challenge is the dynamic nature of language and hate speech, which continuously evolves, necessitating constant model updates (Kiritchenko, Nejadgholi, & Fraser, 2021; Liu et al., 2023). Another issue is the balancing act between detecting hate speech and preserving freedom of speech, a particularly delicate issue given the subjective nature of what constitutes offensive content.

This research aims to contribute to the field by developing a comprehensive hate speech detection model that incorporates both traditional machine learning and advanced natural language processing techniques. Our approach begins with the utilization of a balanced dataset of online comments, categorized into hateful and non-hateful classes, to train and test our model. By employing a word embeddings (Rodriguez & Spirling, 2022) and TF-IDF (Term Frequency- Inverse Document Frequency) techniques (Kalra, Kashyap, & Kaur, 2022), we seek to capture the

(3)

nuanced linguistic features that distinguish hate speech. The integration of the word cloud visualization provides a novel way to understand and present the most frequent vocabulary in both categories of comments, offering insights into the linguistic patterns that characterize hate speech.

Furthermore, the use of the Naive Bayes classifier, known for its effectiveness in text classification tasks, allows for the creation of a model that is not only accurate but also computationally efficient.

Our methodology includes a rigorous comparison between the word embeddings approach, using SpaCy's large language model, and the TF-IDF technique, providing a comprehensive analysis of their respective efficacies in hate speech detection. This comparative study is crucial, as it offers valuable insights into the strengths and limitations of each method, guiding future research directions in the field. In addition to the methodological contribution, this research addresses the critical need for effective hate speech detection mechanisms in the digital era.

By developing a reliable and efficient model, we aim to aid in the automation of content moderation on social media platforms, enhancing their ability to promptly identify and mitigate hate speech. This, in turn, contributes to creating a safer online environment, reducing the exposure of users to harmful content. The remaining structure of the research paper unfolds as follows: In the Research Methods section, we delve into the source and nature of the dataset used, meticulously detailing the data collection process. This is followed by a thorough explanation of the preprocessing techniques employed to cleanse and ready the data for in-depth analysis. The development of the model is then described, focusing on the construction of the Naive Bayes classifier and the intricate implementation of word embeddings and TF-IDF techniques.

Additionally, the validation techniques employed to assess the accuracy and reliability of these models are outlined.

Moving to the Results and Discussion section, we present our findings from the initial data exploration and visualization, highlighting key observations. This includes a detailed report on the performance of the classifiers, focusing on metrics such as accuracy, precision, recall, and F1-score.

A comparative analysis follows, scrutinizing the differences in performance between the models based on word embeddings and TF-IDF techniques. In addition, the paper delves into interpreting these results, analysing their implications within the broader context of hate speech detection. This segment also acknowledges the limitations and potential biases inherent in the study, maintaining a critical perspective. Finally, the Conclusion succinctly restates the key findings of the research, underscoring the principal outcomes. It reflects on the broader implications of these findings for future research and the evolving practice of online content moderation. Concluding with recommendations, the paper proposes directions for future research and potential improvements in the development of more sophisticated models for hate speech detection, aiming to contribute meaningfully to this critical field.

RESEARCH METHODS

The research method for this study on hate speech detection and analysis is designed to be comprehensive, systematic, and replicable, ensuring both rigor and reliability in its findings. This section details the various components of the research method, including data collection, data preprocessing, model development, and validation techniques.

Data Collection

This research is anchored in the meticulous assembly of a comprehensive dataset, designated as the ‘HateSpeechDatasetBalanced.’ The core objective of this dataset is to enable an in-depth examination of hate speech within the realm of online communication. Recognizing the complexities and varying manifestations of hate speech, this dataset is methodically compiled from a wide array of online platforms. This strategic selection of sources is instrumental in capturing a broad and authentic representation of the digital discourse landscape. To ensure the dataset's relevance and effectiveness, it is composed of comments meticulously categorized as either

‘hateful’ or ‘non-hateful.’ This binary classification serves as a foundational element for subsequent analyses. What sets this dataset apart is its balanced composition. Care has been taken to evenly distribute the examples of hate speech and non-hate speech. This balance is crucial in mitigating

(4)

potential biases that could skew the outcomes of model training, thereby enhancing the reliability and validity of the analysis.

One of the standout features of the ‘HateSpeechDatasetBalanced’ is its all-encompassing nature. It doesn't merely focus on generic instances of hate speech; instead, it delves into various forms and expressions of hate speech, encompassing multiple dimensions such as race, religion, gender, and other pertinent factors. This diversity in the dataset is not just about the range of topics covered but also about the depth and nuances within each category. By incorporating a spectrum of hate speech examples, from subtle insinuations to overt expressions, the dataset presents a rich resource for understanding the multifaceted nature of hate speech in the digital era. Furthermore, the dataset’s extensive coverage and detailed categorization make it a valuable tool for developing and refining algorithms aimed at identifying and analyzing hate speech. The granularity of the dataset facilitates a more nuanced understanding of the patterns, triggers, and contexts that characterize online hate speech. Such insights are vital for stakeholders, including social media platforms, policymakers, and researchers, in their efforts to combat the pervasive issue of hate speech on digital platforms.

Data Preprocessing

In the realm of data analysis, especially when dealing with complex datasets like those involving online communications, data preprocessing emerges as a pivotal phase. This process is instrumental in transforming the raw, often cluttered dataset into a refined, analysis-ready format.

The primary objective of data preprocessing is to enhance the quality and interpretability of the data, thereby ensuring more accurate and insightful analysis outcomes. The initial phase of data preprocessing is data cleaning. This stage is akin to sieving through the data to remove impurities and inconsistencies. It involves several critical tasks such as the elimination of irrelevant data points that do not contribute to the analysis objectives. Additionally, this step addresses any errors in the dataset, such as typographical mistakes or inaccuracies in the data entries. Handling missing values is another cornerstone of this phase. Missing data can distort analysis results, and thus, it’s crucial to either fill these gaps with appropriate estimations or to eliminate the incomplete records, depending on the context and the potential impact on the dataset’s integrity.

Following data cleaning, the preprocessing journey leads to text normalization. This step is essential in homogenizing the dataset, particularly crucial for text data derived from diverse online platforms. Normalization includes converting all text to lower case, which is vital in maintaining consistency and avoiding discrepancies caused by case sensitivity. Punctuation removal is another aspect of normalization, as these symbols often do not contribute to the semantic analysis and can be distractions in processing. Furthermore, the standardization of expressions, such as transforming abbreviations to their full forms or unifying variant spellings, is undertaken to ensure clarity and uniformity across the dataset. The subsequent stage is tokenization, where the text is segmented into its fundamental units - words or tokens. This decomposition is not merely a splitting process but a strategic division that facilitates deeper linguistic and semantic analysis of the text. Tokenization enables the identification of patterns and frequencies of word usage, which are critical in understanding the nuances of the dataset. An additional, yet crucial step in the preprocessing workflow is the removal of stop words. Stop words are commonly used words in any language (like ‘the’, ‘is’, ‘at’) that, while essential for sentence construction, offer minimal value in the context of text analysis. Their removal is pivotal as it allows the analytical focus to shift to the more meaningful and impactful words within the dataset, thereby enhancing the efficacy of subsequent analyses like sentiment analysis or topic modeling.

Model Development

At the heart of this research lies the development of a sophisticated machine learning model, meticulously engineered to discern between hateful and non-hateful comments in online discourse. This model is the culmination of advanced computational techniques and linguistic analysis, aimed at addressing the nuanced and complex nature of hate speech. Two primary methodologies are employed to empower this model: word embeddings and TF-IDF (Term

(5)

Frequency-Inverse Document Frequency) vectorization, each contributing uniquely to the model’s classification capabilities.

Word Embeddings Approach

Utilizing the robust capabilities of SpaCy’s Natural Language Processing (NLP) library, the model employs word embeddings to transform words into vectors. This vector representation is not arbitrary; it encapsulates the rich contextual meanings and nuances associated with each word. By representing words as vectors, the model gains a profound understanding of the semantic relationships between words. This comprehension is pivotal, especially in the realm of hate speech, where nuances and context dictate the difference between offensive and benign expressions. The word embeddings approach enables the model to navigate through these subtleties, enhancing its ability to detect and interpret various forms of hate speech that may not be overt but are harmful nonetheless.

TF-IDF Vectorization

In conjunction with word embeddings, the model leverages the TF-IDF vectorization technique. This method is instrumental in quantifying the textual data, transforming the words into numerical values based on their frequency and significance within the dataset. The uniqueness of TF-IDF lies in its dual focus: it considers not just the frequency of words in a single comment but also their relative rarity across the entire dataset. As presented in (1)-(3), this approach is particularly effective in distinguishing the linguistic patterns prevalent in hateful comments as opposed to non-hateful ones. It highlights words that are significantly more common in either category, providing a clear linguistic demarcation that aids the model in classification.

TF-IDF(𝑡, 𝑑, 𝐷) = TF(𝑡, 𝑑) × IDF(𝑡, 𝐷) (1)

TF(𝑡, 𝑑) =Number of times term 𝑡 appears in document 𝑑

Total number of terms in document 𝑑 (2)

IDF(𝑡, 𝐷) = log (Total number of documents in set 𝐷

Number of documents with term 𝑡) (3)

Integration with Naive Bayes Classifier

The choice of the Naive Bayes classifier further strengthens the model’s classification prowess. Renowned for its efficiency and accuracy in text classification tasks, the Naive Bayes classifier as presented in (4) is an excellent fit for this application. It operates on the principle of probabilistic inference, estimating the likelihood of a comment being hateful or non-hateful based on the linguistic features identified by the word embeddings and TF-IDF vectorization.

𝑃(𝑐|𝒙) =^𝑃(𝒙^|𝑐) 𝑃(𝑐)

𝑃(𝒙) (4)

Training and Learning from the Processed Dataset

The model’s training is conducted on the meticulously preprocessed dataset, which has undergone extensive cleaning, normalization, and tokenization. This training phase is crucial as it is where the model learns to differentiate between hateful and non-hateful comments. By analyzing and understanding the linguistic patterns, word usage, and contextual nuances present in the dataset, the model develops the ability to accurately classify new, unseen comments.

Validation Techniques

In order to validate the effectiveness of the developed models, a series of techniques are employed. Firstly, we make a splitting the data process, the dataset is divided into training and testing sets, with 80% of the data used for training and the remaining 20% for testing. This separation ensures that the model’s performance is evaluated on unseen data, providing a more accurate measure of its predictive capabilities. Secondly, we make a special focus for performance metrics, in this process, the models are evaluated using metrics such as accuracy, precision, recall, and F1-score. Accuracy as presented in (5) measures the overall correctness of the model, while precision and recall provide insights into its ability to correctly identify hateful comments without misclassifying non-hateful comments. The F1-score as presented in (8) offers a balance between precision (6) and recall (7), providing a comprehensive view of model performance. Lastly, we

(6)

provide a confusion Matrix, this toolbox is utilized to visualize the performance of the classifier, showing the true positives, true negatives, false positives, and false negatives. This matrix also helps in understanding the model's strengths and weaknesses in classifying different classes.

Accuracy =True Positives (TP)+True Negatives (TN)

Total Observations (5)

Precision = True Positives (TP)

True Positives (TP)+False Positives (FP) (6)

Recall = True Positives (TP)

True Positives (TP)+False Negatives (FN) (7)

F1-Score = 2 ×Precision×Recall

Precision+Recall (8)

RESULTS AND DISCUSSION

As presented in the figure 1, The deployment of SpaCy’s word embeddings in the construction of a Naive Bayes classifier has yielded interesting insights into the model’s ability to decipher and classify textual data into hateful and non-hateful comments. The word embedding approach aims to capture the semantic context of words by mapping them into a high-dimensional space, where the distance and direction between vectors are intended to represent the relationship between words. In this study, the precision for classifying non-hateful comments (class 0) stood at 66%, indicating that two-thirds of the comments predicted as non-hateful were indeed non-hateful.

Conversely, for the hateful comments (class 1), the precision was marginally lower at 63%, implying that the model was slightly less adept at correctly identifying hateful comments when it predicted them.

The recall metric, which gauges the model's ability to identify all relevant instances within a dataset, was uniform across classes at 65%. This suggests that the model was equally proficient—

or deficient—in identifying both non-hateful and hateful comments within the data. It indicates that of all the actual non-hateful and hateful comments present, the model successfully retrieved 65% of each. When we look at the F1-Score, which combines precision and recall into a single metric by calculating their harmonic mean, we find a consistent value of 65% for both classes. This conveys that there is a balance between the model’s precision and recall, a desirable trait in classification tasks, especially when both false positives and false negatives carry significant implications. The model’s overall accuracy was 65%, a metric that encapsulates its general predictive capabilities across both classes of comments. While this figure provides a quick snapshot of model performance, it does not capture the nuances of class-specific performance, which is crucial in the context of hate speech detection.

Figure 1. Naïve Bayes + Word Embedding

(7)

Figure 2. Naïve Bayes + TFIDF

As presented in the figure 2, transitioning to the model developed using TF-IDF vectorization, there is a noticeable enhancement in its classification metrics. TF-IDF, an acronym for Term Frequency-Inverse Document Frequency, is a numerical statistic intended to reflect how important a word is to a document within a collection or corpus. The underlying principle behind TF-IDF is to downplay the significance of words that appear frequently across documents, presuming that they are less informative than words that appear less frequently. The precision for non-hateful comments in this model increased to 77%, suggesting that when the model predicts a comment as non-hateful, it is correct approximately three-quarters of the time. For hateful comments, precision stood at 73%, indicating that the model is relatively consistent in its predictive precision across both classes, albeit slightly more precise with non-hateful comments.

In terms of recall, the model registered 73% for non-hateful comments and an improved 77% for hateful comments. This increment in recall for hateful comments is particularly significant, as it reflects the model’s enhanced sensitivity to the detection of hate speech, which is the primary focus of this research. Detecting as many instances of hate speech as possible is vital to prevent the potential harm that such speech can propagate. The F1-scores for this model were 75% for both classes, showing that the model has a balanced performance between precision and recall and indicating that neither metric is disproportionately high or low. This balance is especially pertinent in practical applications where both false positives and false negatives have considerable consequences. The overall accuracy of the model was 75%. This marks a 10% improvement over the model using SpaCy's word embeddings, signifying that the model is correct in its predictions three out of four times. This is a substantial improvement and suggests that the TF-IDF model is more adept at generalizing its predictions to new data. A confusion matrix for each model offers a visual representation of their classification accuracy, providing a breakdown of true positives, true negatives, false positives, and false negatives. From the matrices, it was observed that the TF-IDF model had a higher number of true positives and true negatives compared to the word embedding model, suggesting that the former is more competent in correctly classifying both hateful and non- hateful comments. For the detection of hateful comments, the high recall rate of 77% in the TF- IDF model is particularly noteworthy. This rate indicates that the TF-IDF model is less likely to overlook hateful content, a vital attribute for applications where the cost of missing such content is high.

(8)

CONCLUSION

The exploration of machine learning models to filter and classify online commentary as either 'hateful' or 'non-hateful' represents a significant endeavor in the quest to mitigate the proliferation of hate speech on digital platforms. This study has rigorously tested and compared two distinct models—one employing SpaCy's word embeddings and the other utilizing Sklearn's TF-IDF vectorization—to gauge their efficacy in this vital task. The results of the comprehensive evaluation, encompassing a suite of metrics including precision, recall, F1-score, and overall accuracy, have yielded insightful findings that contribute to the broader dialogue on automated content moderation. The model based on SpaCy's word embeddings showcased a commendable level of performance, with an overall accuracy of 65%. It displayed an equitable balance between precision and recall across both classes, indicating a model that does not disproportionately misclassify one category over the other. While the symmetrical F1-score of 65% for both classes highlights a satisfactory balance, it also suggests room for improvement, especially in a domain where the stakes of misclassification are high.

In contrast, the model that incorporated Sklearn's TF-IDF vectorization emerged as a superior classifier in this study, with a notable overall accuracy of 75%. The increase in precision and recall for both classes—particularly the recall for hateful comments at 77%—underscores the model's heightened sensitivity to the nuances of hate speech. The TF-IDF model has proven adept at discerning the contextual importance of terms, which is pivotal in distinguishing between hateful and non-hateful content accurately. Moreover, the confusion matrices provided a stark visual contrast between the two models' predictive capabilities, with the TF-IDF model demonstrating a higher number of true positives and true negatives—a testament to its robustness. Given the primary objective of identifying and curbing hateful content, the higher recall for hateful comments is particularly significant, as it reflects the model's capacity to capture a greater proportion of such detrimental content.

It is imperative to recognize that the model's effectiveness is inextricably linked to the dynamic and evolving nature of language and online discourse. The TF-IDF model's adeptness at identifying hate speech today does not guarantee its future effectiveness, necessitating continual retraining and updating of the model to adapt to new linguistic patterns and emergent slang. This research also underscores the criticality of selecting appropriate vectorization and machine learning techniques tailored to the unique requirements of content moderation systems. While the TF-IDF model has demonstrated promising results, a one-size-fits-all approach is not feasible due to the varied nature of online platforms and the context-dependent interpretation of hate speech.

Future research should, therefore, explore the integration of additional linguistic features and consider the ethical implications of automated moderation, such as the potential for over- censorship or bias. In conclusion, the findings advocate for a nuanced application of NLP techniques in the automated detection of online hate speech. The Sklearn TF-IDF vectorization model, with its superior recall and balanced precision, has proven to be more effective in this context. However, the evolution of online communication will inevitably require the development of even more sophisticated models and a commitment to iterative refinement. The ultimate goal remains steadfast: to create online spaces that are both safe and respectful, free from the scourge of hate speech.

REFERENCES

Abardazzou, N. (2023). Unmasking implicit abuse: a data-centric approach to detect online abusive language.

(9)

Anansaringkarn, P., & Neo, R. (2021). How can state regulations over the online sphere continue to respect the freedom of expression? A case study of contemporary ‘fake news’ regulations in Thailand. Information \& Communications Technology Law, 30(3), 283–303.

Borrego-D’iaz, J., & Galán-Páez, J. (2022). Explainable Artificial Intelligence in Data Science:

From Foundational Issues Towards Socio-technical Considerations. Minds and Machines, 32(3), 485–531.

Çinar, N. (2020). The Rise Of Consumer Generated Content And Its Transformative Effect On Advertising. In Reimagining Communication: Mediation (pp. 193–209). Routledge.

Chen, J.-L., Dai, Y.-N., Grimaldi, N. S., Lin, J.-J., Hu, B.-Y., Wu, Y.-F., & Gao, S. (2022).

Plantar Pressure-Based Insole Gait Monitoring Techniques for Diseases Monitoring and Analysis: A Review. Advanced Materials Technologies, 7(1), 2100566.

De Gregorio, G. (2020). Democratising online content moderation: A constitutional framework.

Computer Law \& Security Review, 36, 105374.

Duwairi, R., Hayajneh, A., & Quwaider, M. (2021). A deep learning framework for automatic detection of hate speech embedded in Arabic tweets. Arabian Journal for Science and Engineering, 46, 4001–4014.

Eusebius, S. (2020). Customer-based brand equity in a digital age: An analysis of brand associations in user-generated social media content. University of Otago.

Hassija, V., Chamola, V., Mahapatra, A., Singal, A., Goel, D., Huang, K., … Hussain, A. (2023).

Interpreting black-box models: a review on explainable artificial intelligence. Cognitive Computation, 1–30.

Heath, R. (2020). Branding in a Digitally Empowered World: The Role of User-Generated Content. Auckland University of Technology.

Homayounfar, S. Z., & Andrew, T. L. (2020). Wearable sensors for monitoring human motion: a review on mechanisms, materials, and challenges. SLAS TECHNOLOGY: Translating Life Sciences Innovation, 25(1), 9–24.

Iosifidis, P., & Nicoli, N. (2020). Digital democracy, social media and disinformation. Routledge.

Kalra, V., Kashyap, I., & Kaur, H. (2022). Improving document classification using domain- specific vocabulary: hybridization of deep learning approach with TFIDF. International Journal of Information Technology, 14(5), 2451–2457.

Kiritchenko, S., Nejadgholi, I., & Fraser, K. C. (2021). Confronting abusive language online: A survey from the ethical and human rights perspective. Journal of Artificial Intelligence Research, 71, 431–478.

Liu, Y., Han, T., Ma, S., Zhang, J., Yang, Y., Tian, J., … others. (2023). Summary of chatgpt- related research and perspective towards the future of large language models. Meta- Radiology, 100017.

Lukings, M., & Habibi Lashkari, A. (2022). Technical Complexities. In Understanding Cybersecurity Law in Data Sovereignty and Digital Governance: An Overview from a Legal Perspective (pp. 117–180). Springer.

Machova, K., Srba, I., Sarnovsk`y, M., Paralič, J., Kresnakova, V. M., Hrckova, A., … others.

(2021). Addressing False Information and Abusive Language in Digital Space Using Intelligent Approaches. Towards Digital Intelligence Society: A Knowledge-Based Approach, 3–32.

Mann, B. L. (2020). Applying Internet Laws and Regulations to Educational Technology. IGI Global.

(10)

Markov, Č., & DJordjević, A. (2024). Becoming a Target: Journalists’ Perspectives on Anti-Press Discourse and Experiences with Hate Speech. Journalism Practice, 18(2), 283–300.

McDermid, J. A., Jia, Y., Porter, Z., & Habli, I. (2021). Artificial intelligence explainability: the technical and ethical dimensions. Philosophical Transactions of the Royal Society A, 379(2207), 20200363.

Miric, M., Jia, N., & Huang, K. G. (2023). Using supervised machine learning for large-scale classification in management research: The case for identifying artificial intelligence patents. Strategic Management Journal, 44(2), 491–519.

Nguyen, T. T., Huynh, T. T., Yin, H., Weidlich, M., Nguyen, T. T., Mai, T. S., & Nguyen, Q. V.

H. (2023). Detecting rumours with latency guarantees using massive streaming data. The VLDB Journal, 32(2), 369–387.

Nodehi, I., Hassannataj Joloudari, J., Sharifrazi, D., Nematollahi, M., Marefat, A., Çifçi, M. A.,

& Hussain, S. (n.d.). Ocsvm-Cnn: Malicious Script Detection Using One-Class Support Vector Machine Combined with Convolutional Neural Network. Danial and Nematollahi, Mohammad and Marefat, Abdolreza and Çifçi, Mehmet Akif and Hussain, Sadiq, Ocsvm- Cnn: Malicious Script Detection Using One-Class Support Vector Machine Combined with Convolutional Neural Network.

Rodriguez, P. L., & Spirling, A. (2022). Word embeddings: What works, what doesn’t, and how to tell the difference for applied research. The Journal of Politics, 84(1), 101–115.

Shawkat, N. (2023). Evaluation of Different Machine Learning, Deep Learning and Text Processing Techniques for Hate Speech Detection.

Tworek, H. J. S. (2021). Fighting hate with speech Law: Media and German visions of democracy.

The Journal of Holocaust Research, 35(2), 106–122.

Vidgen, B., & Yasseri, T. (2020). Detecting weak and strong Islamophobic hate speech on social media. Journal of Information Technology \& Politics, 17(1), 66–78.

Xi, Z., Chen, W., Guo, X., He, W., Ding, Y., Hong, B., … others. (2023). The rise and potential of large language model based agents: A survey. ArXiv Preprint ArXiv:2309.07864.

Yin, W., & Zubiaga, A. (2021). Towards generalisable hate speech detection: a review on obstacles and solutions. PeerJ Computer Science, 7, e598.

Zhu, Y., Yuan, H., Wang, S., Liu, J., Liu, W., Deng, C., … Wen, J.-R. (2023). Large language models for information retrieval: A survey. ArXiv Preprint ArXiv:2308.07107.