View of Sentiment Analysis of Sentence-Level using Dependency Embedding and Pre-trained BERT Model

(1)

Sentiment Analysis of Sentence-Level using Dependency Embedding and Pre-trained

BERT Model

Fariska Zakhralativa Ruskanda ^1,*, Stefanus Stanley Yoga Setiawan ¹, Nadya Aditama ¹, Masayu Leylia Khodra ¹

* Corespondence Author: e-mail: [email protected]

1 Sekolah Teknik Elektro dan Informatika, Institut Teknologi Bandung; Jl. Ganesha 10 Bandung, Indonesia; telp: (022) 2502260; e-mail:

[email protected], [email protected], [email protected], [email protected] Submitted : 14/02/2023

Revised : 28/02/2023 Accepted : 14/03/2023 Published : 31/03/2023

Abstract

Sentiment analysis is a valuable field of research in NLP with many applications. Dependency tree is one of the language features that can be utilized in this field.

Dependency embedding, as one of the semantic representations of a sentence, has shown to provide more significant results compared to other embeddings, which makes it a potential way to improve the performance of sentiment analysis tasks. This study aimed to investigate the effect of dependency embedding on sentence-level sentiment analysis through experimental research. The study replaced the Vocabulary Graph embedding in the VGCN-BERT sentiment classification system architecture with several dependency embedding representations, including word vector, context vector, average of word and context vectors, weighting on word and context vectors, and merging of word and context vectors. The experiments were conducted on two datasets, SST-2 and CoLA, with more than 19 thousand labeled sentiment sentences.

The results indicated that dependency embedding can enhance the performance of sentiment analysis at the sentence level.

Keywords: dependency embedding, sentiment analysis, sentence-level, pretrained model

1. Introduction

Sentiment analysis, also known as opinion mining, is a valuable technique used to analyze people's opinions, sentiments, evaluations, attitudes, and emotions towards a particular entity or attribute through text (Liu, 2020).

This field, which falls under Natural Language Processing (NLP), has garnered significant interest and has been applied successfully in various cases.

Sentiment analysis research is typically conducted at three levels of granularity:

document, sentence, and aspect level (Liu, 2020). While both document and

(2)

sentence level analyses predict whether a text is positive or negative, aspect- level analysis goes a step further by identifying the target of an opinion and providing a positive or negative evaluation of that specific opinion.

Sentiment analysis can be conducted at three different levels: document, sentence, and aspect. To conduct this analysis, supervised learning methods have been used, often involving the use of word embeddings as a word representation technique. One study that utilized this approach for sentence- level sentiment analysis is Lu et al (Lu et al., 2020), who proposed the VGCN- BERT architecture for text classification. This architecture combines local information from BERT and global information from language to improve performance. The study found that the VGCN-BERT architecture outperformed other methods in text classification. One interesting aspect of this architecture is that it combines two types of embeddings, word and graph embeddings, which interact with each other through self-attention mechanisms. As a result, other word embedding techniques could also be utilized in this architecture.

Various techniques have been utilized to generate word embeddings, such as the neural skip-gram embedding introduced by (Mikolov, Chen, et al., 2013) along with negative sampling proposed by (Mikolov, Sutskever, et al., 2013). However, this technique has a limitation in capturing the context of the target word due to the limited window size used for surrounding words. To address this issue (Levy & Goldberg, 2014) proposed using dependency embedding as an extension of negative sampling skip-gram that utilizes the relationship between words, which are extracted by a dependency parser, to capture the context of the word. Dependency embedding has been applied in various tasks, including text classification (Komninos & Manandhar, 2016) and sequence labeling task (Reimers & Gurevych, 2017). Research results indicate that using dependency embedding, which contains syntactic information, provides better performance than skip-gram word embedding. This finding is also supported by other studies, including (Komninos & Manandhar, 2016; Levy

& Goldberg, 2014)

In this study, we conducted an experiment to investigate the effectiveness of incorporating dependency embedding into sentence-level

(3)

sentiment analysis through text classification using a pre-trained BERT model.

The aim was to assess the impact of this embedding on the performance of sentiment analysis. Results showed that the use of dependency embedding improved the performance compared to other embeddings. The VGCN-BERT architecture was modified by replacing the Vocabulary Graph with dependency embedding. Our research contribution is the exploration of the effect of dependency embedding on sentence-level sentiment analysis using a pretrained BERT model. Notably, there has been a lack of research examining the effect of dependency embedding in sentiment analysis tasks, making this study an important addition to the field.

2. Research Method

Dependency embedding is constructed from dependency parse trees of sentences in the corpus. An example of a dependency parse tree is shown in Figure 1. A dependency tree contains words in sentence and relations between words (Marneffe & Manning, 2008)

Source: Marneffe & Manning (2008)

Figure 1. Dependency Parse tree

In the studies conducted by Levy and Goldberg (2014) and Komninos and Manandhar (2016), two types of vectors were generated from dependency embedding, namely word vectors and context vectors. Word vectors represent a word without considering its dependency type. On the other hand, context vectors represent a word along with its dependency relation, such as the word

"movie" with the nsubj (nominal subject) dependency relation. In this study, two types of embeddings were used: Levy embedding (Levy & Goldberg, 2014) and Komninos embedding (EXT) (Komninos & Manandhar, 2016).

(4)

In the experiment conducted, several methods will be used to build word vectors and context vectors, namely:

a. Word vector method only: This method uses the embedding results from dependency embedding by taking the word vector values for each word in a sentence. (Levy embedding)

b. Context vector method only: This method uses the embedding results from dependency embedding by using context vectors for each word and its dependent pair in a sentence. If a word has more than one dependency, then the context vector average will be taken.

c. Average method of both vectors: This method combines the previous two methods, by taking the average of the word vector and the average of the context vector for each word. The context vector value for words in a sentence with more than one dependency will be averaged first before being averaged with the word vector.

d. Weighting method for each vector: This method is similar to method number 3, but the word vector and context vector are given different weights. In method number 3, the average weight is the same (1:1), whereas in this method, different weights can be given, for example 3 to 2.

e. Concatenation method of word and context vectors: This method takes the average context vector combined with the word vector of a word. If a word has no dependency, so the context vector is not found, it will be concatenated with an array with a value of 0.

In this study, an experiment will be conducted using dependency embedding in sentiment analysis tasks at the sentence level, inspired by the VGCN-BERT architecture described in the previous section. Dependency extraction will be performed using the Stanford parser version 3.5 for Levy dependencies, and Stanza 1.4.0 parser using Universal Dependencies for Komninos (EXT) dependencies. It should be noted that there may be some differences in the dependencies extracted by these parsers. This is because the dependencies used by Levy follow the universal dependencies standard annotation, while the dependencies used by the Stanza parser follow the

(5)

Universal Dependencies annotation scheme. To address this, the labels generated by the Stanford parser will be converted according to the rules set by (Nivre & McDonald, 2008).

Source: Research Result (2023)

Figure 2. Proposed Method

To incorporate dependency embedding with BERT, the padding tokens in BERT are substituted with the output of the dependency embedding before inputting it to the transformer block. This process is carried out by replacing the right padding of BERT with the result of dependency embedding, as illustrated in Figure 2. The proposed method is based on the VGCN approach (Lu et al., 2020), which integrates the output of the Vocabulary Graph with BERT before feeding it to the transformer block.

3. Results and Analysis

The experiments in this study were conducted on two datasets. The first dataset used is the SST-2 (Stanford Sentiment Treebank) dataset with data

(6)

statistics as shown in Table 1 (Socher et al., 2013). The second dataset is CoLA (Corpus of Linguistic Acceptability), a binary classification dataset for single sentences (Warstadt et al., 2019). It contains a total of 9,594 instances, with 8,551 for training and 1,043 for testing. The label composition of this dataset is 6,744 positive labels and 2,850 negative labels.

Table 1. SST-2 Dataset Statistics Train Dev Test

Total Data 6920 872 1821

Total Label 0 3310 444 912

Total Label 1 3610 428 909

Source: Research Result (2023)

In this study, two types of pretrained BERT used were bert-base- uncased and bert-large-uncased. The optimizer used was ADAM with a learning rate of 1e-5. In the experiments with BERT large, several trials were conducted with various weight initializations due to inconsistencies found where the model could only guess the majority class. The results of the conducted experiments can be seen in Table 2.

Table 2.Experimental Result of BERT – Dependency Embedding

Dataset Model

Dev Test

F1

Weighted Accuracy F1

Weighted Accuracy

CoLA

BERT-Deps-Words 0.8295 0.8384 0.8058 0.8169 BERT-Deps-Contexts 0.5768 0.7002 0.5651 0.6913 BERT-Deps-Words-Contexts 0.5768 0.7002 0.5651 0.6913 BERT-BoW2-Words 0.7891 0.8056 0.7959 0.8111 BERT-BoW2-Contexts 0.5768 0.7002 0.5651 0.6913 BERT-BoW2-Words-Contexts 0.8154 0.8267 0.8183 0.8284 BERT-BoW5-Words 0.5768 0.7002 0.5651 0.6913 BERT-BoW5-Contexts 0.5768 0.7002 0.5651 0.6913 BERT-BoW5-Words-Contexts 0.8259 0.8314 0.8078 0.8150 BERT-Agg-Word-Context 0.8069 0.8173 0.8081 0.8178

(7)

Dataset Model

Dev Test

F1

Weighted Accuracy F1

Weighted Accuracy

SST-2 VGCN

BERT-Deps-Words 0.7984 0.7993 0.8068 0.8078 BERT-Deps-Contexts 0.9047 0.9048 0.8955 0.8957 BERT-Deps-Words-Contexts 0.7861 0.7867 0.8184 0.8188 BERT-BoW2-Words 0.9174 0.9174 0.9011 0.9012 BERT-BoW2-Contexts 0.9140 0.9140 0.9033 0.9033 BERT-BoW2-Words-Contexts 0.9014 0.9014 0.9028 0.9028 BERT-BoW5-Words 0.8040 0.8050 0.8086 0.8094 BERT-BoW5-Contexts 0.7991 0.7993 0.8133 0.8131 BERT-BoW5-Words-Contexts 0.9002 0.9002 0.9083 0.9083 BERT-Agg-Word-Contexts

(without relation conversion) 0.9140 0.9140 0.9017 0.9017 BERT-Agg-Word-Contexts 0.9082 0.9083 0.9055 0.9055 BERT-Fixed-Agg 0.9071 0.9071 0.9094 0.9094 BERT-Fixed-Agg-0.6Word-

0.4Deps 0.9106 0.9106 0.9099 0.9099

BERT-Fixed-Agg-0.7Word-

0.3Deps 0.9105 0.9106 0.9104 0.9105

0.2Deps 0.9105 0.9106 0.9094 0.9094

0.1Deps 0.9106 0.9106 0.9088 0.9088

BERT-Ext-Agg-0.5Word-0.5Deps 0.9117 0.9117 0.9055 0.9055 BERT-Ext-Agg-0.6Word-0.4Deps 0.9151 0.9151 0.9061 0.9061 Source: Research Result (2023)

Discussion

The results of experiments using the processed SST-2 dataset with VGCN method show that BERT base achieves a Weighted F1 score of 89.91 without using dependency embedding. When dependency embedding is used, the score increases by 2. In addition, by using EXT embedding and weighting

(8)

the word vector and context vector with a ratio of 0.6 and 0.4 respectively, a score of 91.51 is obtained. Meanwhile, using Levy embedding and the same weighting on word vector and context vector, a score of 91.06 is achieved. On the CoLA dataset, the best result is obtained using the BERT-Deps-Words model with an accuracy metric of 83.84.

From the results of the experiment, it can be concluded that the use of dependency embedding can improve the performance score of the BERT model. However, the improvement is not very significant. This is likely because BERT indirectly already considers the factor of dependency in its embedding.

4. Conclusion

In this article, an experimental study was conducted on the use of dependency embedding for sentence-level sentiment analysis. The best performance was achieved by using a dependency embedding representation that consisted of a 0.6:0.4 ratio of aggregated word and context vectors, with a weighted F1 score of 0.9151. However, combining dependency embedding with BERT did not show a significant performance improvement. This is because the elements accessed by dependency embedding are already included in the BERT model itself. Additionally, the study results indicate that the context representation in dependency embedding did not yield better results compared to word representation.

Acknowledgements

This work is supported by the PPMI - Young Researcher Grant 2022 from STEI ITB.

Author Contributions

Fariska proposed the topic; Fariska and Masayu conceived the proposed method; Stefanus and Nadya designed and performed the experiments;

Fariska, Masayu, and Nadya wrote paper.

(9)

Conflicts of Interest

The author declare no conflict of interest.

References

Komninos, A., & Manandhar, S. (2016). Dependency Based Embeddings for Sentence Classification Tasks. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:

Human Language Technologies, 1490–1500.

Levy, O., & Goldberg, Y. (2014). Dependency-based Word Embeddings.

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 302–308.

Liu, B. (2020). Sentiment Analysis: Mining Opinions, Sentiments, and Emotions.

In Advances in Information Retrieval: 42nd European Conference on IR Research ( ECIR) 2020. Cambridge University Press.

Lu, Z., Du, P., & Nie, J. Y. (2020). VGCN-BERT: Augmenting BERT with Graph Embedding for Text Classification. Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Proceeding, 369–382.

Marneffe, M.-C. de, & Manning, C. D. (2008). The Stanford Typed Dependencies Representation. Coling 2008: Proceedings of the Workshop on Cross-Framework and Cross-Domain Parser Evaluation, 1–8.

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781 Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed

Representations of Words and Phrases and Their Compositionality.

Advances in Neural Information Processing Systems, 26.

Nivre, J., & McDonald, R. (2008). Integrating Graph-based and Transition- based Dependency Parsers.

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A. Y., &

Potts, C. (2013). Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 1631–1642.

(10)

Warstadt, A., Singh, A., & Bowman, S. R. (2019). Neural network acceptability judgments. Transactions of the Association for Computational Linguistics, 7, 625–641.