Sentiment Analysis: Amazon Electronics Reviews

(1)

Sentiment Analysis: Amazon Electronics Reviews Using BERT and Textblob

Authors ElKafrawy, Passent; Mahgoub, Abdulrahman; Atef, Hesham;

Nasser, Abdulrahman; Yasser, Mohamed; Medhat, Walaa M.;

Darweesh, M. Saeed

Citation A. Mahgoub et al., "Sentiment Analysis: Amazon Electronics Reviews Using BERT and Textblob," 2022 20th International Conference on Language Engineering (ESOLEC), Cairo, Egypt, 2022, pp. 6-10,

DOI https://doi.org/10.1109/ESOLEC54569.2022.10009176

Publisher IEEE

Download date 21/06/2023 04:22:22

Link to Item http://hdl.handle.net/20.500.14131/704

(2)

Sentiment Analysis: Amazon Electronics Reviews

Abdulrahman Emad Mahgoub¹, Abdulrahman Nasser¹, Mohamed Yasser Anwar¹, Hesham Atef¹, M. Saeed Darweesh^2,3, and Passent M. El-Kafrawy^1,4

1School of Information Technology and Computer Science, Nile University, Giza 12588, Egypt

2School of Engineering and Applied Sciences, Nile University, Giza 12677, Egypt

3Wireless Intelligent Networks Center (WINC), Nile University, Giza 12677, Egypt

4Mathematics and Computer Science Department, Menoufia University, Shebin El-Kom 32511, Egypt

Abstract — The market needs a deeper and more comprehensive grasp of its insight, which is where the analytics world and methodologies such as “Sentiment Analysis” come in.

These methods can assist people especially “business owners” in gaining real-world insights into their businesses and determent wheatear customers are satisfied or not. This paper plans to do this by gathering real world Amazon customers’ reviews from Egypt and apply both Bidirectional Encoder Representations from Transformers “Bert” and “Text Blob” sentiment Analysis methods to determent the overall satisfaction of Egyptian customers in the electronics department and that to focus on a specific domain, including comparisons of the two models in Arabic and English. The results shows that we people in Amazon.eg are mostly satisfied with the percentage of (47%) and as for the performance, BERT outperformed Textblob which indicates that word embedding models (BERT) are more superior than rule-based models (Textblob) with a difference of (15% - 25%)

Keywords— Sentiment, Analysis, Bert, Text Blob, Amazon I. INTRODUCTION

The internet is regarded as one of the most important sources of consumer opinion, enabling the release of several websites. Customers can offer their ratings and thoughts about many things on these websites, including films, eateries, hotels, gadgets, and books. Amazon, which offers millions of user evaluations of various product categories, is an example of the increasing availability and popularity of opinion-rich resources. And it’s no different here in Egypt with more than 8.9M visits “only last month” which makes it a massive market plenty of opportunities [10]. But our question is how many of those customers are really satisfied with the services that they are getting? We all know the Amazon got a rating system of its own for each product, but it is not globally on their entire web site which we are aiming to do.

Sentiment analysis is a contextual mining of words which indicates the social media analysis of feedbacks or reviews regarding the brands or products, which helps the marketers to determine whether their product is going to attract a demand in the market or not. As a data mining technique, that uses NLP, computational linguistic and text analysis to identify and extract content of interest from a textual data’s body which can help measuring customer satisfaction in Amazon.eg

Also, this literature is focused on the comparison of both rule- based models (textblob-ar) and word embedding models (CAMeLBERT) in Arabic.

Objective of this paper (customer satisfaction model, comparison of the state-of-the-art methods)

The main objectives of this paper:

• Measuring customer satisfaction in Amazon.eg To achieve this, this study aims to scarp customers’

reviews from Amazon.eg and apply preprocessing like dealing with missing data and eliminate linkages, tags, numerical values, and stop words and more cleaning techniques. Then applying both “Bert” and “Text Blob”

and visualizing data and getting drawing conclusions.

• The comparison and evaluation of “TextBlob” and

“Bert” in both Arabic and English

For evaluating the performance of the two sentiment analysis models, the outputs of both models were compared with the actual review’s ratings. This process is very crucial to evaluate the used models.

This paper is divided into Section II presents related work, which includes earlier work that addressed the same issue as ours. Section III will present the background. Section IV which presents the methodology, consisting of dataset specifications, data collection and preprocessing, and Sentiment and Evaluation Also, the results are discussed in Section V, and finally, Section VI is for the conclusion.

II. RELATED WORK

In [2], the authors conducted research to perform the classification of customer reviews followed by finding sentiment of the reviews, to provide visualization and summarization for the results. Classification of reviews was done along with sentimental analysis, which provided accurate reviews to the user.

In [3], the authors conducted research to examine the effectiveness of different machine learning techniques for classification of online reviews using supervised learning methods, and also the extraction of product feature perception for deducing adjective polarity when the polarity is unknown.

Sentiment analysis was used to gather a lot of information, where this information where these training data were previously gathered. The results from the perception of product features subtask did not have sufficient test data, where the subtask was more complicated than the document level sentiment analysis, but more care should be given to give verifiable results. The results from the polarity deduction subtask were something of an afterthought compared with the other subtasks conducted in the research.

(3)

In [4], research to polarize feedbacks of customers over different products, which was done through supervised learning method is conducted on a large-scale amazon dataset to polarize it and get satisfactory accuracy. The sentiment analysis resulted in achieving accuracy over 90%. Different simulations were applied using cross validation, training- testing ratio, and different feature extraction process for comparing varying amount of data.

In [1], the authors conducted research to implement and test Amazon customer reviews where aspect terms are identified first for each review. The system performs preprocessing operations to extract meaningful information, so meaningful information could be extracted and classified either positive or negative. Identification of words changing polarity took place in presence of context and its effect on the overall rating of the product along with the aspect has been analyzed on the work.

III. BACKGROUND

Web Scrapping:

Web scraping is the process of using software tools and packages to extract data from a certain website or even a webpage. These scraping tools and packages are typically preferred for web scraping, as they are more effective and quicker than manual approaches. The main aim of web scraping is to collect particular data from various websites.

So, the extensive data collected is then transformed into an organized format for the users by the application and tools.

Sentiment Analysis:

Sentiment analysis is the process of analyzing product reviews on the internet to determine the overall opinions and expressions about a product, so the text of these opinions and expressions could be classified whether positive, negative, or neutral. It specifically focuses on evaluating the opinions and expressions on a topic of interest using machine learning techniques. Machine learning approach, which is automatic approach of sentiment analysis, is widely used towards sentiment classification than linguistic methods approach, which is rule-based approach of sentiment analysis.

However, these sentiment analysis approaches don’t perform with the same of efficiency of sentiment classification in the topic categorization. Since the nature of the opinionated text requires more understanding of the text, machine learning classifiers as naïve bayes, maximum entropy and support vector are used for sentiment classification to achieve high accuracies of categorization.

The feedbacks or reviews, which are user-generated content, are rich source for marketing specialists who are concerned with public moods personal attitudes of the customers towards what is offered by the marketer from brands and products. Due to big diversity and size of social media data, sentiment analysis is applied instead of collecting data manually through individuals or companies, as it is an automated and real time opinion extraction and mining.

Bidirectional Encoder Representations from Transformers Model [BERT Model]:

Bidirectional Representation for Transformers, or BERT, is a pre-trained language model that is designed to consider the context of a word from both left and right side simultaneously. It improves results at several NLP tasks, as sentiment analysis and question and answering systems.

As pre-trained language model, BERT provides context to words for representing them from unannotated training data.

So, it could extract more context features from a

sequence compared to left and right side simultaneously.

BERT is adaptable to perform different NLP tasks with state- of-the-art accuracy, similar to the transfer learning method in computer vision, which allows building accurate models in a time-saving way.

TextBlob Model:

TextBlob is a simple API offered by python library, which is more as python strings, to perform certain NLP tasks, as part- of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and other tasks.

TextBlob produces polarity and subjectivity as its output. The polarity score ranges from [-1] to [1], where [1] indicates the most positive terms (best, great, good, …etc.) and [-1]

identifies the most negative words (worst, terrible, bad,

…etc.). The subjective score ranges from [0] to [1], indicating how much it is based on individual opinion. If a sentence is highly subjective, or close to [1], it suggests that the material is more opinionated than factual.

As a lexicon-based sentiment analyzer, TextBlob contains certain predefined rules, and also word and weight dictionary, where it has some scores that helps in calculating the polarity of a sentence. So, the lexicon-based sentiment analyzers could be referred to as a rule-based sentiment analyzer.

IV. METHODOLOGY

A. Data collection

The proposed solution is to firstly collect the data/reviews. For any sentiment analysis to work we need data first, we used an already developed web scraper to fetch user reviews on given products from Amazon.eg from a .txt file that contains all review URLs.

The result is saved data in .csv format. The data has been stored in Dataframe format with attributes as date of the review title, URL of the review, rating on the review, username, and user review. The number of product reviews collected, product name, variant, and whether the user is verified or not

(4)

title content date variant ^images ^verified ^author ^rating product url ىلخ

كلاب أرقاو هنع رتكا لبق

هيرتشتام

ام دعب هتمدختسا نيعوبسا ةردق هيراطبلا تلزن

١ ٪ ...

16 Oct 2021

Color:

White

NaN Yes م ذاعم. 4.0 Anker USB C Charger 20W, 511 Charger ( Nano ),...

https://www.amazon.eg/- /en/Anker-Charger- Durab...

Figure [1]: A sample of the scraped data

B. Preprocessing of data

This phase is one of the most important phases in which cleaning of data and removal of stop words etc. happens to improve the effectiveness of results. by applying preprocessing like dealing with missing data and eliminate linkages, tags, numerical values, and stop words and more cleaning techniques.

And dropping any foreign language other than English and Arabic.

C. Sentiment and Evaluation

Then applying both “Bert” and “Text Blob” to determine if the reviews are negative or positive with their respective range. This was done by inserting the “content” column from the data [Figure 1] into the models. Then new columns are created with the results from each model that can be concidered as “prediected data”.

After identification of the sentiment analysis, the next part is the evaluation of the models. This was did by using the basic metrics evaluation [Figure 2] by using the “Rating” columns [Figure 1] for actual data and the newly added “Sentiment”

column (wheater is it from Textbolb or BERT) for predicted data.

As for the ranges used in the evaluation, it was opted that study use simpler ranges rather than the original ranges of “5”

[1, 2, 3, 4, 5] (Which represents 1 as very negative, 3 as neutral and 5 as very positive).

And use a range of “3” [negative, neutral, positive].

𝑀𝑒𝑡𝑟𝑖𝑐: 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =𝑡𝑝 + 𝑡𝑛 𝑁 , 𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑡𝑝

𝑡𝑝 + 𝑓𝑛 , 𝑃𝑒𝑟𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑡𝑝

𝑡𝑝 + 𝑓𝑝 , 𝐹1𝑆𝑐𝑜𝑟𝑒 = 2 ×𝑃𝑒𝑟𝑐𝑖𝑠𝑖𝑜𝑛

𝑃𝑒𝑟𝑐𝑖𝑠𝑖𝑜𝑛

Figure [2]: Basic evaluation metrics

V. RESULTS

The main aim of our analysis is to ensure fair results of sentiments, also we don’t want users to spend a lot of time reading through long textual descriptions in the reviews, and hence we summarize our result in the form of charts (Statistical Graphs). Data visualization is an important technology in the coming future, as data is increasing in size and complexity. Hence our system summarizes the results like pie charts that help users to view and directly understand the sentiment extracted.

A. “Textblob” Sentiment analysis:

Figure [3]: Sentiment analysis results for Textblob

“English”

Figure [4]: Sentiment analysis results for Textblob

“Arabic”

(5)

B. “Bert” Sentiment analysis

Figure [5]: Sentiment analysis results for BERT “English”

Figure [6]: Sentiment analysis results for BERT

“Arabic”

C. Evaluation

precision recall f1-score support

1.0 1.00 0.88 0.94 60

2.0 0.53 0.62 0.57 13

3.0 0.77 1.00 0.87 17

4.0 0.89 0.89 0.89 18

5.0 0.97 0.97 0.97 70

accuracy 0.91 178

macro avg

0.83 0.87 0.85 178

weighted avg

0.92 0.91 0.91 178

Figure [7]: Evaluation results for BERT “English”

1.0 0.80 0.45 0.58 73

2.0 0.10 0.18 0.13 17

3.0 0.75 0.91 0.82 88

accuracy 0.65 178

macro avg

0.55 0.51 0.51 178

weighted avg

0.71 0.65 0.66 178

Figure [8]: Evaluation results for Textblob “English”

2.0 0.76 0.76 0.76 58

3.0 0.67 0.67 0.67 42

accuracy 0.72 100

macro avg

0.71 0.71 0.71 100

weighted avg

0.72 0.72 0.72 100

Figure [9]: Evaluation results for BERT “Arabic”

1.0 0.79 0.57 0.67 87

2.0 0.14 0.10 0.12 49

3.0 0.59 0.94 0.73 64

accuracy 0.57 200

macro avg

0.51 0.54 0.50 200

weighted avg

0.57 0.57 0.55 200

Figure [10]: Evaluation results for Textblob “Arabic”

VI. CONCLUION

In summary, this research focuses on measuring the overall customer satisfaction in Amazing Egypt’s electronics section and evaluating of both rule-based models (Textblob) and word embedding models (BERT) in Arabic (textblob-ar and CAMeLBERT) and English.

The results showed that both models are fairly similar, resulting that both models showed that the sentiment results were mostly positive with an average percentage of 47% and the rest are between neutral and negative. We believe that the Bert results are probably more accurate because the complexity that the model operates with. Figures [3],[4],[5],[6].

(6)

These results shows that customers are mostly satisfied with the services offered in Amazon.eg

As for the evaluation of the models, the overall evaluation showed that we got an accuracy of 91% in BERT over the 65%

in Textblob in English, Figures [8],[9].

For Arabic, BERT showed an accuracy of 72% and Textblob showed an accuracy of 57%, Figures [9],[10].

Around 25% - 15% difference in accuracy in favor of BERT which indicates that word embedding models (BERT) are more superior than rule-based models (Textblob) in both Arabic and English.

REFERENCES

[1] Neha Nandal, (2020). Machine learning based aspect level sentiment analysis for Amazon products.

[2] Aashutosh Bhatt, Ankit Patel, Harsh Chheda, Kiran Gawande, (2015).

Amazon Review Classification and Sentiment Analysis.

[3] Alexander Wallin, (2014). Sentiment analysis of Amazon reviews and perception of product features.

[4] Tanjim Ul Haque, Nudrat Nawal Saber, Faisal Muhammad Shah, (2018). Sentiment analysis on large scale Amazon product reviews.

[5] Shekhawat, Bhupender Singh (2019) Sentiment Classification of Current Public Opinion on BREXIT: Naïve Bayes Classifier Model vs Python’s TextBlob Approach. Masters thesis, Dublin, National College of Ireland.

[6] Lendave, V. (2021, November 11). How to Obtain a Sentiment Score for a Sentence Using TextBlob? Analytics India Magazine.

https://analyticsindiamag.com/how-to-obtain-a-sentiment-scorefor-a- sentence-using-textblob/

[7] Batra, H. (2021). Indian Institute of Information Technology Allahabad. BERT-Based Sentiment Analysis: A Software Engineering Perspective.

[8] Rambocas, M. (2013). Marketing research: The role of sentiment analysis.ResearchGate.https://www.researchgate.net/publication/3015 49590_Marketing_research_The_role_of_sentiment_analysis [9] Sentiment analysis techniques in recent works. (2015, July 1). IEEE

Conference Publication |IEEE Xplore.

https://ieeexplore.ieee.org/document/7237157?fbclid=IwAR2vcsWrfy KxmXFSxiIdEDBG9cBk7DG6Ra Ed2ZssrQlrIth19366Ws0Wh58 [10] amazon.eg Traffic Analytics & Market Share | Similarweb