This project/internship titled "Sentiment Analysis on Bangla Conversational Data Using Machine Learning Approach" submitted by "Md. Shahriar Shakil", ID No and Department of Computer Science and Engineering, Daffodil International University is accepted as satisfactory in partial fulfillment of the requirements for the degree of B.Sc Department of Computer Science and Engineering Faculty of Information Science and Technology Daffodil International University.
We hereby declare that this project was done by us under the supervision of Ms. We also declare that neither this project nor any part of this project has been submitted elsewhere for the award of any degree or diploma. Deep knowledge and great interest of our supervisor in the field of "Machine Learning and Natural Language Processing" to carry out this project.
Her endless patience, scholarly guidance, constant encouragement, constant and energetic supervision, constructive criticism, valuable advice, reading many inferior drafts and correcting them at every stage have made it possible to complete this project. Touhid Bhuiyan and Head, Department of CSE, for his kind help in completing our project and also to other faculty members and staff of CSE Department of Daffodil International University. We would like to thank all our coursemates at Daffodil International University who participated in this discussion while completing the coursework.
This research titled “Sentiment Analysis of Bangla Call Data Using Machine Learning Approach” is from conversations from which people's sentiment during the conversation period can be extracted as valuable information.
INTRODUCTION
- Motivation
- Rationale of the Study
- Research Questions
- Expected Outcome
- Report Layout
Natural language processing is one of the popular fields for text analysis and summarization. The main goal of machine learning is to build a model that can perform well based on the data set it keeps during the training process, and this capability gives promising accuracy to that model. Machine learning works at scale to solve critical and complex forecasting and analysis tasks.
People's subconscious minds are often changing based on the environmental conditions and situations they face. It is a critical task to trace the specific feeling of specific moments from one's conversations. We are so confident in our model that we can accurately predict the exact results of whether the conversation is positive or negative.
We used seven machine learning algorithms such as Support Vector Machine, Multinomial Naive Bayes, K-Nearest Neighbors, Logistic Regression, Decision Tree, Stochastic Gradient Descent and Random Forest. In this research work, we proposed a model that can extract sentiment from conversation as positive or negative sentiment. Based on the training data set, the accuracy of the model depends entirely on the training data set.
We used a number of techniques, such as changing the parameters of machine learning models, to get more accurate results.
We have presented about the foundation of our work and examine about others related works, similar investigations, the extent of the issues and difficulties in this section
In this chapter we are talking about summary of prediction and conclusion and we have also added further study process
BACKGROUND BACKGROUND
- Introduction
- Related Works
- Research Summary
- Challenges
They collect data from a Facebook group post and used two methods to find the polarity of the post. In paper [4] sentiment analysis was performed on Romanized Bangla and Bangla text collected from various social media. With categorical cross-entropy loss, they achieved 78% accuracy using RNN LSTM to train their model.
First, they used a rule-based classifier to train data and divide the post into positive and negative polarity. They used a bag of words method and lexical analysis approach to extract sentiment from a paragraph. In this paper [7] they want to analyze twitter posts using machine learning.
They proposed a new feature vector that can classify positive and negative sentiments from Twitter posts. This research work [9] used Naive Bayes and Decision tree machine learning algorithms to analyze twitter data for sentiment analysis. This paper [10] attempted to analyze customer reviews of a restaurant using some machine learning algorithms based on classifiers.
In this research work, we have tried to develop a model that can extract sentiment from Bangla conversation data, which is most significant for this research. Support for Vector Machine, Multinomial Naive Bayes, K-Nearest Neighbors, Logistic Regression, Decision Tree, Random Forest and Stochastic Gradient Descent implies that Python gets results from our research work. We faced many problems while collecting data as the dataset and resources in Bengali language are not so available.
We have collected our data from Bangla movie and short film scripts as they are a great source of Bangla chat data. This procedure was not so easy as the script of the film and the short film have copyright problems. In this case, our esteemed supervisor madam directs us to the scriptwriters so that we can collect our data.
RESEARCH METHODOLOGY
- Introduction
- Research Subject and Instrumentation
- Data Collection Procedure
- Data Preprocessing and Organizing
- Machine Learning Algorithms
- Statistical Analysis
- Implementation Requirements
- Experimental Setup
- Model Summary
- Experimental Result and Analysis
- Prediction
- Discussion
In natural language processing, it is essential for any language to identify and remove stop words. Daffodil International University 10 Figure 3.4 shows the python code for removing Bangla words and punctuation. To extract features from each of the conversations, the number of words and the number of characters are needed.
The Daffodil International University 12 Decision Tree classifier works like a flowchart as a tree structure, with each internal node indicating the test on an attribute, each individual branch representing a result of the test, and each leaf node containing a class label. It sends the new case to each of the trees to classify the new case. Stochastic Gradient Descent is known for optimizing any algorithm mainly propagated in machine learning algorithms to find the related parameter of the model that fits the predicted and actual output.
In this figure (3.6) we can know step by step how to proceed with our target. We collect them as sentences and save them in an xlsx file by mentioning the conversation type whether it is a positive or negative conversation. In the test set, it is necessary to have an idea of the correctly classified samples.
Here it is clear that Machine Vector Support gives the highest accuracy score of 0.85589 and. Accuracy is used to measure the class match of the data labels to the positive labels given by the classifier. We need to calculate the accuracy score for each of the two class labels because it is directly relevant to the class labels.
In Table 4.2 the values for each of the classifiers are given, along with the two labels we used in this research work. Daffodil International University 18 To Identify Class Labels Recall is known as the sensitivity of the measure that represents the effectiveness of the classifier. It is clear from the tables that SVM, Multinomial Naive Bayes and Random Forest have the best performance as individual classifiers.
To avoid overfitting and robustness, it is necessary to have a strong correlation across fitting nuts; though not exceptional. Since it is not robust against noise and does not generalize well, future observed data Decision Trees do not work too well.
SUMMARY, CONCLUSION, IMPLICATION FOR FUTURE RESEARCH
Summary of the Study
Conclusion
Future Work
But on a large scale, people's emotions and sentiments as individuals such as sadness, anger, neutral, happiness, fear can also be extracted. For real-time conversation data, by converting real-time conversations into text and analysis of sentiment from these conversations can also be done.