Online News Monitoring and Sentiment Analysis using BERT Approach
Kennedy B. Mateo1*, Aleta C. Fabregas1
1 Master in Information Technology, Polytechnic University of the Philippines, Open University System, Sta. Mesa, Manila, Philippines
*Corresponding Author: [email protected] Accepted: 15 December 2022 | Published: 31 December 2022
DOI:https://doi.org/10.55057/ijarti.2022.4.4.2
_________________________________________________________________________________________
Abstract: Due to the advancement of Information Technology (IT), the medium for publishing news and events has gotten speedier. How people consume news has changed since the introduction of the Internet, online, and mobile technology. With the growth and evolution of digital media in today's society, Media Monitoring (MM) services have also become popular.
While media monitoring interaction has obvious advantages, it also incurs considerable costs depending on the level of features it offers. Further, despite extensive research being done on sentiment analysis of social media and user input, there hasn't been much done so far to automatically predict the sentiment of narrative text. In this work, the proponent employed the pre-trained BERT model for the sentiment analysis feature and RSS feed to automate the collection and gathering of related online news in aid of the manual monitoring and analysis of online news stories and/or articles and avoid the costs associated with online media monitoring services. The developed system was able to produce appropriate features and a highly acceptable usability rating. The obtained results showed that the developed software will greatly improve and increase the end-user’s productivity by automating their existing manual system of monitoring and analysis of online news while reducing business costs by cutting the corresponding charges imposed by online media monitoring services.
Keywords: media monitoring, natural language processing, machine learning, sentiment analysis, BERT approach
___________________________________________________________________________
1. Introduction
According to the Digital 2022: The Philippines Report, one of the main reasons why 73.1% of Filipino internet user respondents aged 16 to 64 use the internet, is to keep up with news and current events. Before the internet, keeping up with the news requires physical copies of tabloids and broadsheets reporting the previous day’s events. But, today, a few clicks are all it takes to read the local paper as well as any news source from anywhere in the world, all of which are updated to the minute via the internet.
People's ways of consuming news have altered as a result of the development of the Internet, online, and mobile technology. Virtual internet versions of traditional print newspapers have mostly taken the role of those publications. In light of this, online news monitoring and analysis which can be done through Media Monitoring (MM) services are becoming increasingly popular today. While media monitoring interaction has obvious advantages, it also incurs considerable costs depending on the level of features it offers. Relatively, the Department of
the Interior and Local Government (DILG) - Public Affairs and Communications Service (PACS), manually monitors and analyzes news from various online media channels in order to reduce and spare additional costs.
Upgrading online media monitoring can enable enterprises and other institutions to gather direct and up-to-date data which can provide valuable insights concerning an organization in various application fields (Stavrakantonakis et al., 2012). Thus, with the growth and evolution of the media landscape in today's society, it is necessary to upgrade the manual news monitoring and analysis function of the PACS as it is crucial to anticipate and interpret the changing framing and priming nature of the media in packaging stories relative to the DILG.
However, limited studies, particularly in the local context, have been done on news and media monitoring. Hence, media monitoring activities in the online environment look set to remain a priority of the research agenda for some time to come (Zhang & Vos, 2014). Despite extensive research being done on sentiment analysis of social media and user input, there hasn't been much done so far to automatically predict the sentiment of narrative text (Lyu et al., 2020).
Although this is a rising subject in academia today, it hasn't yet gained much traction in real- world corporate settings. These factors provide the rationale for this study to close this research-to-practice gap by employing freely available and unstructured online data in an effort to produce actionable reports and insights through the proposed system.
In this paper, to complement the mandate of the DILG PACS in the formulation and implementation of plans, programs, and projects on public information and communication, public assistance, and modernization and maintenance of Department‐wide telecommunications systems, the researcher intends to develop an online web-based news monitoring in aid for the manual monitoring and analysis of DILG-related news online.
2. Related Literature
Digital and Social Media Monitoring Services
The fundamental nature of communication and its relationship with the audience in today's society has changed as a result of the Internet. Online news and weblogs, which are essentially virtual online versions of traditional print newspapers and periodicals or commonly known as digital media, have mostly replaced them (Taj et al., 2019). Accordingly, as Information Technology (IT) advances, the medium for publishing news and events has gotten speedier.
Relative to this, with the growth and evolution of digital media in today's society, Media Monitoring (MM) services are also becoming popular. In the studies of Barile et al. (2019), Ruggiero & Vos (2020), and Umar (2020), the authors discuss that MM services, often referred to as clipping services, give a daily selection of media material that is pertinent to an organization or its clientele; techniques can be thought of as “listening solutions” that support organizations in listening, interpreting, and responding to what people are saying online; and a process of gathering data pertinent to an organization from a social media platform and perform various analysis of the data for various purposes such as academic, decision-making, and marketing strategies, among others.
Technologies and Tools used in Media Monitoring
In the research of Stavrakantonakis et al. (2012), the authors presented the technological features social media monitoring tools should provide to determine the extent of the effect on the client’s enterprise: Listening Grid adjustment, Near real-time processing, Integration with 3rd party applications (API), Sentiment analysis, and Historical data. Likewise, the proponents
also discussed various tools to choose from while doing social media monitoring including NM Incite My BuzzMetrics, Radian6, Sysomos, and Visible Technologies Intelligence to name a few. However, said tools are quickly outdated due to the rapid development of the market: new functionalities, takeovers and the appearance of new players make it difficult to solely rely on such. In addition, dashboard services that offer an overview of online activity, including HootSuite, Netvibes, and Trackur, are among the tools and solutions mentioned in the study by Ruggiero & Vos (2020) on their study for social media monitoring for crisis communication.
The authors also cited Tweet Archivist, TweetDeck, Twapperkeeper, Gawk, Wordle, and Gephi as other analytical tools. Since the development of social media monitoring tools is fast- paced, new software mergers and rebranding may occur. Moreover, Barile et al. (2019) identified current MM services’ Meltwater, Cision Media Monitoring, Mention Reviews, and News Exposure that enables the customer company to evaluate its reputation and visibility in addition to assisting the company in strategic planning.
Purpose and Application of Media Monitoring
The application of media monitoring varies in different fields and purposes: to improve daily work and increase productivity, provide situational awareness, public policy-making, and formulation, and early detection and prognoses supporting strategy making. Barile et al. (2019) created a news recommender system for media monitoring that reduces the daily work of an editor for generating press releases from documents provided by a media monitoring system.
In the work of Pieterse et al. (2022), to help analysts obtain and interpret open-source data more effectively and provide situational awareness, they have developed a dedicated media monitoring tool. Meanwhile, Androutsopoulou et al. (2015) conducted their study to devise a framework for evaluating the use of media monitoring for supporting public policy-making and formulation. Said attempt aims to reach higher levels of maturity in improving relevant ICT platforms and practices. Furthermore, the work of Zhang & Vos (2014) identifies that in order to support strategy development, international organizations require real-time monitoring software, knowledge, and dynamic visualization. Even though monitoring social media activity has many clear benefits, it also incurs considerable costs. Stavrakantonakis et al. (2012) further state that upgrading online media monitoring can enable enterprises and other institutions to gather direct and up-to-date data which can provide valuable insights concerning an organization in various application fields.
Natural Language Processing and Sentiment Analysis
Because of the rapid rise of newsgroups, it is now possible to examine opinions and feelings in the news domain as well. As a result, sentiment analysis, which aims to discern a speaker's or writer's ideas, attitudes, and emotions represented in a text using a computer program (Kumar
& Garg, 2020) toward a specific issue, has become a hot topic in the field of Natural Language Processing (NLP) conducted by several researchers. Based on Stoy (2021), sentiment analysis is a subset of NLP. NLP is a branch of study that investigates the use of computational methods to decipher data from natural language. NLP allows computer programs to analyze textual data using machine learning methods, which are abundant in a public-source format, and extract meaning from data, which is known as semantics. Extracting and defining human feelings from the unstructured text can be done through sentiment analysis (Jindal & Aron, 2021), which is accomplished using Natural Language Processing and Machine Learning. Considering that sentiment analysis is timely and trendy in this generation, many studies are engaged or being done on this topic.
Approaches and Techniques Used in Sentiment Analysis
There are several approaches to sentiment analysis such as the classical machine learning approaches and deep learning approaches. Lexicon-based approaches like SentiStrength, Senti Word Net, Linguistic Inquiry Word Count (LIWC), and Affective Norms for English Words (ANEW); machine-learning approaches such as Naive Bayes (NB), Multi-Layer Perceptron (MLP), Multinomial Naive Bayes (MNB), Random Forest (RF), Maximum Entropy, Support Vector Machine (SVM) or a hybrid approach that uses both lexicon-based and machine learning approach (Shofiya & Abidi, 2021) are some examples.
Classical Machine Learning Approach
Several authors have used the classical machine learning approach in their studies. In the work of Pitogo & Ramos (2020), the proponents examine how social media platforms can provide an opportunity to improve civic involvement in e-Participation using a lexicon-based model in analyzing sentiments and an unsupervised machine learning-based algorithm in determining and classifying emotions in the dataset captured from Facebook. A similar study using machine learning algorithms was conducted by Arispe et al., (2020) using stratified cross-validation and Support Vector Machines (SVMs) to analyze the sentiment on disaster evacuation-related tweets and relief operations in the Philippines. Another machine learning algorithm used by some authors is the Naïve Bayes (NB) algorithm. In the paper of Vangara* et al. (2020), the authors used NB to accurately label the user's review on Amazon products and combine the reviews to give a final rating to the product. In most cases, there is no rating system available when in need to access and determine broad opinions on specific issues or products, and establishing rapid and efficient opinion-mining methods is a current concern. With this, Dinu
& Iuga (n.d.) concentrated on developing a method that employs the Naive Bayes Algorithm as a time-efficient classification algorithm. Additionally, Bilog (2020) sought to categorize whether comments made by Filipinos on Facebook were good or negative. This was accomplished through the use of web sentiment analysis using the NB algorithm.
Deep Learning Approach: BERT
Aside from the classical approach, some proponents conducted their studies using the deep learning approach, a combination of several techniques, and the BERT approach. To get the most comprehensive understanding of what people are saying about the airline's brand online and to acquire useful information from airline company data, Kang et al. (2021) conducted their study using the sentiment analysis models: Random Forest, Multinomial Naive Bayes, Linear Support Vector Classifier, Ensemble Method, Bidirectional Long Term Short Memory (Bi-LSTM), and BERT model in order to identify and build the optimal model to be utilized.
In the work of Prottasha et al. (2022), they have utilized BERT’s transfer learning ability to a deep integrated model CNN-BiLSTM for enhanced performance of decision-making in sentiment analysis. Li et al. (2021) study also provides a new model based on BERT and deep learning techniques for sentiment analysis. The model employs the BERT to turn the text's words into the appropriate word vectors, adds a sentiment dictionary to increase the word vector's sentiment intensity, and then employs a BiLSTM network to extract the forward and reverse contextual information. Similarly, Özçift et al. (2021) employed BERT to examine six datasets and five problem types from the Turkish language domain, comparing the outcomes to the best predictions of standard ML algorithms found in the literature. Gonzalez-Carvajal &
Garrido-Merchan (2020) likewise introduced the BERT model and the traditional NLP approach, in which an ML model is trained using the characteristics obtained with TF-IDF and makes predictions about how BERT will behave while attempting to complete NLP tasks. In four different NLP cases, the authors have demonstrated how BERT performs better than the conventional NLP technique. In the same manner, BERT (Bidirectional Encoder
Representations from Transformers) model was introduced in 2018 to swiftly and efficiently create a high-quality model with little effort and training time utilizing the PyTorch interface, regardless of the particular NLP application, and produce cutting-edge outcomes. Using the BERT model, sentiment analysis on the effect of the coronavirus on social life recently obtained 94% validation accuracy on the collected data sets (Singh et al., 2021). Briefly put, BERT is one of the most potent NLP models currently on the market, requiring little data while producing cutting-edge results with few task-specific modifications for a variety of NLP tasks like named entity recognition, language inference, semantic similarity, question answering, and classification like sentiment analysis (Kang et al., 2021).
Sentiment Analysis on News Domain
Because of the rapid rise of newsgroups, some studies were done to examine opinions and feelings in the news domain using sentiment analysis. Sentiment categorization techniques were applied to political news from columns on several Turkish news websites, in the early work of Kaya et al. (2012). The authors compared four supervised machine learning algorithms and went over the issue of sentiment classification in political news in-depth. Taj et al. (2019) used a lexicon-based strategy to investigate sentiment analysis of news and blogs using a dataset from the BBC consisting of news articles published between 2004 and 2005.
Meanwhile, Shirsat et al. (2018) focus on sentence-level negation identification from news articles using data from BBC news as well. Results are analyzed using Machine Learning Algorithms like SVM and NB. Another related study was conducted by Shuhidan et al. (2018) to explain the detailed stages of the conduct of sentiment analysis of Malaysia's financial news headlines using a machine learning algorithm: Opinion Lexicon-based algorithm and Naïve Bayes algorithm. Meanwhile, in the work of Zhang & Yamana (2020), the authors designed and compared systems built using BERT and NB-SVM models to cope with the humor assessment in newly updated headlines.
3. Methodology
The researcher utilized the descriptive and developmental method research design for the development of the proposed system. The descriptive method was utilized to determine the manual process of the DILG PACS in monitoring, analyzing, and tagging of news from various online media channels. Furthermore, developmental research is also employed as a way of developing instructional and non-instructional products and resources on an empirical foundation. Purposive sampling is used by the proponent to identify participants necessary for the conduct of the study. The identified participants were considered because of being part of and knowledgeable about the existing process of the DILG PACS in monitoring, analyzing, and tagging news from various online media channels.
The survey questionnaires were employed in the study to collect information and ensure that the respondents' evaluation was as precise as possible in the developed application system. The study's research instrument is the survey questionnaires which were constructed with indicators to assess the software's acceptability. The concept of Usability was included to determine if the system has met the ISO/IEC 25010:2011 standard. The researcher conducted Focus Group Discussions (FGD) and gather resources to identify the challenges/problems encountered in the existing manual process of the DILG PACS in monitoring, analyzing, and tagging news from various online media channels.
The proponent applied descriptive statistics to utilize all the information gathered from the respondents. The information gathered from the questionnaire was analyzed, described, and
summarized using metrics such as frequency, percentage, and weighted mean. Relative to the performance criteria of the sentiment analysis, the results were assessed using a confusion matrix and common performance criteria (Yadav et al., 2020) as evaluation metrics.
Table 1: Confusion Matrix Example
Classified Positive Classified Negative Classified Neutral Actual Positive True Positive (TP) False Negative (FN) False Neutral (FNe) Actual Negative False Positive (FP) True Negative (TN) False Neutral (FNe) Actual Neutral False Positive (FP) False Negative (FN) True Neutral (TNe) The confusion matrix was created with the use of manual labels provided by DILG- PACS.
Moreover, the accuracy of the model is calculated using the confusion matrix and is as follows:
Accuracy. Predictions of how often classier makes correct predictions. It measures the ratio of correct predictions over the total number of instances evaluated.
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = Correct Prediction
Total Number of Instances 𝑥 100
Software Development
The researcher used the Rapid Application Development (RAD) approach. Daud et al. (n.d.) stated that Rapid Application Development is suited for a small system with a quick implementation since the client can view the result in a short time and the developer can still oversee the development process. The aforementioned paradigm takes a methodical approach to systems development using the following processes:
Figure 1: Rapid Application Development (RAD) Approach
1) Requirements Planning: The researcher will determine the application's functionalities and features through the conduct of Focus Group Discussions (FGD) and will gather resources to identify the challenges/problems encountered by the end users. This will aid in creating a Use Case diagram to visualize the user-system interaction.
2) User Design: After defining the app's functionalities and features, the researcher will create the app's User Experience (UX), which includes the overall layout and how it will act. As a result, User Interface (UI) is created. Based on the data collected, the researcher will develop models and prototypes that depict all system inputs, processes, and outputs. In order to create a system that is both visually appealing and easy to use, the researcher will also take programming languages into account. This encompasses several aspects of the overall design, such as the wireframe for the system and the database design. The user interface and database design, which are crucial for integrating the system's functionality, will be built as part of the initial configuration during this phase. An iterative approach is used to fix bugs and problems. The prototype will be created, and while it is being tested by end users, adjustments will be made as necessary until all of the requirements have been satisfied.
3) Construction: This is the phase where the development of the main functionalities of the proposed system will take place. This includes the data in the database and managing the
contents and learning process in using the application as well as integrating the BERT model created using Python into the Pentaho software. The researcher and end-users will collaborate during this phase to ensure that everything is operating as intended and that the final product meets the goals and expectations of the end users. Because the client continues to have input throughout the process, this third phase is crucial. The end-users can recommend adjustments, modifications, or even brand-new concepts that can address issues as progress. The researcher used Pentaho Data Integration to extract, transform, and load data, PostgreSQL for the database, Pentaho Dashboards, HTML, CSS, and Javascript for dashboarding and visualization, and yii2 (Php Framework) for the portal as the medium for creating the main functionalities of the proposed system.
4) Cutover: The application is now developed and will be pilot tested by the client. It is also in this phase that the application is ready for deployment. The application's quality will be assessed for acceptability. The researcher will examine the system and its contents to see if they are well organized. The researcher put the entire concept to the test to see if it was acceptable. The researcher will utilize the ISO/IEC 25010:2011 Software Product Evaluation to determine the level of acceptability of the proposed system. Appropriateness recognizability, learnability, operability, user interface aesthetic, and accessibility are all criteria used to evaluate the software.
Proposed System Architecture
Figure 2 shows the proposed system architecture for Online News Monitoring and Sentiment Analysis using BERT Approach. The architecture presents the process flow from the target end-user by obtaining DILG-related news articles from Google News using the RSS to extract the news articles with the necessary keywords. After capturing the selected news articles, it will undergo data transformation before storing data in the datawarehouse in preparation for dashboard visualization.
Figure 2: Proposed System Architecture of the Online News Monitoring and Sentiment Analysis using BERT Approach
The proponent employed the BERT model developed by Google (Devlin et al. (2018)) as a reference which uses a bidirectional Transformer. BERT model representations are jointly conditioned on both left and right contexts in all layers. Further, the BERT model input representation shows that the relevant token, segment, and position embeddings are added together to create the input representation for a specific token and take into account the words that come before and after a word in the text to understand the sentiment of a phrase or sentence.
The researcher deployed the BERT model using python scripts, a python executor, and a python virtual environment. To build up the Sentiment Analysis feature of the system, the Jupyter Notebook was used containing different libraries and dependencies needed such as Pytorch, Transformers, Numpy, and Pandas to implement it on the Pentaho PDI. The process of the sentiment analysis using the BERT model is shown in Figure 3. Tokenizer will be applied to the given input news content. The tokenizer will split the input string into words called tokens and eliminate separators such as whitespace, punctuation marks, and line breaks.
Considering that the BERT model is a neural network that can only work with numerical values, the given input string shall undergo embeddings such as token, segment and position embeddings to create the input representation for a specific token. The output of the embeddings will be passed to the BERT transformer to determine the output class of the sequence classifier containing the probability of that particular class being the sentiment. To obtain the highest value result from the results, the researcher used argmax. Moreover, the result of the model with a score of 1 to 2 represents that the context of given news content or summary is ‘Negative’. While context with a score of 4 to 5 represents ‘Positive’ and context with a score of 3 will be tagged as ‘Neutral’.
Figure 3: Process of Sentiment Analysis Module using BERT Approach
The system architecture shows the process of how the system will perform the sentiment analysis feature. Sentiment Analysis is performed once the user entered the summary of the news content or the portion of the news mentioning the DILG or the SILG and is not from the DILG Press Release using the BERT model as shown in Figure 4.
Figure 4: Sentiment Analysis Module of the Online News Monitoring and Sentiment Analysis using BERT Approach
After the classification of the given summary of the news content, the system will show the recommended topics or keywords with positive, neutral, and negative orientations based on the result of the sentiment analysis. It will generate what are the recommended topics that need immediate attention or intervention and what topics are to maintain and further uphold as shown in Figure 5.
Figure 5: List of Topics with Positive/Neutral/Negative Orientations
4. Results and Discussion
- Challenges/Problems Encountered in the Existing System of Monitoring and Analysis of Online News Stories and/or Articles. After the conduct of Focus Group Discussions (FGD), the proponent has identified the challenges/problems encountered by the DILG- PACS in their existing system of monitoring and analysis of DILG-related news articles presented in Table 2.
Table 2: List of Identified Challenges/Problems Encountered
Statement Percentage (%) Rank
It is tedious and laborious to do the manual monitoring tasks of DILG-related online news to avoid the corresponding charges entailed by online media monitoring services.
33.33% 1
It is hard to identify the sentiment of the news content. 26.67% 2 It is difficult to immediately create the necessary news
monitoring report needed by the DILG Top Management. 20% 3.5 It is time-consuming to produce the daily news
monitoring report. 20% 3.5
Total 100%
- Features of the Proposed System that may Address the Identified Challenges/Problems Encountered. For the first challenge which is the tedious and laborious manual searching of DILG-related news online, the system automates the collection and gathering of related news mentioning the DILG and/or SILG from Google News using the RSS Feed and presented the same in an interactive dashboard. Through the developed software, users can improve their productivity considering that the manual searching for related news was already automated. Moreover, the system also assisted with the analysis and tagging of related news articles through the sentiment analysis feature of the system using the BERT approach. Likewise, to aid the user in generating the daily online news monitoring report, the system has the facility to generate and download the report based on the gathered related news articles and the results of the sentiment analysis. Thus, addressing the issues of laborious and time-consuming effort in the preparation of reports to be submitted to the DILG Top Management.
Table 3: Actual Confusion Matrix
Classified Positive
Classified Negative
Classified Neutral
Actual Positive 14 2 0
Actual Negative 1 33 1
Actual Neutral 1 0 8
- Result of the Performance Criteria Used in the Evaluation Metrics of the Sentiment Analysis. To classify the news sentiment, the researcher implemented the pre-trained model called BERT. The accuracy of the model during the testing phase is 91.67%. Table 3 displayed the result of the accuracy using a confusion matrix.
The model predicted 33 correct negative sentiments out of the total number of data with a negative label. Only one of the neutral dataset's results is incorrect. While the positive label
correctly predicted 14 of the 16 positive labels. This means that the model predicted 55 correct labels out of the total 60 labeled contents in the dataset, yielding a 91.67% accuracy rate.
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = Correct Prediction
Total Number of Instances 𝑥 100
=55
60 𝑥 100
= 0.9167 𝑥 100
= 91.67%
- Intelligent Component of the Proposed System with Respect to the Result of the Sentiment Analysis. After the classification of the given summary of the news content, the system will show the recommended topics or keywords with positive and negative orientations based on the result of the sentiment analysis. This will be the intelligent component of the system that will generate the recommended topics that need immediate attention or intervention and what topics are to maintain and further sustain. This will further support the goal to have up-to-date data and information that is fundamental to gauge the relevant insights in interpreting the changing framing and priming nature of the media landscape by employing freely available and unstructured online data to produce actionable reports and valuable media insights.
Knowing the sentiment of news/topics towards an organization can help the organization understand how it is perceived by the public and the media. This can be useful for reputation management, as the organization can take steps to address any negative sentiment and improve its reputation.
Table 4: Level of Usability of Online News Monitoring and Sentiment Analysis
Criteria Mean Descriptive Equivalent
Descriptive Interpretation Appropriateness
Recognizability 4.6 Very Strongly Agree
Very Highly Acceptable Learnability 5.0 Very Strongly
Agree
Very Highly Acceptable
Operability 4.8 Very Strongly
Agree
Very Highly Acceptable User Interface
Aesthetic 5.0 Very Strongly
Agree
Very Highly Acceptable Accessibility 4.8 Very Strongly
Agree
Very Highly Acceptable Grand Mean 4.84 Very Strongly
Agree
Very Highly Acceptable
- Level of Usability of the Online News Monitoring and Sentiment Analysis. The level of usability of the Online News Monitoring and Sentiment Analysis was measured using ISO/IEC 25010:2011 Usability standard in terms of appropriateness recognizability, learnability, operability, user interface aesthetic, and accessibility. The user interface
aesthetic and learnability indicators got the highest sub-mean of 5.0 while the appropriateness recognizability was rated lowest with a sub-mean of 4.6. While both operability and accessibility were rated 4.8. All the usability indicators were rated very highly acceptable as the equivalent descriptive interpretation.
5. Conclusion and Recommendation
Summary of Findings
According to the findings, the DILG PACS managed the encountered challenges/problems in their existing manual system of monitoring and analysis of related online news in order to avoid the costs associated with online media monitoring services. Relative to this, the proponent perceived that manual tasks reduce the overall efficiency and productivity of the concerned technical staff in carrying out their day-to-day tasks and operations since the users were forced to perform the required tasks manually in an attempt to cut and save additional costs.
The developed system successfully automates the collection and gathering of related news mentioning the DILG and/or SILG from Google News using the RSS feed and presented the same in an interactive dashboard. This feature addressed the main issue encountered by the DILG PACS in its tedious and laborious manual monitoring tasks of DILG-related online news.
The model employed in classifying the sentiment of the given news content using the pre- trained BERT model has yielded an accuracy rate of 91.67%. This leads to the conclusion that the sentiment analysis feature of the system is very highly satisfactory.
Through the development and implementation of online news monitoring and sentiment analysis, organizations and entities can have up-to-date data and information that is fundamental to gauge the relevant insights in interpreting the changing framing and priming nature of the media landscape by employing freely available and unstructured online data to produce actionable reports and valuable media insights as demonstrated by the intelligent component of the system in generating the recommended topics or keywords with positive and negative orientations.
The developed “Online News Monitoring and Sentiment Analysis using BERT Approach” was deemed very acceptable by the respondents. The system is best for its user interface aesthetic and learnability, as evidenced by its highest rating for UI design, software look, and systems effectiveness. In terms of appropriateness recognizability, users recognize that the features provided by the application are appropriate for their needs. Users also recognize that the application is simple to use and that the application's features are easy to comprehend. Overall, the system received a grand mean rating of 4.84, which was deemed very acceptable to aid the existing system of DILG PACS in monitoring and tagging of online DILG-related news.
Recommendations
1) This study suggests that the developed software as rated very highly acceptable by the respondents serves as a useful tool to be employed by the users in automating their monitoring and analysis function of online related news. This will improve and increase their productivity by automating the generation of news reports needed and reduce business costs by cutting the corresponding charges entailed by online media monitoring services.
2) As for the features of the developed system “Online News Monitoring and Sentiment Analysis using BERT Approach”, the study suggests having a real-time gathering and
fetching of related news using the RSS feed and expanding the news source aside from the Google News to widen the scope of the news to be analyzed and monitored is recommended.
3) For future researchers, the study recommends fetching the news content using the Application Programming Interface (API) from other online sources rather than manually inputting it into the system to further reduce human intervention in the actual system procedure and reduce human error in the system.
4) The study proposes to fine-tune the presently employed pre-trained BERT model to upgrade the sentiment analysis feature of the system and further increase its accuracy. The proponent also suggests employing another NLP technique to automatically identify the keyword or topic based on the news content.
References
Androutsopoulou, A., Charalabidis, Y., & Loukis, E. N. (2015). Social Media Monitoring for
Public Policy Making - An Evaluation (Vol. 8).
http://aisel.aisnet.org/mcis2015http://aisel.aisnet.org/mcis2015/8
Arispe, M. C. A., Bigueras, rosemarie T., Torio, J. O., & Maligat, D. J. E. (2020). Sentiment Analysis on Evacuation and Relief Operation in the Philippines. International Journal of Advanced Trends in Computer Science and Engineering, 9(1.3), 298–302.
https://doi.org/10.30534/ijatcse/2020/4591.32020
Barile, F., Ricci, F., Tkalcic, M., Magnini, B., Zanoli, R., Lavelli, A., & Speranza, M. (2019).
A news recommender system for media monitoring. Proceedings - 2019 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2019, 132–140.
https://doi.org/10.1145/3350546.3352510
Bilog, R. J. (2020). Application of Naïve Bayes Algorithm in Sentiment Analysis of Filipino, English and Taglish Facebook Comments. International Journal of Management and Humanities, 4(5), 73–77. https://doi.org/10.35940/ijmh.E0524.014520
Daud, N. M. N., Abu Bakar, N. A. A., & Rusli, H. M. (n.d.). Implementing Rapid Application Development (RAD) Methodology in Developing Practical Training Application System.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. http://arxiv.org/abs/1810.04805 Dinu, L. P., & Iuga, I. (n.d.). The Naive Bayes Classifier in Opinion Mining: In Search of the
Best Feature Set. http://twittersentiment.appspot.com/
Gonzalez-Carvajal, S., & Garrido-Merchan, E. (2020). Comparing BERT against Traditional Machine Learning Text Classification. http://arxiv.org/abs/1607.06450
Jindal, K., & Aron, R. (2021). A systematic study of sentiment analysis for social media data.
Materials Today: Proceedings. https://doi.org/10.1016/j.matpr.2021.01.048
Kang, H. W., Chye, K. K., Yuan, O. Z., & Tan, C. W. (2021). THE SCIENCE OF EMOTION:
MALAYSIAN AIRLINES SENTIMENT ANALYSIS USING BERT APPROACH.
https://www.researchgate.net/publication/356493369
Kaya, M., Fidan, G., & Toroslu, I. H. (2012). Sentiment analysis of Turkish political news.
Proceedings - 2012 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2012, 174–180. https://doi.org/10.1109/WI-IAT.2012.115
Kumar, A., & Garg, G. (2020). Systematic literature review on context-based sentiment analysis in social multimedia. Multimedia Tools and Applications, 79(21–22), 15349–
15380. https://doi.org/10.1007/s11042-019-7346-5
Li, H., Ma, Y., Ma, Z., & Zhu, H. (2021). Weibo Text Sentiment Analysis Based on BERT and
Deep Learning. Applied Sciences (Switzerland), 11(22).
https://doi.org/10.3390/app112210774
Lyu, C., Ji, T., & Graham, Y. (2020). Incorporating Context and Knowledge for Better Sentiment Analysis of Narrative Text. http://ceur-ws.org
Özçift, A., Akarsu, K., Yumuk, F., & Söylemez, C. (2021). Advancing Natural Language Processing (NLP) applications of Morphologically Rich Languages with Bidirectional Encoder Representations from Transformers (BERT): an empirical case study for Turkish.
Automatika, 62(2), 226–238. https://doi.org/10.1080/00051144.2021.1922150
Pieterse, H., Van ’t Wout, C., Khan, Z., & Serfontein, C. (2022). Specialised Media Monitoring Tool to Observe Situational Awareness Network Threats View project Professional offensive cyber operations View project Specialised Media Monitoring Tool to Observe Situational Awareness. https://www.researchgate.net/publication/359082210
Pitogo, V. A., & Ramos, C. D. L. (2020). Social media enabled e-Participation: A lexicon- based sentiment analysis using unsupervised machine learning. ACM International Conference Proceeding Series, 518–528. https://doi.org/10.1145/3428502.3428581 Ruggiero, A., & Vos, M. (2020). Social Media Monitoring for Crisis Communication: Process,
Methods and Trends in the Scientific Literature. Online Journal of Communication and Media Technologies, 4(1). https://doi.org/10.29333/ojcmt/2457
Shirsat, V. S., Jagdale, R. S., & Deshmukh, S. N. (2018). Sentence level sentiment identification and calculation from news articles using machine learning techniques. In Advances in Intelligent Systems and Computing (Vol. 810, pp. 371–376). Springer Verlag. https://doi.org/10.1007/978-981-13-1513-8_39
Shofiya, C., & Abidi, S. (2021). Sentiment analysis on covid-19-related social distancing in Canada using twitter data. International Journal of Environmental Research and Public Health, 18(11). https://doi.org/10.3390/ijerph18115993
Shuhidan, S. M., Hamidi, S. R., Kazemian, S., Shuhidan, S. M., & Ismail, M. A. (2018).
Sentiment analysis for financial news headlines using machine learning algorithm.
Advances in Intelligent Systems and Computing, 739, 64–72.
https://doi.org/10.1007/978-981-10-8612-0_8
Singh, M., Jakhar, A. K., & Pandey, S. (2021). Sentiment analysis on the impact of coronavirus in social life using the BERT model. Social Network Analysis and Mining, 11(1).
https://doi.org/10.1007/s13278-021-00737-z
Stavrakantonakis, I., Gagiu, A.-E., Kasper, H., Toma, I., & Thalhammer, A. (2012). An approach for evaluation of social media monitoring tools.
Stoy, L. (2021, June 23). Sentiment Analysis: A Deep Dive Into The Theory, Methods, And Applications.
Taj, S., Shaikh, B. B., & Meghji, A. F. (2019). Sentiment Analysis of News Articles:A Lexicon based Approach.
Umar, H. (2020). A DYNAMIC MODEL OF SOCIAL MEDIA MONITORING TOOLS WITH SENTIMENT ANALYSIS.
Vangara*, R. V. B., Thirupathur, K., & Vangara, S. P. (2020). Opinion Mining Classification u sing Naive Bayes Algorithm. International Journal of Innovative Technology and Exploring Engineering, 9(5), 495–498. https://doi.org/10.35940/ijitee.E2402.039520 Yadav, A., Jha, C. K., Sharan, A., & Vaish, V. (2020). Sentiment analysis of financial news
using unsupervised approach. Procedia Computer Science, 167, 589–598.
https://doi.org/10.1016/j.procs.2020.03.325
Zhang, B., & Vos, M. (2014). Social media monitoring: Aims, methods, and challenges for international companies. Corporate Communications, 19(4), 371–383.
https://doi.org/10.1108/CCIJ-07-2013-0044
Zhang, C., & Yamana, H. (2020). Combining BERT and Naive Bayes-SVM for Humor
Assessment in Edited News Headlines. Online.
https://github.com/HeroadZ/SemEval2020-task7