• Tidak ada hasil yang ditemukan

Chapter 4 The VHope System

N/A
N/A
Protected

Academic year: 2024

Membagikan "Chapter 4 The VHope System"

Copied!
12
0
0

Teks penuh

(1)

Chapter 4

The VHope System

VHope (Virtual Hope) is a hybrid empathetic conversational agent that uses a retrieval-based model and neural conversational model in order to provide students a place where they can freely talk about their feelings and emotions without feeling invalidated and help them maintain their well-being which is measured using the PERMA model of well-being (Seligman, 2012).

This hybrid model is a combination of an existing retrieval-based model and training a publicly available neural conversational model. The Retrieval-based model used is first EREN (Santos et al., 2020) and later on MHBot (Ong, Go, Lao, Pastor, & To, 2021). EREN provided the emotional classification on the input as it aims to teach emotional intelligence for children. Its target users are aged 9 to 11, far from VHope’s 17-20, so the model is later changed to MHBot which also used EREN as their base system but has college students as their target users. MHBot’s goal is also to help college students in their well-being so its responses are more appropriate for VHope’s use. VHope was made available as a web page using Python Flask so users can access it on their phone, tablet, or computers - wherever they are comfortable.

4.1 Software Objectives

4.1.1 General Objective

To combine a retrieval-based model with a generative-based model that can gen- erate empathetic responses for newer and more human-like conversation.

(2)

4.1.2 Specific Objectives

• To use an existing retrieval-based model with the trained neural conversa- tional model.

• To train neural conversational model with a collection of empathetic dia- logues dataset to teach the model emotions and empathy.

• To detect the user’s wellbeing level using PERMA model as measurement and Mental Health continuum as label

• To act as a facilitator that helps label and recognize their wellbeing and as a listener that can empathize and encourage to share their feelings and stories.

4.2 Scope and Limitations

Retrieval-based model. Since VHope is a hybrid model, it uses a retrieval- based and a generative model working together to get the desired output. The retrieval-based model used in this systemfirst is an already existing system called Emotion Reflective Entity or EREN for short (Santos et al., 2020). EREN uses the OCC model to classify an event’s emotion which consists of 14 emotions - anger, disappointment, distress, fears-confirmed, sorry for, gratitude, hate, joy, love, surprise, relief, resentment, satisfaction, and shock. Then when MHBot became available, EREN was replaced with MHBot as its responses are more appropriate for the target audience and objective of VHope. The retrieval-based system was modified in order to accommodate the modified conversation flow.

Neural conversational model. As for the generative model, the publicly available DialoGPT (Y. Zhang et al., 2019) is used. DialoGPT’s pre-trained model with sizes small, medium, and large arefine-tuned to theEmpatheticDi- alogues (Rashkin et al., 2018b) dataset to train the model to generate empa- thetic responses. Perplexity is used to evaluate the performance of each trained variation of the model before integrating it with the retrieval-based model.

PERMA model. In order to help users maintain their well-being, their well-being must first be computed and then classified into some labels for easier identification. Well-being will be measured using the PERMA lexica introduced by Schwartz et al. (2016) based on the PERMA model of Seligman (2012). This measurement will then be classified into 5 categories - in crisis, struggling, surviv- ing, thriving, and excelling which are derived from the Mental Health Continuum stated by Delphis.

(3)

VHope as facilitator and listener. As an empathetic conversational agent, VHope is expected to act as a facilitator which helps the user recognize their wellbeing and a listener that shows empathy as users share their stories.

To perform the role of a facilitator, VHope should be able to encourage the user to share their stories, label their wellbeing, and help them recognize why they are feeling which can be achieved by asking questions. And as a listener, VHope should be able to recognize the user’s wellbeing and respond to it accordingly while showing empathy.

4.3 Architectural Design

The overall architectural design of the system is as seen in Figure 4.1. This hybrid model is composed of 2 submodels - retrieval and generative. These two models are expected to work together to answer each other’s weakness - retrieval-based to give rules or certainflow and specific context to the open-domain generative-based and for the latter to generate a more empathetic and newly constructed response, different from predefined responses of retrieval-based.

Figure 4.1: VHope hybrid model system architecture

To start conversing with VHope, users will login to the web page and will be prompted with a welcome message from VHope - mainly asking about the user’s day so that the user will feel more comfortable to start sharing stories. Once a user starts giving an utterance, this input will be accepted by the retrieval model.

Utterance will be processed in the Text understanding module where it passes the text to the Story World and Event Extraction process. In Story World Extraction,

(4)

characters, objects, settings, and events are extracted from the text utterance whereas in Event Extraction are the creation, description or action events. The retrieval model’s Emotion Recognition module uses the OCC Model for classifying emotions from action events and the Keyword Spotting Technique for description events. Once an emotion is detected and classified, the model uses the Dialogue Manager to find the best suitable move which then chooses a template randomly that will later be filled up by the Response Generation.

Both the detected emotion and response generated will be used as input to the generative model to generate an appropriate response. For the generative model to be able to generate a response, it must first be fine-tuned and trained with EmpatheticDialogues(Rashkin et al., 2018b) dataset for it to learn generating responses with empathy, Well-being conversations (Sia, Yu, Daliva, Montenegro,

& Ong, 2021) retrieved from a past study which focused on providing conversations that encourage students in improving their lifestyle habits and well-being for the model to learn words used in well-being related conversations, and the PERMA lexicon to teach the model words related to each category in PERMA model.

The PERMA module uses the PERMA lexica (Schwartz et al., 2016) in order to measure the current well-being of the user. This computes the score of each word in the utterance in each PERMA category - positive emotion, engagement, relationships, meaning, and accomplishment. The overall score will then be la- beled as in crisis, struggling, surviving, thriving, or excelling as adapted from Delphis1. If the user’s label is in crisis or struggling, VHope will notify the user and ask if he wants to seek help as suggested by the psychology expert. Otherwise, it will just continue the storytelling by asking if the detected PERMA is correct to what they are currently feeling.

4.3.1 Retrieval-based model

Retrieval-based model is responsible for processing the input, understanding it, recognizing emotion, and giving a candidate response which follows the designed conversation flow. MHBot was created using the architecture of EREN, the only difference is their templates wherein MHBot provides responses regarding college student’s well-being and advice in order to help them maintain their mental well- being whereas EREN tries to guide children to share their stories and recognize their emotions.

1https://delphis.org.uk/mental-health/continuum-mental-health/

(5)

Figure 4.2: Retrieval-based model Knowledge Base

Retrieval model uses a variety of knowledge bases in order to understand the text input and perform its tasks. The three knowledge bases used are the following:

Story World Representation. This maintains two types of event chain - one is the set of action and descriptions events and the other is the emotion chain event which contains the set of emotions which is utilized in order to check for recurring emotions throughout the conversation.

Commonsense Ontology. This provides the commonsense knowledge needed to understand the basic words and concepts in the real world. It is utilized to recognize the familiarity of the object detected and is also used to properly label an emotion.

NRC Emotion Lexicon. This helps the model understand affective words. It was based on the NRC Emotion Lexicon which is composed of words, its sentiment which can either be positive or negative, and eight basic emotions namely - anger, anticipation, disgust, fear, joy, sadness, surprise, and trust.

Text Understanding

In text understanding, the user’s text input is processed where important ele- ments are extracted. These elements create an Event which can be categorized into three - Action, Description, or Creation Event. Every actual event from the input which was performed is categorized as Action event, information which de- scribes an object or character as Description event, and Creation event are where

(6)

new characters or events are introduced in the story. Only Action and Descrip- tion events are passed to the Emotion Recognition module since emotions can be influenced by actions performed and adjectives that can explicitly describe how they are feeling.

Emotion Recognition

Emotion Recognition classifies the emotions from the detected events. Action events such as subject, verb, direct object, and adverb that were passed to the Emotion Recognition module uses the OCC model while Description events uses the Keyword Spotting Technique.

Dialogue Manager

Dialogue Manager works to know and decide which dialogue move is appropriate to use. Different from chatbots designed to assist humans with a specific goal or task, conversational agents providing social and mental health support need to mimic and maintain the common human-to-human exchange of turns. It is also essential in order to provide an acceptable flow of conversation expected from an actual health care professional or therapist. Thus, this study adapts the conversation flow of EREN originated from Gottman et al. (1996)’s Emotion Coaching Model which, as seen in Figure 4.3, consists of five (5) phases namely - introduction, labeling, listening, reflecting, and evaluating. This conversation flow guides the retrieval-based model what candidate responses to generate which will be used as input to the FTER model.

Introduction. Every conversation starts with an introduction phase where the agent greets the user, provides a simple self-introduction of what it can offer, and ends with asking the user about his/her day signaling the user to share his/her story or feeling. This uses the simple retrieval-based model to output a predefined response. This is also where the system tries to detect an emotion using the retrieval-model’s Emotion Recognition module and assesses the user’s well-being using the PERMA module of the generative model through his/her utterance.

Labeling. In this phase, the agent asks the user if the detected well-being state is correct. In case the user disagrees and does not provide any alternative well- being state, it will try to ask for the specific well-being label state OR process the new utterance to detect and assess again. Once the user confirms these, related keywords will be collected and used as a guide in generating empathetic and relevant responses throughout the succeeding conversation.

(7)

Figure 4.3: Retrieval-based model’s updated conversation flow

Listening. With the well-being state confirmed, the agent will now try to gain the user’s trust by building rapport in asking questions on why he/she feels that way and is in that state. The agent should also be able to generate positively validating statements to the user’s situation. This phase will make full use of the FTER model in generating empathetic responses.

Reflecting. This phase will still continue to make use of the FTER model guided by the candidate responses generated by the retrieval-based model. The agent will also offer statements containing positive thoughts and advice to al- low the user to reflect and think of better actions which triggers the user’s self- reflection, but note that the system will not check if the user’s action was correct or not.

Evaluating. Lastly, the agent will ask the user’s insights about their conver- sation and whether it positively affected and helped his/her concern. The con- versation will then be ended by the agent’s statement expressing its availability and willingness to support and listen anytime and anywhere to the user’s stories and/or concerns.

4.3.2 Generative-based model

The generative-based model used for this system is composed of a publicly avail- able neural conversational model DialoGPT which is Fine-tuned for Empathetic Responses or what is named the FTER model and the PERMA Module. The base model of DialoGPT was adapted from the architecture of GPT-2 in 2019 by

(8)

Figure 4.4: Generative-based model

OpenAI but is trained with a much larger dataset consisting of twelve years worth of 147 million conversation-like exchanges from Reddit comment chains. Three different sizes of the model small, medium, and large were trained consisting of the model with 117 million, 345 million, and 762 million parameters, respectively.

All these variations are made available in the HuggingFace Transformers reposi- tory. For FTER to perform its objective of generating empathetic responses, it was trained and evaluated with three datasets namely EmpatheticDialogues (Rashkin et al., 2018b) Dataset, Well-being Conversations (Sia et al., 2021), and the PERMA Lexica (Schwartz et al., 2016).

EmpatheticDialogues Dataset

The recent EmpatheticDialogues (Rashkin et al., 2018b) dataset contains 24,850 conversations grounded on emotional situations. 32 emotion labels were chosen by aggregating labels from several emotion prediction datasets. Each of the dialogues consists of two roles - the Speaker and Listener. The Speaker is the person who chooses the emotion and description of the situation then instigates a conversation on it. The participant who understands and responds to the given situation is the Listener. Each conversation ranges from four to eight utterances and the average number of utterances per conversation is 4.31 while the average utterance length is 15.2 words. The dataset was initially split to make sure that all sets of conversations with the same speaker providing the initial situation description would be in the same partition.

There are eight features in the dataset: conv\_idfor conversation id,utterance\_idx for utterance id in each conversation, contextwhich contains the emotion where

(9)

the conversation is grounded, prompt as the initial emotional situation of the conversation, and the utterance which contains the alternating replies of both Speaker and Listener. The original format of the dataset contains the same prompt and different utterances between one Speaker and Listener in multiple rows. These data were processed in order to populate a whole conversation within a single row, resulting in 17,841 rows of conversation for training, 2,758 rows for validation, and 2,539 rows for testing.

Well-being Conversations

Well-being Conversations (Sia et al., 2021) are a collection of well-being related conversation logs from a rule-based chatbot used by twenty-five Grade 12 students aged 17-18. Conversations aimed to promote healthy lifestyle and well-being that can help prevent further illness and distress can be found. By training the gen- erative model with this type of conversations, it is expected to learn giving out helpful advice regarding well-being.

PERMA Lexica

PERMA Lexica (Schwartz et al., 2016) is composed of lexicons which can be used to predict well-being through the PERMA scales. Each lexicon has a respective category and weight. Categories are POS P, POS E, POS R, POS M, POS A, NEG P, NEG E, NEG R, NEG M, and NEG A for the positive and negative of all five categories of the PERMA model. Weight is the score of the word in the category which ranges from negative 0.3 to positive 0.7 as shown in Table 4.1.

Table 4.1: PERMA Lexica minimum and maximum scores per Category Minimum Score Maximum Score Mean

POS P -0.36639 0.76549 0.04172

POS E -0.30074 0.34065 0.03234

POS R -0.28884 0.78376 0.03824

POS M -0.16748 0.77167 0.02517

POS A -0.19784 0.55031 0.03990

NEG P -0.32731 0.70697 0.04705

NEG E -0.15230 0.84017 0.04354

NEG R -0.28648 0.62033 0.04045

NEG M -0.14987 0.31674 0.03416

NEG A -0.15369 0.24760 0.03426

(10)

4.3.3 PERMA Module

The PERMA module is responsible for computing the well-being score of the user. To compute the well-being score, PERMA Lexica (Schwartz et al., 2016) is used which contains a collection of words and its score in each PERMA category.

Butler and Kern also provided a table to interpret these scores. These clusters are re-named after the Mental Health Continuum phases for easier understanding of the users.

Table 4.2: PERMA Profiler Score Interpreter (Butler and Kern, 2016) Cluster Well-being Score Negative Emotion Score

Very High Functioning 9 and above 0 to 1

High Functioning 8 8.9 1.1 to 3

Normal Functioning 6.5 to 7.9 3 to 5

Sub-optimal Functioning 5 to 6.4 5.1 to 6.5

Languishing Functioning Below 5 Above 6.5

4.4 Physical Environment and Resources

The VHope system requires the following resources for development:

• Python

• Python Flask

• MySQL

• HuggingFace Transformers

To fine-tune the models, re-training is required and will consume a lot of re- sources. CCS Cloud’s current Alpha Testing provided the researcher a JupyterHub with 2x NVIDIA Tesla V100 GPUs at 32GB each.

To make the web page available for access, a server machine is needed. CCS Cloud also provided an environment configured in Ubuntu 18.04 with 4GB RAM and 32 GB storage since neural models are big in size.

The minimum requirement for accessing the web page is any updated browser installed in phone, tablet, or laptop and a stable internet connection.

(11)

Figure 4.5: CCS Cloud JupyterHub Interface

Figure 4.6: CCS Cloud allocated resources

4.4.1 Tools

ConceptNet

Pre-existing retrieval-based models, EREN and MHBot, used ConceptNet to pro- vide commonsense knowledge to their models. ConceptNet is a large semantic net- work which contains a variety of concepts extracted from other different sources such as Open Mind Common Sense, Wordnet, DBPedia, UMBEL, Wiktionary, and Verbosity.

(12)

Spacy

Spacy is an open-source library written in Python and Cython languages used for Natural Language Processing tasks such as tokenization, part-of-speech tagging, dependency parsing, and named entity recognition. This tool helped the retrieval- based model to understand the input and extract usedul information.

HuggingFace Transformers

For the generative-based model, HuggingFace Transformers is used. This pro- vides a variety of pre-trained models used in many cases which includes conver- sational modelling. DialoGPT, a large open-source neural conversational model, uses transformers and is readily available from the HuggingFace repository.

Referensi

Dokumen terkait

At the analysis tool module, WELSA uses a rule base technique to model a user. In fact, rule base technique has it’s own limitation. In this case, using the rule base to model a user

The HyperCGSF consists of a multifunctional geospatial service provider agent model, an underlying networking topology called ‘hypercube’, and a set of distributed

Some other modifications and important invention that provide simplified dynamic model of a liquid chiller and study of the thermo-dynamic mechanism of hybrid cycle, study about the

In this study also uses the hybrid method in building an event recommendation system, but the method used is a combined method of collaborative filtering and sentiment analysis, where

1.- This Chapter does not cover: a Articles of a kind used in machines, appliances or for other technical uses, of vulcanized rubber other than hard rubber heading 40.16, of leather

"Chapter 4 Copyright Law in Indonesia: From a Hybrid to an Endogenous System?", Springer Science and Business Media LLC, 2018 Publication Submitted to American Intercontinental

Chapter IV EM Method to Learn the Group Structure of Deep Neural Networks 4.1 EM Method Consider a layer of a neural network which receives the input variables,X={x1, ...,xN}and

engine cars, the hybrid electrical car has batteries that provide and drive electric motors in addition to a fuel tank that supplies power to the internal combustion engine.. There are