Chapter 2 Review of Related Literature

(1)

Chapter 2 Review of Related Literature

This chapter discusses the approach, features, and limitations of existing research that are related or similar to the proposed research consisting of three sections.

First section describes the existing chatbots used for mental health and well-being.

Next section discusses the technology applied on creating empathic systems and lastly, the third section details natural language techniques in response generation for artiﬁcial intelligence (AI) conversational agents.

2.1 Chatbots for Mental Health and Well-being

Conversational agents have been around and present in human lives serving the roles of virtual assistants (Dibitonto et al., 2018), software tutors (Hobert, 2019), and learning companions (Cassell et al., 2007; Fryer & Carpenter, 2006). As it became more prevalent in the society, its use was extended to the medical field. The creative use of mobile health applications such as in chatbot domain showed potentials in reducing the cost of health care and improve well-being in countless ways (Kumar et al., 2013). Fryer and Carpenter also observed that most students enjoyed talking with a chatbot during their study. Students even feel more comfortable chatting with the chatbot than any of their student partner or teacher. People in general tend to appreciate and have more relaxed conversations with chatbots as it don’t get bored or angry even if they repeatedly share their stories or talk about a certain topic endlessly. Chatbots are also capable of giving fast feedback and share broader knowledge than human teachers can. Given those qualifications, a conversational agent can be an acceptable domain for support in the field of mental health.

(2)

In the year 1966, Weizenbaum created ELIZA, one of the ﬁrst chatterbots.

ELIZA is often described as a therapist chatbot which follows the simple Rogerian psychotherapy rules to impersonate a real-life therapist. Using pattern-matching and word substitution, it was able to respond appropriately to the text input. The script it follows let it seem to be reﬂecting on questions of the user by turning the question back to its patient. Even though it was able to respond appropriately, the chatterbot cannot really understand or remember what was being told which can sometimes result to repetitive answers.

Liu et al. (2013)first offered a chatterbot where anyone can ask non-obstructive psychological questions and the answers were derived from an online Q&A fo- rum to address the lack of mental health awareness of people. This application’s limitation became a stepping stone for Huang et al. (2015) to develop a better conversational agent that can detect stress in adolescents and can give positive response in order to shift their negative feelings. This chatbot accepts text input from the user and detects whether the user is stressed or not. In case stress is de- tected and user’s input is classified as declarative sentence or rhetorical question, the response will come from their local knowledge base which consists of positive responses; and if the user’s input is an interrogative question, then it will use the same idea as Liu et al. (2013) where the response will come from a large Q&A community. Every response delivered by this chatbot is pre-defined and limited to their retrieved knowledge from available public resources.

In order to make chatbots more appealing and human-like to users, espe- cially in thefield of mental health where showing empathy is essential, researchers started to explore on making chatbots emotionally-aware. Similarly, Ghandeharioun, McDuff, Czerwinski, and Rowan presented an emotionally-aware agent that can respond to the specified emotion appropriately. They used valence (pleasure - dis- pleasure) and arousal (high energy - low energy) to make it more understandable for their user’s to determine their current affective states. Responses generated for the users are based from their manually scripted dialog that contains emotionally expressive statements and emojis to better communicate their emotions.

Deep learning methods for mental health intervention was explored by Yin et al. in 2019 using recurrent neural networks for detecting stress and sequence to sequence (Seq2Seq) for response generation. Their chatbot, EveBot, was able to diagnose student’s negative, depressive, and anxious emotion using their input and can act as a psychological therapist and virtual friend which can give them comfort, shift their attention to positive emotions, and oﬀer friendly solutions. For their mood detection model, they used Bi-directional LSTM RNN-based model which is a deep learning approach that classiﬁes responses into either positive or negative categories. As for the response generation, a fully generative Seq2Seq based on an objective function Maximum Mutual Information (MMI) are used.

(3)

This objective function evaluates the mutual dependence between input and output. This study believed that since their chatbot uses deep learning based models, more interaction means more data that can be collected. Therefore, system per- formance can improve over time using the collected data for more training.

Sunny (Narain et al., 2020), on the other hand, was a Facebook Messenger bot that improves psychological well-being by promoting positive social connections.

Using a state machine composed of states and transitions between them, conversation flows between the user and the bot using pre-defined messages. Sunny prompts and helps them to compose a message that they may want to send to a friend. These prompts aim to promote user’s gratitude to the friend, positive reflection to their relationship, and appreciation to their friendship. Though it follows a certain conversation flow with specific states, Sunny was enjoyed by the participants saying it provided them venue to to say meaningful things which may feel awkward if said in day-to-day life.

Figure 2.1: Features and interaction flow of Woebot (De Nieva et al., 2020) Two of the most popular mental health chatbots available to date in the market is Woebot and Wysa. Woebot, as studied by Fitzpatrick et al. (2017), was a fully automated conversational agent that can deliver self-help cognitive behavioral therapy (CBT) for college students who self-identified themselves as showing signs of anxiety and depression. The chatbot was able to generate empathetic responses where it can show excitement for good news and sympathy for loneliness. Its interaction flow is as shown in Figure 2.1. It also has goal setting and reflection features which are identified to be therapeutic. Overall, Woebot helps mainly on relationships, grief, and addiction of a person.

(4)

Wysa, on the other hand, was also studied by Inkster et al. (2018) to present a preliminary evaluation using real-word data for the chatbot’s eﬀectiveness on users with self-reported depression. Aside from CBT, other evidence-based prac- tices such as dialectical behavior therapy, motivational interviewing, positive behavior support, behavioral reinforcement, mindfulness, and guided microactions and tools to encourage users to build emotional resilience skills are also included in the application. Wysa helps those suﬀering from stress, anxiety, and sleep loss.

Although both Woebot and Wysa are proven to be eﬀective and acceptable as a self-help application for mental health support, they are still running on rule- based design with scripted statements which sometimes may result to repetitive responses as their developers cannot risk these publicly available chatbots to generate weird responses to their users. These also limits the chatbot’s understanding and adaption to the user’s interest during conversation.

(5)

Table2.1:ChatbotsforMentalHealthandWell-being:SummaryofSources ChatbotsTargetuserClinicalapproachTechnologyLimitations ELIZA(1966)notspeciﬁedactslikeaRogerianpsy- chotherapistwhichreﬂectson questionsbyturningtheques- tionsback rule-based patternmatch- ingandsubsti- tution

noframeworkforunderstand- ingthecontextofconversation TeenChat(2015)15to22yodetectingstressthenreplies positivemessagestoshift user’sattention

retrieval-basedresponsedataislimitedto theirlocalknowledgebaseand anonlineQ&Acommunity EMMA(2019)16to49yoemotion-awarethatallowsto respondpositivelytopositive emotionsandsympathetically tonegativeones

rule-basedmanuallyscriptedallthetex- tualinteractions EveBot(2019)teenagersdiagnosestudent’snegative emotiontopreventdepression throughpositivelysuggestive responses

RNN-LSTM generative model

poorperformanceongrammar andmeaningofresponsesdue tosmalltrainingdata Sunny(2020)university studentsand youngprofes- sionals

promotespositivesocialcon- nectionstoenhancespsycho- logicalwell-being

rule-baseddesignedwithastatemachine, asetofstatesandtransitions betweenthem Woebot(2017)18to28yoself-helpCognitiveBehavioral Therapy(CBT)rule-basedworkswithpre-deﬁnedstate- mentsastheycouldnever riskitsayingsomethingreally weird Wysa(2018)anonymous populationCBT,dialecticalbehavior therapy,motivationalinter- viewing,positivebehavior support,behavioralrein- forcement,mindfulness,and guidedmicroactionsandtools toencourageuserstobuild emotionalresilienceskills rule-basedentirelyscriptedandgenerates repetitiveresponses

(6)

2.2 Empathic Computing

Empathy, as described by psychotherapist Alfred Adler, is one’s ability to see with the eyes of another, listen with the ears of another, and feel with the heart of another. Generally speaking, to look into other’s perspective and to respect their opinions. He also acknowledged the inevitable presence of empathy in the lives of people and its essential role to the delivery of counselling and therapy (Clark, 2016). Machines or computers have been branded as apathetic object for years which can only accept and reject human instructions. Technology has now come to the point of advancement which lets human show greater empathy for one another.

Alfred Adler’s original deﬁnition for empathic computing was the technology or computer systems that can create deeper understanding or empathy between people. This ﬁeld of empathic computing focuses on embedding elements of empathy such as sensitivity to emotions that intends to interact with users beyond the technical aspects. The chatbot ELIZA (Weizenbaum, 1966) is possibly the earliest technology that can engage to its user in an empathetic manner. But empathic computing is not limited to systems showing care to its users. There are three types of empathic computing systems (Billinghurst, 2017). One is the area ofunderstanding where systems are designed to understand the user’s feelings and emotion. Two, experiencing which consists of systems that bring its users to the recorded world of others. And third is sharing where systems allow its users to share the feelings of other in real time.

Conversational agents with the ability to empathize with its users falls under the area ofunderstanding. Researches under this area are done to study how they can develop systems that can recognize a person’s emotional state. This area of empathic computing is also called Affective Computing that was first coined by Picard defining it as ”computing that relates to, arises from, or deliberately influences emotion or other affective phenomena”. With therapeutic chatbots being able to understand a patient’s emotion appropriately, it can suggest suitable treatment or therapy. There are three types of empathy that a patient can exhibit, cognitive, emotional, and compassionate (Devaram, 2020). Knowing what type of empathy to handle, applications will know how to understand the emotions in the perspective of the user and appropriately relate them to correct emotions such as happy, sad, anger, and fear.

Using artiﬁcial intelligence and deep learning techniques paired with natural language processing, emotions can be processed from user input. Zhong et al.

(2019) used a deep learning approach with recurrent neural networks embedded with attention to create an open-domain neural conversational model. In order

(7)

to make their model generate affective words in its responses, they adopted the Valence, Arousal, and Dominance (VAD) affective notations to annotate each word with affect. Some of the words from their corpus with either the highest or lowest rating on valence, arousal, and dominance are plotted in Figure 2.2.

Their results showed that the aﬀect-incorporated weights of their model was able to maintain a good balance betweenﬂuency and emotion quality in its responses.

The model also undergone human evaluation with a preference test where most annotators preferring their model over the state-of-the-art baseline model in terms of both content quality and emotion quality by a large margin.

Figure 2.2: 2D representation showing words with the highest or lowest ratings in valence (V), arousal (A), and dominance (D) (Zhong et al., 2019)

Another chatbot that exhibits great emotional connection with its human users is XiaoIce (Zhou, Gao, Li, & Shum, 2020), literally means ”Little Ice”, which is known as the most popular social chatbot created by Microsoft being used by over 660 million active users. Being a social bot, it was designed to address the human’s need for companion, aﬀection, and social belonging integrating both intelligent quotient and emotional quotient in its system design. Four key parts - dialogue manager, core chat, skills, and empathetic computing module, makes up its system architecture. The empathetic computing module consists of three components:

contextual query understanding, user understanding, and interpersonal response generation. In contextual query understanding, all entities mentioned are labelled

(8)

and linked to the entities from the memory of the stat tracker. These contextual queries will be utilized by the Core Chat in order to generate responses by using either a retrieval-based engine or a neural response generator. User understanding is responsible for generates the query empathy vectors which consists of a list of key-value pairs representing the user’s intents, emotions, topics, opinions, and the user’s persona. There are also a set of machine learning classifiers to help generate the key-value pairs namely topic detection, intent detection, sentiment analysis, and opinion detection. Lastly, the interpersonal response generation is designed to generate the response empathy vectors which specifies what empathetic aspects for the response are needed to be generated and to embody the persona of XiaoIce, an 18-year-old girl who is always reliable, sympathetic, affectionate, and has a wonderful sense of humor. This empathetic computing module was first released in July 2018 which then became the most important feature of XiaoIce’s sixth generation that has substantially strengthened XiaoIce’s emotional connections to human users and increased XiaoIce’s number of active users.

Facebook AI Research also released their own open-domain chatbot named Blender (Roller et al., 2020), named as such due to its ’Blended Skill Task’ setup.

The initial goal of their study was to emphasize that aside from large parameters and data size, there are other important elements that is needed to be considered for a high-functioning chatbot. This includes the ability to engage in an inter- esting way, to listen to their partners, and to display knowledge and empathy all while maintaining a consistent persona. Their work consisted of three types of architectures - retrieval, generative, and retrieve-and-reﬁne. After training their models, they ﬁne-tuned it to a cleaner, more task-focused datasets such as Con- vAI2 (S. Zhang et al., 2018) which focuses on personality and engaging the other speaker,EmpatheticDialogues (Rashkin et al., 2018b) for empathy, and Wiz- ard of Wikipedia (Dinan et al., 2018) for knowledge. And with the Blended Skill Task applied that emphasizes desirable traits which can also minimize undesirable traits learned from large corpora, their models was able to achieve improved per- formance in terms of humanness and engagingness as per their human evaluations.

(9)

Table2.2:EmpatheticComputing:SummaryofSources ResearchAimTechnology Zhongetal.(2019)capturetheinput’semotionandgenerate aﬀect-richresponsesrelatedtothecontextRecurrentNeuralNetworks(RNN)withAt- tentionandValence,Arousal,andDominance (VAD)wordembeddings Zhouetal.(2020)recognizehumanfeelingsandstates,under- standuserintents,andrespondtotheuser’s needs

empatheticcomputingthatconsistsofthree modules-contextualqueryunderstanding, userunderstanding,andinterpersonalre- sponsegeneration Rolleretal.(2020)displayempathy,knowledge,andpersonalityﬁne-tuningthemodelintodomain-speciﬁc datasetssuchasEmpatheticDialoguesto teachempathy

(10)

2.3 Response Generation Models

Conversational agents can be classiﬁed into diﬀerent types depending on some criteria. Some of these criteria may include the system’s core design philosophy, the extent to which the context are needed to be stored and considered in order to comprehend the conversation, or the type and purpose of the conversation for which the chatbot needs to be designed (Ramesh, Ravishankaran, Joshi, &

Chandrasekaran, 2017). Machine learning-based conversational agents can also be classiﬁed depending on their response generation models.

2.3.1 Retrieval-based Models

Retrieval-based models are designed to query from a pre-defined set of responses fed into the system’s database. This model retrieves the closest candidates from the database which matches the current utterance then chooses the most appropriate response to output. As a result, they are incapable of generating new text or content. Different heuristics can be applied in choosing the proper response. It can be a simple concept of rule-based expression match or a complex combination of machine learning classifiers.

One such system is a chatbot that can perform a standard assessment of alcohol drinking habits to determine the level of health risk (Elmasri & Maeder, 2016).

This chatbot was developed to oﬀer two functionalities. One is to provide alcohol education and two is to perform an alcohol risk assessment to the user. The design of the chatbot allows it to ask interview-styled questions which has the capability of mimicking a pragmatic consultation or session with an actual health care professional. Based on the user’s responses to the chatbot which includes the user’s pre-existing drinking behavior, the chatbot can assess the user’s level of risk. And as a form of feedback, appropriate recommendation and educational information is shared to the user. The alcohol education map in Figure 2.3 shows the three main alcohol education topics oﬀered by the chatbot.

The chatbot has been designed to make its user feel that the conversation is remembered and that it is aware of the context. To achieve this functionality, the chatbot need to perform one of two actions - (1) store the response given by the user if the chatbot believes it is a suﬃcient answer. If this is successful, then the chatbot can prepare and collate an appropriate feedback for the response;

or (2) if the answer is not suﬃcient, the chatbot will try to collect appropriate response from the user. In case of open-ended question, the chatbot will ask again in simpler words and if close-ended, the question will be posed agan then ask the

(11)

Figure 2.3: Alcohol education conversation map of Elmasri and Maeder (2016) user to say ’Yes’ or ’No’. Once complete, step (1) will be repeated and so on. After the evaluation, few users commented on chatbot’s inability to recognize diﬀerent but related keywords and that there were too much information resulting from the use of highly structured conversation maps.

2.3.2 Generative Models

Contrast to the retrieval-based models, generative models can construct its own responses by using diﬀerent machine learning techniques. Models in this category are trained on a given dataset which consists of dialogues and are used to generate responses by ’translating’ inputs into responses instead of translating from one language into another language (Kulatska, 2019). They also have the ability to refer back to previous information thus making them smarter which can even make their responses more human-like. But these models are hard to train and are prone to grammatical errors and usually requires large amount of computing resources and testing and training data.

Chawla and Anuradha (2018) built a generative chatbot that can act like a counselor who responds by giving an advice based on the premise it was given.

(12)

Their dataset was initially trained on LSTM or GRU, which is a type of RNN, where each character relevance is taken to understand the words based on which characters appear in a word and which word appear near which word and thus, the context as to what comes after which kind of sentence is ﬁnally understood by the system and stored in the database. In their conclusion, they wrote that their chatbot was able to generate grammatically correct English responses and that the overall mood of the user improved since he/she was able to express what he/she feels to the chatbot.

Another generative chatbot is CAiRE presented by Lin et al. (2020) which focused on the fully data-driven integration of empathy to chatbots. Their model adapted the publicly available Generative Pre-trained Transformer (GPT) and used transfer learning to integrate empathetic response generation task. The chatbot is currently accessible through a web-link where they continuously collect user feedback to better improve the empathy of the model.

2.3.3 Hybrid Models

Hybrid models combining the retrieval and generative models are recently explored in several researches. This is in order to address the limited responses of retrieval- based models and the sometimes ungrammatical responses of generative models.

In 2017, Tammewar et al. proposed a hybrid model which combines the use of a rule-based retrieval system and a neural conversational model. Their rule-based retrieval system shows high precision - the fraction of relevant instances among the retrieved instances, and can provide grammatically correct responses but is said to have a low recall, which is the fraction of relevant instances that were retrieved.

Thus, generative model is used when the conversation deviated from the expected conversationﬂow resulting to the ability of handling more complex and variety of requests. Their generative model uses Recurrent Neural Networks (RNNs) which takes sequence of words for input and outputs a predictive sequence in return.

They also used attention mechanism to allow the decoder block to take a look at the input sequence which in return can overcome the limitation of ﬁxed context size regardless for long input sequences.

Another study by Day and Hung (2019) developed an artiﬁcial intelligence aﬀective conversational robot (AIACR) which uses sentiment analysis as shown in Figure 2.4 to be emotion-aware. In their study, they populated a corpus for their retrieval-based model and used LSTM for the generative model. For training the sentiment prediction model, they used LSTM and BiLSTM. The similarity ranking model will choose the better response between the responses generated by both retrieval-based and generative, treating them equally.

(13)

Figure 2.4: Hybrid chatbot system architecture by Day and Hung (2019) In this study, the proposed architecture will have the generative model as an extension to the retrieval model. It does not use a ranking or similarity model to treat both models’ responses equally like in Day and Hung (2019) and does not treat the generative as a backup as proposed by Tammewar et al. (2017) but instead as a full extension to the guided retrieval model to generate empathetic and non-repetitive responses. The retrieval model here will maintain a conversation ﬂow and give keywords in order to guide and limit the context that can be generated by the generative model.

(14)

Table2.3:ResponseGenerationModels:SummaryofSources ResearchResponseGenerationDescription ElmasriandMaeder(2016)Retrieval-basedprovidesinformationregardingalcoholeducationusingapre- populateddatabaseoffactualresponsecontentsfollowinga conversationmap ChawlaandAnuradha(2018)GenerativetrainsitselfusingRNN-LSTM/GRUandcontextclassifica- tioninimprovingitsresponsesandexpandingtheresponses ituses Linetal.(2020)GenerativelearnsempathyviatransferlearningusingEmpatheticDi- aloguesdatasetandPersonaCHattoallowmoreconsistent personatothechatbot Tammewaretal.(2017)Hybridarule-assistedretrievalsystemisbackedupbyagenerative neuralmodeltogenerateresponsesincasetheinput’sstate cannotbeidentifiedorcannotbehandledbytheretrieval model’sdesign DayandHung(2019)Hybridusessimilarityrankingmodeltochooseabetterresponse betweentheretrieval-basedandgenerativemodels(retrieval- basedhaspre-definedcorpusandgenerativeusesLSTM)