Developing an Expressive Intelligent Agent
Lamia Alam
Depart ment of Co mputer Science & Engineering Chittagong University of Engineering & Technology
Chittagong-4349, Bangladesh E-ma il: la miacse09@g mail.co m
Mohammed Moshiul Hoque Depart ment of Co mputer Science & Engineering Chittagong University of Engineering & Technology
Chittagong-4349, Bangladesh E-ma il: [email protected]
Abstract—Displaying facial expressions are one of the major ingredient in human-human and human-virtual agent communication scenarios. This capability makes an interaction episode more natural, effective and interesting. However, most of the previous works of online chatting system used emoticons for displaying emotion of interacting partners. In this paper, we focus on developing a human like virtual agent that produces facial expressions with motions (such as head movement and eye blinks) in a chatting environment through analyzing the input text messages of users. This agent is capable to display six facial expressions namely, happy, sad, angry, disgust, fear, and surprise based on the chatting partner’s input text. Evaluation result suggests that the proposed agent can display emotive expressions correctly 92% of the times from the users’ text input.
Index Terms— human-computer interaction; virtual agent;
expression visualization; facial expression; eye blinks.
I. INT RODUCTION
Ability to express emotion and feeling is a unique feature of human that makes it different fro m other living creature.
Here, face plays a vital role as change in facial e xpression and motion in face components (i.e. eye blink, head move ment etc.) are the ma jor mean of showing emotions and feelings. A face is capable of producing about twenty thousand different facial expressions [1]. It is the area of human’s body that most closely observed during interaction. . When people speak with one another, they tend to adapt their head movements and facial exp ressions in response to each other’s head movements and facial e xpressions [2]. But this is not the same when people are engaged in non-verbal communication with each other through computer.
A ma jor challenge here is to develop an expressive intelligent agent that can generate facial e xpressions and head move ments while people are interacting through it. In o rder to do so, the agent should understand the emotional content of the user’s input that expressed through his/her text message and should respond accordingly to that text. It requires both the model to recognize the emotions and the model to generate the facial e xpressions to produce the appropriate emotional response. People’s thoughts, feelings or intents are not expressed completely while co mmunicating with each other by exchanging text messages (especially in chatting scenario) through computer. Moreover, communicating via text messages alone often seems boring to interacting partners . In order to make the chatting interaction amusing, few systems
use virtual characters to represent chatting partners. These are mostly 2D or 3D models of human, cartoon or anima l like characters. But most people find these kinds of virtual characters unrealistic and incompatible with their personality because of their static and unsuitable representation.
In this paper, we would like to e xhibit the e xpressive capability of an intelligent agent to show various emotions by generating appropriate facial e xpressions and motions.
According to Nasr et al. [3] this type of agent requires a visual interface, typically a hu man face, wh ich is fa miliar and non - intimidating, making us ers feel comfortable when interacting with a computer. This visual interface should support the expressions that fit to the emotional states of human. In this paper, we worked with six basic e motions (i.e. happy, sad, fear, surprise, anger, and disgust) due to their universality [4]. In order to visualize these emotions through facial expression , we developed a human like agent that understands the emotional intent of user that expressed through textual input (as a form of words, or sentence). Addition to these basic emotions we adopted eye blinks and head movements to make the interaction more engaging and to maintain more natural communication with the other people in the collaborative virtual environment.
II. RELATEDWORK
A few research activities have been conducted on intelligent agent with e xpressive capabilities. There are some 2D or 3D virtual agents with very limited ability to demonstrate non-verbal behavior such as displaying predefined facial e xpression controlling through emotion wheel [5], move ment of face co mponents (lips, eye brows etc.) to conduct input speech [6] and low level co mmunication like gaze direction [7]. Another interactive animated creatures called woggles are autonomous, self-controlled and have the ability to ju mp, slide, move eyes , change body and eye shape while interacting among themselves and with the outside world [8].
Most of these studies are focused on generating static facial expressions while the motions of some of the face components have been neglected, in particular the eyes and the rigid motion of the head. Another aspect that is not addressed in these previous works is adaption of expressions with intermediate states.
Not only these virtual agents are used for entertainment, these agents are widely used for customer s ervice functionality.
Two such types of agents are Anna-virtual assistant [9] and
Rea-the real estate agent [10]. These agents interact with the users in a human like ways with a very limited e motion and provide desired information. In this work, we focus on developing an expressive agent that display different facial expressions by analyzing textual contents of the user.
III. PROPOSEDSYSTEMFRAM EWORK
The main objective of our work is to develop an expressive agent that can able to display various facial e xpressions during interaction with users . Fig. 1shows a schematic representation of the proposed system. Sender’s input text is analyzed first for generating appropriate facial expressions. Once processing is complete, recognize emotional te xt or word (s) [keywords] and match with the database. The database stores the emotional words and modifiers that define the emotional states and level of intensity of emotion. Once emotional words are matched, system proceeds to determine the level of intensity of recognized emotion. This module determines emotional intensity within a set of predefined intensity level stored in database.
The system consists of three main modules: e motion recognition and intensity determination, facia l e xpression visualization, and move ment generation.
Fig. 1. Proposed framework for expressive agent A. Recognizing Emotion and Determining Intensity
In order to produce facial expression, the module recognizes the emotional word of sender’s text. The module tokenize the sender’s input text into tokens by splitting the input sentence on specified delimiter characters (such as “
”,“,”,“.”,“?” etc.). From a set of token, it searches for keywords and related modifiers. After recognition of keywords and modifiers, the module will represent the emotional state against each word.
In the current imple mentation, we have used six basic emotional states: happy, sad, anger, surprise, fear, and disgust.
The module classifies recognized keyword into one of these categories with appropriate emotional strength. The recognized words are main ly adjectives (like happy, sad, furious, serious, horrid etc.) that give a clearer idea of emotion. For e xa mp le, consider two people are chatting to each other and trying to express their feelings:
User A: I a m angry with you.
User B: Well, I a m sorry.
Here the sentence of user A, ‘I am angry with you is first divided into tokens : “I”, “am”, “angry”, “with” and “you”.
Then the keyword “angry” is extracted from these tokens to characterize user A's emotional state. Similarly, User B's emotional state is characterized with the word “sorry”.
Ne xt, the module will assign the level of intensity with the corresponding emotional state. Consider another examp le: user A’s statement with different emotional intensity:
User A: I a m very angry with you.
In this case, the modifier “very” is used to determine the intensity level for this emotional state. The intensity level for the emotional state “very angry” and “angry” is different, so their corresponding facial expression and intensity level will change with low to high value. Other modifiers such as, really, extre me ly, highly, and so on can be stored in database. Once the emotional strength of a particular category passes a certain threshold then user’s agents’ representation can be changed to show appropriate expression. This system also generates facial motions such as eye blink and head movements based on certain text input such as: yes, ya, no, nope, etc.
B. Facial Expression Visualization
To visualize the expression, we have created 3D human-like character using software MakeHuman [11]. Table I indicates some characteristics of 3D character designed by MakeHuman.
TABLE I. CHARACTERISTICS OF 3DCHARACTER
Characte ristics Race Age Weight Height(cm) Muscle
Male Agent Asian 25 100% 180.41 50%
Female Agent Asian 22 75% 153.5 32%
Another software Blender [12] is used to make these agents more realistic and to generate the facial e xpressions for each emotion. Fig. 2 shows a snapshot of Blender interface. In order to make the agents more natural, we manipulated the texture of the agent and performed c loth simulation. Fig. 2(a) shows cloth panels used for cloth simulation of agents in Blender. The basic idea next here is to manipulate the geometric representation of the face of these models using facial parameters specified by Nasr et al. [3]. Fig. 2(b) shows the interface of Blender in pose mode to manipulate the face models for generating various facial e xpressions.
(a) Cloth simulation panel (b) Pose mode Fig. 2. Blender Interface
We generated six basic facial expressions and motions (i. e, eye blinks and head movements) by manipulating the facial parameters based on the involvement of facia l muscles and other non-verbal behaviors. Table II shows the facial expression of six basic emotions along with neutral expression for both male and fe ma le agent.
TABLE II. GENERATED FACIAL EXP RESSIONS Female Agent Facial
Muscles
Male Agent
(a) Neutral
No contraction of
Facial Muscles
(a)Neutral
(b) Happy
Eye brows up;
Cheeks rose;
Lip corners pulled up;
(b) Happy
(c) Sad
Inner eyebrows
rose;
Eyes down cast;
Lip corners
pulled down; (c) Sad
(d) Fear
Eyebrows pulled up;
Mouth stretched;
(d) Fear
(e) Surprise
Entire eyebrows pulled up;
Mouth hangs open;
(e) Surprise
(f) Angry
Inner eyebrows pulled down;
Lips corner pulled down;
Mouth slightly
open; (f) Angry
(g) Disgust
Eyebrows pulled down;
Upper lip pulled up;
Lips corner stretched
downward; (g) Disgust
C. Movements Generation
Ideally, humans would interact with virtual agents as naturally as they interact with other people. To facilitate this kind of social interaction, agent behavior should reflect life- like qualities. To produce natural behaviors, we included move ments in facia l co mponents such as eye blinks and head move ment and clear transaction between emotional states using frame -by-fra me animat ion technique. Table III shows the fra mes used to generate eye blink. We choose to blinks in every 3 seconds and one blink lasts about 1/3 seconds .
TABLE III. FRAMESUSEDT OGENERATEEYEBLINKS Female Agent Condition to
generate eye blink
Male Agent
Eyes Open;
Eyes Closed;
Eyes Open;
To generate the head movements we manipulated the bone group of neck. Fig. 3 show some fra mes for head movements.
Specifica lly, we focused on two kinds of head movements : (i) nodding (i.e. head is tilted in alternating up and down arcs along the sagital plane) and (ii) head shaking (i.e. head is turned left and right along the transverse plane repeatedly in quick succession). Nodding of head is used to indicate acceptance, whereas shaking of head is used to indicate disagreement, denial, or rejection. Fig. 3(a) show the frames used to generate nodding by female agent and Fig. 3(b) show the frames used to generate shaking head by male agent. To achieve these head movements frame -by-fra me animat ion technique is used.
Fig. 3. Frames used to generate head movements.
IV. EXPERIM ENTS
To evaluate the system, we conducted an experiment to achieve the appropriate expression for each basic emotion.
A. Participants
A total of ten participants were participated in this experiment. The average age of participants are 32.4 years (SD= 4.45). Most of them are employees of private organizations.
B. Experimental Design
To select the appropriate facial e xpression, we designed different types of expression for each kind. For happy, sad, angry, fear and disgust we designed five different types of
(a) Nodding Head
(b)Shaking Head
expression for each category. However, we have used only three for surprise. Before the e xperiment, we e xp lained the purpose of this experiment is to evaluate the suitable expression for each emotion. Each tria l was started with showing the participants different types of expressions for each emotion twice. Each participant interacted with both male and fe male agents and each session took approximately 45 minutes.
After experiencing all e xpressions, participants were asked to rate their feelings for each type in terms of 1-to-7 Like rt scale.
C. Evaluation Measures
We measured the following two ite ms in this e xperiment:
Appr opriateness : We asked a question (‘which type of expression do you lik e most preferable to represent an emotion?’) to all participants for choosing appropriate expression for each category.
Accurac y: To evaluate the system performance, we have analyzed emotive input texts and observed their corresponding recognition with expression generated by the system. We counted total number of emotive keywords in the texts (N), total nu mber of e motive words that are correctly recognized (C) and total number of incorrect emotive words (M). We used the following equations to measure the accuracy (A) in recognizing e motion.
%
100
N
M A N
D. Results
We conducted a repeated-measure of analysis of variance (ANOVA ) on the participants’ scores for both male and female agents.
1) Appropriate Expressions
We collected a total of 600 (10 [partic ipants] 6 [e xpressions] 5 [types] 2 [agents]) interaction responses for both agents. Figs. 4 and 5 show the results of questionnaire analysis.
For happy expression of female agent, the results show that the differences among conditions were statistically significant [(F (4, 49) =31.5, p<0.0001, 2=0.75]. Results also indicates that the type 3 expression gained the higher scores than other types [Fig. 4(a)]. Thus, we chose type 3 expression for happy expression of female agent. On the other hand, for same expression of male agent, we also found significant differences among conditions [(F (4, 49) =11.8, p<0.0001, 2=0.51]. The Result also reveals that the type 4 expression gained the higher scores than other types [Fig. 5(a)]. Thus, we have to decide to use expression type 4 for producing the happy expression of ma le agent.
In case of sad expression of female agent [F (4, 49) =58.1, p<0.0001, 2=0.83] and ma le agent [F (4, 49) =29.3, p<0.0001,
2=0.72], the results show that the differences among conditions were statistically significant. Results [Fig. 4(b) and 5(b)] also indicate that for both fema le and ma le agen t the expression type 3 gained the higher scores than other types.
Thus, we chose type 3 e xpression for sad expression of both fe male and male agents.
(a) Happy expression (b) Sad expression
(c) Angry expression (d) Fear expression
(e) Disgust expression (f) Surprise expression Fig. 4. Evaluation results for Female agent
Similarly for angry e xpression of fema le agent [F (4, 49)
=65.2, p<0.0001, 2=0.85] and male agent [F (4, 49) =44.8, p<0.0001, 2=0.80] having significant differences among conditions. Results [Figs. 4(c) and 5(c)] revealed that the type 2 expression gained the higher scores than other types for both fe male and ma le agent. Thus, we have to decide to use expression type 2 for producing the angry expression.
In case of fear express ion of female agent [F (4, 49) =34.3, p<0.0001, 2=0.75] and ma le F (4, 49) =60.3, p<0.0001,
2=0.80] agent the results [Figs. 4(d) and 5(d)] show that the differences among conditions were statistically significant and indicate that the type 4 e xpression gained the higher scores than other types for both agents. Thus, we chose type 4 expression for fear e xpression.
For disgust expression the results show statistically significant differences among (4, 49) =31.3, p<0.0001, 2=0.74] and male agent [F (4, 49) =51.1, p<0.0001, 2=0.82]. Results also indicate that the type 4 expression gained the higher scores than other types for female agent [Fig. 4(e)]. On the other hand, for same expression of male agent, the Result indicates that the type 5 expression gained the higher scores than other types [Fig. 5(e)].
Thus, we have to decide to use expression type 4 and 5 for producing the disgust expression for female and male agents respectively.
(a) Happy expression (b) Sad expression
(c) Angry expression (d) Fear expression
(e) Disgust expression (f) Surprise expression Fig. 5. Evaluation results for male agent.
For the expression surprise results showing different among statistically significant conditions for female [F (2, 29) =50.4, p<0.0001, 2=0.78] and male [F (2, 29) =32.8, p<0.0001,
2=0.84] agent. Results also indicate that the type 3 expression gained the higher scores than other types for female agent [Fig.
4(f)] and type 2 expressions gained the higher scores than other types for male agent [Fig. 5(f)]. Thus, we have to decide to use expression types 3 and 2 for producing the surprise expression of female and male agent respectively.
Table IV shows the summarized results of analysis of both female and male agents’ expressions.
TABLE IV. SUMMARIZATION OF ANALYSIS RESULTS FOR MALE AND FEMALE AGENTS
Expression Type
Happy Sad Angry Fe ar Disgust Surprise
Type1 Female 2.7 3.4 3.4 3.5 3.5 2.7
Male 3.9 2.8 3.4 2.6 3.5 3.4
Type2 Female 4.2 2.5 6.0 2.7 2.5 4.1
Male 3.2 3.6 6.2 2.0 4.0 6.5
Type3 Female 5.8 6 2.5 4.3 4.2 6.2
Male 4.0 6.1 4.2 3.6 2.6 4.5
Type4 Female 4.0 4.1 4.7 6.0 5.9 -
Male 5.6 4.0 4.2 5.8 2.0 -
Type5 Female 2.5 2.5 3.3 4.1 2.6 -
Male 3.8 4.0 2.4 4.4 6.0 -
2) Accuracy
Table V summa rizes the results of data analysis. We calculated the accuracy of the system in recognizing e motions and generating corresponding expressions. This result reveled that the system is about 92% accurate in recognizing emotion fro m the input texts.
TABLE V. ACCURACY OF RECOGNIZING EMOTIVE WORDS No. of emotive
keywords (N)
No. of corre ctly re cognizes emotive
words
No. of incorre ctly re cognized emotive
words (M)
50 46 4
V. CONCLUSION
The primary focus of the work is to develop an expressive agent that can generates facial e xpressions with some facia l move ments as a means of communication. For this purpose, we developed an agent that can display six fac ial e xp ressions depending on the texts input by the users . This agent can be used in chatting scenarios in place of emoticons that make the text-based chatting more interesting and enjoyable. Although the current version of the agent has some limitations, it can be a better way of recreation for people and may represent themselves via this agent in online. Experimental results revealed that the system is quite satisfactory. Full body embodiment with various gestures will enhance the interaction quality of the agent that are left as future issues.
REFERENCES
[1] R. L. Birdwhistell, Kinesics and Context: Essays on Body Motion Communication, Philadelphia: University of Pennsylvania Press, 1970.
[2] S. M. Boker, J. F. Cohn, B. J. Theobald, I. Matthews, T . R. Brick and J.
R. Spies, “ Effect of damping head movement and facial expression in dyadic conversation using real-time facial expression tracking and synthesized avatars,” Phil. T rans. Roy Soc. B, vol. 364, pp. 3485–3495, December 2009.
[3] M. S. E. Nasr, T . R. Ioerger, J. Yen, D. H. House, F. I. Parke,
“Emotionally expressive agents,” in Proc. of Int. Conf. on Computer Animation, pp.48-57, Washington, DC, USA,1999.
[4] P. Ekman, E. R. Sorenson, and W. V. Friesen, “Pan-cultural elements in facial displays of emotions”, Science, vol. 4, pp. 86-88, April 1969.
[5] D. Kurlander, T. Skelly and D. Salesin, “Comic chat”, in Proc. of Int.
Conf. on Computer Graphics and Interactive Techniques, pp. 225-236, 1996.
[6] K. Nagao and A. T akeuchi, “ Speech dialogue with facial displays:
Multimodal human-computer conversation,” in Proc. Annual Meeting of the Association for Computational Linguistics, pp. 102-109, Las Cruces, NM, USA, 1994.
[7] H. H. Vilhjalmsson and J. Cassell, “BodyChat: Autonomous communicative behaviors in avatars,” in Proc. 2nd Int . Conf. on Autonomous Agents, Minneapolis, pp. 269-276, USA, 1998.
[8] A. B. Loyall and J. Bates, “ Real-time control of animated broad agents”, in Proc. Annual Conf. on Cognitive Science Society, pp. 664-675, Boulder, Colorado,USA, 1993.
[9] I. Mount, “Cranky consumer: Testing online service reps,” The Wall Street Journal, February 1, 2005.
[10] J. Cassell, T. Bickmore, M. Billinghurst, L. Campbell, K. Chang, H.
Vilhjalmssom and H. Yan, “Embodiment in conversational interfaces:
Rea”, in Proc. of Int. Conf. on Human Factors in Computing Systems, pp. 520-527, Pittsburgh, Pennsylvania, 1999.
[11] MakeHuman, http://www.makehuman.org/.
[12] Blender, http://www.blender.org/.