System Performance - Results and Discussions

C- Hg: A COLLABORATIVE HAPTIGRIPPER VIRTUAL REALITY SYSTEM

7.6 Results and Discussions

7.6.2 System Performance

INC-Hg automatically recorded the task-related data from 20 games and 60 minutes of game data (10

participants × 2 games × 3 minutes per game) of the study. This yielded a total of 327 socially-directed participant utterances and a total of 558 agent utterances, which means that participants said an average of approximately five utterances per minute. The frequent utterances suggest that the game was well designed to promote verbal communication.

To evaluate the classification accuracy of the system, we recruited two human coders (trained graduate students with experience collecting and analyzing qualitative and quantitative data) to manually classify the recorded utterances of participants into the five dialogue act classes as the ground truth. We provided a framework that described the concrete definitions of the five dialogue act classes and a few examples about how to read and code the recorded files for the human coders. Each human coder independently coded all the data. After each rater completed coding, a percent agreement of 93.5% was found. For the coding that were not in agreement, two human coders reconciled differences via a consensus in which any discrepancies were discussed and resolved.

Figure 7-7 shows the confusion matrix of the dialogue act classification results. The results indicated that the classification accuracy of the system reached 70.34% (230 out of 327 utterances were accurately classified), which is much higher than the random classification accuracy for five classes (20%). From the matrix, we see that most “DirAct” utterances were mistakenly classified as “ProObj” (57.1%). The reason might be that participants usually mentioned the name of the target object when giving the action commands (e.g., pick up the pig). The classification results suggested the potential of INC-Hg to automatically recognize and analyze the conversation content.

Figure 7-7. The Confusion Matrix for Dialogue Act Classification.

Accuracy: 70.34%

78.5%

0.0%

14.0%

7.5%

0.0%

75.0%

0.0%

25.0%

0.0%

16.1%

1.2%

74.4%

125

8.3%

0.0%

8.6%

0.0%

57.1%

34.3%

0.0%

6.7%

20.0%

0.0%

53.3%

Ack ReqObj ProObj DirAct Que

Target Class Ack

ReqObj

ProObj

DirAct

Que

Output Class

In order to analyze the accuracy of the agent to respond to the user, as well as the accuracy of the agent to initiate a proper conversation and/or interaction with the user, two human coders were also asked to watch the videotapes to code the participant’s and agent’s utterances into two types of utterances, Initiation and Response. Initiation represented one player’s statement that started a conversation. Response represented one player’s feedback to the partner’s statement. The inter-rater reliability using the same procedures described above was 100% for this process.

Based on the final coding data, we counted the frequencies of proper/improper Initiations and Responses of the agent (Table 7-6). The results indicated that among the 558 agent’s utterances, 312 utterances are Responses, while 246 ones are spontaneously initiated utterances. We found 88.46% (276 out of 312) of agent’s Responses were proper feedback to the participants’ speech or actions. However, there were 7 times when the agent did not give any response to the participants’ requests, reducing the accuracy to 86.52%

(276 out of 319). On the average, the proper Response rate to each participant was 87.40% (SD: 0.11). The Response accuracy is comparable to those of existing conversational agent systems [46-48]. For example, the conversational agent, Max, developed by Kopp et al. can act as a museum guide to provide information to visitors. The agent can accurately respond to 63% of 50423 visitors’ natural language inputs [46]. The conversational intelligent tutor, Oscar, developed by Latham et al. can lead a tutoring conversation and dynamically predict a student’s learning style with an accuracy of 61-100% [47]. In addition, 97.56% (240 out of 246) of the agent’s Initiations were found to be proper Initiations. On the average, the proper Initiation rate to each participant was 96.54% (SD: 0.10). All the six improper initiations occurred when the participants and the agent initiated conversations about the same time. For example, the participant might say “Let’s get the pig” right after the agent started a speech, “What do you want to get now?” Though the system could recognize the participant’s intention and send the agent’s handle movement commands quickly, the agent’s speech audio “What do you want to get now?” had to be completely played before the next agent’s speech audio “Okay. I am moving to the pig” could be played. We would add the functionality to control the play, pause and stop of agent’s speech audios in the next version of INC-Hg. Despite this weakness, the results still indicated that the agent was able to provide appropriate instructions, recommendations and answers, and act appropriately to interact with the participant at the right time. These results also indicated that though the initial classification accuracy was only 70.34%, INC-Hg could improve the agent’s communication accuracy to 91.33% (516 proper utterances out of 565 total utterances) by considering the contextual information. A sample of user-agent conversations is showed in Table 7-7.

Table 7-6. The Response Accuracy and Initiation Accuracy of Agent.

Agent’s Utterances Proper Improper Total Response 276 (86.52) 36 + 7(No response) 319

Initiation 240 (97.56%) 6 246

Total 516 (91.33%) 53 565

Table 7-7. A Sample of Human-to-Agent Conversations.

User: Now what do you want to get?

Agent: Let’s get the pig.

User: Okay.

(The user does not move the gripper for a while.) Agent: Move your gripper down.

User: Okay. (The user moves her gripper down.) Are you ready to get the pig?

Agent: Yes. (Both grip to make the claw down to get the pig.)

(The user does not use the appropriate force when they are trying to get the pig.) Agent: We should squeeze slightly to get the pig.

User: Okay. (The user adjusts the force and gets the pig successfully.) (The user inadvertently releases the gripper and drops the pig on the table.) Agent: No, pick up it again.

User: (The user moves the gripper and tries to pick up the pig again.)

Dalam dokumen Huan's Dissertation_v3 - Institutional Repository Home (Halaman 157-160)