Multimodal
Multimodal
Human-Computer Interaction
Computer Interaction
New Interaction Techniques 22.1.2001 New Interaction Techniques 22.1.2001
Roope Raisamo (rr@cs.uta.fi) Roope Raisamo (rr@cs.uta.fi)
Department of Computer and Information Sciences
Department of Computer and Information Sciences
University of Tampere, Finland
Multimodal human-computer
Multimodal human-computer
interaction
interaction
A definition [Raisamo, 1999e, p. 2]:
Multimodal interaction
Multimodal interaction
techniques
techniques
Our definition of an interaction technique
[Raisamo, 2000]:
• An
interaction technique
is a way to carry out
an interactive task. It is defined in the binding,
sequencing, and functional levels, and is
based on
using a set of input and output
devices or technologies
.
– In a multimodal interaction technique there are
Two views
Two views
• A Human-Centered View
– common in psychology
– often considers human input channels, i.e.,
computer output modalities, and most often vision and hearing
– applications: a talking head, audio-visual speech recognition, ...
• A System-Centered View
– common in computer science
Multimodal
Multimodal
human-computer interaction
computer interaction
Cognition ”cognition” Computer
Human Computer input
modalities
Human output channels
Computer output media
Human input channels
Senses and modalities
Senses and modalities
Sensory perception Sense organ Modality Sense of sight Eyes Visual Sense of hearing Ears Auditive Sense of touch Skin Tactile Sense of smell Nose Olfactory Sense of taste Tongue Gustatory Sense of balance Organ of equilibrium Vestibular
Design space for
Design space for
multimodal user interfaces
multimodal user interfaces
Use of modalities
Sequential Parallel Combined
Fu
si
on
Independent
Meaning No Meaning Meaning No Meaning Levels of abstraction
SYNERGISTIC CONCURRENT ALTERNATE
EXCLUSIVE
An architecture for
An architecture for
multimodal user interfaces
multimodal user interfaces
Input processing - motor - speech - vision - … Output generation - graphics - animation - speech - sound - … Media analysis - language - recognition - gesture - … Media design - language - modality - gesture - … Interaction management
Modeling
Modeling
Put Put – That – That – There – There
Potential benefits
Potential benefits
A list by Maybury and Wahlster [1998, p. 15]:
– Efficiency
– Redundancy
– Perceptability
– Naturalness
– Accuracy
– Synergy
Common misconceptions
Common misconceptions
A list by Oviatt [1999b]:
1. If you build a multimodal system, user will interact multimodally.
2. Speech and pointing is the dominant multimodal integration pattern.
3. Multimodal input involves simultaneous signals.
4. Speech is the primary input mode in any multimodal system that includes it.
Common misconceptions
Common misconceptions
6. Multimodal integration involves redundancy of content between modes.
7. Individual error-prone recognition technologies combine multimodally to produce even greater unreliability.
8. All users’ multimodal commands are integrated in a uniform way.
9. Different input modes are capable of transmitting comparable content.
Two paradigms for
Two paradigms for
multimodal user interfaces
multimodal user interfaces
1. Computer as a tool
– multiple input modalities are used to enhance direct manipulation behavior of the system
– the machine is a passive tool and tries to
understand the user through all different input modalities that the system recognizes
– the user is always responsible for initiating the operations
Two paradigms for
Two paradigms for
multimodal user interfaces
multimodal user interfaces
2. Computer as a dialogue partner
– the multiple modalities are used to increase the anthropomorphism in the user interface
– multimodal output is important: talking heads and other human-like modalities
– speech recognition is a common input modality in these systems
Two hypotheses on
Two hypotheses on
combining modalities
combining modalities
1. The combination of human output channels
effectively increases the bandwidth of the
human
machine channel.
This has been discovered in many empirical
Two hypotheses on
Two hypotheses on
combining modalities
combining modalities
2. Adding extra output modality requires more
neurocomputational resources and will lead
to deteriorated output quality, resulting in
reduced effective bandwidth.
Two types of effects are usually observed:
a slow-down of all output processes, and
interference errors due to the fact that selective attention cannot be divided between the increased number of output channels.
Call for research
Call for research
A summary in [Raisamo, 1999e] pointed out that more research is needed to understand the following:
– How the brain works and which modalities can best be used to gain the synergy advantages that are possible with
multimodal interaction?
– When a multimodal system is preferred to a unimodal system?
– Which modalities make up the best combination for a given interaction task?
– Which interaction devices to assign to these modalities in a given computing system?
– How to use these interaction devices, that is, which
interaction techniques
Touch’n’Speak
Touch’n’Speak
[Raisamo, 1998][Raisamo, 1998]• Touch’n’Speak is a multimodal user interface
framework that makes use of combined touch and speech input and different output modalities
– Input: touch buttons, touch lists, touch gestures in area selection (time, location, pressure), speech commands
– Output: graphical, textual, and auditory (non-speech) output, speech feedback
• The framework was used to implement a restaurant information system that provides information on
Examples
Examples
• CHI2000 Video Proceedings: The Efficiency of
Multimodal Interaction for a Map-Based Task
(8:18)
• SIGGRAPH Video Review 76, CHI’92
Technical Video Program: Multi-Modal Natural
Dialogue (10:25)
• SIGGRAPH Video Review 77, CHI’92
Technical Video Program: Combining
Gestures and Direct Manipulation (9:56)
Homework
Homework
• Read Chapter 2 (Multimodal interaction)
in [Raisamo, 1999e].
– [Raisamo, 1999e] is available online at http://granum.uta.fi/ pdf/951-44-4702-6.pdf
– A printable version is available online at