Short description:

Multimodal dialogue means that user and computer interact with each other using a choice of different modalities (such as speech, pictures, gestures). Providing natural multimodal dialogue requires dialogue management that can handle both the user's natural language and natural communication in other modalities. We summarise several HMI projects involving multimodal dialogue management in several different application domains.


multimodal interaction, dialogue management, question answering, route navigation

Research Themes:

Some themes present in this work are :


This showcase lists information about Trung Bui's work on the ICIS project, the Virtual Music Centre Guide by Dennis Hofs, and the Vidiam project.

ICIS dialogue modeling and management (Trung Bui)

Affective dialogue modeling for affective multimodal dialogue systems
This work aims to develop a dialogue model which is able to take into account some aspect of the user's emotional state and act appropriately. The Partially Observable Markov Decision Process (POMDP) is exploring for this approach. A prototype is under development for analyzing the influence of the user's stress on action and how the system should respond in crisis situations.


Generic dialogue modeling for multi-application, multimodal dialogue systems
This work aims to create a unified multimodal dialogue model for a large number of applications. The applications (for example, car route navigation, air route navigation, traffic lanes, map and fire management, tunnel sensors management, weather forecast, virtual control room, road surface temperature monitoring, patient information search, and medical worker verification) are first constructed using the rapid dialogue prototyping methodology and then integrated into a hierarchy using vector-space model techniques. The system uses this hierarchy to switch between applications based on the user's application of interest.


Twente route navigation demo.
An example of the multimodal dialogue management prototype with three input/output modalities (text, speech, pointing gesture)


ICIS Home page:
Trung's personal home page:

Virtual Music Centre Guide (Dennis Hofs)

The virtual guide is an agent in the Virtual Music Centre that can help users find their way in the building. It includes a multimodal Dutch dialogue system that accepts speech or text input as well as pointing gestures (mouse clicks) from the user. Output comes in the form of the virtual guide speaking and making gestures.

The dialogue system is mainly suited to be used in a virtual world with objects and agents that can be talked about or pointed at. It consists of several components including a natural language parser, a fusion agent that merges parsed text or speech input with pointing gestures, a dialogue act recogniser that acts on a parsed Dutch phrase or sentence using the dialogue history, a reference resolver based on salience factors that links noun phrases to objects in the virtual world, and an action stack that the dialogue manager fills by matching the user's dialogue acts to action templates.

Screenshot of the virtual guide

Virtual Guide homepage:

Vidiam - Dialogue management and the visual channel

Vidiam is part of the IMIX (Interactive Multimodal Information eXtraction) project, which concerns a multimodal interactive Question Answering (QA) system. Unlike most QA systems, IMIX can give answers with pictures in them, and enables the user's information need to be satisfied through a natural dialogue. The Vidiam dialogue manager recognises several types of follow-up questions, and several kinds of feedback on the quality of the answer. In addition to supporting dialogue, Vidiam enables the user to communicate multimodally. The user can point to or encircle words or visual elements on the screen.


Extending QA with dialogue and multimodality is still a relatively unexplored area. Basic questions still have to answered. Such as: how do users naturally react to QA answers? And: how are multimodal followup questions to be handled? We try to answer these questions with help of several followup utterance corpora that we designed in the Vidiam project.


Screen grab of Imix system. Top left: animated system architecture. Top right: Ruth-based talking face. Bottom: interaction window.
An interaction from multimodal interaction experiment. Top: question and answer. Bottom: follow-up question. The green sketch line is a user gesture.


Vidiam homepage:

Publications related to this showcase are:

H.J.A. op den Akker, H.C. Bunt, S. Keizer and B.W. van Schooten From question answering to spoken dialogue - towards an information search assistant for interactive multimodal information extraction, in Proc. 9th European Conference on Speech Communication and Technology (Interspeech 2005), European Speech Communication Association (ESCA) / CEP Consultants, Edinburgh, ISSN 1018-4074, pp. 2793-2796, 2005 [ BiBTeX ] [EprintsDownload HTML document  Download PDF document
D.H.W. Hofs, H.J.A. op den Akker and A. Nijholt A generic architecture and dialogue model for multimodal interaction, in Proceedings of the 1st Nordic Symposium on Multimodal Communication, P. Paggio, K. Jokinen and A. Jönsson (eds), volume 1, CST Publication, Center for Sprogteknologi, Copenhagen, ISSN 1600-339X, pp. 79-91, 2003 [ BiBTeX ] [EprintsDownload PDF document
T.H. Bui, J. Zwiers, A. Nijholt and M. Poel Generic dialogue modeling for multi-application dialogue systems, in Proceedings of the 2nd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms, S. Renals and S. Bengio (eds), Lecture Notes in Computer Science, volume 3869, Springer-Verlag, Berlin, ISBN 3-540-32549-2, ISSN 0302-9743, pp. 174-185, 2006 [ BiBTeX ] [EprintsDOI>  Download PDF document

HMI-members working on this showcase are:

Former HMI-members:

Projects involved with this showcase:

  • ICIS [Interactive Collaborative Information Systems]
  • Vidiam [IMIX/VIDIAM]

old Parlevink website   colophon   [Back] .