Remote participants in hybrid meetings often have problems to
follow what is going on in the (physical) meeting room they are
The system has been developed as a research vehicle to see how
technology based on automatic real-time recognition of conversational
behavior in meetings can be used to improve engagement and floor
control by remote participants. The system uses modules for online
speech recognition, real-time visual focus of attention as well as a
module that signals who is being addressed by the speaker. A built-in
keyword spotter allows an automatic meeting assistant to call the
remote participant's attention when a topic of interest is raised,
pointing at the transcription of the fragment to help him catch-up.
Some themes present in this work are :
We presents the first version of a system that was
implemented to demonstrate how recognition and generation
modules could be used to support remote meeting participation. The
User Engagement and Floor Control (UEFC) demo is meant to
show how AMIDA research can contribute to technology that makes
remote meetings more engaging by giving remote participants more
control in discussions and decision making processes. The UEFC
demo is one system developed in a general Meeting Recorder
Framework that is being used as a research vehicle for
experimental studies of how outcomes and processes in remote meetings
depend on properties of communication channels and how engagement
and efficiency are affected by meeting support technology.
The UEFC technology is being used in the AMIDA TXchange
Miniproject and is also available as HMI showcase.
In  the authors give a list of problems that
people experience with communication in hybrid meetings (i.e.
meetings where some people are local and some are remote).
 Yankelovich, N., Kaplan, J. Simpson, N. Provino, J.: Porta-person:
telepresence for the connected meeting room. In: Proceedings of CHI
2007. (2007) 2789-2794.
- Audio problems:
- Poor quality speakerphones
- Too much background noise
- Multiple speakers speaking at the same time can be difficult
- People speaking too far from microphones
- Remote attendee problems:
- Inability to conduct side conversations.
- In-room attendees forget about remote people
- Challenging to break into lively conversation
- Difficult to detect in-room speaker changes
- Hard to identify people currently in the meetingroom
- Hard to identify the current speaker
- Difficult to participate in brain-storming sessions
- Cannot see in-room demonstrations or artifacts
- Meeting room problems:
- Local people are more emotionally salient than remote
- Easy to forget about remote participants
- Often local people do not know who is still connected
The visual channel is important when people discuss objects, or
documents. Moreover it helps to identify who is speaking and to signal
focus of attention of the speaker, which helps understanding verbal
referring expressions. This has been leading for the development of
the UEFC Demonstrator.
The Meeting Recorder Framework contains a package for media streaming.
Audio and video streams can be produced real-time by devices (in
on-line use in a life meeting) as well as from files. This makes
it possible to exploit the MRF for building systems that play back
recorded audio and video files and that use annotation layers of
filed meetings in what we call ``off-line'' systems, as well
as for building real life, ``on-line'', tele-meeting systems.
The UEFC demonstrator is an application that uses a software
architecture and framework that we developed for experimenting with
remote meetings, one-toone or hybrid. But it can also be used for
off-line applications and for building software agents, and virtual
characters. There are three kinds of client applications:
one for the remote participant, one for the meeting room, the location
where the overview of the meeting room is recorded (and the remote
participant is presented), and finally there is an application for
each local participant. The user interface is actually implemented in
an integrated application, named meeting recorder, which can also be
The UEFC modules that receive media input streams, send their respective
outputs to a central database application known as The Hub,
which sends it through to the modules that rely on the data.
|Figure 1: the HMI meeting room and a remote
meeting participant using the system. |
Figure 2: the dependencies of all modules between each other
A description of the modules:
- Automatic speech recognition: The ASR system receives the incoming
audio streams from all participants on different sockets, which allows
the system to be split between Windows and Linux systems easily. A
Java wrapper allows the results of recognition to be streamed via the
Java middleware to the Hub. From the Hub, metadata is available to
all other consumers.
- Dialogue act recognition: The Dialogue Act Recognition module
segments the words from the ASR module into Dialogue Act segments and
classifies them with a Dialogue Act Tag from the AMI tag set.
- On-line keyword spotting: The Keyword Spotting module analyses the
audio input stream for the occurrence of certain keywords.
- Visual focus of attention recognition: Visual focus of attention
(VFOA) of participants provides important cues to recognize
interactions in meetings. The Visual Focus of Attention module
analyses the video streams of each individual meeting participant. It
tracks the pose of the head in terms of tilt (vertical movement) and
pan (horizontal movement) and maps these values to predefined targets.
In the UEFC demo the main consumer of the VFOA data is the Addressee
- Addressee Detection: The addressee detection module (ADR) that is
used in the UEFC system identifies the addressee of the speaker.
- The graphical user interface: The GUI of the UEFC demo shows close
up view of each of the local partners and an overview of the meeting room.
We acknowledge our AMIDA partners from IDIAP in Martigny, DFKI in
Saarbrücken and the Universities of Brno, Sheffield, and
Edinburgh for their contributions to the UEFC demonstrator. The work
is sponsored by the European IST Programme Project FP6-0033812
(AMIDA). It only reflects the authors views and funding agencies are
not liable for any use that may be made of the information contained
|H.J.A. op den Akker, D.H.W. Hofs, G.H.W. Hondorp, H. op den Akker, J. Zwiers and A. Nijholt Supporting Engagement and Floor Control in Hybrid Meetings, in Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions, A.M. Esposito and R. Vich (eds), Lecture Notes in Computer Science, volume 5641, Springer Verlag, Berlin, ISBN 978-3-642-03319-3, ISSN 0302-9743, pp. 276-290, 2009 [ BiBTeX ]  |
|H.J.A. op den Akker, J. Zwiers, G.H.W. Hondorp, E.M.A.G. van Dijk, O. Kulyk, D.H.W. Hofs, A. Nijholt and D. Reidsma Engagement and Floor Control in Hybrid Meetings, AMI Newsletter, V. Devanthéry (eds), 20(20):6-6, ISSN not assigned, 2010 [ BiBTeX ]  |
- AMI [Augmented Multi-party Interaction]
- AMIDA [Augmented Multi-party Interaction with Distance Access]