UT boog University of Twente Home Page

Abstract Piroska

The present talk evaluates the role selected features and feature combinations play for error detection in spoken dialogue systems. We investigate the relevance of various, readily available features by presenting the results of recent machine learning experiments. User utterances extracted from a corpus of Dutch train timetable information dialogues were characterized by a large number of low-level features. The following classes of features are used: prosodic features (energy, pitch, duration; both raw and normalized values), word-graph features, ASR confidence scores (raw and normalized) and dialogue history (the six most recent system question types). We systematically tested the learnability of data represented by combinations of these features using both Ripper (a rule-inducing algorithm) and IB1-IG (a memory-based learning algorithm). The learning task consisted of the identification of communication problems (due to speech recognition errors or incorrect interpretations) arising in either the previous turn or the current turn of the dialogue.

Previous research has shown that combining system question types and word-graph features is beneficial for detecting errors (in particular for the previous turn, cf. Van den Bosch et al., 2001). It has also been shown that combining prosodic and ASR characteristics is helpful (primarily for the current turn, cf. Hirschberg et al., 1999). Our results characterize the performance of large-scale combinations of these types of features in the above two tasks. We demonstrate that the combination of the various feature classes can indeed lead to an improved performance in identifying or predicting problematic turns in our dialogue corpus. Interestingly, while we do find a significant effect of prosody, the magnitude of the effect is much smaller for our corpus than it is for the corpus of Hirschberg et al. (1999). In the talk we raise the question to what extent methods for error-detection and error-handling generalize across spoken dialogue systems.

Last modified $Date: 2001/10/04 15:26:05 $ by Parlevink Webmaster