Searching Spontaneous Conversational Speech

ACM SIGIR 2007 Workshop - 27 July 2007

Context

This workshop is part of ACM SIGIR 2007, 23-27 July, Amsterdam, The Netherlands (http://www.sigir2007.org/).

Background

Nearly a decade ago, we learned from the TREC Spoken Document Retrieval (SDR) track that searching speech was a "solved problem." Three factors were key to this success:

  1. Broadcast news has a "story" structure that resembled written documents.
  2. The redundancy present in human language meant that search effectiveness held up well over a reasonable range of transcription accuracy.
  3. Sufficiently accurate Large-Vocabulary Continuous Speech Recognition (LVCSR) systems could be built for the planned speech of news announcers.

The long-term trend in speech recognition research has been toward transcription of progressively more challenging sources. Over the last few years, LVCSR for spontaneous conversational speech has improved to a degree where transcription accuracy comparable to what was previously found to be effective for broadcast news can now be achieved for a diverse range of sources. This has inspired a renaissance in research on search and browse technology for spoken word collections in communities focused on:

  1. Archived cultural heritage materials (e.g., interviews and parliamentary debates).
  2. Discussion venues (e.g., business meetings and classroom instruction).
  3. Broadcast conversations (e.g., in-studio talk shows and call-in programs).
Test collections are being developed in individual projects around the world, and some comparative evaluation activity for speech search technology has developed over this period. The time seems now right to look more broadly across these research communities for potential synergies that can help to shape the information retrieval research agenda of each of these communities by sharing ideas and resources.