CHoral - access to oral history

CHoral

Project description

CHoral aims to advance the state-of-the-art in Spoken Document Retrieval and will develop technology for the disclosure of oral history collections. The focus of attention will be on spoken word collections with stories and testimonies on historical events. In this audio mining project a variety of techniques will be studied and integrated, among which automatic speech recognition and information retrieval.

With speech recognition transcriptions can be generated for spoken documents (audio and video). The transcripts have timestamps associated with the words and will be used to build an index that allows searching the audio files at fragment level. The core dataset will be the archive of material broadcast by regional radio station Radio Rijnmond, maintained at the Municipal Archives of Rotterdam. In combination with recordings of non-broadcast interviews and background texts, the archive is a potentially rich source for historical research. The tools to be developed will be tested also on other oral history collections.

The speech recognition technology to be deployed will be partly based on a system for speech recognition of Dutch that has been originally developed for the broadcast news domain. The target content of CHoral will require efforts to improve robustness and to incorporate adaptive speech and language modeling, as well as metadata extraction technology. The coupling of speech data to related textual records will also be addressed.

In parallel to advancing and tuning existing speech and indexing technology for this domain, the project aims to contribute to the advent of methodological framework for handling and use of multimedia oral history content for historical research.

Research themes

Robust ASR for spontaneous Dutch speech, speaker diarization, topic detection, audio indexing, speech retrieval, user interface design, user studies.

Time schedule

Start date: October 1, 2005
End date: July 2010

Project Funding

CHoral is a project of the programma CATCH funded by NWO.

Consortium

CHoral is a joint initiative of the Municipal Archives Rotterdam (Gemeentearchief) and HMI, with participation from Radio Rijnmond and Erasmus Universiteit Rotterdam.

NWO CATCH University of Twente Human Media Interaction
Rotterdam Municipal Archives RTV Rijnmond