Project Name: CHoral: access to oral history
October 1, 2005
December 31, 2011
CHoral aims to advance in the state-of-the-art in Spoken Document
Retrieval and will develop technology for the disclosure of oral
history collections. The focus of attention will be on spoken word
collections with stories and testimonies on historical events. In
audio mining project a variety of techniques will be studied and
integrated, among which automatic speech recognition and information
With speech recognition transcriptions can be generated for spoken
documents (audio and video). The transcripts have timestamps
associated with the words and will be used to build an index that
allows searching the audio files at fragment level. The core dataset
will be the archive of material broadcast by regional radio station
Radio Rijnmond, maintained at the Municipal Archives of Rotterdam. In
combination with recordings of non-broadcast interviews and
texts, the archive is a potentially rich source for historical
research. The tools to be developed will be tested also on other oral
The speech recognition technology to be deployed will be partly
based on a system for speech recognition of Dutch that has been
originally developed for the broadcast news domain. The target
of CHoral will require efforts to improve robustness and to
incorporate adaptive speech and language modeling, as well as
extraction technology. The coupling of speech data to related textual
records will also be addressed.
In parallel to advancing and tuning existing speech and indexing
technology for this domain, the project aims to contribute to the
advent of methodological framework for handling and use of
oral history content for historical research.
CHoral is a project of the programma CATCH funded by NWO.
CHoral is a joint initiative of the Municipal Archives Rotterdam
(Gemeentearchief) and HMI, with participation from Radio Rijnmond
and Erasmus Universiteit Rotterdam.