Short description:

This ShowCase demonstrates the application of speech recognition for searching in Dutch audiovisual collections. The collection contains interviews and lectures of Willem Frederik Hermans. The spoken-word in the audio is recognised using a Dutch speech recognition system. The speech transcripts are then used for searching the audio documents using standard text-based retrieval techniques.


speech recognition multimedia retrieval spoken-word archives spoken document retrieval

Research Themes:

Some themes present in this work are :


This demo demonstrates how Spoken Document Retrieval can be used to disclose a Dutch spoken-word archieve: a collection interviews and lectures of the famous Dutch novellist Willem Frederik Hermans. A Dutch speech recognition system developed at HMI is used to generate a full-text transcript, labeled with time-codes, of the speech in the collection. This transcript was indexed so that the collection can now be searched in a 'google' kind of fashion.

The search-engine provides for every query relevant "audio documents". Document "boundaries" were generated for this demo using speaker segmentation and speech/non-speech detection.

The speech recognition system (based on the SONIC speech recognition system) used consists of a broadcast news acoustic model that is slightly adapted to the taskdomain using a small amount of adaptation data. The language model was created using newspaper data, written interviews of Hermans and text transcripts from the Spoken Dutch Corpus. The vocabulury consists of 30K words derived from Hermans specific text material.

Speech recognition in the spoken-word archieve domain can be very difficult, especially when the collections contain 'older' material, with low audio quality and old-fashioned speech (example ) . Adapting a system to the domain is crucial but it requires that there are sufficient amounts of training data (both speech and text). For the WFH case study, there was only little training data available. Using this data, system performance (relative to a standard broadcast news recognition system) could be brought down substantially, but is still somewhat high: above 60% word error rate for certain fragments. However the demo shows that using speech technology for indexing spoken word archives is a promising approach: although speech recognition is errorfull, it at least enables searching collections without the need for labourintensive manual annotation.

The main Willem Frederik Hermans website can be found here.


The demo can be viewed here:


Publications related to this showcase are:

R.J.F. Ordelman, M.A.H. Huijbregts and F.M.G. de Jong Unravelling the Voice of Willem Frederik Hermans: an Oral History Case Study, CTIT report, TR-CTIT-05-72, CTIT, University of Twente, Pages: 8, 2005 [ BiBTeX Download PDF document
R.J.F. Ordelman De stem van Willem Frederik Hermans ontrafeld. Audiovisuele archieven ge├»ndexeerd, DIXIT, Tijdschrift voor toegepaste Taal- en Spraaktechnologie, 3(3):15-17, ISSN 1572-6037, 2005 [ BiBTeX ] [EprintsDownload PDF document

HMI-members working on this showcase are:

Former HMI-members:

Projects involved with this showcase:

old Parlevink website   colophon   [Back] .