This ShowCase demonstrates the application of speech recognition for
searching in Dutch audiovisual collections. The collection contains
interviews and lectures of
Willem Frederik Hermans. The spoken-word in the audio is recognised
using a Dutch speech recognition system. The speech transcripts are
then used for searching the audio documents using standard text-based
retrieval techniques. speech recognition multimedia retrieval spoken-word archives spoken document retrieval Some themes present in this work are : This demo
demonstrates how Spoken Document Retrieval can be used to disclose a
Dutch
spoken-word archieve: a collection interviews and lectures of the
famous
Dutch novellist Willem Frederik Hermans. A Dutch speech recognition
system
developed at HMI is used to generate a full-text transcript, labeled
with
time-codes, of the speech in the collection. This transcript was
indexed
so that the collection can now be searched in a 'google' kind of
fashion.
The search-engine provides for every query relevant "audio
documents".
Document "boundaries" were generated for this demo using speaker
segmentation and speech/non-speech detection.
The speech recognition system (based on the SONIC speech
recognition system)
used consists of a broadcast news acoustic
model that is slightly adapted to the taskdomain using a small amount
of
adaptation data. The language model was created using newspaper data,
written
interviews of Hermans and text transcripts from the Spoken Dutch
Corpus. The
vocabulury consists of 30K words derived from Hermans specific text
material.
|
Speech recognition in the spoken-word archieve domain can be very
difficult, especially when the collections contain 'older' material,
with low audio quality and old-fashioned speech (example
)
. Adapting a system to
the domain is crucial but it requires that there are sufficient
amounts of training data (both speech and text). For the WFH case
study, there was only little training data available. Using this data,
system performance (relative to a standard broadcast news recognition
system) could be brought down substantially, but is still somewhat
high: above 60% word error rate for certain fragments. However the
demo shows that using speech technology for indexing spoken word
archives is a promising approach: although speech recognition is
errorfull, it at least enables searching collections without the need
for labourintensive manual annotation.
The main Willem Frederik Hermans website can be found here.
The demo can be viewed here:
.
Former HMI-members: |