Multi-Context Learning for Speech Emotion Recognition

Title:Multi-Context Learning for Speech Emotion Recognition
Institute:University of Twente (HMI)
Place:Enschede The Netherlands
Type:Capita selecta and Research Topics
End date:not present
HMI ContactJaebok Kim


We aim to build a speech emotion recognition system, which is robust in the wild. Unlike for speech recognition (where corpus are in general more than 100 hours), data for speech emotion recognition is sparse. We use 6 different corpora which all have different speakers, microphones and room conditions. Because of this severe variance the aggregated-corpora approach does not achieve significant gains compared to a single corpus approach, but this does give a more realistic measure for the real-world performance of the system. To deal with these unknown characteristics of a target speaker or environment, we build a deep neural network model (which is known to be good at generalisation).