|
|
Abstract PrinsIn order to reduce lexical ambiguity during parsing of Dutch sentences using the Alpino system, we implemented a POS filter, a bigram HMM POS-tagger, that removes bad tags. To decide which tags are bad, their 'a posteriori' probabilities are computed and compared to the best scoring one for the same position in the sentence. What is remarkable about our method is that the model does not require hand-annotated training data, but instead is trained on the output of the parser itself, making it possible to use large sets of training data. Using this filter, parsing performance is increased both in terms of speed and accuracy: in tests the system worked more than ten times as fast as the system without filter, while at the same time showing a slight increase in accuracy.Last modified $Date: 2001/10/04 13:39:48 $ by Parlevink Webmaster |