UT boog University of Twente Home Page

Abstract Fien de Mulder

Although a well-developed area for English (where high levels of accuracy are achieved), there hasn't been any work we are aware of on named-entity recognition (NER) for Dutch (with the notable exception of Buchholz and Van den Bosch, 2000).

In this paper, (i) we describe an NER system for persons, companies, and locations, based on gazetteers and simple "sure-fire" rules. Precision of this baseline system on unseen parts of our corpus (a part from the Flemish journal FET) is between 90 and 99% depending on the type of NE, with low recall levels between 22 and 55%.
(ii) we show how this baseline system can be used to provide a seed for the extraction of more sophisticated rules (using Ripper, a rule induction algorithm), improving recall.
(iii) we describe an NP chunker consisting of a hand-crafted regular expression part and a (memory-based) machine learning part.
(iv) we show the interaction between both components: the chunk information can be used to improve NER, and the NER information can be used to improve chunking.

Last modified $Date: 2001/10/04 13:39:44 $ by Parlevink Webmaster