19th Twente Workshop on Language Theory

Workshop on Information Extraction in Molecular Biology
November 11-14, 2001


We are witnessing an incredible explosion in the production of information relevant to molecular biology and biomedicine. This information is stored in most cases as free text. The pace of production of information currently far exceeds the possibilities of utilising the information. Therefore, there is growing interest in automated techniques for harvesting information from the texts available. A number of groups are active in applying information extraction techniques to free-text sources. The field is very young and in need of an exchange of methods and results.

The techniques used are very different, ranging from statistical techniques to techniques rooted in computational linguistics. A comparative analysis of the pros and cons of the diverse techniques is lacking. Indeed, the tasks as seen by different practitioners seem ill-defined.

The applications in biology today concentrate on a few questions, particularly protein-protein interactions, but surely other fields are equally promising.

This workshop proposes to address the question of information extraction at two levels:
1) At the object level, some would advocate shallow (mostly statistical) techniques, e.g. as used in text mining, while others would advocate deeper but more expensive techniques. There is a trade-off involved, about which we want to learn more.
2) At the meta-level, the definition of the task, or, more precisely, the range of tasks, must be better defined. There are diverse models one can derive from work in computer science, and natural-language engineering in particular: text mining, indexing for purposes of information retrieval, DARPA's Message Understanding Project, and more. 


Important Dates  
Registration until October 1, 2001
Deadline Camera-Ready Paper Submission October 12, 2001
Workshop November 11-14, 2001



There are a maximum of 29 places available to attend this workshop.

