|
Email:
|
|
| Description: |
Web pages consist of different segments with different semantic meanings. Examples are the navigation menu, advertisements or the actual information the website is build around. The heart of the project is to identify these different parts, making it possible to extract useful information or to transform the webpage by performing operations on these parts.
Three different sub problems can be distinguished (not necessarily in this order):
• Segmentation: determining the different elementary texts with their properties
• Segment clustering: merging segments with corresponding semantic contents
• Classification: identifying the (clustered) segments based on contents
These sub problems are not independent of each other. For instance, classification of segments may be needed to cluster them.
Different resources are available in a webpage and can be accessed through the webpage DOM (Document Object Model):
• Structural information, under which paragraphs and tables
• Visual information, color and spatial properties
• Semantic information, by analyzing the contents of segments
The project will take place at the Telematica Instituut in cooperation with Carp Technologies.
|
|
|
| Description: |
|
Although a lot of progress is made in the field of question answering, the technology is still not used much among search engines on the Internet. Since question answering systems provide a more natural way of interacting with computer systems and allow users to express their demands more clearly, the technology will inevitably nd its way into Internet search engines. In this paper we try to nd the changes we can expect regarding the interface of such a hybrid system. By looking at the current state of question answering technology, environmental properties and other work on interface design we composed a list of guidelines that aim to improve on-line search engine interfaces. Comparing these guidelines with the systems readily available on the Internet made clear that there is still room for various improvements, among which clustering and classication of results can have a substantial effect on search engine interface design. | |
|