hmilogo

Zwier Kanis

Email: 


Final project assignment

Title: Determining and Classifying Semantically Coherent Segments in Web Pages
Institute: Content2context (formely known as Carp Technologies)
Place: 7559 ST Hengelo
Country: The Netherlands
Startdate: 31-08-2011
Completed: Yes
Mentor: Betsy van Dijk
Research themes: Multimedia Retrieval, Information Engineering
Description:
Web pages consist of different segments with different semantic meanings. Examples are the navigation menu, advertisements or the actual information the website is build around. The heart of the project is to identify these different parts, making it possible to extract useful information or to transform the webpage by performing operations on these parts.

Three different sub problems can be distinguished (not necessarily in this order):
• Segmentation: determining the different elementary texts with their properties
• Segment clustering: merging segments with corresponding semantic contents
• Classification: identifying the (clustered) segments based on contents

These sub problems are not independent of each other. For instance, classification of segments may be needed to cluster them.

Different resources are available in a webpage and can be accessed through the webpage DOM (Document Object Model):
• Structural information, under which paragraphs and tables
• Visual information, color and spatial properties
• Semantic information, by analyzing the contents of segments

The project will take place at the Telematica Instituut in cooperation with Carp Technologies.


Capita selecta and Research Topics assignment

Title: Online Search Interfaces: A future look
Institute: Teezir Search Solutions
Place: 6710 BJ Ede
Country: The Netherlands
Startdate: 14-11-2007
Completed: Yes
Report:http://hmi.ewi.utwente.nl/verslagen/capita-selecta/CS-Kanis-Zwier.pdf
Mentor: Theo Huibers
External mentor:Thijs Westerveld
Research themes: Multimedia Retrieval, Information Engineering
Description:
Although a lot of progress is made in the fi eld of question answering, the technology is still not used much among search engines on the Internet. Since question answering systems provide a more natural way of interacting with computer systems and allow users to express their demands more clearly, the technology will inevitably nd its way into Internet search engines. In this paper we try to nd the changes we can expect regarding the interface of such a hybrid system. By looking at the current state of question answering technology, environmental properties and other work on interface design we composed a list of guidelines that aim to improve on-line search engine interfaces. Comparing these guidelines with the systems readily available on the Internet made clear that there is still room for various improvements, among which clustering and classi cation of results can have a substantial e ffect on search engine interface design.

old Parlevink website   colophon   [Back] .