Google-based Information Extraction. Finding John Lennon and Being John Malkovich
Gijs Geleijnse, Jan Korst, Verus Pronk Philips Research
We discuss a method to extract information from text fragments found with a search engine. We populate an ontology using handcrafted domain-specific relation patterns and a class-dependent rules to recognize instances of the classes. The algorithm uses the instances for one class found in the Google excerpts to find instances of other classes. The work is illustrated by two case studies. The first involves the population of an ontology in the movie domain. The second is a search for famous people and the collection of their biographical entries such as nationality and profession.
paper (116.3 KB) paper
slides (2.6 MB) slides

Return to the program.




6th Dutch-Belgian Information Retrieval Workshop (DIR 2006)