Google-based Information Extraction. Finding John Lennon and Being John Malkovich
Gijs Geleijnse, Jan Korst, Verus Pronk
Philips Research
We discuss a method to extract information from text fragments
found with a search engine. We populate an ontology using handcrafted
domain-specific relation patterns and a class-dependent rules
to recognize instances of the classes. The algorithm uses the instances
for one class found in the Google excerpts to find instances
of other classes. The work is illustrated by two case studies. The
first involves the population of an ontology in the movie domain.
The second is a search for famous people and the collection of their
biographical entries such as nationality and profession.
paper
slides