Improving Query Parsing for AskReaxys (at Elsevier)

Title:Improving Query Parsing for AskReaxys (at Elsevier)
Institute:University of Twente (HMI)
Place:Enschede The Netherlands
Type:final project
Start date:1 februari 2017
End date:not present
HMI ContactMariƫt Theune



Elsevier is the world's biggest scientific publisher, established in 1880. Elsevier publishes over 2,500 impactful journals including Tetrahedron, Cell and The Lancet. Flagship products include ScienceDirect, Scopus and Reaxys. Increasingly, Elsevier is becoming a major scientific information provider. For specific domains, structured scientific knowledge is extracted for querying and searching from millions of Elsevier and third-party scientific publications (journals, patents and books). In this way, Elsevier is positioning itself as the leading information provider for the scientific and corporate research community.

Task Description

Reaxys is a chemistry knowledge base. The knowledge in Reaxys is extracted and compiled from articles and patents. AskReaxys is a search interface for Reaxys. AskReaxys allows users to input queries in natural language. Query parsing is to translate the user queries into internal structured queries which can be executed to find related information in the Reaxys knowledge base.

One important subtask of this is author name recognition: recognize and disambiguate author names in the user queries. A previous UT intern student made a good start on this task that we expect to continue and see the improved results.
Depending on interests, many other work items can identified, such as chemical name recognition, user purpose classification, patent number recognition, relevance ranking of the returned results, etc. Students are encouraged to explore new techniques and publish their work as papers.

Students have the opportunity to work with Elsevier's powerful Spark cluster in the Databricks framework.

Location: Amsterdam or Frankfurt.

Host Group: NLP group, Content & Innovation, Operations, Elsevier.