Preliminary study into query translation for patent retrieval

Research output: Chapter in Book/Report/Conference proceedingBook chapterResearchpeer-review

Patent retrieval is a branch of Information Retrieval (IR) aiming to support patent professionals in retrieving patents that satisfy their information needs. Often, patent granting bodies require patents to be partially translated into one or more major foreign languages, so that language boundaries do not hinder their accessibility. This multilingual-ity of patent collections offers opportunities for improving patent retrieval. In this work we exploit these opportunities by applying query translation to patent retrieval. We expand monolingual patent queries with their translations, using both a domain-specific patent dictionary that we extract from the patent collection, and a general domain-free dictionary. Experimental evaluation on a standard CLEF-IP dataset shows that using either translation dictionary fetches similar results: query translation can help patent retrieval, but not always, and without great improvement compared to standard statistical monolingual query expansion (Rocchio). The improvement is greater when the source language is English, as opposed to French or German, a finding partly due to the effect of the complex French and German morphology upon translation accuracy, but also partly due to the prevalence of English in the collection. A thorough per-query analysis reveals that cases where standard query expansion fails (e.g. zero recall) can benefit from query translation.
Original languageEnglish
Title of host publicationProceedings of the 3rd international workshop on Patent information retrieval
Number of pages10
PublisherAssociation for Computing Machinery
Publication date2010
ISBN (Electronic)978-1-4503-0384-2
Publication statusPublished - 2010
Externally publishedYes
Event3rd International Workshop on Patent Information Retrieval - Toronto, Canada
Duration: 26 Oct 201026 Oct 2010
Conference number: 3


Conference3rd International Workshop on Patent Information Retrieval

ID: 49484546