panlingual
University of Washington Department of Computer Science & Engineering
 Panlingual Lexical Translation
  CSE Home  About Us    Search    Contact Info 

Demonstrations
 PanImages
 
How can we scale machine translation to cover the thousands of languages spoken on earth? Current statistical machine translation methods require large, hand-built “aligned” corpora as input for each language pair of interest, but we cannot expect to hand craft millions such corpora.

To address this quandary we are investigated the following hypotheses:

  1. Lexical translation, the translation of individual words or phrases, is a translation task that is amenable to more scalable methods.
  2. People can be motivated to disambiguate the “content” they author, which will facilitate both Panlingual translation and human-machine communication.
We constructed translation graph from compiling over 600 freely available dictionaries in a common resource. Our novel inference procedures inferred additional translations based on this translation graph. This led to the construction of PanDictionary, a massively multilingual, sense-distinguished, lexical resource with over 200 million pairwise translations. We release our resource for use by researchers for non-commercial purposes. To obtain a copy please email .

To demonstrate the utility of lexical translation, we have created PanImages, a cross-lingual image search system that enables users to issue queries in over 1,000  languages using more than 10,000,000 words, and translates the queries automatically. The translated queries are then sent to Google’s Image Search engine and to Flickr.

Users are able to to add and correct translations, using their own language, turning the project into panlingual community effort.

Publications


CSE logo Department of Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA 98195-2350