|
|
|
|
|
How can we scale machine translation to cover the
thousands of languages
spoken on earth? Current statistical machine translation methods
require large, hand-built “aligned” corpora as input for each language
pair of interest, but we cannot expect to hand craft millions such
corpora.
To address this quandary we are investigating the following
hypotheses:
- Lexical translation, the translation of individual words
or phrases, is a translation task that is amenable to more scalable
methods.
- People can be motivated to disambiguate the “content”
they author, which will facilitate both Panlingual translation and
human-machine communication.
To demonstrate the utility of lexical translation, we have created PanImages, a cross-lingual image
search system that enables users to issue queries in over 1,000
languages using more than 2,500,000 words, and translates the queries
automatically. The translated queries are then sent to Google’s Image
Search engine and to Flickr.
Users are able to to add and correct translations, using their own
language, turning the project into panlingual community effort.
Publications
|