|
|
|
|
|
How can we scale machine translation to cover the
thousands of languages
spoken on earth? Current statistical machine translation methods
require large, hand-built “aligned” corpora as input for each language
pair of interest, but we cannot expect to hand craft millions such
corpora.
To address this quandary we are investigated the following
hypotheses:
- Lexical translation, the translation of individual words
or phrases, is a translation task that is amenable to more scalable
methods.
- People can be motivated to disambiguate the “content”
they author, which will facilitate both Panlingual translation and
human-machine communication.
We constructed translation graph from compiling over 600 freely available
dictionaries in a common resource. Our novel inference procedures inferred
additional translations based on this translation graph. This led to the
construction of PanDictionary, a massively multilingual, sense-distinguished,
lexical resource with over 200 million pairwise translations. We release
our resource for use by researchers for non-commercial purposes. To obtain a
copy please email .
To demonstrate the utility of lexical translation, we have created PanImages, a cross-lingual image
search system that enables users to issue queries in over 1,000
languages using more than 10,000,000 words, and translates the queries
automatically. The translated queries are then sent to Google’s Image
Search engine and to Flickr.
Users are able to to add and correct translations, using their own
language, turning the project into panlingual community effort.
Publications
- "Compiling a Massive, Multilingual
Dictionary via Probabilistic Inference"
Mausam, Stephen Soderland, Oren Etzioni, Daniel S. Weld, Michael Skinner, and Jeff Bilmes
Proceedings of the 47th Annual
Meeting of the Association for Computational Linguistics and 4th International
Joint Conference on Natural Language Processing (ACL-IJCNLP 2009)
-
"A Rose is a Roos is a Ruusu: Querying Translations for Web Image Search"
Janara Christensen, Mausam, and Oren Etzioni
Proceedings of the 47th Annual
Meeting of the Association for Computational Linguistics and 4th International
Joint Conference on Natural Language Processing (ACL-IJCNLP 2009)
- "Lexical
Translation with Application to Image Search on the Web"
Oren Etzioni, Kobi Reiter, Stephen Soderland, and Marcus Sammer
Proceedings of Machine Translation Summit XI, 2007
- "Building
a Sense-Distinguished Multilingual Lexicon from Monolingual Corpora and
Bilingual Lexicons"
Marcus Sammer and Stephen Soderland
Proceedings of Machine Translation Summit XI, 2007
- "Disambiguating
for the Web: A Test of Two Methods"
Jonathan Pool and S. M. Colowick
Proceedings of the 4th International Conference on
Knowledge Capture (K-CAP 2007)
|