From: Pravin Bhat (firstname.lastname@example.org)
Date: Sun Dec 05 2004 - 23:36:50 PST
Proverb: The Probabilistic Cruciverbalist
Keim et al
The paper tackles the "AI-complete" problem of solving crossword puzzles by
describing an architecture that performs better than an average human on a
popular class of crossword puzzles - namely the New York Times(NYT) style crossword
The key contribution made by this paper is the learning algorithm to generate
candidate word lists for a given clue. This learning algorithm can be thought of
as a variant of the "ensemble of recognizers" technique that uses probability theory.
The authors manage to build a strong probabilistic recognizer that recognizes the
candidate crossword-targets given a clue using an ensemble of weak recognizers.
Individually the strongest recognizer in the system, the Dijkstra3 module, recognizes
the right target as it's best candidate with an accuracy of 13.3%. But the merger
manages to the combine the results from all the recognizers to raise the accuracy
level to 40.9% after which the grid-filling algorithm takes over.
The Dijkstra module described in the paper was particularly interesting. Essentially
this technique manages to build a n-gram model for a high dimensional space, here
the number of dimensions is equal to the number of words in the dictionary, using
sparse training data and limited memory. Given a large enough data corpus the authors
could have built an accurate n-gram model by simply counting frequency of word-pairs,
word-triplets, etc. Instead the authors relax this calculation by calculating n-grams
as a summation of individual word-pair bigrams. The bigram probabilities are stored
in a graph instead of a table to minimize memory consumption. The authors further
strengthen their n-gram model using the assumption that most phrases in English language
continue to make sense when one of its word is replaced by an "equivalent" word. To
account for such phrase transformations the authors add synonyms to their word-pairs
during the bigram calculations.
In a sense the paper is disappointing once the authors reveal their techniques. One
has to wonder if their system really "solves" crossword puzzles. PROVERB is more like
the non-Chinese speaking human in the Chinese room experiment. PROVERB has a set of
rules, some of which are encoded using probabilities of prior examples, which are
used to make a shallow analysis of a given crossword puzzle. The solver's performance
could drop at the slightest changes - say if the current editor of NYT crossword puzzles
retires or if NYT has an independence day special where all the answers are
On a similar note, the authors mention that once they tested their recognizer on a NYT
dataset they further debugged and modified their expert-modules before testing it on
another NYT dataset. While it is reasonable for a learning algorithm to require
constants/parameters to be hand-tuned for a given data-domain it is unreasonable for
the learning algorithm to require changes to the algorithm itself to account for the
changes in the domain. The authors in the "modification" stage could have strengthen
their rule based expert-modules, like the transformation-module, specifically for NYT
type crossword puzzles which would skew their performance results.
The learning component of PROVERB needs to be further analyzed for its dependence on
hand written rules. It would be interesting to see how PROVERB performances on
non-American crosswords after training on the right dataset without any changes to
the rule-based modules. Another simple test would be to see how well PROVERB performs
on NYT crossword puzzles from several decades ago or maybe on crossword puzzles from
a rival newspaper that uses a different style of clues.
Another area of research would be analysis of global clue patterns from partially
generated results. For example, if PROVERB recognized that every candidate-list generated
so far contains the name of an ex-president then it could hypothesize that all targets for the
current puzzle involve president names. This hypothesis could then be tested on results from
other clues. Highly probable hypothesis would then be enforced in the grid-filling step.
This archive was generated by hypermail 2.1.6 : Sun Dec 05 2004 - 23:36:51 PST