Horn-clause inference rules learned by Sherlock

Horn-clause inference rules learned by Sherlock

These files contain the Horn-clause inference rules learned by the Sherlock system. The methodology for constructing and evaluating these rules was described in:

"Learning First-Order Horn Clauses from Web Text"
S. Schoenmackers, J. Davis, O. Etzioni, and D. Weld
In EMNLP 2010
(paper available at: http://ai.cs.washington.edu/pubs/184)

There are two versions of the files available at http://www.cs.washington.edu/research/sherlock-hornclauses/:

sherlockrules.zip *RECOMMENDED* - contains the subset of the rules which were accepted by Sherlock
allrules.zip - contains all rules evaluated by sherlock which had at least two ground instances that were both observed in the corpus and inferred by the rule (i.e. all rules that have some very minimal support).
Additionally, the 1.1 million (class, instance) pairs used by Sherlock are available in the following file allclassinstances.txt.gz.

The contents of the rules zip files are:

allclasses.txt - a file listing all of the classes used
alltypedrelations.txt - a file listing all of the typed relations used. All classes and realtions are given in stemmed and normalized form (e.g. 'was born in' becomes 'be bear in').
allrules.<arg1class>.<arg2class>.out - files listing all rules such that the first argument of the head of the rule is an instance of <arg1class>, and the second argument of the head of the rules is an instance of <arg2class> (e.g. Contain(Food, Nutrient) ).

Each file contains a header line (beginning with a '#') and then a series of rules/typed relations/class names, one per line. Within a line, the fields are tab-separated. The header line describes what each of the fields correspond to.

In the alltypedrelations.txt file, the fields list the relation, class of the first argument, class of the second argument, and a couple of scores indicating how often the relation occurred in our corpus and how much more likely it was to occur than random (log PMI).

In each of the rules files, the fields first list the rule, how many relations are in the body of the rule (1 or 2), and the rule's score according to several rule scoring metrics.

The rules are formatted to be human readable and reasonably easy to parse mechanically. In the files they are listed like prolog/datalog rules. For example, one of the rules is:

rule "be bear in(writer_A, place_B) :- be bear in(writer_A, city_C), be locate in(city_C, place_B) "

Which can be understood as:
If 'A is born in C' and 'C is located in B', then 'A is born in B', where A is a member of the class of writers, B is a member of the class of places, and C is a member of the class of cities.

Unfortunately, due to licensing restrictions we are unable to provide the raw extractions.

The format of the class instances file is 4 fields separated by tabs:

The stemmed and normalized class name.
The stemmed and normalized instance name of the class
The number of times this (class, instance) pair was extracted in our corpus.
The total number of times this instance was seen with any class in our corpus (including relatively rare classes, which may not be included in the file).

This page was last updated March 1, 2011.