Title: Semi-Supervised Event Extraction with Paraphrase Clusters
Advisors: Hannaneh Hajishirzi and Dan Weld
Abstract: Supervised event extraction systems are limited in their accuracy due to the lack of available training data. We present a method for self-training event extraction systems by taking advantage of the occurrence of multiple mentions of the same event instances across newswire articles from various sources. If our system can make a high-confidence extraction of some mentions in a cluster of news articles, it can then leverage those mentions to identify additional occurrences of the event in the cluster. This allows us to collect a large amount of new, diverse training data for a specific event. Using this method, our experiments show an increase of 1.7 F1 points for a near-state-of-the-art baseline extractor on the ACE 2005 dataset.