Project Description

Activities and events in our lives are structural, be it a vacation, a camping trip, or a wedding. While individual details vary, there are characteristic patterns that are specific to each of these scenarios. For example, a wedding typically consists of a sequence of events such as "walking down the aisle", "exchanging vows", and "dancing". This work focuses on learning hierarchical and temporal event knowledge from a large collection of photo albums that depict common scenarios. Hierarchical knowledge identifies which events make up a scenario. In the previous example, "walking down the aisle", "exchanging vows", and "dancing" are the events of a wedding. Temporal knowledge captures whether there is an order to these events that is fundamental to the scenario. In a wedding, walking down the aisle generally must happen before the exchange of vows. The specific order of these events is crucial to understanding the scenario. Conversely, on a trip to Paris or New York City, there might be a less rigid temporal structure to events such as climbing the Eiffel Tower or visiting the Louvre. One can generally be done before the other without compromising the nature of the scenario. Check out our publication below for details on our approach to learning this knowledge.


If you're interested in this type of research, we have made the dataset publicly available here.

A second version of the dataset with manual removal of some of the more noisy albums in the dataset is forthcoming.