CSE logo University of Washington Department of Computer Science & Engineering
 UW Mangrove Project

Annotation of Regular Structures

To simplify the task of annotating regular structures (e.g., tables or lists), the Mangrove system provides the reglist macro command. This document describes how to use it.

The reglist command can help to annotate any collection of items (such as publications, events, or interests). To be concrete, this document describes how to annotate all "events" in a table with a single template statement, referred to as a regular list, or simply a reglist. This simplifies annotation and makes it easier to maintain the annotations when you edit your document.

Assumptions:

  • You already have a web-accessible HTML document, which contains a table (or list) of events that you wish to add to the Department Calendar.
  • The time and location of the events in the example are already annotated outside of the table. (For details on semantic annotation see the annotation example.)
  • Using a template of that kind reduces to adding a reglist macro that describes the structure of a single recurring element (e.g., a row in a table).
    This "definition" should be written before the very first actual event in the table. For example, if the first row of the table contains the column headings, the template should immediately follow it; otherwise it should be right after the <table> tag.
    The position of the closing reglist tag defines the scope of the macros, i.e., the semantic structure defined in the template will be applied only to the rows enclosed between the pair of open and close reglist tags.

    Example

    The following HTML fragment is annotated with a reglist macro. The relevant code is in bold.

    <html xmlns:uw="http://www.cs.washington.edu/research/semweb/vocab#v1_0">

    <table>
    <tr>
      <th>Date</th>
      <th>Topic</th>
      <th>Presenter</th>
      <th>Paper</th>
      <th>Additional Notes</th>
    </tr>

    <!-- <reglist="<tr><uw:event>
      <td><uw:date>...</uw:date>
      <td><uw:topic>...</uw:topic>
      <td><uw:presenter>...</uw:presenter>
      <td><uw:paper>...</uw:paper>
      <td>...
    </uw:event></tr>"> -->


    <tr>
      <td>Feb 3, 2003</td>
      <td>Semantic Web</td>
      <td>Luke McDowell</td>
      <td>Evolving the Semantic Web with Mangrove</td>
      <td>cookies provided...</td>
    </tr>

    ...

    <!-- </reglist> -->
    </table>
    </html>


    Syntax of the <reglist> Element

    As you may have already noticed, the reglist element has its own syntax and is very similar to a regular expression. Let's take a look at the details.

    1. The reglist element is enclosed in an HTML comment. (In comparison, the regular semantic tags could be added directly among other HTML tags.) Note that in that comment there are no other tags except the reglist itself.
    2. The other distinct difference with semantic tags is that the reglist does not have the “uw:” prefix. The reason for that is that the reglist element is a macro command, and it is available in all user-defined name spaces.
    3. Users describe the semantically enhanced structure of the elements as the value of the reglist tag.

    Let’s go through the above example with the table and explore the actual syntax of the element.
    In this case the reglist element describes the structure of a single row from the table. (The very first and the last tag from the value of the reglist define that scope.) The string representing the value of the reglist element is actually the skeleton of a row from the table (the HTML elements) with additional semantic tags and the special symbol “...” (without the quotes).
    The sample reglist tells the semantic parser to treat each row in this table as a <uw:event> object. The data in the first column should be interpreted as a date for that event, the next one as the event’s topic, etc. The last column does not have any semantic tags (i.e., we have only ‘<td>...’). This means that there is a column in the table, but we do not want to annotate its contents, or there is no suitable semantic tag for it in the name space we are using currently.
    The symbol “...” is a place holder for the data which is present in the actual table.
    The order of the semantic tags and their neighboring HTML tags could be switched, i.e., it is correct to use "<tr><uw:event>" or "<uw:event><tr>".

    Here is an example of a reglist element used for annotation of an HTML list.

    Additional Notes

    Troubleshooting