4. Experiments

5. Design Recommendations

Given these experimental results, how should we configure handwriting recognition, adaptive menus, and predictive fillin in another application (or a redesigned Names++)? For handwritten input, recognition should use dictionaries specific to each type of field: numbers only for numeric fields, and lists of domain terms for text fields.

5.1 Adaptive Menus

For adaptive menus, add a menu to any field that might have repeated values. If we accidentally added an adaptive menu to a field that never had the same value twice, our mistake would be harmless. The user would surely notice that the choices were useless and avoid checking the menu. When the menu was appropriate, the user would save time by choosing common values from it.

How long should each menu be? Long enough to include the most common values but short enough to be checked quickly. To make sure the menu is long enough, study how often a field's values repeat. For Names++, Figures 9a and 9b depict a frequency histogram for the 20 most common values for 8 fields drawn from the 448 name records used in the experiments. Overlaid on each plot is a line indicating what percent of field values could be chosen from a particular size menu. For instance, for the First Name field in Figure 9a, the histogram is almost flat. A menu including only "John" would allow the user to choose a value for that field less than 5% of the time. If a menu included all 20 of the first names shown, the user could choose a value 25% of the time. This field should not have a menu because if it were long enough to include the most common values it would take too long to check. (Also, the Newton computer we used in Section 4 limits menus to 23 choices because of its screen size.) In contrast, for the Company field in Figure 9a, a menu including only "Boeing" would allow the user to choose a value more than 10% of the time. If it included the 20 values shown, the user could choose a value 50% of the time.

Figure 9a: Frequency of values in the First Name, Last Name, Title, and Company fields for the 448 names used in Section 4. Each plot is a histogram of the 20 most common values. Dark lines indicate what percent of values could be chosen by different sized menus. If the menu includes choices from the top down to its vertical position, it would allow the user to choose the percentage of field values indicated by its horizontal position.

Figure 9b: Frequency of values in Address, City, State, and Zip Code fields for the 448 names used in Section 4.

Studying the histograms and aiming for menus that include 50% of the field's values, we might re-engineer Names++ to have a menus of size 20 for the Company Field, size 10 for the City field, and size 5 for the State field. Other fields have very flat histograms and would need large menus to include a high percentage of field values. Recall that Section 4 reports the subject's frustration with the Title field. Only "President" seems to be repeated for this field in the 448 names we used.

5.2 Predictive fillin

For predictive fillin, set up fill for any field that is functionally dependent (Ullman, 1988) on another. A functional dependency is related to the artificial intelligence idea of a determination (Russell, 1989). Intuitively, one field R, for range, functionally depends on another field D, for domain, if, given a value for D, we can compute a unique value for R. If predictive fillin can find a previous entry with the same value for D as the new entry, it copies over the previous entry's value for R into the new entry. In Names++, the Company field is the domain and the Address field is the range of a functional dependency.

Predictive fillin for all and only functionally dependent fields is probably too strict a strategy. Some functional dependencies are not useful for predictive fillin because their domain values are unique in the database. When this is so, predictive fillin cannot find a previously matching entry and cannot copy over relevant information. For instance, a US citizen's address is functionally dependent on their Social Security number. In an application like Names++ we don't expect to see the same Social Security number twice, so predictive fillin would never have the opportunity to help the user by filling the address. Functional dependencies with repeated domain values in the database, or dense functional dependencies, should be used to set up predictive fillin.

Conversely, some non-functional dependencies may be close enough to functional to be useful for predictive fillin. Technically, a dependency is not functional unless only one value in the range can be computed for every value in the domain. If only a few values in the range were computed for most values in the domain, the dependency might still be useful (Raju & Majumdar, 1988, Russell, 1989, Ziarko, 1992). For instance, most companies have a single office and address, but some may have more than one. It is still quite useful to fill address fields when Names++ finds a previous name with a matching Company field. Other user interface strategies can compensate for the other possible range values when they arise; for instance, Names++ puts alternate addresses into the Address field's adaptive menu. Therefore, dense dependencies that are functional or nearly so, or dense approximately-functional dependencies, should be used to set up predictive fillin.

To determine which dense approximately-functional dependencies hold for a new application area, it may be necessary to repeat the type of empirical domain analysis described above for adaptive menus. For Names++, we used common sense knowledge about people, companies, and addresses to set up predictive fillin. Recall that our goal is to discover if the end result of automatic learning is worthwhile (e.g., Dent et al., 1992; Hermens & Schlimmer, 1994; Schlimmer & Hermens, 1993; Yoshida, 1994). We recommend considering each field as a number of logical components because dependencies may exist between parts rather than whole fields. For instance, each person in a company may share a common telephone number area code and prefix, but they are likely to have different extensions. By predictively filling in all but the last component of a phone number, Names++ fills as much as it can without adding poor quality information.

6. Related Work


Jeffrey C. Schlimmer, schlimme@eecs.wsu.edu, 5 December 1994