The experiment used a within-subject design where each subject participated in each of six conditions summarized in Table 2. Conditions were designed to assess the contribution of each interface component separately and collectively. In the control, Typed condition, the subject types all values without using any of the components. In the Null condition, the subject writes all words using remedial recognition steps (to be described) and types words only if they are not recognizable. The subject does not add words to Newton's dictionary when asked and does not have the assistance of either adaptive menus or predictive fillin. The D condition extends Null by requiring the subject to add words to Newton's dictionary when asked. The AM condition extends Null by adding adaptive menus. The PF condition extends Null by adding predictive fillin. The All condition combines the extensions of D, AM, and PF.
![]()
Table 2: Experimental conditions, one row per condition. Columns indicate which user interface components were used. Blank cells represent "No".
We used a pair of Apple Newton MessagePad 100 computers (running Newton OS version 1.3) for the experiment and three versions of the Names++ application. One version has all interface components disabled and was used for the Typed, Null, and D conditions. A second version has adaptive menus and was used for AM. A third version has adaptive menus and predictive fillin and was used for PF and All.
A set of 448 name records for the experiments was donated by a development officer from Washington State University. Her job involves contacting alumni and others to solicit support for university programs. Almost all of the records include a first and last name, a full mailing address, and one to three phone numbers. Few include an honorific, country, or e-mail address. Informal tests indicated that the MessagePads could hold about 250 names after Names++ was installed, so we selected a random set of 200 of these records.
To simulate a worst case for recognition, adaptive menus, and predictive fillin, we chose 5 names (listed below) from the residual 248 such that each name's company was not in the preload set of 200 names. (To preserve anonymity here, first and last names are swapped and phone numbers replaced with artificial values. Actual first and last name pairs and phone numbers were used in the experiment.)
Robert Anderson Eric Brice Mike Carlson
Account Marketing Rep Director of Engineering VP Engineering & Estimating
IBM RAIMA Corp General Construction
W 201 N River Drive 3245 146th Place SE 2111 N Northgate Way
Spokane, WA 99201 Bellevue, WA 98007 Suite 305
509 555 0000 206 555 2222 Seattle, WA 98133
509 555 1111 206 555 3333 206 555 5555
205 555 4444 206 555 6666
Peter Friedman Thomas Leland
President Staffing Manager
NOVA Information Systems Aldus Corporation
12277 134th Court NE 411 First Ave South
Suite 203 Seattle WA 98104 2871
Redmond, WA 98052 206 555 8888
206 555 7777 206 555 9999
To score how words in names were entered and the total time, we used
the sheet in
Figure 5.
Fictitious data corresponding to a subject's
entering of the second name in the All condition is also depicted.
![]()
Figure 5: Scoring sheet used each time a name was added. A "1" in the center to right columns indicates how the first word of a field value was entered using recognition (cf. Figure 6), an adaptive menu (cf. Figure 2), or predictive fillin (cf. Figure 3). A "2" indicates the second word, and so on. The highest digit in a row corresponds to the number of words in that field's value.
To facilitate setting up each condition, we constructed backup images of MessagePads correctly configured for each of the six conditions. For all images, the 200 names and the appropriate version of Names++ was installed. To the images for D and All, we added all First, Last, and Company names to its dictionary using a built-in feature of Newton. To initialize the adaptive menus in the images for AM and All, we used a special purpose application. Prior to each use the MessagePads were completely erased and then restored from the backup image appropriate to the condition to be tested.
The task of the subject was to enter each of the five names twice in each of the six conditions. The first time a name is entered in a condition simulates a worst-case scenario; the second time, a best.
Subjects were given a precise script to follow when entering a name. This was done partially to bias results against the hypotheses and partially to minimize individual variation. Specifically, the subject was instructed to enter values for each field in order, from top to bottom, completing one before going to the next (cf. Figure 1). In conditions involving handwriting, if a word was not correctly recognized, the subject was to check the menu of alternate recognitions (depicted in the left panel of Figure 6). If the intended word was not in this list, they were to select "Try letters" which attempts recognition without the dictionary. If the result of this was not correct, they were to check a second menu of alternative recognitions (depicted in the center panel of Figure 6). If the intended word was not in this second menu, they were to tap the button with the keyboard picture, type in the word using the on-screen keyboard, and close the keyboard. If this word was not already part of the dictionary, Newton asked if they would like to add it (depicted in the right panel of Figure 6). Note that for each of the recognition menus, the original handwriting is shown near the bottom. The first choice is Newton's best guess, and the second choice is its best guess with different capitalization. The subject was instructed to ensure that words were correctly capitalized.
![]()
![]()
![]()
Figure 6: Remedial steps when a handwritten word is not correctly recognized. In this example, the subject wrote "Brice" which was misrecognized as "Brian". When the subject double-taps on the word, a menu of alternative recognitions appears (left panel). If none of these are correct, the subject requests recognition without the dictionary (or letter by letter). Another double-tap on the word generates a second menu of alternatives (middle panel). If none of these are correct, the subject entered the word by tapping on the buttons of an on-screen keyboard (right panel).
For Typed, the subject was instructed to enter all data using Newton's on-screen soft keyboard. For Null, the subject was to enter all data by handwriting. For D and All, the subject was instructed to add any words to Newton's dictionary if asked. For AM and All, the subject was instructed to check a field's menu (if there was one) before writing any data. No special instructions were required for PF beyond the default of not adding words to the dictionary.
A stopwatch was started when a subject tapped the "New" button and stopped when the last field value had been correctly entered. Choosing a manual timing method simplified development of the experimental software. The method by which each word of each field was entered was recorded on a scoring sheet as indicated in Figure 5.
The experiment took between three and five hours for each subject and was spread over two or more sessions of approximately two hours within the same week. Subjects took short breaks after adding each name to minimize fatigue.
After each subject completed the experiment, they were asked to rank their favorite methods for entering names from most to least.
![]()
Table 3: Median time in minutes to add a new name over five names and five subjects (25 samples per cell, standard deviation in parentheses). Columns list six experimental conditions.
The difference within D, AM, and PF across worst and best cases confirms our hypothesis that these interfaces can speed entering names, by 29%, 210%, 110% compared to Null, respectively. We were surprised to find that predictive fillin was not as fast as adaptive menus (though the difference is not statistically significant). When designing a data entry system one might be tempted to implement just adaptive menus given their algorithmic simplicity, especially compared to sophisticated methods in machine learning that have been proposed for predictive fillin. However, the latter do not suffer from recency effects imposed by the limited size of adaptive menus; when entering new data related to some in the distant past, predictive fillin would have little difficulty providing assistance where adaptive menus could not. Adaptive menus could be further refined to use a frequency or frequency-recency combination, but the performance of All suggests implementing both adaptive menus and predictive fillin. Combined with adding words to a dictionary, they can speed entering names by 294%. In practical terms, these interfaces could make entering a name into an electronic organizer faster than writing it down on paper and certainly fast enough to capture the information during a phone conversation.
Prior work confirms the difference between the Typed and D conditions. Ward and Blesser (1986) state that normal writing speed is rarely greater than 69 characters per minute (cpm) for a single line of text. Using the fact that the mean number of characters per name in our experiment is 98.2, our subjects achieved 30 cpm. MacKenzie, Nonnecke, Riddersma, McQueen, and Meltz (1994) compare four interfaces for entering numeric and text data on pen-based computers, including hand printing and using an on-screen keyboard. (The other two interfaces were experimental gesture-based techniques for entering single characters.) For numeric entry conditions, they found that the on-screen keyboard was 30 words per minute (wpm) with 1.2% error whereas hand printing was 18.5 wpm with 10.4% error. For text entry conditions, the keyboard was 23 wpm with 1.1% error whereas hand printing was 16 wpm with 8.1% error. Using the fact that the mean number of words per name in our experiment is 20.8, our subjects achieved 8.3 wpm typing and 6.3 wpm handwriting for mixed numeric/text input. The key point of comparison is that both their and our study found that using a stylus to tap an on-screen keyboard is faster than handwriting or printing. Differences in speed between these studies and ours is likely a result of differences between experimental procedures (theirs versus ours): single versus multiple field fillin, copying information from memory or screen versus paper, and block or comb-type (letter) versus open (word) interface.
Figure 7 presents Box plot summaries of the time data. Of interest is reduction in variance of time by adaptive menus and predictive fillin in the best case (right plot). Differences between individual performance is reduced by these interface components.
![]()
Figure 7: Box plots of time to enter a name by condition in the worst and best cases. Each box summarizes 25 values. Values outside the inner "fences" are plotted with asterisks. Values outside the outer "fences" are plotted with empty circles (Wilkinson, Hill, Vang, 1992).
The left half of Table 4 lists the recognition accuracy for each field over all conditions, subjects, and names. The first row indicates that 94% of the first names written were correctly recognized immediately. By checking the first menu of alternate recognitions, that accuracy rises to 95%. Similarly, the second row indicates that 59% of all second names written were correctly recognized immediately. This rate rose to 74% when letter-by-letter recognition was invoked and again to 79% by checking the second menu of alternate recognitions. Phone numbers enjoyed the second highest recognition rate below first names.
![]()
Table 4: The left columns list cumulative recognition accuracy by field over all words that were written in all conditions, names and subjects. The right columns list percentage of all words by field entered by typing, adaptive menus, and predictive fillin. 5190 values total. Blank cells represent 0.
For reference, Cesar and Shinghal (1990) report over 90% recognition rate on hand printed, Canadian postal codes which are {letter, digit, letter, space, digit, letter, digit}. This is comparable to our observed rates for first names, second address lines, and phone numbers.
The right half of Table 4 lists the percentage of words entered using typing, adaptive menus, or predictive fillin by field over all conditions, subjects, and names. The first row indicates that 5% of first names were typed. The row for State indicates that 32% of state names were typed, 20% were chosen from an adaptive menu, and 39% were predictively filled in. (Note that the numbers in each row do not total to 100% because the left half of the table lists percentages for words that were written while the right half lists percentages of all words.)
Combining the left and right halves of Table 4 reveal that many of the difficult-to-recognize fields have considerable assistance from adaptive menus and predictive fillin. This accentuates the speed improvements by providing help where it is most needed. Figure 8 depicts the relationship between the fields, their recognition accuracy, and which have adaptive menus or predictive fillin. Several fields have near perfect recognition accuracy; they can be recognized without resorting to typing. For instance, numeric fields are easier to recognize; the Phone Number fields were recognized at nearly 90% even though the area code, prefix, and suffix varied from name to name. The First and Last name fields also had high recognition accuracy. All of the first names were in the built-in dictionary. All but two of the last names were, and the others were often recognized letter by letter. Recognition was poorer in the Company and Address fields. Words in full capitals (e.g., "RAIMA") and words with a combination of numbers and letters (e.g., "146th") were difficult to recognize. The low recognition accuracy of the State field is apparently due to an oversight in Newton's dictionary. "WA" is not included but many other two-letter abbreviations for US states are. To compensate for low accuracy, Names++ includes an adaptive menu and/or predictive fillin for each of the difficult-to-recognize fields.
![]()
Figure 8: Recognition rate as a function of the number of total words entered in all conditions by all subjects for all names. Fields with adaptive menus or predictive fillin (or both) are marked. Note that every field with less than 75% accuracy has either an adaptive menu or predictive fillin (or both).
Table 5 summarizes subjects' preference for condition for entering a name. It lists frequency of ranking over the five subjects. Subjects partitioned conditions into non-overlapping groups of (Typed, Null), (D, AM, PF), and (ALL). (The authors know of no suitable statistic for asserting these differences.) These results contradict those of MacKenzie et al. (1994) who found that subjects preferred typing to handwriting, mildly for text entry and more strongly for numeric entry. They restricted hand printing input to block or comb-type interface; this unnaturalness may account for some of the dispreference toward handwriting. Writing with a stylus does have its advantages. As Meyer (1995) points out, keyboards are faster for linear text entry, but a pen input device can be more natural, can handle text and graphic input, and can jump quickly from point to point. Writing with a pen also supports "heads up" writing, allowing the user to visually attend to other aspects of the task at hand. Typing with an on-screen keyboard requires heads down entry.
![]()
Table 5: Subjects' frequency of ranking of preference for different conditions as a means to enter a name. 30 values total. Blanks cells represent 0.
One subject experimented with Names++ outside the experimental setting and offered a number of observations. First, the adaptive menus were too short, and sometimes menus would be useless no matter how long they were. She wished that the City and Company field's menus were longer (especially City). It was frustrating to have one of the common city names for a large metropolitan region bumped from the short list. In contrast, the Title field's menu was rarely useful, and she did not see the point of maintaining it. The principles to be outlined in Section 5 suggest similar revisions.
Second, she found the predictive fillin helpful. Sometimes it filled when she didn't expect it to. She also noted that because predictive fillin copies over many fields, it encourages the user to add a more complete name. This may be an advantage in a harried setting.