To implement the two modes of the note taking software, the system internally learns two structures. To characterize the syntax of user's notes, it learns finite-state machines (FSMs). To generate predictions, it learns decision tree classifiers situated at states within the FSMs. In order to construct a graphical user interface, the system converts a FSM into a set of buttons. This section describes the representation and method for learning FSMs. The next section discusses learning of the embedded classifiers.
Prior to learning a finite-state machine, the user's note must first be converted into a sequence of tokens. Useful tokenizers can be domain independent. However, handcrafted domain-specific tokenizers lead to more useful representations. The generic tokenizer used for the results reported here uses normal punctuation, whitespace, and alpha-numeric character boundaries as token delimiters. For example, our generic tokenizer splits the sample PowerBook note in Example 1 into the following 16 tokens:
:NULL "4096" " K" " PowerBook" " 170" ", 1.4" "MB" " and" " 40" "MB" " Int." " Drives" ", 2400/9600" " Baud" " FAX" " Modem" .
The token :NULL is prepended by the tokenizer. This convention simplifies the code for constructing a FSM.
Deterministic finite-state machines (FSMs) are one candidate approach for describing the syntax of a user's notes because they are well understood and relatively expressive. Moreover, Angluin (1982) and Berwick and Pilato (1987) present a straightforward algorithm for learning a specific subclass of FSMs called k-reversible FSMs. The algorithm is incremental and does not suffer from presentation order effects. Berwick and Pilato define a k-reversible FSM as:
"A regular language is k-reversible, where k is a non-negative integer, if whenever two prefixes whose last k words [tokens] match have a tail in common, then the two prefixes have all tails in common. In other words, a deterministic finite-state automaton (DFA) [FSM] is k-reversible if it is deterministic with lookahead k when its sets of initial and final states are swapped and all of its arcs [transitions] are reversed."
Given a list of tokens, the k-reversible FSM algorithm first constructs a prefix tree, where all token sequences with common k-leaders share a k-length path through the FSM. For example, Figure 4a depicts a simple FSM constructed for a single fabric pattern note. The text of the user's note was converted into a sequence of tokens. Then a transition was created for each token and a sequence of states was created to link them together. One state serves as the initial state, and another indicates the completion of the sequence. For convenience, this latter, terminal state is depicted with a double circle. If the FSM is able to find a transition for each token in the sequence, and it arrives at the terminal state, then the FSM accepts the token sequence as an instance of the language it defines. Figure 4b depicts the same FSM after another path has been added corresponding to a second fabric pattern note (Example 2). Now the FSM will accept either note if expressed as a sequence of tokens. This FSM is a trivial prefix tree because only the first state is shared between the two paths.
![]()
Figure 4 (a): Degenerate finite-state machine after processing a single fabric pattern note.
![]()
Figure 4 (b): Prefix tree finite-state machine after adding a second fabric pattern note (cf. Example 2).
A prefix tree is minimal for observed token sequences, but it may not be general enough for use in prediction. (The prefix tree is, in essence, an expensive method for memorizing token sequences - which is not the desired result.) For the sake of prediction, it is desirable to have a FSM that can accept new, previously unseen combinations of tokens. The prefix tree automaton can be converted into a more general FSM by merging some of its states. A particular method for doing this converts a prefix tree into a k-reversible FSM via Angluin's (1982) algorithm. The algorithm merges states that have similar transitions, and it creates a FSM that accepts all token sequences in the prefix tree, as well as other candidate sequences. Table 1 lists the three rules for deciding when to merge a pair of states in a prefix tree to form a k-reversible FSM. In the special case where k equals zero, all states have a common k-leader, and Rule 2a ensures that there will be only one accepting state.
A k-leader is defined as a path of length k that accepts in the given state. Merge any two states if either of the following is true: 1. Another state transitions to both states on the same token; or (This enforces determinism.) 2. Both states have a common k-leader and a. Both states are accepting states, or b. Both states transition to a common state via the same token.
Table 1: FSM state merging rules from (Angluin, 1982).
Because the rules in Table 1 must be applied to each pair of states in the FSM, and because each time a pair of states is merged the process must be repeated, the asymptotic complexity of the process is , where n is the number of states in the FSM.
Applying these rules to the prefix tree in Figure 4b with k equal to zero results in a FSM depicted in Figure 5a. Notice that the first two states have been merged to make the FSM deterministic (Rule 1). The accepting states have also been merged in compliance with Rule 2a. The resulting FSM has fewer states but is not more general. It only accepts the two token sequences originally seen. Extending this example, Figure 5b illustrates the addition of a third fabric pattern note as a prefix tree path to the FSM. Reapplying the rules results in the FSM shown in Figure 6. The first two states have been merged as before through the action of the determinism Rule 1. Note that a pair of latter states have also been merged because they share a common zero-leader (true of all pairs of states) and because they transition to the common terminal state on the token "dress".
![]()
Figure 5 (a): Finite-state machine after processing two fabric pattern notes and applying state merging rules in Table 1.
![]()
Figure 5 (b): Prefix tree finite-state machine after adding a third fabric pattern note.
![]()
Figure 6: Sample finite-state machine after processing three fabric pattern notes.
Figure 7 depicts a more sophisticated result; it shows a learned zero-reversible FSM for notes about PowerBook computers. This example shows that the model number "100" is never followed by a specification for an internal floppy drive, but that other model numbers are. Any model may have an external floppy drive. Note that there is a single terminal state. Whitespace and punctuation have been eliminated for clarity in the figure.
![]()
Figure 7: Zero-reversible FSM characterizing PowerBook notes (cf. Example 1).
The rules listed in Table 1 are generalization operators that allow the FSM to accept previously unobserved sequences. Whenever two or more states are merged into one, the FSM will accept more sequences than before if the new state is at the tail end of more transitions than one of the previous states and if the new state is at the head end of at least one transition. For example, the state just after State 1 in Figure 7 was merged from several previous states and generalizes memory sizes for PowerBook models. These rules comprise a heuristic bias and may be too conservative. For example, Figure 8 depicts a FSM for notes about fabric patterns. Many of the states prior to the accepting state could be usefully merged, but using only the rules listed in Table 1, many more notes will have to be processed before this happens. If the FSM in Figure 8 were rendered as a button-box interface, it would reflect little of the true structure of the domain of fabric patterns. Table 2 lists specializations of Rules 2a and 2b and an additional pair of rules we developed to make the FSM generalize more readily. Note that the parameter k has been set to zero in Rule 2 and to one in Rule 3. Effectively, two states are merged by Rules 3a or 2b' if they share an incoming or outgoing transition. Rule 3b is a Kleene rule that encourages the FSM to generalize the number of times a token may appear in a sequence. If one state has a transition to another, then merging them will result in a transition that loops from and to the newly merged state. Figure 9 depicts a FSM for notes about fabric patterns learned using all three generalization rules in Table 2. The resulting FSM accurately captures the syntax of the user's fabric pattern notes and correctly indicates the syntactically optional tokens that may appear at the end of note. When rendered as a button-box interface, it clearly depicts the user's syntax (as illustrated later by Figure 12). The added generalization rules may have only marginal effects on the system's ability to accurately predict a completion as the user writes out a note (as Table 14 below indicates). Their purpose is to improve the quality of the custom interface.
![]()
Figure 8: Zero-reversible finite-state machine characterizing fabric pattern notes learned using merging rules listed in Table 1.
Merge any two states if any of the following are true: 1. Another state transitions to both states on the same token; or (This enforces determinism.) 2'. Both states have a common 0-leader and a. Both states are accepting states, or b. Both states transition to a common state via the same token; or 3. Both states have a common 1-leader and a. Both states transition to a common state via any token, or b. One transitions to the other via any token.
Table 2: Extended FSM state merging rules.
![]()
Figure 9: Finite-state machine characterizing fabric pattern notes learned using extended rules in Table 2. Compare to zero-reversible finite-state machine for the same domain in Figure 8.
Cohen (1988) uses an interesting alternative representation for learning a syntactic form. The goal in his work is to guide the generation of proof structures. Intuitively, the representation is a finite-state machine that accepts a tree rather than a sequence, and for this reason it is termed a tree automaton. Like the rules in Tables 1 and 2, tree automatons are generalized by merging states that share similar transitions. Oddly enough, one motivation for using tree automatons is that they are less likely to introduce extraneous loops, the opposite of the problem with the original FSM merging rules in Table 1. It is not clear how to map the sequence of tokens in the user's notes into a tree structure, but the less sequential nature of the tree automaton may help alleviate sequencing problems in rendering the custom user interface (see Section 9, Observations/Limitations).
To use the finite-state machine for prediction, the software needs a strategy for dealing with novel tokens. For example, when the user takes a note about a PowerBook computer with a new memory configuration, the FSM will not have a transition for the first token. If the software is to prompt the user, then it must have a means for deciding where novel tokens lie in a note's syntax - which state to predict from. Without such a mechanism, no meaningful prediction can be generated after novel tokens.
A state may not have a transition for the next token. In general, this is a single symptom with three possible causes: (1) a novel token has been inserted, (2) a suitable token has been omitted and the next token would be accepted by a subsequent state, or (3) a token has been simply replaced by another in the syntax. For example, in the sequence of tokens {:NULL, "12288", "K", "PB"}, "12288" is a novel token, a familiar memory size has been omitted, and "PowerBook" has been replaced by "PB".
An optimal solution would identify the state requiring a minimum number of insertions, omissions, and replacements necessary to parse the new sequence. An efficient, heuristic approximation does a greedy search using a special marker. Each time the marked state in the FSM has a transition for the next token written by the user, the marker is moved forward, and a prediction is generated from that state. When there is no transition for the next token, a greedy search is conducted for some state (including the marked one and those reachable from it) that has a transition for some token (including the next one and those following). If such a state is found, the marker is moved forward to that state, tokens for the transitions of skipped states are assumed omitted, and novel tokens are assumed inserted. If no state past the marker has a transition for any of the remaining tokens, the remaining tokens are assumed to be replacements for the same number of the most likely transitions; the marker is not moved. If the user writes a subsequent token for which some state has a transition, the marker is moved as described above, and the syntax of the user's note is realigned with the learned syntax. Continuing with the simple PowerBook example, the marker is moved to State 1 of the FSM in Figure 7 because the initial state had a transition for the first token :NULL. Because State 1 doesn't have a transition for the next token "12288", a greedy search is conducted to find a nearby state that accepts either "12288", "K", or "PB". The state just before State 2 accepts "K", so the marker is moved to that state. Another greedy search is started to find a state that accepts "PB". Because one cannot be found, the heuristic parsing assumes that it should skip to the next transition. In this case the one labeled "PowerBook". Consequently, the system generates a prediction from State 2 to prompt the user.
If the user decides to take notes about multiple domains, it may be necessary to learn a separate syntax for each domain. For example, a single syntax generalized over both the PowerBook and fabric pattern notes is likely to yield confusing predictions and an unnatural user interface. Maintenance of multiple finite-state machines is an instance of the clustering problem-deciding which notes should be clustered together to share a FSM. As Fisher (1987) discusses, this involves a trade-off between maximizing similarity within a cluster and minimizing similarity between clusters. Without the first criteria, all notes would be put into a single cluster. Without the second criteria, each note would be put into its own cluster.
One obvious approach would be to require the user to prepend each note with a unique token to identify each note's domain. This simplifies the clustering computation. All notes sharing the first token would share a FSM. However, with this scheme, the user would have to remember the identifying token or name for each domain. An interface could provide a pop-up list of all previously used domain identifiers. This is not satisfactory because it requires overhead not needed when taking notes on paper.
An alternative approach doesn't require any extra effort on the part of the user. A new note is grouped with the FSM that skips the fewest of its tokens. This heuristic encourages within cluster similarity because a FSM will accept new token sequences similar to those it summarizes. To inhibit the formation of single-note FSMs, a new FSM is constructed only if all other FSMs skip more than half of the new note's tokens. This is a parametrized solution to encourage between-cluster dissimilarity.