Information filters should present each user with all information relevant to that user's information needs, and not present any irrelevant information. However, causal users of information, such as browsers of news stories, look for interesting information rather than information relevant to a specific need they can state at that time. How can filters for such users be specified and acquired?
Building models of users' interests is difficult because it is not easy for users to specify what those interests are; they differ for each individual user, and they are constantly changing. Interests may be related to affect, domain beliefs, information goals, information types, and information characteristics such as quality and complexity. Checking for patterns of keywords is not enough to model interests. Semantic and contextual information must also be used.
Can a model of interests be encoded, enabling it to filter information? Mapping available information to users' interests is difficult because of the vocabulary problem[1]. The terms contained in the information available are different from those the user would use to specify his or her interests. Further, users may encounter a conceptualization problem[2], where the concepts used to represent available information are different from the concepts the user has for the domain.
What kinds of regularities may be captured that can predict a user's interests? In a preliminary study of Usenet News readers' interests[3], we found regularities among all readers in the categories they used to describe what a message was about and why they did or did not want to read it. We formulated these categories into five types (see Table 1). Because these categories were used regularly by each user (i.e., each user would use those from the same small set of categories to describe a message) and because these sets of categories (one set for each user) overlapped greatly, it is evident that users share a sufficiently common conceptual structure to enable building a knowledge base of description categories over which individual models of users' interests can be defined. These categories can be determined a priori by a knowledge engineer or through a dialogue with each potential user and be encoded into the knowledge base.
We also found some regularities in users' reading behavior, which can be formulated into rules correlating conjunctions of description categories with whether or not the user would read a message. These rules can be used as information filters. Those messages that a rule would predict a user would read would be filtered through to the user and those a rule predicted the user would not read would be filtered out. (See Table 2 for rule examples).
Many of the rules were shared by more than one user. Such rules could form the basis of stereotypes, which could be used as initial filters for new users.
Preliminary results also show that users employed a small number of goal types and message types in their rules, and that each user's rules were consistent among themselves. There were never two rules where the same conjunctive clause implied that a user would both read and not read a message. Also, if more than one rule applied to a message, all of the rules would imply the same reading decision. This provides a basis for automating rule learning because many machine-learning algorithms require little or no noise in the data set (e.g., no contradictions).
Some of the goals and domain concepts news readers used to describe messages were more abstract and will be more difficult to acquire for the user models. Abstract goals (e.g., look in documentation, ignore impossible configurations) and abstract domain categories (e.g., actually experienced, factual, popular easy, new, informative) are more difficult to find or parse from a message and are harder to acquire from the user because they develop as the user reads messages and cannot be known beforehand. Thus, the abstract categories cannot be used in the initial phase of knowledge acquisition when the user first specifies his or her user model. In the study, news readers used them to describe the message and their reading decision for as many as one-third of the messages. These categories will require more elaborate representation and acquisition techniques than those currently used for the user models.
It is important for information filters to model users' interests, though it is difficult to keep such models accurate. However, there appear to be some things that can be done to make such models and information filters based on them more useful.
Table 1. Description Category Types
Category Type |Definition |Example Categories =======================|=======================|====================== Domain concepts |semantic categories |Mac IIfx |used to describe the | |subject of a message | |and whether to read | |the message or not | -----------------------|-----------------------|---------------------- Goals |related to the users' |want to understand, able |interests |to help -----------------------|-----------------------|---------------------- Message Types |describe major message |for sale, discussion, |classes |bug, problem-solution -----------------------|-----------------------|---------------------- Message Characteristics|contextual information |length of the message, |about messages |whether there is an | |address listed -----------------------|-----------------------|---------------------- Relations |usually relate goals to|have documentation of |domain concepts |HFS, want to buy color | |monitorGo back to text
Rule Format ======================================================================== [about(topic,msg)|goal(topic)|msg-type(msg)| msg-characteristic([msg|topic])]* -> [read(msg)|not read(msg)] Rule Examples ======================================================================== help(msg) -> not read(msg) about(SE, msg) -> not read(msg) about(printer,msg) and not have(printer) -> not read(msg) about(printer,msg) and want(printer) and for-sale(msg) -> read(msg) about(LC, msg) and not know(LC) and short(msg) -> read(msg)Go back to text
Robert Kass is a senior research engineer at the EDS Center for Advanced Research. His research interests focus on methods for automatically acquiring models of users' beliefs and interests, knowledge-based human-computer interfaces, and computer-supported cooperative work.
kass@cmi.com