lec26_learnbayes

Next: lec26_learnbayes Up: lec26_learnbayes Previous: lec26_learnbayes

Learning to Classify Text

Target concept 136#136

Represent each document by vector of words
- one attribute per word position in document
Learning: Use training examples to estimate
- 137#137
- 138#138
- 139#139
- 140#140

Naive Bayes conditional independence assumption

141#141

where 142#142 is probability that word in position 143#143 is 144#144, given 112#112

one more assumption: 145#145

LEARN/SMALL>_NAIVE/SMALL>_BAYES/SMALL>_TEXT(146#146)

: 1. collect all words and other tokens that occur in 147#147
$$: 148#148 all distinct words and other tokens in 147#147
: 2. calculate the required 149#149 and 150#150 probability terms
$$: For each target value 151#151 in 152#152 do

153#153 subset of 147#147 for which the target value is 151#151

154#154

155#155 a single document created by concatenating all members of 156#156

157#157 total number of words in 158#158 (counting duplicate words multiple times)

for each word 159#159 in 160#160

161#161 number of times word 159#159 occurs in 158#158

162#162

CLASSIFY/SMALL>_NAIVE/SMALL>_BAYES/SMALL>_TEXT(163#163)

164#164 all word positions in 163#163 that contain tokens found in 160#160

Return 165#165, where

166#166

Next: lec26_learnbayes Up: lec26_learnbayes Previous: lec26_learnbayes

Don Patterson 2001-12-14