next up previous
Next: lec26_learnbayes Up: lec26_learnbayes Previous: lec26_learnbayes

Learning to Classify Text

Target concept 136#136

  1. Represent each document by vector of words
  2. Learning: Use training examples to estimate
Naive Bayes conditional independence assumption


141#141

where 142#142 is probability that word in position 143#143 is 144#144, given 112#112


one more assumption: 145#145

LEARN/SMALL>_NAIVE/SMALL>_BAYES/SMALL>_TEXT(146#146)

1. collect all words and other tokens that occur in 147#147

 $$
148#148 all distinct words and other tokens in 147#147

2. calculate the required 149#149 and 150#150 probability terms

 $$
For each target value 151#151 in 152#152 do
  • 153#153 subset of 147#147 for which the target value is 151#151

  • 154#154

  • 155#155 a single document created by concatenating all members of 156#156

  • 157#157 total number of words in 158#158 (counting duplicate words multiple times)

  • for each word 159#159 in 160#160
    • 161#161 number of times word 159#159 occurs in 158#158

    • 162#162

CLASSIFY/SMALL>_NAIVE/SMALL>_BAYES/SMALL>_TEXT(163#163)


next up previous
Next: lec26_learnbayes Up: lec26_learnbayes Previous: lec26_learnbayes
Don Patterson 2001-12-14