Next: lec26_learnbayes
Up: lec26_learnbayes
Previous: lec26_learnbayes
Learning to Classify Text
Target concept
136#136
- Represent each document by vector of words
- one attribute per word position in document
- Learning: Use training examples to estimate
- 137#137
- 138#138
- 139#139
- 140#140
Naive Bayes conditional independence assumption
141#141
where
142#142 is probability that word in position 143#143 is 144#144,
given 112#112
one more assumption:
145#145
LEARN/SMALL>_NAIVE/SMALL>_BAYES/SMALL>_TEXT(146#146)
-
1. collect all words and other tokens that occur in
147#147
- $$
-
148#148 all distinct words and other tokens in 147#147
-
2. calculate the required 149#149 and
150#150 probability
terms
- $$
- For each target value 151#151 in 152#152 do
-
153#153 subset of 147#147 for which the target value is 151#151
-
154#154
-
155#155 a single document created by concatenating all members of 156#156
- 157#157 total number of words in 158#158 (counting duplicate words multiple
times)
- for each word 159#159 in 160#160
-
161#161 number of times word 159#159 occurs in 158#158
-
162#162
CLASSIFY/SMALL>_NAIVE/SMALL>_BAYES/SMALL>_TEXT(163#163)
-
164#164 all word positions in 163#163 that contain tokens found in
160#160
- Return 165#165, where
166#166
Next: lec26_learnbayes
Up: lec26_learnbayes
Previous: lec26_learnbayes
Don Patterson
2001-12-14