CSE 473 Autumn 1998, Copyright, S. Tanimoto, Univ. of Washington 
Introduction to Artificial Intelligence (Nov 16, 1998)

"Processing Evidence With Bayes' Rule"


 

Motivation: We Need Something Like Modus Ponens that Works With Probabilistic Information

Modus Ponens:
  P -> Q                    ; implication relationship between P and Q (usually a general rule)
  P                              ; the "fact" of P  (usually representing a specific situation)
  ------
  Q                             ; the conclusion (for the specific situation).

Limitation:  This works under the assumption that statements are either TRUE or FALSE.
What can we do if we have uncertainty associated with each statement?
 


 

An Example

Situation: It is flu season and a number of people are coming down with it.  John, a particular person, has just developed a runny nose, and we wonder what the probability is that he has the flu.

Hypothesis H:  John has the flu.
Evidence E: John has a runny nose.

Other available general information...
The probability that somebody has the flu (during this period) is P(H).  This is called the a priori probability.  Let's suppose P(H) = 0.1.

If someone has the flu, then there's high probability that they have a runny nose.  That is,
P(E | H) = 0.7.
However, they can have a runny nose without having the flu.
P(E | ~H) = 0.2.

Given that John has a runny nose, it is more likely that he has the flu than we thought before we knew about the runny nose.  We can compute this probability by using Bayes' rule.

P(H | E) = P(E | H) P(H) / P(E)

where P(E) = P(E | H) P(H) + P(E | ~H) (1 - P(H))

So P(H | E) = (0.7) (0.1) / ((0.7) (0.1) + (0.2) (0.3))
 = 0.07 / (0.07 + 0.06) = 0.07 / 0.13
or a little more than one half.
 
 


 

Bayes' Rule as a Variation of the Formula for Conditional Probability

Let A and B be events in an outcome space Omega.

1. P(A and B) = P(A) * P(B given A)

2. Conversely,  P(B given A) = P(A and B) / P(A)

By symmetry,

3. P(A and B) = P(B) * P(A given B)

Putting 2 and 3 together,

P(B given A) = P(A given B) * P(B) / P(A)

Replace A by E for evidence
Replace B by H for hypothesis

Use "|" for "given"

P(H|E) = P(E|H) P(H) / P(E)             ; Bayes' Rule.

Note that P(E) can be expressed as P(E|H)P(H) + P(E|~H) P(~H)
Here P(~H) = 1 - P(H), so that

P(E) = P(E|H)P(H) + P(E|~H) (1 - P(H))

So we can write Bayes' Rule as

                                        P(E|H) P(H)
 P(H|E) =  -------------------------
                    P(E|H)P(H) + P(E|~H) (1 - P(H))
 


 

Odds Formulation of Bayes' Rule

Odds:

Let A be an event with probability P(A).
Then by the Odds of A, we mean the ratio of P(A) to P(~A).
For example, the odds of a fair coin coming up heads is P(heads) : P(not heads)
 = 0.5 / 0.5 = 1.
By the way, this is equivalent to the popular way of expressing odds: "Fifty : fifty odds"
which means 50 / 50 = 1.
Let's write O(A) for the odds on A.
The definition is then,  O(A) = P(A) / (1 - P(A))

Conditional Odds:

O(H|E) =  P(H|E) / (1 - P(H|E))

Now let's write down Bayes' Rule twice:  Once in the normal way to get P(H|E), and once for the complementary value of P(~H|E).

                   P(E|H) P(H)
 P(H|E) = ----------
                          P(E)

                     P(E|~H) P(~H)
 P(~H|E) =  -----------
                             P(E)

Now let's divide the top equation by the bottom one.  We get...

P(H|E)           P(E|H)       P(H)
-----      =  -------   ----
P(~H|E)        P(E|~H)    P(~H)

Let's call the middle term Lambda.  Then we have, by the definition of odds,

O(H|E)  =  Lambda  O(H)

This says that to take into account some evidence E, we update the odds on the hypothesis H by multipling by some number Lambda.

By learning that E is true, we update the odds on H being true.

If, on the other hand, we were to learn that E is false, then that also should impact the odds we place on H.  A complementary formula turns out to be

O(H|~E) = Lambda' O(H)

where Lambda' is some other number.

The two numbers Lambda and Lambda' characterize the relationship between the evidence E and the Hypothesis H for the purpose of updating odds.
 
 

).


 

Probabilistic Inference Networks

In realistic diagnosis problems, there are usually multiple items of evidence and multiple hypotheses.  It is usually insufficient to model the items of interest with a one level evidence to hypothesis network.  Rather, the evidence input to the network affects a bunch of hypotheses that in turn affect a bunch of other hypotheses, etc., creating a multi-level network of probabilistic relationships.

In order to manage the updating of probability or odds values through such a network, we need a means to generalize Bayes' rule so that it can handle not only situations when E is certainly true or E is certainly false but also situations where E is established with a certain probability or a certain odds.

Important research in probabilistic inference networks was performed in the development of two classic AI programs:  MYCIN and PROSPECTOR.  For MYCIN, E. Shortliffe developed a general framework for probabilistic inference networks in which certainty values are updated according to rules.  For the PROSPECTOR system, Nilsson et al developed a set of probability updating techniques that respected the attributes of necessity, sufficiency, and independence of evidence with respect to a hypothesis.
 


 
 

Last modified: November 2, 1998

Steve Tanimoto

tanimoto@cs.washington.edu