Belief Networks ILecture 23
(Chapter 15.1-2)
Artificial Intelligence I
Autumn 2001
Henry Kautz

Outline

Conditional independence

Bayesian networks: syntax and semantics

Exact inference

Approximate inference

Independence

Two random variables 1#1 2#2 are (absolutely) independent iff 3#3 or 4#4
e.g., 1#1 and 2#2 are two coin tosses

If 5#5 Boolean variables are independent, the full joint is 6#6
hence can be specified by just 5#5 numbers

Absolute independence is a very strong requirement, seldom met

Conditional independence

Consider the dentist problem with three random variables: 7#7, 8#8, 9#9 (steel probe catches in my tooth)

The full joint distribution has 10#10 = 7 independent entries

If I have a cavity, the probability that the probe catches in it doesn't depend on whether I have a toothache: (1) 11#11
i.e., 9#9 is conditionally independent of 7#7 given 8#8

The same independence holds if I haven't got a cavity: (2) 12#12

Conditional independence contd.

Product rule:

13#13

Independence:

14#14

Full joint distribution can now requires only 5 independent numbers (instead of 7)

Belief networks

A simple, graphical notation for conditional independence assertions
and hence for compact specification of full joint distributions

Syntax: a set of nodes, one per variable a directed, acyclic graph (link 15#15 directly influences'') a conditional distribution for each node given its parents: 16#16

In the simplest case, conditional distribution represented as
a conditional probability table (CPT)

Example

I'm at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar?

Variables: 17#17, 18#18, 19#19, 20#20, 21#21
Network topology reflects causal'' knowledge:

=0.5figuresburglary2.ps

Note: 22#22 parents 23#23 numbers vs. 24#24

Semantics

Global'' semantics defines the full joint distribution as
the product of the local conditional distributions:

25#25

e.g., 26#26 is given by =

Semantics

Global'' semantics defines the full joint distribution as
the product of the local conditional distributions:

25#25

e.g., 26#26 is given by = 27#27

Local'' semantics: each node is conditionally independent
of its nondescendants given its parents

Theorem: Local semantics 28#28 global semantics

Markov blanket

Each node is conditionally independent of all others given its
Markov blanket: parents + children + children's parents

=0.5figuresmarkov-blanket.ps

Constructing belief networks

Need a method such that a series of locally testable assertions of
conditional independence guarantees the required global semantics

1. Choose an ordering of variables 29#29
2. For 30#30 = 1 to 5#5 add 31#31 to the network select parents from 32#32 such that 33#33

This choice of parents guarantees the global semantics: 34#34 (chain rule) = 35#35 by construction

Example

Suppose we choose the ordering 36#36, 37#37, 1#1, 2#2, 38#38

=0.4figuresburglary-make1.ps

39#39?

Example

Suppose we choose the ordering 36#36, 37#37, 1#1, 2#2, 38#38

=0.4figuresburglary-make2.ps

39#39?    No
40#40? 41#41?

Example

Suppose we choose the ordering 36#36, 37#37, 1#1, 2#2, 38#38

=0.4figuresburglary-make3.ps

39#39?    No
40#40? 41#41?    No
42#42?
43#43?

Example

Suppose we choose the ordering 36#36, 37#37, 1#1, 2#2, 38#38

=0.4figuresburglary-make4.ps

39#39?    No
40#40? 41#41?    No
42#42?    Yes
43#43?    No
44#44?
45#45?

Example

Suppose we choose the ordering 36#36, 37#37, 1#1, 2#2, 38#38

=0.4figuresburglary-make5.ps

39#39?    No
40#40? 41#41?    No
42#42?    Yes
43#43?    No
44#44?    No
45#45?    Yes

Example: Car diagnosis

Initial evidence: engine won't start
Testable variables (thin ovals), diagnosis variables (thick ovals)
Hidden variables (shaded) ensure sparse structure, reduce parameters

=0.85figurescar-net.ps

Example: Car insurance

Predict claim costs (medical, liability, property)
given data on application form (other unshaded nodes)

=0.85figuresinsurance-net.ps

Compact conditional distributions

CPT grows exponentially with no. of parents
CPT becomes infinite with continuous-valued parent or child

Solution: canonical distributions that are defined compactly

Deterministic nodes are the simplest case: 46#46 for some function 47#47

E.g., Boolean functions 48#48

E.g., numerical relationships among continuous variables

49#49

Compact conditional distributions contd.

Noisy-OR distributions model multiple noninteracting causes 1) Parents 50#50 include all causes (can add leak node) 2) Independent failure probability 51#51 for each cause alone 52#52

 53#53 54#54 55#55 56#56 57#57 F F F 58#58 59#59 F F T 60#60 61#61 F T F 62#62 63#63 F T T 64#64 65#65 T F F 66#66 67#67 T F T 68#68 69#69 T T F 70#70 71#71 T T T 72#72 73#73

Number of parameters linear in number of parents

Naive Bayes

Very simple but surprisingly useful model: All findings conditionally independent given cause 74#74

Therefore the cause 75#75 that maximizes 76#76 is just the one that maximizes t#math178#77#77 !

Pathfinder: first BN medical diagnosis system. Using naive Bayes outperformed doctors! Full BN version saved 1 life in 1000. 1) Better at incorporating prior probability of different diseases. 2) Uses all evidence -- humans focus on only 7-9 pieces

CPCS: internal diseases -- 448 nodes, 906 edges