Belief Networks ILecture 23
(Chapter 15.1-2)
Artificial Intelligence I
Autumn 2001
Henry Kautz
Outline
Conditional independence
Bayesian networks: syntax and semantics
Exact inference
Approximate inference
Independence
Two random variables 1#1 2#2 are (absolutely) independent iff
3#3
or
4#4
e.g., 1#1 and 2#2 are two coin tosses
If 5#5 Boolean variables are independent, the full joint is
6#6
hence can be specified by just 5#5 numbers
Absolute independence is a very strong requirement, seldom met
Conditional independence
Consider the dentist problem with three random variables: 7#7, 8#8, 9#9 (steel probe catches in my tooth)
The full joint distribution has 10#10 = 7 independent entries
If I have a cavity, the probability that the probe catches in it
doesn't depend on whether I have a toothache:
(1)
11#11
i.e., 9#9 is conditionally independent of 7#7 given 8#8
The same independence holds if I haven't got a cavity: (2) 12#12
Conditional independence contd.
Product rule:
13#13
Independence:
14#14
Full joint distribution can now requires only 5 independent numbers (instead of 7)
Belief networks
A simple, graphical notation for conditional independence assertions
and hence for compact specification of full joint distributions
Syntax: a set of nodes, one per variable a directed, acyclic graph (link 15#15 ``directly influences'') a conditional distribution for each node given its parents: 16#16
In the simplest case, conditional distribution represented as
a conditional probability table (CPT)
Example
I'm at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar?
Variables: 17#17, 18#18, 19#19, 20#20, 21#21
Network topology reflects ``causal'' knowledge:
=0.5figuresburglary2.ps
Note: 22#22 parents 23#23 numbers vs. 24#24
Semantics
``Global'' semantics defines the full joint distribution as
the product of the local conditional distributions:
Semantics
``Global'' semantics defines the full joint distribution as
the product of the local conditional distributions:
``Local'' semantics: each node is conditionally independent
of its nondescendants given its parents
Theorem: Local semantics 28#28 global semantics
Markov blanket
Each node is conditionally independent of all others given its
Markov blanket: parents + children + children's parents
=0.5figuresmarkov-blanket.ps
Constructing belief networks
Need a method such that a series of locally testable assertions of
conditional independence guarantees the required global semantics
1. Choose an ordering of variables
29#29
2. For 30#30 = 1 to 5#5
add 31#31 to the network
select parents from
32#32 such that
33#33
This choice of parents guarantees the global semantics: 34#34 (chain rule) = 35#35 by construction
Example
Suppose we choose the ordering 36#36, 37#37, 1#1, 2#2, 38#38
=0.4figuresburglary-make1.ps
39#39?
Example
Suppose we choose the ordering 36#36, 37#37, 1#1, 2#2, 38#38
=0.4figuresburglary-make2.ps
39#39? No
40#40?
41#41?
Example
Suppose we choose the ordering 36#36, 37#37, 1#1, 2#2, 38#38
=0.4figuresburglary-make3.ps
39#39? No
40#40?
41#41? No
42#42?
43#43?
Example
Suppose we choose the ordering 36#36, 37#37, 1#1, 2#2, 38#38
=0.4figuresburglary-make4.ps
39#39? No
40#40?
41#41? No
42#42? Yes
43#43? No
44#44?
45#45?
Example
Suppose we choose the ordering 36#36, 37#37, 1#1, 2#2, 38#38
=0.4figuresburglary-make5.ps
39#39? No
40#40?
41#41? No
42#42? Yes
43#43? No
44#44? No
45#45? Yes
Example: Car diagnosis
Initial evidence: engine won't start
Testable variables (thin ovals), diagnosis variables (thick ovals)
Hidden variables (shaded) ensure sparse structure, reduce parameters
=0.85figurescar-net.ps
Example: Car insurance
Predict claim costs (medical, liability, property)
given data on application form (other unshaded nodes)
=0.85figuresinsurance-net.ps
Compact conditional distributions
CPT grows exponentially with no. of parents
CPT becomes infinite with continuous-valued parent or child
Solution: canonical distributions that are defined compactly
Deterministic nodes are the simplest case: 46#46 for some function 47#47
E.g., Boolean functions 48#48
E.g., numerical relationships among continuous variables
Compact conditional distributions contd.
Noisy-OR distributions model multiple noninteracting causes 1) Parents 50#50 include all causes (can add leak node) 2) Independent failure probability 51#51 for each cause alone 52#52
53#53 | 54#54 | 55#55 | 56#56 | 57#57 | ||
F | F | F | 58#58 | 59#59 | ||
F | F | T | 60#60 | 61#61 | ||
F | T | F | 62#62 | 63#63 | ||
F | T | T | 64#64 | 65#65 | ||
T | F | F | 66#66 | 67#67 | ||
T | F | T | 68#68 | 69#69 | ||
T | T | F | 70#70 | 71#71 | ||
T | T | T | 72#72 | 73#73 |
Number of parameters linear in number of parents
Naive Bayes
Very simple but surprisingly useful model: All findings conditionally independent given cause 74#74
Therefore the cause 75#75 that maximizes
76#76 is
just the one that maximizes t
Pathfinder: first BN medical diagnosis system. Using naive Bayes outperformed doctors! Full BN version saved 1 life in 1000. 1) Better at incorporating prior probability of different diseases. 2) Uses all evidence -- humans focus on only 7-9 pieces
CPCS: internal diseases -- 448 nodes, 906 edges