lec24_bayes2

Next: About this document ...

Belief Networks IILecture 24
(Chapter 15.3-4 + new)
Artificial Intelligence I
Autumn 2001
Henry Kautz

Outline

Exact inference by enumeration

Exact inference by variable elimination

Approximate inference by stochastic simulation

Approximate inference by Markov chain Monte Carlo

Inference tasks

Causal: Given burglary, what is probability John calls?

Diagnostic: Given John calls, what is probability of earthquake?

Mixed: Given John calls and there is an earthquake, what is the probability of burglary?

Most Probable Explanation: Given earthquake, what is the most likely simulaneous setting of all of the other variables?

Inference by enumeration

Slightly intelligent way to sum out variables from the joint without actually constructing its explicit representation

Simple query on the burglary network:
1#1
2#2
3#3
4#4

Rewrite full joint entries using product of CPT entries:
5#5
6#6
7#7

Enumeration algorithm

Exhaustive depth-first enumeration: 8#8 space, 9#9 time

120#120

Inference by variable elimination

Enumeration is inefficient: repeated computation e.g., computes 10#10 for each value of 11#11

Variable elimination: carry out summations right-to-left,
storing intermediate results (factors) to avoid recomputation

1#1 12#12 13#13 14#14 15#15 16#16 (sum out 17#17) 18#18 (sum out 19#19) 20#20

A form of dynamic programming. Can also be implemented using message passing of intermediate results.

Variable elimination: Basic operations

Pointwise product of factors 21#21 and 22#22: 23#23 = 24#24
E.g., 25#25

Summing out a variable from a product of factors: move any constant factors outside the summation:

26#26

assuming 27#27 do not depend on 28#28

Variable elimination algorithm

121#121

Complexity of Bayes net inference

Singly connected networks (or polytrees): - any two nodes are connected by at most one (undirected) path - time and space cost of variable elimination are 29#29

Multiply connected networks: - can reduce 3SAT to exact inference 30#30 NP-hard - equivalent to counting 3SAT models 30#30 #P-complete

=0.75figuresbn-3sat.ps

Inference by stochastic simulation

Basic idea: 1) Draw 31#31 samples from a sampling distribution 32#32 2) Compute an approximate posterior probability 33#33 3) Show this converges to the true probability 34#34

Outline: - Sampling from an empty network - Rejection sampling: reject samples disagreeing with evidence - Likelihood weighting: use evidence to weight samples - MCMC: sample from a stochastic process whose stationary distribution is the true posterior

Sampling from an empty network

122#122

35#35 sample 36#36 37#37
38#38 sample 36#36 39#39
40#40 sample 36#36 37#37
41#41 sample 36#36 37#37

=0.5 42#42

Sampling from an empty network contd.

Probability that PriorSample generates a particular event 43#43
i.e., the true prior probability

Let 44#44 be the number of samples generated for which 45#45, for any set of variables 46#46.

Then 47#47 and

48#48

That is, estimates derived from PriorSample are consistent

Rejection sampling

49#49 estimated from samples agreeing with 50#50

123#123

E.g., estimate 51#51 using 100 samples 27 samples have 52#52 Of these, 8 have 53#53 and 19 have 54#54.

55#55

Similar to a basic real-world empirical estimation procedure

Analysis of rejection sampling

56#56 (algorithm defn.) 57#57 (normalized by 58#58) 59#59 (property of PriorSample) 60#60 (defn. of conditional probability)

Hence rejection sampling returns consistent posterior estimates

Problem: hopelessly expensive if 61#61 is small

Likelihood weighting

Idea: fix evidence variables, sample only nonevidence variables,
and weight each sample by the likelihood it accords the evidence

124#124

Likelihood weighting example

Estimate 62#62

=0.5 63#63

LW example contd.

Sample generation process:
1. 64#64
2. Sample 35#35; say 37#37
3. 65#65 has value 37#37, so 66#66
4. Sample 67#67; say 37#37
5. 68#68 has value 37#37, so 69#69

Likelihood weighting analysis

Sampling probability for WeightedSample is 70#70
Note: pays attention to evidence in ancestors only 30#30 somewhere ``in between'' prior and posterior distribution

Weight for a given sample 71#71 is 72#72

Weighted sampling probability is 73#73 74#74 75#75 (by standard global semantics of network)

Hence likelihood weighting returns consistent estimates
but performance still degrades with many evidence variables

Approximate inference using MCMC

``State'' of network = current assignment to all variables

Generate next state by sampling one variable given Markov blanket
Sample each variable in turn, keeping evidence fixed

125#125

Approaches stationary distribution: long-run fraction of time spent in each state is exactly proportional to its posterior probability

MCMC Example

Estimate 62#62

Sample 76#76 then 77#77, repeat.
Count number of times 77#77 is true and false in the samples.

Markov blanket of 76#76 is 65#65 and 77#77
Markov blanket of 77#77 is 76#76, 65#65, and 68#68

=0.5 63#63

MCMC example contd.

Random initial state: 78#78 and 54#54

1. 79#79 sample 36#36 39#39

2. 80#80 sample 36#36 37#37

Visit 100 states 31 have 53#53, 69 have 54#54

81#81 82#82

MCMC analysis: Outline

Transition probability 83#83

Occupancy probability 84#84 at time 85#85

Equilibrium condition on 86#86 defines stationary distribution 87#87 Note: stationary distribution depends on choice of 83#83

Pairwise detailed balance on states guarantees equilibrium

Gibbs sampling transition probability: sample each variable given current values of all others
30#30 detailed balance with the true posterior

For Bayesian networks, Gibbs sampling reduces to
sampling conditioned on each variable's Markov blanket

Stationary distribution

84#84 = probability in state 88#88 at time 85#85
89#89 = probability in state 90#90 at time 91#91

92#92 in terms of 86#86 and 83#83

93#93

Stationary distribution: 94#94

95#95

If 96#96 exists, it is unique (specific to 83#83)

In equilibrium, expected ``outflow'' = expected ``inflow''

Detailed balance

``Outflow'' = ``inflow'' for each pair of states:

97#97

Detailed balance 30#30 stationarity:

98#98

MCMC algorithms typically constructed by designing a transition
probability 99#99 that is in detailed balance with desired 96#96

Gibbs sampling

Sample each variable in turn, given all other variables

Sampling 100#100, let 101#101 be all other nonevidence variables
Current values are 102#102 and 103#103; 50#50 is fixed
Transition probability is given by

104#104

This gives detailed balance with true posterior 105#105:

106#106

Markov blanket sampling

A variable is independent of all others given its Markov blanket: 107#107

Probability given the Markov blanket is calculated as follows: 108#108

Hence computing the sampling distribution over 100#100 for each flip requires just 109#109 multiplications if 100#100 has 110#110 children and 111#111 values; can cache it if 110#110 not too large.

Main computational problems: 1) Difficult to tell if convergence has been achieved 2) Can be wasteful if Markov blanket is large: 112#112 won't change much (law of large numbers)

Performance of approximation algorithms

Absolute approximation: 113#113

Relative approximation: 114#114

Relative 30#30 absolute since 115#115 (may be 116#116)

Randomized algorithms may fail with probability at most 117#117

Polytime approximation: 118#118

Theorem (Dagum and Luby, 1993): both absolute and relative
approximation for either deterministic or randomized algorithms
are NP-hard for any 119#119

(Absolute approximation polytime with no evidence--Chernoff bounds)

About this document ...

Next: About this document ...

Don Patterson 2001-12-14