lec22_prob

Next: About this document ...

UncertaintyLecture 22
Artificial Intelligence I
Autumn 2001
Henry Kautz

Outline

Uncertainty

Probability

Syntax

Semantics

Inference rules

Uncertainty

Let action 1#1 = leave for airport 2#2 minutes before flight
Will 1#1 get me there on time?

Issues: 1) partial observability (road state, other drivers' plans, etc.) 2) noisy sensors (KCBS traffic reports) 3) uncertainty in action outcomes (flat tire, etc.) 4) immense complexity of modelling and predicting traffic

Problems with logical planning?

Uncertainty

Let action 1#1 = leave for airport 2#2 minutes before flight
Will 1#1 get me there on time?

Problems: 1) partial observability (road state, other drivers' plans, etc.) 2) noisy sensors (KCBS traffic reports) 3) uncertainty in action outcomes (flat tire, etc.) 4) immense complexity of modelling and predicting traffic

Hence a purely logical approach either
1) risks falsehood: ``3#3 will get me there on time''
or 2) leads to conclusions that are too weak for decision making: ``3#3 will get me there on time if there's no accident on the bridge and it doesn't rain and my tires remain intact etc etc.''

(4#4 might reasonably be said to get me there on time
but I'd have to stay overnight in the airport 5#5)

Probability

Subjective or Bayesian probability:
Relate propositions to one's own state of knowledge e.g., 6#6

Probabilities of propositions change with new evidence: e.g., 7#7

Objective probabilities based only on observed frequencies of events. On 1,000 flips this coin came up heads 90 times. What is the probability next flip will be heads?

Subjective probabilities can be assigned to anything: What is the probability that there will be a terrorist attack on Thanksgiving?

Making decisions under uncertainty

Suppose I believe the following:

8#8

Which action to choose?

Making decisions under uncertainty

Suppose I believe the following:

8#8

Which action to choose?

Depends on my preferences for missing flight vs. airport cuisine, etc.

Utility theory is used to represent and infer preferences

Decision theory = utility theory + probability theory

Axioms of probability

For any propositions 9#9, 10#10

1. 11#11
2. 12#12 and 13#13
3. 14#14

=0.45figuresaxiom3-venn.ps

de Finetti (1931): an agent who bets according to probabilities that violate these axioms can be forced to bet so as to lose money regardless of outcome.

Syntax

Similar to propositional logic: possible worlds defined by assignment of values to random variables.

Propositional or Boolean random variables e.g., 15#15 (do I have a cavity?)
Include propositional logic expressions e.g., 16#16

Multivalued random variables e.g., 17#17 is one of 18#18
Values must be exhaustive and mutually exclusive

Proposition constructed by assignment of a value: e.g., 19#19; also 20#20 for clarity

Syntax contd.

Unconditional probabilities of propositions 21#21 and 22#22
correspond to belief prior to arrival of any (new) evidence

Probability distribution gives values for all possible assignments: 23#23 (note: must sum to 1)

Joint probability distribution for a set of variables gives
values for each possible assignment to all the variables 24#24 = a 25#25 matrix of values:

Conditional probabilities 26#26 i.e., given that 27#27 is all I know

New evidence may be irrelevant, allowing simplification, e.g., 28#28

Conditional probability

Definition of conditional probability:

29#29

Product rule gives an alternative formulation: 30#30

General version holds for whole distributions, e.g., 31#31
(View as a 25#25 set of equations, not matrix mult.)

Chain rule is derived by successive application of product rule: 32#32 = 33#33 = 5#5 = 34#34

Bayesian Updating

Prior probabilities = what you believe before seeing evidence.

Posterior probabilities = what you believe after seeing evidence.

Bayesian updating: if 27#27 is observed, then 35#35

Therefore sometimes use ``posterior'' to refer to conditional probabilities.

Bayes' Rule

Product rule 30#30

36#36

Why is this useful???

Bayes' Rule

Product rule 30#30

36#36

Why is this useful???

For assessing diagnostic probability from causal probability:

37#37

E.g., let 38#38 be meningitis, 39#39 be stiff neck:

40#40

Note: posterior probability of meningitis still very small!

Normalization

Suppose we wish to compute a posterior distribution over 9#9
given 41#41, and suppose 9#9 has possible values 42#42

We can apply Bayes' rule for each value of 9#9: 43#43 5#5 44#44
Note denominator is always 45#45, and that 46#46 = sum numerators

So (1/sum numerators) is the normalization factor:

47#47

Typically compute an unnormalized distribution, normalize at end e.g., suppose 48#48 then 49#49

Conditioning

Introducing a variable as an extra condition:

50#50

Intuition: often easier to assess each specific circumstance, e.g.,
51#51 = 52#52 + 53#53 + 54#54

When 55#55 is absent, we have summing out or marginalization:

56#56

Full joint distributions

A complete probability model specifies every entry in the joint distribution for all the variables 57#57
I.e., a probability for each possible world 58#58

Note: possible world = model

E.g., suppose 27#27 and 15#15 are the random variables:

59#59

Thus,

60#60

Inference from joint distributions

Typically, we are interested in the posterior joint distribution of the query variables 61#61 given specific values 62#62 for the evidence variables 63#63 taking into account the hidden variables 64#64

Then the required summation of joint entries is done by summing out the hidden variables:

65#65

The terms in the summation are joint entries because 61#61, 63#63, and 64#64 together exhaust the set of random variables

Obvious problems: 1) Worst-case time complexity 66#66 where 67#67 is the largest arity 2) Space complexity 66#66 to store the joint distribution 3) How to find the numbers for 66#66 entries???

About this document ...

Next: About this document ...

Don Patterson 2001-12-14