CS&E logo University of Washington Computer Science & Engineering
 Markov Decision Processes
  CSE Home   AI Home  About CSE    Search    Contact Info 

Project students
 Omid Madani
   

Markov Decision Processes

Overview

The Markov Decision Process (MDP) framework for decision making, planning, and control is surprisingly rich in capturing the essence of purposeful activity in various situations.  These models and associated problems arise in many areas, including medical decision making, maintenance planning, robot navigation, and so on, and have taken a good portion of research in operations research.   MDPs are also a source of clean and challenging  computational problems.  My Ph.D. thesis research will primarily focus on the computational complexity and algorithmic issues in infinite-horizon MDPs and related models and problems. I am  advised by professors Richard Anderson and Steve Hanks.  My committee members include Jim Burke from mathematics dept. and Anne Condon and Dick  Karp.

Recently, Steve Hanks and Anne Condon and I investigated the computability of several infinite horizon Partially Observable MDP (POMDP) problems.   We had shown undecidability of most problems under the so-called undiscounted optimality criteria, by making connections to results on probabilistic finite automata, but the discounted case remained open and vexing!  We recently settled this, and I presented the paper , titled On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems, in AAAI99.  A  prior paper   on the subject, reports on some of the then open problems, and also includes a discussion of properties discounted and undiscounted unobservable MDPs have or lack, including a counter example to a desirable periodicity property that discounted unobservable MDPs might have had.  I presented the paper in AAAI98 Fall Symposium on planning with POMDPs.

Currently, I am investigating the computational complexity of infinite horizon (fully observable) MDPs.  To get an idea of the area and some open problems here, you may want to check out my generals  exam report.

An important problem here is whether an algorithm commonly called policy iteration in the MDP literature, credited to Richard Bellman and Ron Howard,  has polynomial running time.  Variants of the  algorithm exhibit  very good performance in practice.   A preliminary  technical report  describes a few structural  properties that MDP  and other related problems enjoy, from which not only correctness of algorithms such as policy iteration can be shown, but also some constraints on the search path  of policy iteration are derived.   The paper reports on these constraints, and raises the question of whether these lead to upper bounds better than exponential.   Later I discovered a probabilistic argument to show that a random process constrained as such has expected exponential running time.

Concurrently, I have been looking at a restricted class of MDPs, I call MDP(2), or the leaky model, to simplify and get insight on problem structure that's perhaps applicable to the general problem.    Connections to feasibility in two variable per inequality (TVPI) linear systems, and related  generalized network flow and generalized shortest path problems and applicability of their corresponding algorithms have been made.   More "lower level" constraints, based on the linearity of the MDP problem and graph structure have also been discovered that suggest analysis techniques and new algorithms, and point to a polynomial run-time for policy iteration on MDP(2).   A technical  report on this is underway.

Publications



CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to Omid Madani]