|
CSE Home | AI Home | About CSE | Search | Contact Info |
|
Markov Decision ProcessesOverviewThe Markov Decision Process (MDP) framework for decision making, planning, and control is surprisingly rich in capturing the essence of purposeful activity in various situations. These models and associated problems arise in many areas, including medical decision making, maintenance planning, robot navigation, and so on, and have taken a good portion of research in operations research. MDPs are also a source of clean and challenging computational problems. My Ph.D. thesis research will primarily focus on the computational complexity and algorithmic issues in infinite-horizon MDPs and related models and problems. I am advised by professors Richard Anderson and Steve Hanks. My committee members include Jim Burke from mathematics dept. and Anne Condon and Dick Karp. Recently, Steve Hanks and Anne Condon and I investigated the computability of several infinite horizon Partially Observable MDP (POMDP) problems. We had shown undecidability of most problems under the so-called undiscounted optimality criteria, by making connections to results on probabilistic finite automata, but the discounted case remained open and vexing! We recently settled this, and I presented the paper , titled On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems, in AAAI99. A prior paper on the subject, reports on some of the then open problems, and also includes a discussion of properties discounted and undiscounted unobservable MDPs have or lack, including a counter example to a desirable periodicity property that discounted unobservable MDPs might have had. I presented the paper in AAAI98 Fall Symposium on planning with POMDPs. Currently, I am investigating the computational complexity of infinite horizon (fully observable) MDPs. To get an idea of the area and some open problems here, you may want to check out my generals exam report. An important problem here is whether an algorithm commonly called policy iteration in the MDP literature, credited to Richard Bellman and Ron Howard, has polynomial running time. Variants of the algorithm exhibit very good performance in practice. A preliminary technical report describes a few structural properties that MDP and other related problems enjoy, from which not only correctness of algorithms such as policy iteration can be shown, but also some constraints on the search path of policy iteration are derived. The paper reports on these constraints, and raises the question of whether these lead to upper bounds better than exponential. Later I discovered a probabilistic argument to show that a random process constrained as such has expected exponential running time. Concurrently, I have been looking at a restricted class of MDPs, I call MDP(2), or the leaky model, to simplify and get insight on problem structure that's perhaps applicable to the general problem. Connections to feasibility in two variable per inequality (TVPI) linear systems, and related generalized network flow and generalized shortest path problems and applicability of their corresponding algorithms have been made. More "lower level" constraints, based on the linearity of the MDP problem and graph structure have also been discovered that suggest analysis techniques and new algorithms, and point to a polynomial run-time for policy iteration on MDP(2). A technical report on this is underway. Publications
|
Computer Science & Engineering University of Washington Box 352350 Seattle, WA 98195-2350 (206) 543-1695 voice, (206) 543-2969 FAX [comments to Omid Madani] |