Title: Human Priors for Reinforcement Learning

Advisors: Pedro Domingos & Sidd Srinivasa

Abstract: Current methods for deep reinforcement learning incorporate only minimal prior knowledge about the environment, and learn from only the rewards and possibly other sparse signals. This greatly limits the speed of learning. We propose to incorporate a few basic priors that humans are known to use: "Infants divide perceptual arrays into units that move as connected wholes, that move separately from one another, that tend to maintain their size and shape over motion, and that tend to act upon each other only on contact" [1]. Based on these, we learn a model of the environment directly from the entire perceptual stream, by detecting objects, predicting their motion, and learning from the errors made. This model has no supervision, yet is able to quickly and accurately develop a compact object-level representation of its environment. We demonstrate the power of such a representation by combining it with a simple linear value function estimator, with object properties and relations as features, to learn effective control policies. We test this approach on Atari video games, and learn up to four orders of magnitude faster than deep Q networks, rendering rapid desktop experiments in this domain feasible. Our system is also, to our knowledge, the first to learn faster than humans.

[1] Spelke, Elizabeth S (1990). “Principles of Object perception”. In:Cognitive Science14.1, pp. 29–56.


CSE2 371 (Gates Center)
Friday, March 1, 2019 - 15:00 to 16:30