Current Projects

Dynamic High Resolution Deformable Articulated Tracking

In order for robots to safely operate in human environments, it is important that they are able to accurately track the pose of humans and other dynamic articulated objects in order to avoid collisions and safely perform physical interaction. The last several years have seen significant progress in using depth cameras for tracking articulated objects such as human bodies, hands, and robotic manipulators. Most approaches focus on tracking skeletal parameters of a fixed shape model, which makes them insufficient for applications that require accurate estimates of deformable object surfaces. To overcome this limitation, we present a 3D model-based tracking system for articulated deformable objects. Our system is able to track human body pose and high resolution surface contours in real time using a commodity depth sensor and GPU hardware.

SE3-Nets: Learning Rigid Body Motion using Deep Neural Networks

In this work, we explore the use of deep learning for learning a notion of physical intuition. We introduce SE3-Nets, which are deep networks designed to model rigid body motion from raw point cloud data. Based only on pairs of 3D point clouds along with a continuous action vector and point wise data associations, SE3-Nets learn to segment effected object parts and predict their motion resulting from the applied force. We show that the structure underlying SE3-Nets enables them to generate a far more consistent prediction of object motion than traditional flow based networks, on three simulated scenarios.

The Best of Both Modes: Separately Leveraging RGB and Depth for Unseen Object Instance Segmentation

In order to function in unstructured environments, robots need the ability to recognize unseen novel objects. We take a step in this direction by tackling the problem of segmenting unseen object instances in tabletop environments. However, the type of large-scale real-world dataset required for this task typically does not exist for most robotic settings, which motivates the use of synthetic data. We propose a novel method that separately leverages synthetic RGB and synthetic depth for unseen object instance segmentation. Our method is comprised of two stages where the first stage operates only on depth to produce rough initial masks, and the second stage refines these masks with RGB. Surprisingly, our framework is able to learn from synthetic RGB-D data where the RGB is non-photorealistic. To train our method, we introduce a large-scale synthetic dataset of random objects on tabletops. We show that our method, trained on this dataset, can produce sharp and accurate masks, outperforming state-of-the-art methods on unseen object instance segmentation. We also show that our method can segment unseen objects for robot grasping. Code, models and video can be found at https://rse-lab.cs.washington.edu/projects/unseen-object-instance-segmen....

DeepIM: Deep Iterative Matching for 6D Pose Estimation

Estimating the 6D pose of objects from images is an important problem in various applications such as robot manipulation and virtual reality. While direct regression of images to object poses has limited accuracy, matching rendered images of an object against the input image can produce accurate results. In this work, we propose a novel deep neural network for 6D pose matching named DeepIM. Given an initial pose estimation, our network is able to iteratively refine the pose by matching the rendered image against the observed image. The network is trained to predict a relative pose transformation using an untangled representation of 3D location and 3D orientation and an iterative training process. Experiments on two commonly used benchmarks for 6D pose estimation demonstrate that DeepIM achieves large improvements over state-of-the-art methods. We furthermore show that DeepIM is able to match previously unseen objects.

PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes

Estimating the 6D pose of known objects is important for robots to interact with the real world. The problem is challenging due to the variety of objects as well as the complexity of a scene caused by clutter and occlusions between objects. In this work, we introduce PoseCNN, a new Convolutional Neural Network for 6D object pose estimation. PoseCNN estimates the 3D translation of an object by localizing its center in the image and predicting its distance from the camera. The 3D rotation of the object is estimated by regressing to a quaternion representation. We also introduce a novel loss function that enables PoseCNN to handle symmetric objects. In addition, we contribute a large scale video dataset for 6D object pose estimation named the YCB-Video dataset. Our dataset provides accurate 6D poses of 21 objects from the YCB dataset observed in 92 videos with 133,827 frames. We conduct extensive experiments on our YCB-Video dataset and the OccludedLINEMOD dataset to show that PoseCNN is highly robust to occlusions, can handle symmetric objects, and provide accurate pose estimation using only color images as input. When using depth data to further refine the poses, our approach achieves state-of-the-art results on the challenging OccludedLINEMOD dataset.

Building Hierarchies of Concepts via Crowdsourcing

In this project, we propose a novel crowdsourcing system for inferring hierarchies of concepts, tackling the questions posed above. We develop a principled algorithm powered by the crowd, which is robust to noise, efficient in picking questions, cost-effective, and builds high quality hierarchies.

Graph-Based Inverse Optimal Control for Robot Manipulation

This project explores an approach towards teaching manipulation tasks to robots via human demonstrations. A human demonstrates the desired task (say, carrying a cup of water without spilling) by physically moving the robot. Given many such kinesthetic demonstrations, the robot applies a learning algorithm to learn a model of the underlying task. In a new scene, the robot uses this task-model to plan a path that satisfies the task requirements.

DART: Dense Articulated Real-Time Tracking

This project aims to provide a unified framework for tracking any arbitrary articulated model, given it's geometric and kinematic structure. Our approach uses dense input data (computing an error term on every pixel), which we are able to process in real-time by leveraging the power of GPGPU programming and very efficient representation of model geometry with signed distance functions.

Language Grounding in Robotics

A number of long-term goals in robotics, for example, using robots in household settings; require robots to interact with humans. In this project, we explore how robots can learn to correlate natural language to the physical world being sensed and manipulated, an area of research that falls under grounded language acquisition.

Hierarchical Matching Pursuit for RGB-D Recognition

Hierarchical Matching Pursuit uses sparse coding to learn codebooks at each layer in an unsupervised way and then builds hierarchial feature representations from the learned codebooks. It achieves state-of-the-art results on many types of recognition tasks.

RGB-D Object Dataset

The RGB-D Object Dataset is a large dataset of 300 common household objects. The objects are organized into 51 categories arranged using WordNet hypernym-hyponym relationships (similar to ImageNet). This dataset was recorded using a Kinect style 3D camera that records synchronized and aligned 640x480 RGB and depth images at 30 Hz.


RGB-D Mapping: Using Depth Cameras for Dense 3D Mapping

Simultaneous localization and mapping (SLAM) has been a major focus of mobile robotics work for many years. We combine state-of-the-art visual odometry and pose-graph estimation techniques with a combined color and depth camera to make accurate, dense maps of indoor environments.

RGB-D Object Recognition and Detection

In this project we address joint object category, instance, and pose recognition in the context of rapid advances of RGB-D cameras that combine both visual and 3D shape information. The focus is on detection and classification of objects in indoor scenes, such as in domestic environments

Attribute Based Object Identification

We introduce an approach for identifying objects based on natural language containing appearance and name attributes.


Data-Efficient Robot Reinforcement Learning

This project aims at developing and applying novel reinforcement learning methods to low-cost off-the-shelf robots to make them learn tasks in a few trials only.

Robotic In-Hand 3D Object Modeling

We address the problem of active object investigation using robotic manipulators and Kinect-style RGB-D depth sensors. To do so, we jointly tackle the issues of sensor to robot calibration, manipulator tracking, and 3D object model construction. We additionally consider the problem of motion and grasp planning to maximize coverage of the object.

Object Modeling During Scene Reconstruction

We segment objects during scene reconstruction rather than after as is usual. The emphasis is on merging information we get from different points in time to improve existing object and scene models.

Gaussian Processes for Bayesian State Estimation

The goal of this project is to integrate Gaussian process prediction and observation models into Bayes filters. These GP-BayesFilters are more accurate than standard Bayes filters using parametric models. In addition, GP models naturally supply the process and observation noise necessary for Bayesian filters.

Object Segmentation from Motion

We can't be sure where objects are unless we see them move relative to each other. In this project we investigate using motion as a cue to segment objects. We can make use of passive sensing or active vision, and both long-term and short-term motion, to aid segmentation.


 
 

Inactive Projects

Learning to Navigate Through Crowded Environments

In this project we use inverse reinforcement learning to train a planner for natural and efficient robotic motion in crowded environments.

RGB-D Kernel Descriptors

Kernel descriptors is a general approach that extracts multi-level representations from high-dimensional structured data such as images, depth maps, and 3D point clouds.

3-D Object Discovery Using Motion

In contrast to object recognition or object detection, which match data to existing object models, object discovery creates object models. Obviously, we need information sources to compensate for the lack of models. In this project, we investigate using 3-D motion of surface patches between multiple maps of the same environment as such a cue.

Robot Game Playing

The Gambit manipulator is a novel robotic arm combined with an RGBD camera, used for interacting dextrously with small-scale physical objects, as in game playing.

Robotic Pile Sorting and Manipulation

We are investigating strategies for robot interaction with piles of objects and materials in cluttered scenes. In particular, interaction with unstructured sets of objects will allow a robot to explore and manipulate novel items in order to perform useful tasks, such as counting, arranging, or sorting even without having a prior model of the objects.


Active Mapping

We can improve any geometric and semantic scene reconstruction technique by adding a robot: the robot can choose how to use its sensors to get new information. In the case of active vision, the robot can move a camera around to view, e.g., all sides of an object. We use active vision for the map completion and object segmentation problems. Map completion is the problem of viewing all surfaces in a map, and active segmentation involves looking at areas likely to contain object boundaries so that we can determine which are in fact boundaries.

Robot Localization

Robot localization is an important application driving our research in belief representations and particle filtering for state estimation. Localization is one of the most fundamental problems in mobile robotics. With our collaborators, we introduced grid-based approaches, tree-based representations, and particle filters for robot localization. We were the first to solve the global localization problem, which requires a robot to estimate its position within an environment from scratch, i.e., without knowledge of its start position.


Mapping and Exploration

We are interested in the development of robust and efficient map buiding techniques. We developed different solutions to this problem, ranging from expectation maximization (EM) to Rao-Blackwellised particle filters. We also introduced novel coordination strategies for large teams of mobile robots. Within the CentiBots project, we developed a decision-theoretic approach that enables teams of robots to build a consistent map of an environment even when the robots start from different, completely unknown locations.

Centibots: the Hundred-Robots Project

The Centibots system is a framework for very large teams of robots that are able to perceive, explore, plan and collaborate in unknown environments. The Centibots were developed in collaboration with SRI International, funded under DARPA's SDR program.

Museum Tour-guide Robots

The reliability of probabilistic methods for mobile robot navigation has been demonstrated during the deployment of the mobile robots Rhino and Minerva as tour-guides in two populated museums. The task of these robots was to guide people through the exhibitions of the ``Deutsches Museum Bonn'', Germany, and the ``National Museum of American History'' in Washington, D.C.

Plant Care

The plant care project helps us to investigate how mobile robots can interact with environments that are equipped with networks of sensors. The task of the robot is to water the plants and calibrate the sensors in the environment.


People Tracking

Knowing and predicting the locations of people moving through an environment is a key component of many proactive service applications, including mobile robots. Depending on the task and the available sensors, we apply joint probabilistic data association filters, Rao-Blackwellised particle filters, and Voronoi-based particle filters to estimate locations of people. Such estimates build the foundations for learning typical motion patterns of people, as used in the activity recognition project.

Active Sensing and Estimation in RoboCup

The task sounds simple: Program Sony AIBO robots to play soccer. We use RoboCup to investigate techniques for multi-robot collaboration, active sensing, and efficient state estimation. Our multi-model technique for ball tracking allows our robots to accurately track the ball and its interactions with the environment; even under the highly non-linear dynamics typically occuring during a soccer game. Our active sensing strategy is based on reinforcement learning. It takes into account which uncertainty has to be minimized at each point in time.

Particle Filters

With our collaborators, we introduced particle filters as a powerful tool for state estimation in mobile robotics. More recently, we developed several improvements to particle filters, including adaptive particle filters, which dynamically adapt the size of sample sets to the complexity of the underlying belief. We also developed real-time particle filters, which avoid loss of sensor information even under limited computational resources.

Activity Recognition

This project aims at learning and estimating high-level activities from raw sensor data. To do so, we strongly rely on the etimates generated by our people tracking approaches. We recently demonstrated that it is possible to learn typical outdoor navigation patterns of a person using raw GPS data. For example, our approach uses EM to learn where a person typically gets on or off the bus. Such techniques allow hand-held computer devices to assist people with cognitive disorders during their everyday life.

Semantic Mapping

The goal of this project is to generate models that describe environments in terms of objects and places. Such representations contain far more useful information than traditional maps, and enable robots to interact with humans in a more natural way.