RGB-D Object Dataset
- December 13, 2012 - Software and data for detection-based object labeling in Kinect videos now available here.
- October 3, 2012 - The dataset is now available for download directly from the website! No more sending emails necessary (questions and suggestions are, of course, still welcomed!).
- April 6, 2012 - RGB-D kernel descriptors are now available.
- March 22, 2012 - 3D reconstructions created by aligning video frames of all 8 scenes in the RGB-D Scenes Dataset are now available.
- June 20, 2011 - Pose annotations for all 300 objects in the RGB-D Object Dataset are now available.
The RGB-D Object Dataset is a large dataset of 300 common household objects. The objects are organized into 51 categories arranged using WordNet hypernym-hyponym relationships (similar to ImageNet). This dataset was recorded using a Kinect style 3D camera that records synchronized and aligned 640x480 RGB and depth images at 30 Hz. Each object was placed on a turntable and video sequences were captured for one whole rotation. For each object, there are 3 video sequences, each recorded with the camera mounted at a different height so that the object is viewed from different angles with the horizon.
Unlike many existing datasets,such as Caltech 101 and ImageNet, objects in this dataset are organized into both categories and instances. In these datasets, the class dog contains images from many different dogs and there is no way to tell whether two images contain the same dog, while in the RGB-D Object Dataset the category soda can is divided into physically unique instances like Pepsi Can and Mountain Dew Can. The dataset also provides ground truth pose information for all 300 objects.
Here are some example objects that have been segmented from the background.
RGB-D Scenes Dataset
Aside from isolated views of the 300 objects, the RGB-D Object Dataset also includes 8 annotated video sequences of natural scenes containing objects from the dataset. The scenes cover common indoor environments, including office workspaces, meeting rooms, and kitchen areas. The objects are visible from different viewpoints and distances and may be partially or completely occluded in some frames.
This work was funded in part by an Intel grant, ONR MURI grants N00014-07-1-0749 and N00014-09-1-1052, and NSF award IIS-0812671.