Scanning with Shadows
(CSE 558 Project)

Jiwon Kim

Jia-chi Wu


Overview

The basic idea of this project is to wave a stick in front of a light source to cast a shadow on the object of interest, and figure out its 3D shape by observing the distortion of the shadow. More specifically, first calculate the "shadow plane" defined by the edge of shadow on the ground plane and the light source, and by intersecting this plane with the optical ray that passes through the shadow edge of a particular pixel on the image plane, the 3D coordinate of the corresponding 3D point can be recovered.

This technique was developed by Jean-Yves Bouguet and Pietro Perona at Caltech. More information can be found on Jean-Yves's web page.


Experimental Setup

The following two pictures show the experimental setup used in this project, each corresponding to the two different methods of the technique. The first picture shows the one-plane configuration where the shadow is cast on the object and a single ground plane, and it is necessary to calibrate the light source (which is located outside the top-left corner of the picture). The second picture shows the two-plane setup with one horizontal and one vertical plane, and here the light source need not be calibrated.


One-plane setup


Two-plane setup

The Canon Optura PI camcorder was used to capture the scene. A few different strategies were tested with certain equipment. We experimented with a solid stick as well as a rope held tight by two people at each end to cast shadow. The light source (a regular incandescent light) was tested both as it is and blocked with a paper with a hole in the center to enhance the focus.


Description of the Method

Camera and light calibration

For camera calibration, we used the Camera Calibration Toolbox from Caltech (and a checkerboard).

For light calibration, which is needed for one-plane method, we used the following approach:

  1. Take pictures of a preferably sharp-edged object and its shadow at two different position
  2. Calculate the 3D coordinate of the base (B) and tip (Ts) of the shadow using their image coordinates and extrinsic parameters of the camera, and the tip of the object using its measured height (h)
  3. Calculate the intersection of two lines formed by T and Ts of each image in least-square sense, which gives the position of the light source.

Light calibration with the calibration object at two different positions

Extracting structure from shadow

The following two figures illustrates the geometrical principle of the one-plane and two-plane method.


One-plane method


Two-plane method

This part of the procedure can be summarized into three basic steps:

1. Shadow plane estimation

Compute the "shadow plane" at each frame, formed by the shadow edge on the ground plane and the light source (or the shadow edges on the ground and vertical plane for two-plane method).

This involves finding, for each row on the image specified by the user, the pixel at which the shadow passes, called spatial shadow edge localization by the authors. More specifically, a pair of pixels need to be identified between which the intensity changes from "normal" to "dark". Since another localization takes place in time dimension (in the next step), a universal criterion is used for defining "normal" and "dark" for each pixel:

Imin = min{I(x,y,t)}
Imax = max{I(x,y,t)}
Ishadow = (Imin+Imax)/2

where x, y is the pixel coordinate, and t is the time (or frame). An intensity above Ishadow is normal, and an intensity below it is dark.

Once the edge pixels are collected, a 2D line is fitted with these pixels by least-squares method. The 3D back-projection of this 2D line is then calculated using the equation of the corresponding plane. The equation of the horizontal (or ground) plane is known from camera calibration. The equation of the vertical plane is computed using the horizontal plane and the 2D projection of the line intersecting the two planes, which is calculated from the set of points the user specifies on the image.

Finally, in one-plane method, the shadow plane is defined as the plane containing the shadow edge on the ground plane and the light source. In two-plane method, the shadow edge on the vertical plane is used in place of light position. In this case, the least-squares method is used because the two lines may not be coplanar.

2. Shadow time estimation

Compute the "shadow time" for each pixel on the image, i.e. the frame at which the shadow passes the pixel. This requires analyzing the intensity of each pixel over frames, then pinning down the pair of frames between which the intensity changes from "normal" to "dark". This criterion on intensity is defined as described in previous step.

3. Triangulation

The 3D coordinate of each image pixel can now be obtained by intersecting the optical ray that originates from the camera optical center and passes through the pixel on the image plane, and the shadow plane at the pixel's shadow time.

Strengths and limitations

The biggest advantage of this method is obviously the simplicity of the setup, so that one can perform it using equipment easily found at home.

The possible drawbacks of this approach include:

  1. Since the pattern on the object is created by *blocking* the light rather than projecting it, it is difficult to scan objects with dark colored textures as well as shadowed parts of objects (the second case can be handled by taking multiple scans with different light source locations and merging them).
  2. As a natural byproduct of the simplicity of the method, there's a higher chance of error in the overall process. The possible sources of errors are: the light calibration, where a small error in the standing angle of calibration object, the measurement of height, or the pixel selection may result in a big difference; the vertical plane, which may not be perpendicular to the ground plane; and the sweeping speed of stick, which, if too fast, can bypass too many pixels, making them invalid.

Results

Below are the pictures of the object we used for scanning.

For single-plane triangulation, the user specifies a set of rows used by the program to localize the shadow edge. The triangulation result is shown in the picture to the right below. Noise in the video frames caused the inaccuracy in shadow edge time and our program could not find the correct shadow plane for ray intersection. Therefore there are a lot of "spikes" in the constructed scene.

Increasing the intensity threshold (used to determine which pixels can be used to localize the shadow edge) eliminates the spikes but then less pixels can be used to construct the scene (as shown in the picture below).

For two-plane triangulation, the user specifies a set of points along the intersection line of the horizontal and vertical planes in addition to the set of rows used for shadow edge localization. There are some missing stripes in the constructed scene (picture to the right below). Because we were not able to move the stick at a slow and constant speed, our program could not reconstruct locations where the shadow of the stick passed by too fast and resulted in those stripes.

For our program to work, brightness and high contrast are more important than sharp shadow edge. Dimmed and low contrast scenes recorded by the digital video camera contain much more noise which our program is not good at handling, so lot of spikes will appear in the constructed scene.

We also tried objects with colored textures but the color decrease the contrast and we were not able to reconstruct the object with our program.

Here is a zip file that contains the PLY files (for Scanalyze) kindly processed (dropping unneeded data and transformed) and provided by our instructors.


April 2001