Jon and Harlan's
Video Alpha Matting Project

CSE557 Winter 2001: Brian Curless


Motivation

We want to extract a foreground element from a scene and composite it onto a new background. We implemented a Java program to help automate the task. In addition, our program is designed for use on video sequences. Here is a simple example of our work.

Starting with the original picture:

Scissoring (Harlan's)

We can scissor along the edges:

Then assign thicknesses to create the trimap:

Alpha Matte (Jon's)

We then run this through the matting program to obtain a matte:

Then we composite it with a new image in GIMP:


Intelligent Scissoring

The first component of our program is intelligent scissoring, based on Intelligent Scissors for Image Composition by Eric Mortensen, http://www.cs.washington.edu/education/courses/cse490cv/02wi/readings/mort-sigg95.pdf and more directly on the 490cv description, http://www.cs.washington.edu/education/courses/cse490cv/CurrentQtr/projects/project1/project1.htm.
Once we got he basic implementation correct, we tried introducing other link cost metrics in, but without improvement. Adding in the lapacian zero-crossing cost (as mentioned in the paper) and a color difference along link gave surprisingly similar results.

I expected the area indicated with arrow to improve when colors difference was included, but it did not, and the area indicated with the circle actually worsened. So we just left it with the original costs as detailed in the 490cv page.

Alpha Extraction

After all the regions are defined using intelligent scissoring or with the freehand tool, the alpha component can be extracted for each pixel in the unknown region. The Alpha Matte program can be run in two modes. The "speed" mode strips out much of the computations done in Chung’s algorithm described in [1]. As a result, this mode runs quite quickly. Our program takes several parameters. These will be described later. At each pixel, the mean foreground and background color is estimated using neighboring pixels. In the "speed" mode, known foreground and background colors are weighed exactly the same as unknown colors that has been estimated. These unknown colors come from pixels that are in the unknown region but which have their alphas already computed. The algorithm tries increasingly larger windows until a certain number of foreground and background pixels have been found. The initial and maximum window size can be set. The mean foreground and background color, or Fmean and Bmean, is simply the average of all these pixel colors. In the accuracy mode, k-means is used to compute multiple means for foreground and background color. The "errorbias" parameter indicated the number of means preferred. A low bias means that a high number of means, a high bias means a small number of means. Accuracy mode also weighs closer and higher alpha-ed pixels more. The mean color found by this algorithm is then a weighted mean of the neighboring pixels. The covariance of mean colors are found similarly.

After the mean foreground and background pixel is found, an iterative optimization procedure is used to compute the alpha for this pixel. The first step optimizes estimated foreground and background color, or F and B, with respect to alpha. The second step optimizes alpha with respect to the new F and B. The procedure is detailed in Chungs’s paper. This is done for each pair of foreground and background means. The one with the least error is chosen. In speed mode, the first step is removed. The estimated F and B is simply the Fmean and Bmean respectively. The alpha is found using a single iteration of the second step. Alpha, F and B are stored for future use.

The actual color to be saved at a pixel is the original color with the estimated background color subtracted out. This helps to recover the true color of transparent and semi-transparent objects in the scene.

The major problems that I faced in the implementation of the program were that of speed and efficiency. For some reason, the matrix package that I used was horribly slow. I got a 3x improvement to the program by simply replace matrix operations with direct array operations. However, at this point the program still took about a minute for the lighthouse image. Since we wanted to make this work for video, further improvement was necessary.

Comparison with Chung’s Implementation

In accuracy mode, our two implementations are basically the same. The major difference is in the mean color calculation at each pixel. The paper refers to Orchard and Bouman’s paper [2] that describes a way to cluster colors. It used a binary tree representation that expands each leaf node to minimize a weighted error in the color difference. Our program uses an iterative k-means algorithm to extract mean colors. We also talked to Yung-Yu a little bit about his program. It seemed to run relatively quickly considering all the necessary calculations. He mentioned using a very small window of 8 pixels, and also some kind of data structure that finds the nearest foreground and background pixel. I believe that is where the much of the speed difference arises. Here are the comparison images.

 

The original image. Magnified x2. (10768 pixels to process.)

Using our program in speed mode. Time taken: 4 sec.

Using our program in accuracy mode. Time taken: 66 sec.

Chung’s program. Time taken:?? (probably between 4 and 66 sec).


The speed mode results in images that are not as accurate as the accurate mode. However, the speed gained is very worth it. Every one of them is able to extract the rails. I am very impressed with this Chung's program. The resulting alpha mattes are very smooth. However, there are still errors like under the upper left rail.

Alpha Matting for Video

After we got the extraction working, we put a little time into video. Our idea for evolving the boundary is pretty simple. Find the best match for each seed point from one frame to the next by searching a small neighborhood for the best matching region in terms of sum of squared error. Once the locations of all the new seed points are known, the minimum spanning paths between them are calculated on the new image. If the resulting paths are wrong, the user can simply modify them. A friend was in need of foreground extraction, and we did some work on the video available here: http://mis.pier41.net/post/car-mission.avi
A representative input frame:

We then started the processes with intelligent scissoring:
From this, as usual, we can extract with alpha

Then we composite on a new background

Once we got it started, creating the alpha matte for each frame required very little user intervention. A portion of the resulting alpha extraction video is here.
We still need to address speed problems on transitioning between frames. We expect a significant speed increase by only calculating as much of the minimum spanning tree for the image as is absolutely necessary, but there hasnt been time to implement these changes yet.

Links

Our code including all necessary libraries
Our gallery page

Resources

Jama: a Java matrix package
Java Advanced Imaging:APIs for image manipulation

[1] Yung-Yu Chuang, Brian Curless, David Salesin, and Rick Szeliski. A Bayesian Approach to Digital Matting. CVPR 2001, Vol. II, pp.264-271, December 2001.

[2] M. T. Orchard and C. A. Bouman. Color Quantization of Images. IEEE Transactions on Signal Processing, 39(12):2677–2690, December 1991.