Intelligent Scissors with Alpha Matting

Stavan Parikh and Parag

 

CSE 573 - Winter 2003 

Instructor: Brian Curless

 

 

 

Artifact

 

 

Introduction

 

Intelligent scissors is a method to extract objects from an image accurately by interactively drawing along the object contours while the computer “snaps” the curve to the nearest edge. While this method provides a good way to segment images, it does not allow for fractional mixing of foreground and background colors. This can be achieved via alpha matting, which allows the extraction of a foreground element from a background.  Combining these two techniques, one can first define a nice foreground-background separation using intelligent scissors and then use alpha matting for extracting the desired foreground. This is what we implemented in this project.

 

Intelligent Scissors

 

The technique of Intelligent Scissors is described by Mortensen and Barrett [1]. The idea behind this tool is to allow a use to interactively define the boundary of an object by simple mouse movements.  The way it works is that the user first clicks on a ‘seed point’ which can be any pixel in the image. The program then computes a path called the live wire from the seed point to the mouse cursor that will snap to the edges of the image as closely as possible.

 

The live wire boundary detection scheme is modeled as a two-dimensional graph search problem and uses discrete dynamic programming (DP).  Each pixel in the image is considered a node in a graph and each neighboring pixel is connected by an edge. While the exact details of the algorithm are described below, the idea is that one can compute the minimum cost path from any one pixel to any other pixel in the image such that this path would hug the contours of the image.

 

For this implementation we follow a simplified scheme described on the CSE 455 course page [2].

 

Algorithm


There are two separate processing phases that take place. In the first phase the cost of each link in the image is pre-computed and stored, while in the second phase a minimum spanning tree from the seed point to every other pixel in the image is calculated.

 

Phase 1

 

Every time an image is loaded, the cost of each link in the image is pre-computed and stored. Each pixel in the image has 8 neighbors and thus 8 links (less for boundary pixels). The cost of each link is based on the magnitude of intensity derivative across the link. If this link is across an edge we expect the intensity derivative to be high and we can thus compute its cost to be low so the link is include on the path computed in Phase II.

 

The cost of each link is computed based on the following equations:

 

Where Xlink is the magnitude of the intensity derivative in each color channel (R, G, B) across the link. The computations of the intensity derivatives for the horizontal, diagonal, and vertical links are different and are approximated by applying correlation filters over each pixel. For the sake of brevity these are not reproduced here but are described in [2].

 

Dmax is the maximum link cost over all links in the image. The length of the link is a scaling factor – it is 1 for horizontal and vertical links and sqrt(2) for diagonal links. From the cost equation it follows that higher derivate intensities will lead to lower link costs.

 

Phase II

 

Every time a user selects a seed point, we can compute the shortest path from the seed point to every other pixel in the image. This is done using a variant of Dijkstra’s shortest path algorithm. A minimum spanning tree )MST) from the seed point as the root and every other pixel as leaves is computed and stored. Then as the user moves the mouse over the image, the live wire is the shortest path from the current mouse position to the seed point which is directly extracted from the MST.

 

This approach leads to good results in extracting objects from an image. If the live wire strays from the desired object boundary one can insert new seed points to get the desired results. Examples are shown in the results section.

 

Digital Matting using the Bayesian Technique

 

After we are done with the intelligent scissors part, the next step is to determine the foreground, background and the a value for each pixel in the image. This technique is described in [3]. The idea is to partition the image into three regions – the foreground region with a =1,background region with a = 0 and the unknown region with unknown values of a, F and B. Then, the problem is solved by building foreground and background probability distributions from a given neighborhood in the unknown region. The idea is to construct an oriented Gaussian distribution for a pixel using the known color values in its neighborhood and use the maximum a posteriori  technique to compute the matting parameters in the unknown region.

  

Determining the Unknown Region

 

Given the edge detected by the intelligent scissors part, we need to find the unknown region. We use a sliding window of a fixed size (which depends upon the image being matted), to expand the unknown region around the edge. The rest of the pixels are labeled are foreground/background depending upon whether they lie inside/outside the boundary detected by the scissors part. In the images below white marks the unknown region, black is the background and grey is the foreground.

 

    

    

Estimating the Unknown Parameters:

 

First, the color probability distribution (Gaussian) is built using the known and previously estimated colors within each pixel’s neighborhood. The size of the neighborhood is adaptively increased till we get enough samples (set to 30 in our implementation).

 

Once we have constructed the distribution for foreground and background colors, we set up a set of equations maximize probability P(F,B, a|C). We alternatively refine the values F, B and a values to get the most likely values for these parameters at a given pixel. In the current implementation, we run the iterative procedure for 30 iterations. Instead, we could also set a threshold such that if the refinement in the values being estimated is less than the given threshold, then we stop the iterative refinement.

 

Results

 

 

Improvements

 

Intelligent Scissors

 

We can improve the performance of intelligent scissors if we reused the link cost data across pixels - each link in a pixel can be used at least twice however we currently re-compute the cost each time. We could also implement a snap tool for the initial seed so that the seed would also snap to the nearest edge within a neighborhood. Another option is to allow the user to mark the region within which to search for edges (instead of the whole image) which would help accuracy as well as speed in images with a lot of edges close by.

 

Image Matting

 

Currently, we use a single oriented isotropic Gaussian distribution to estimate the F and B values for a given pixel in the unknown region. Introducing a variable number of clusters is expected to improve the results for the cases where the color distribution does not fit very well a single cluster. Similarly, using the non-isotropic distribution would improve the results for the case where the distribution can not be well estimated a by a uniform variance in all directions.

 

[1] Mortensen, Eric N.; Barrett, William, A; Intelligent Scissors for Image Composition  UPDATE LINK

[2] Computer Vision - CSE 455 - Intelligent Scissors Project

[3] Chuang's Digital Matting Technique.