Assignment 1
Assignment Summary
CSE 576: Image Understanding, Spring 2013

 

How is my feature designed:

The window feature is based on the gray color of each of the 25 neighbor pixels, and thus the dscriptor is 25 dimensions long, and each dimension is the gray color of the corrsponding pixel. My own feature is based on Harris corner detector and the descriptor of the Speeded Up Robust Features (SURF). First, I detect all corner points in the original image using harris corner detection formulas. Instead of an absolute threshold value, I use a relative threshold to remove those points that have harris values less than it. To get the relative threshold, I first calculate the global maximum value in the harris image, then set the threshold as 0.04 times the maximum. The ratio 0.04 is flexible, and a good value is to make the resulting point number not too small but also within a resonable range. After getting the points, I use the idea of SURF to compute the feature for each point. The basic idea is first calculating the dominant direction by computing the haar wavelet responses for points within the radius of 6 and pick the largest angle. Then for each of the surrounding 4x4 pixels get the dx, |dx|, dy, |dy| which are relative to the dominant direction and put it in the feature array, so in the end the feature will be 64 dimensions long for each harris point.

 

Why do I design my feature in this way:

The intuition of my feature comes from the SURF feature, which is a very popular feature that is fast and robust on real-time computer vision systems. To get a rotation invariant descriptor, we need the information of the exact same neighbors of each interest point, and to make it contrast invariant, we cannot choose the color in the original image, but a good choice is the normalized value in the difference image, because the normalized difference of each pixel won't change if the image contrast changes. In order to get that, we need to get the dominant direction of each pixel, and draw a surrounding square according to the dominant direction. The dominant direction is obtained by computing the "average" direction of the neighboring points. Specifically, for each interest point, the surrounding area within a radius is divided into 6 regions, each is a 60 degree sector, and in each sector, the "average" direction is calculated, then the maximum direction among the 6 directions is the dominant direction. After that the information in the surrouding square is extracted into a 64-dimension vector, and that is the very descriptor for the interest point. In a word, the descriptor is invariant to translation, rotation and contrast changing.

 

The experiment results:

The two ROCs for both the graf images and the Yosemite images:

ROC

The two threshold curves for both the graf images and the Yosomite images:

threshold

The harris image for the first graf image (img1.ppm) is:

graf_harris_image

The harris image for the first Yosemite image (Yosemite1.jpg) is:

yosemite_harris_image

Window descriptor benchmarks:

1. AUC for bikes with window descriptor + SSD: 0.239683. Average pixel error: 384.714433 pixels.

2. AUC for graf with window descriptor + SSD: 0.503880. Average pixel error: 296.868838 pixels.

3. AUC for leuven with window descriptor + SSD: 0.272655. Average pixel error: 399.100743 pixels.

4. AUC for wall with window descriptor + SSD: 0.311029. Average pixel error: 363.126621 pixels.

5. AUC for bikes with window descriptor + ratio test: 0.504100. Average pixel error: 384.714433 pixels.

6. AUC for graf with window descriptor + ratio test: 0.579534. Average pixel error: 296.868838 pixels.

7. AUC for leuven with window descriptor + ratio test: 0.563801. Average pixel error: 399.100743 pixels.

8. AUC for wall with window descriptor + ratio test: 0.597256. Average pixel error: 363.126621 pixels.

My own descriptor benchmarks:

1. AUC for bikes with my own descriptor + SSD: 0.789941. Average pixel error: 296.187212 pixels.

2. AUC for graf with my own descriptor + SSD: 0.755271. Average pixel error: 242.802867 pixels.

3. AUC for leuven with my own descriptor + SSD: 0.769996. Average pixel error: 232.604831 pixels.

4. AUC for wall with my own descriptor + SSD: 0.867096. Average pixel error: 221.532884 pixels.

5. AUC for bikes with my own descriptor + ratio test: 0.765272. Average pixel error: 296.187212 pixels.

6. AUC for graf with my own descriptor + ratio test: 0.768096. Average pixel error: 242.802867 pixels.

7. AUC for leuven with my own descriptor + ratio test: 0.884735. Average pixel error: 232.604831 pixels.

8. AUC for wall with my own descriptor + ratio test: 0.865170. Average pixel error: 221.532884 pixels.

 

The strongness and weakness of my feature:

The strongness of my feature is the robustness to rotation and contrast. For the wall image directory which contains the same wall with different color contrast, the average can be above 85%, which shows a good performance. For the graf images, the highest AUC can be 0.928302, which is of the matching between img1.ppm and img2.ppm, where the img2.ppm is a little rotation of the img1.ppm.

The weakness of th my feature is it's not scale invariant. Thus the accuracy is not high in the situation where the image is a little bit zoomed in or out. For the graf images, the img4.ppm is actually a rotation + zoomed out of the img1.ppm, so my feature only get 0.753738 as the AUC.

 

My own experiment:

The first experiment is image matching between two croissant images. The result is:

croissant_matching

 

The second experiment is image matching between two pairs of dinner roll images. The result is:

The first pair:

dinner_roll_matching

The second pair:

dinner_roll_matching