Feature Detection and Matching (by Xiaotao Chen)

CSE 576 Project 1:

Feature Detection and Matching

Xiaotao Chen
cxt1993 at cs dot washington dot edu

In this project, we need to implement feature detection, description, and matching for images. At the same time, our algorithm has to be invariant to translation, orientation, scale and illumination.

Part 1: Feature Detection

In the first part, we need to detect feature points for an image. The detector I used is Harris Corner Detector. The Harris Detector is defined as following:

where is 5 * 5 gaussian kernel.

I used Sobel filter to calculate the gradient in x and y direction, respectively. Then I use the formula: to calculate the corner strength at each point.

The followings are Harris image from Yosemite and Graf imagesets.

After calculating the Harris image for the original image, I chose around 1000 feature points from the Harris image where the value is above the threshold and is local maximum in 3x3 neighborhood. In this process, I did not set the threshold as fixed. Instead, I set the intial value of the threshold 0.9, and then every time when more feature points are needed, the threshold would decrease dynamically, where the step size is 0.9. For example, if the previous threshold is 0.9, then the updated threshold is 0.81 if it is needed.

The following is a couple of sample image with detected points.

Part 2: Feature Description

In the feature detection part, we can only decide which points are fiducial points. It is not enough for matching features if we only have the position information of the features. We need to come up with some algorithms to describe features, which is invariant to scale, orientation, and illumination.

First of all, I tried a 5x5 square window around the feature point as the descriptor. It is easy to implement, but it is a good descriptor because it is only invariant to translation but remains scale, orientation, and illumination as problems.

The final decriptor is based on Multiscale Oriented PatcheS descriptor (MOPS), the following is how it works.

Calculate x-gradient and y-gradient for the image; Convolve the gradients with 5x5 gaussian kernel;
For every detected point, set five windows around it, the size are 153x153, 109x109, 78x78, 56x56, and 40x40, respectively (the step size is about 1.4);
Calculate the orientation for the point according to its x-gradient and y-gradient, then rotate it to horizontal;
Downsample the five different-size windows around the detected points to five 8x8 windows, put the values in each 8x8 window into a vector. Finally there would be five vectors;
For each vector, calculate its mean and standard deviation, then standardize the data by subtracting the mean and deviding by standard deviation.

For this descriptor,

rotating the window to horizontal accoring to its orientation guarentees that the descriptor is invariant to orientation;
choosing five windows with diffrent sizes and then downsampling them into 8x8 patch guarentees that the descriptor is invariant to scale;
the last step of standardizing the data guarentees that the descriptor is invariant to illumination;
the descriptor is invariant to translation for no reason.

Here are ROC Curves that compares my descriptor, SIFT descriptor, and just 5x5 window

Yosemite ROC Curve:

Graf ROC Curve:

I think the reason why Yosemite ROC Curve looks much better than Graf ROC Curve is that there are only translation transformation between two images of Yosemite Set. However, for Graf Set, there are translation, scale, and rotation.

My Descriptor

Average AUC for Benchmark Sets (using SSD)
- Bikes: 0.8104
- Graf: 0.4975
- Leuven: 0.8312
- Wall: 0.6974
Average AUC for Benchmark Sets (using ratio te)
- Bikes: 0.8613
- Graf: 0.6058
- Leuven: 0.8421
- Wall: 0.8008

5x5 Window

Average AUC for Benchmark Sets (using SSD)
- Bikes: 0.4464
- Graf: NaN
- Leuven: NaN
- Wall: NaN
Average AUC for Benchmark Sets (using ratio te)
- Bikes: 0.587723
- Graf: NaN
- Leuven: NaN
- Wall: NaN

Part 3: Feature Matching

For this part, I used the formula given from lecture, which is finding two best SSD match, then calculate the ratio of these two SSD as the final score. The advantage of this algorithm is that it can avoid ambiguous match because for the ambiguous match, the algorithm will give large values.

Strength and weakness

Strength

The descriptor is invariant to scale, illumination, rotation, and translation
The ratio test is more stable than pure SSD

Weakness

The detector is not invariant to scale
The detector might detect strong edges as corners
The matching process is not efficient enough, could be optimized by implementing kd-tree

My Pictures

Extra Credit

Implement my own feature descriptor that is invariant to scale, orientation, illumination, and translation. (I compared it with SIFT descriptor and 5x5 window in Part 2, and the implementation detail is also in Part 2)