Feature descriptor

I used a single-scale variant of MOPS for a feature descriptor. For each detected feature, the algorithm applies the following operations to generate a descriptor:

  1. Compute the angle of the pixel based on the convolution of 9x9 horizontal and vertical Sobel filters with the grayscale version of the image.
  2. Acquire a 40x40 subimage of the grayscale image centered at the pixel after it has been blurred using a 7x7 pixel Gaussian filter with σ=2.0, then compute the mean and standard deviation of its values.
  3. Rotate the 40x40 subimage toward "up" relative to the computed angle of the pixel.
  4. Sample 18 values from concentric circles of radius 5, 10, and 20 about the pixel from the subimage. This technique was proposed in a student's project from a previous quarter and produces better data for the descriptor than simply using a grid.
  5. Subtract the mean from each sampled value and then divide by the standard deviation to account for differences in illumination.

Design choices

I chose to implement a variant of MOPS simply out of curiosity. The rest of the techniques fell into place through reading and experimentation; I found that using larger Sobel filters and the Gaussian kernel led to better scores in the benchmarks.

Performance

ROC curves

Here are the ROC curves for the graf and yosemite image sets provided with the project. The Harris threshold was set to 10 (on a semi-arbitrary scale) to generate the curves.

graf ROC plot

The area under the curve (AUC) for the graf image set is a low 0.47568 when using MOPS plus the SSD test, but is much higher, at 0.981271, when using MOPS plus the ratio test.

yosemite ROC plot

The AUCs for the yosemite image set are 0.662229 when using MOPS plus the SSD test and 0.756059 when using MOPS plus the ratio test.

Harris operators

Here are the Harris operator images produced by running feature detection on the graf and yosemite images.

graf Harris operator image

graf Harris operator image

yosemite Harris operator image

yosemite Harris operator image

Average AUC for benchmark sets

The following is a report of the performance of the feature detection and matching on four different benchmark sets.

Benchmark performance - simple 5x5 descriptor using SSD

Benchmark namebikesgrafleuvenwall
img1 AUC0.4573510.5453380.0723790.172003
img2 AUC0.4392480.4756320.0081860.187861
img3 AUC0.4202520.7791500.174284
img4 AUC0.3526680.54432500.460291
img5 AUC0.3411320.33801900.260319
Mean AUC0.40213020.53649280.0161130.2509516
Median AUC0.4202520.54432500.187861

Benchmark performance - simple 5x5 descriptor using ratio test

Benchmark namebikesgrafleuvenwall
img1 AUC0.5037540.6078810.1748690.211139
img2 AUC0.5061990.5526740.0428570.249494
img3 AUC0.5370990.59718500.172718
img4 AUC0.4353060.48403700.225602
img5 AUC0.4145930.09966600.235734
Mean AUC0.47939020.46828860.04354520.2189374
Median AUC0.5037540.55267400.225602

Benchmark performance - simple 5x5 descriptor using SSD

Benchmark namebikesgrafleuvenwall*
img1 AUC0.7754540.475680.5521040.685016
img2 AUC0.78433400.602730.658239
img3 AUC0.7974020.4263720.6281070.604564
img4 AUC0.7749670.1311040.6329510
img5 AUC0.79343100.6281620.329022
Mean AUC0.78511760.20663120.60881080.4553682
Median AUC0.7843340.1311040.6281070.604564

Benchmark performance - MOPS descriptor using ratio test

Benchmark namebikesgrafleuvenwall*
img1 AUC0.719260.9822550.7108410.722628
img2 AUC0.708880.506590.6930350.710747
img3 AUC0.6792030.8098510.6034550.626793
img4 AUC0.6677510.7409790.6153420.886966
img5 AUC0.7661790.8111050.6494440
Mean AUC0.70825460.7701560.65442340.5894268
Median AUC0.708880.8098510.6494440.710747

*I used a Harris threshold of 0 for the benchmark using the wall image set. Some images still resulted in AUC scores of 0, but the scores for the others were significantly better than they were at a threshold of 10.

Strengths and weaknesses

My descriptor is missing the "M" part of MOPS and hence does not perform as well as it could across affine and scale transformations. In comparing the first image to the last image of the wall image set, for instance, the bricks differ significantly in apparent size due to the affine transformation caused by imaging the wall from an angle as opposed to head-on. Even with a large number of descriptors from setting the Harris threshold to a low value of 0, the match algorithm fails to find any true correspondences between features using the ratio test. Overcoming this limitation would be a matter of sampling images at multiple scales to generate the feature descriptor in line with the original idea of MOPS.

The descriptor performs reasonably well for translations and rotations, however, as evidenced by the respectable scores for the other benchmark sets; in particular, the graf image set showcases the ability of the descriptor to work well with rotations using the ratio test, such as between the first and second images where the AUC is 0.98. What is interesting is how poorly the descriptor performs using simple SSD, however, where the graf scores are abysmally low. I theorize that the ratio test does better at identifying meaningful descriptors between images, whereas there are likely many false positives when using only SSD. Using a large (40x40) downsampled image patch in generating the descriptor helps performance on the bikes image set, where a Gaussian blur has been applied to an input image. Subtracting the mean of the image patch and then dividing by the standard deviation also helps to account for differences in lighting between images as well.

Real-world performance

To test the real-world performance of my system, I used an image of a hummingbird that I had taken with the camera on my smartphone. The focus is not that good in the image and there is some distortion due to movement, since I was trying to take a picture quickly.

Down-scaled hummingbird image

For input images, I used a simple cropped subset of the image around the hummingbird and feeder as well as a rotated, brightened image. The red lines in the image below indicate which feature points the system determined were in correspondence with which other feature points.

Hummingbird comparison image

There are quite a few false positives, but the system does a reasonably good job of matching feature points related to the "stem" at the center of the glass receptacle as well as some of the "flowers" around the base.

Extra credit

My feature descriptor did not perform as well as I would have liked, but it is theoretically robust to translations, rotations, and changes in illumination and gives some indication that it is.