Patrick Williams

Computer Vision (CSE 455), Winter 2012

Project 4: Eigenfaces

Project Abstract

Objectives

In this project, we compute from a training set of faces the eigenvectors of the space of faces (eigenfaces), and use the ones which characterize the space best to simplify recognizing and finding faces. Using eigenfaces allows us to estimate faces by an abstracted and compact representation which requires less storage and computation, and potentially lets us recognize new faces which fit within our perceived space of faces.

Challenges

Much of the difficulty with this assignment revolved around accurately coding the intended steps in the process. Many errors were a result of doing the right thing backwards, or in cases where we were scaling to ensure we did it correctly when appropriate. The provided skeleton made the heavy computation easy, allowing us to focus on utilizing it for our project.

Lessons Learned

As a result of experimenting with this method of facial recognition and verification, we found that eigenvector decomposition with principle component analysis allows us to examine, in a clearer and more abstract manner, faces in images and understand how to recognize, build, or find them computationally. Understanding these concepts also helps us to see how they could potentially apply to other multi-dimensional classification problems as well.

Implementation

The result in this project is an executable with multiple capabilities for generating and using eigenfaces along with representations of faces built from coefficients of them.

The project is structured as follows:

Experiment: Recognition

Methodology

For this experiment I attempted to recognize faces by first computing a set of eigenfaces along with an average face from a group of photos of nonsmiling users (Shown below for 10 eigenfaces).

Our average face

10 eigenfaces

I then generated a set of user profiles based on the projection of these images onto the space spanned by the eigenfaces. This userbase and set of eigenfaces was then tested on a set of photos of the same users with smiling expressions. By varying the number of eigenfaces and corresponding userbase data, I hoped to determine a number of eigenfaces which worked best to express the features of faces without overfitting the training set.

Results & Discussion

Overall, we find a rather low, but acceptable error rate. With our default choice of 10 eigenfaces we accurately identify users just over two-thirds of the time. In fact, we find that varying the number of eigenfaces doesn't provide much improvement.

Above I've plotted the accuracy of recognition versus the number of eigenfaces used. We naturally see a sharp increase with the first few eigenfaces, but after 5 eigenfaces we see a dropoff in accuracy and no improvement after more eigenfaces except for very little when we reach more than 21 eigenfaces. The dropoff might be explained by overfitting; one of the 6th or 7th eigenfaces might express a feature like the size of the mouth more, which leads to error in the set of smiling faces. Regardless, the overall shape of the plot shows that only a few eigenfaces are necessary to accurately express the space of faces.

I did end up with a number of faces which were regularly misidentified; a couple are shown below with their two closest matches given 10 eigenfaces. We saw that in several of the cases though that the correct match was the second best match. One common factor which might explain the errors is the slight variation in cropping and angles of the faces. We see that especially in cases where dark hair borders the image the error might be boosted a lot when hair pixels are compared to skin pixels. We also see with the last example that the match can be skewed when the orientation of one face matches another user's better as well. The eyes, with or without glasses, also carry a lot of detail that can change depending on the expression, which might explain some cases like the first example.

Experiment: Find Faces

Methodology

In order to recognize faces in an image the program does a simple search through a given image by considering every possible face position over a range of sizes and choosing the closest to the span of the eigenfaces covering the space of faces. For example, we're able with our previous 10 eigenfaces to detect and isolate the face from the image below.

We did this with several images, varying the range and granularity of sizes searched in order to try and find a set that worked well and ran in a reasonable amount of time. Results are discussed below.

Results & Discussion

In general we find that the appropriate scale for the face can vary quite a bit, depending on the resolution of the image and size of the eigenfaces computed. For our eigenfaces of size 25x25 pixels, I found that ranging between scales of 0.4 and 1.0 with a step of 0.1 worked reasonably. In cases with larger faces/resolutions it might be necessary use smaller scales with more granularity. For this image we used the mentioned parameters; the resulting scale chosen was .5

We used the same parameters to identify the faces in this group photo.

And this one, which was unfortunately not as accurate.

Refining the scale range to .6 to .8 with a higher .02 granularity, we were able to detect faces a little better by disregarding the smaller squares.

The errors we're getting seem to be more or less random points in the image without much resemblance (to the human eye) to anything close to a face. The fact that we're still detecting valid faces implies that the false positives are close to the hyperplane of 'faces' spanned by the eigenfaces, despite being far from the actual range of faces. We could possibly improve this by promoting closeness to the average face. Why some faces aren't being recognized nearly as well as these false positives is less obvious, though throwing out candidates based on color might also avoid some of them.

However, a point which is concerning that doesn't seem to be directly addressed in the directions of the assignment is that of the warping of the eigenfaces versus our candidate faces that we're matching to. Specifically, if I'm understanding the code correctly, it seems that we generate a set of square eigenfaces from a set of faces which are all warped to fit the square, but then we search through the image and attempt to match these squared eigenfaces to the unwarped images (which have not-square faces). It seems like we might improve accuracy by finding a way to appropriately stretch the image's height by the average height/width ratio of the set of faces, or something.

Experiment: Verify Faces

Methodology

For this experiment I tested my implementation of a verify face method, which uses a set of eigenfaces and a userbase generated as before verify whether a face matches a given user's. Specifically, given a face we project it onto the space spanned by the eigenfaces (i.e. we essentially choose the closest representation of the face as a linear combination of eigenfaces), and then I compare the coefficients of this projection to the coefficients computed in the same way for the user when we created the userbase. If the coefficients are within some mean squared error threshold, we accept the face as the same. My goal was to vary our MSE threshold in order to determine one which worked well for verifying faces. To test this I generated a 6 eigenfaces and a userbase from the nonsmiling student faces and attempted verification on smiling student faces. We consider both the rate of false positive and false negative errors. Since batch files are rather incapable of doing simple math things, my false positive rate is based off of an average when comparing all faces to one face for three different faces.

Results and Discussion

I tried a range of thresholds from 10000 to 200000. Initially I attempted thresholds ranging from 10k to 100k stepping by 10k, but after seeing remarkably low false positive rates I extended my threshold higher until a noticeable intersection had occurred. The resulting data I gathered is plotted below. Naturally, one error rate tends to increase while the other decreases, so depending on our application we might desire to maintain a lower false positive rate while accepting some false negatives (say, in security). However, the intersection where both error rates were about the same at a relatively low 10% which we will call our 'best' would be at a threshold around 120000.