Jason Mackay

Computer Vision (CSE 455), Winter 2012

Project 4: Eigenfaces

Project Abstract

Objectives

(Note: 'we' in the below is used in the proverbial sense, and all work was performed by the above student.) 

In this project, we used PCA to perform face detection in images. The process involves constructing a vector subspace in the space of images such that faces are well represented by linear combinations of the subspace basis vectors. The first step is to input a number of training samples into the algorithm, find the average face, and then compute the delta from the average face to each sample face. These deltas are then used to contruct a covariance  matrix for each sample image, and these matrices are summed into an overall covariance matrix A. By finding the largest n eigenvectors of A, which represent the variance of the faces in various directions from most variant to least, we then construct a basis set that represents the images with an arbitrary degree of accuracy (if all of the eigenvectors are used then we can perfectly reconstruct the training set). A new image can then be projected into this space by first subtracting the average face and then taking the dot product of the residual with each basis vector, yielding a set of coefficients which can be used to construct the closest image on the hyperplane to the new image. 

We can then perform various feats with this reconstruction. We can, for example, represent each training sample by its coefficients and then given a new image we can take its projection and determine the distance to each training sample in face-space, taking the closest such sample as the "recognized" sample (not unlike a nearest neghbor classifier).  Another interesting application is face detection. In this application we take an input image and move a window over it at various locations and scales. For each subimage so defined we can compute its projection in face-space and then "reconstruct" the original image using only a linear combination of the basis vectors. Then we can take the difference between the original and the reconstruction and apply a threshold to decide if it is a face or not. The approach is a bit naive and the accuracy is not terribly good without some additional heuristics (some of which we tried and will discuss below). The overall concept is quite a useful one and is common in statistics, computer vision, and machine learning fields. It was challenging to implement and fun to apply it in a doiman where we get to see what eigenvectors look like. 

Challenges

The most challenging part of the project was definitely face detection. The basic algorithm of thresholding distance to the hyperplane ended up detecting lots of garbage. Many additional heuristics were applied in an attempt to improve the accuracy of this part of the project, and the best performing such heuristics were left in the final code. Hopefully it performs well on test data but its clear that building a robust face detector on top of PCA would require quite a bit of time, lots of training data, and plenty of creativity. Other algorithmic aspects of this problem were also challenging, including ensuring that overlapping windows were not returned in the result set, and generally keeping track of the various coordinates and scales involved. 

Lessons Learned

As a result of this project we learned a technique for constructing a linear subspace from samples in a vector space. This approach can be used for recognition of previously seen samples and for recognition of new samples, for compression and dimensionality reduction, for feature selection, and for a variety of other applications. It was quite interesting and fun to try it out in the context of face detection as this gave us an opportunity to literally see the results of the process. It was also interesting to see how it worked in practice on unseen data and fun to explore various heuristics to improve its classification performance. 

Implementation

The project involved filling in major functionality in the skeleton of a face recognition program. The program was written using Microsoft Visual C++ and came with a number of important routines already written, including support for reading and writing images, vector operations, image manipulation, and some important mathematical support routines such as the Jacobian algorithm for computing eigensystems.   

The follow describes the routines implemented:

Experiment: Recognition

Methodology

In this experiment we constructed an eigenspace using the non-smiling students dataset, which consisted of cropped pictures of students from our class in "neutral" expressions. We used the entire set of pictures to create a set of 10 eigenvectors of size 25x25 pixels to represent the subspace. The average face and the 10 eigenvectors are shown below:

Average face using the non-smiling student pictures

Average face using the non-smiling student pictures

eigenface 1eigenface 2eigenface 3eigenface 4
eigenface 5eigenface 6eigenface 7eigenface 8eigenface 9

The 10 eigenfaces produced from our class non-smiling dataset, 0-4 on the top row, 5-9 on the bottom row.

With these eigenvectors in hand we then computed a "userbase" consisting of the coefficients for each of our input samples in eigenspace and then attempted to classify each of the original persons. Instead of using the original images however, we now used the "smiling" images. These pictures had our students in various facial poses, in an attempt to make things more difficult for our recognition program. To give the reader an idea of how difficult, here is a sample from the "smiling" collection.

Jason Mackay, a fool for the camera.

 In order to get a sense of how the algorithm performed, we swept the number of eigenfaces used from 1 to 33, in steps of 2, and computed the number of correctly recognized faces. The results are given in the chart below. Apologies for the scaling on the chart, we could not get Kompozer to scale the image any large in the y direction. Click on it to see the full size image.

Number of recognized faces vs. number of eigenfaces used

Questions

1. Describe the trends you see in your plots. Discuss the tradeoffs; how many eigenfaces should one use? Is there a clear answer?
It appears that there is a sharp increase in accuracy initially, then a period of gradual improvement seasoned with periodic reductions in accuracy. It is expected that the accuracy improve quickly, since the eigenvectors are order by their contribution to the overall variance in the images seen. As we increase the number of eigenvectors used, we are going to have less interesting eigenvectors to choose from. The undulation in the accuracy is probably because the set of eigenvectors with largest eigenvalues does not capture certain subtleties in the data, and sometimes the eigenvectors with smaller eigenvalues are needed in the basis set in order to correct innacuracies introduced by the larger ones. One interestng point is that we might have expected to see 100% accuracy with 33 eigenvectors, since the entire space could be spanned, but this is not the case since these are completely new images. In fact we did try this on the original input images and 100% accuracy was achieved well before 33 eigenvectors. 
2. You likely saw some recognition errors in step 3; show images of a couple. How reasonable were the mistakes? Did the correct answer at least appear highly in the sorted results?
We did see plenty of recognition errors, as explained above. Here is an example:
Input image which yielded an error
The input image.
Output image for a recognition error
The first guess.
Second guess for the input image
The second guess.
Third guess for the input image.
The third guess.
While it seems clear for a human observer that these first three images are not the same people (humans are incredibly sensitive to queues used for identifying other humans) it may be reasonable for the computer to make these mistakes. The skin tone, highlights, and shadows in these images are similar, and the fact that the person on the left is sticking his tongue out makes it look similar to the mouth of the person on the right. The noses and eyes are clearly different, even at this low resolution, but we can see how the algorithm might have made this mistake. The actual person we were looking for was in fact third in the list of images selected by the algorithm as matches for this input image. Based on the metic we used we can see why this might be the case. The last image has a much thinner mouth area, lighter creases around the mouth, and a darker overall skin tone.  

Experiment: Find Faces

Methodology

In this experiment we used the same of eigenvectors we used in the previous experiment to automatically detect unknown faces in various images. 

In the first part of this experiment we used the program to crop a face from a given image (the "elf" image) automatically. The scale parameters were swept from .45 to .55 at increments of .01. The source image and the final cropped image are shown below.

Original "elf" imageCropped "elf" image
The original "elf" image (showing detection box) and the cropped version using our face detection algorithm.
This took quite a bit of tweaking to get right. Initially we used just the basic detection algorithm (distance of the reconstructed image on the face hyperplane to the original image) but this proved to be woefully innacurate. We were getting boxes in all sorts of places except for faces. The algorithm tended to like the top left corner of the bald mans head for some reason. Frequently the windows that came back, even when a number of them were used such as 10, included no faces at all. This was quite frustrating as we scoured our code for bugs. We could not find any. So we set about attempting to improve the algorithm heuristically.

The first thing we did was to implement the suggested hack in the project write-up. Unfortunately the description of this hack could have been intepreted as using distance and variance in face-space or in pixel-space. Since we were frequently seeing boxes on the walls (low variance in pixel space) we elected to implement the scheme using pixel coordinates. So we penalized the distance of the window image from the average face and rewarded windows images with high variance. This resulted in no more walls and our baby's face was now in the top three faces detected but it was not the top face. The top face was a box including the bald mans right shoulder and the babies hand. So in desparation we elected to try color queues. We took the babies face, cropped it, and computed the average color. This was used in our code to provide a distance metric in color space and by incorporating this into our face score we were finally able to get the babies face to crop correctly.

Another interesting point about this picture was that the bald man was never recognized, not even once, with all the varieties of settings and heuristics we tried. It must be the case that his face is too far from the face hyperplane, probably due to his glasses, bald head, and the cocked orientation of the face with respect to the image axes.  


Questions

1. What min_scale, max_scale, and scale step did you use for each image?
Each image required a different seting for these parameters., which was determined by manual exploration and looking at the size of the detected "faces". For the "elf" image, we used the settings prescribed by the assignment and then did tweaking of the algorithm in order to get the crop to work correctly. The other images were played with until the boxes were roughly correct and then a range was specified around that point. The step size was set to give a nice sweep, say ten or so scalings per image
2. Did your attempt to find faces result in any false positives and/or false negatives? Discuss each mistake, and why you think they might have occurred.
See the detailed analysis of results below.

Sample Results


MeMe - faces

eigenfaces --findface me.tga eigenfaces.txt .06 .1 .001 mark 4 me_faces.tga

This is a large image, .originally 1003x1172 pixels. Despite dramatic scaling and sweeping a large range of parameters we were unable to get the only face to be detected. Instead the program selected an area consisting mostly of wall texture. This may be a confusing image for our face detector since there is a lot of featureless wall in the image, and the wall is roughly skin-toned. The high resolution may cause the image to darken significantly when it is scaled down or may cause aliasing or other artifacts to occur that might inhibit the detection process. Since our algorithm includes a heuristic to detect face colors, the wall color here may be getting extra points that cause the algorithm to misdetect it as a face.

IndiaIndia_faces

eigenfaces --findface india.tga eigenfaces.txt 1.0 1.2 .1 mark 10 india_faces.tga

This was a smaller image, 400x300 in size. Unfortunately the algorithm seemed very distracted by all the trees in the background and did not detect any faces. Its possible that the heuristics added for variance maybe have caused the algorithm to favor the highly variant areas with trees and sky. Other areas that seemed to get preference included the drab section of road highlighted. My face in this picture includes sunglasses which may have confused the algorithm further.

GroupNeutral2GroupNeutral2faces

eigenfaces --findface "..\faceimages\group\group_neutral (2).tga" eigenfaces.txt .3 .6 .1 mark 3 group_neutral(2)_faces.tga

This was a medium sized image 600x400 in size. We had the program sweep the scaling from .3 to .6 in increments of .1 and mark the top three faces. We were delighted to see that we got two out of three faces correct. The program apparently thought that the guy on the left's elbow was more face-like than the guy on the right's face. We were not entirely sure why this would be. Our face detector uses reconstruction error, distance from the average face in pixel space, penalizes low-variance windows, and prefers skin tone. The reconstruction error and distance from the average face seem like they should favor the face on the right. The elbow window seems like it should have lower variance in pixel space and so it should be penalized. The skin tone component should be about the same on both, although perhaps this is the culprit since the face on the right has more dark tones in it. Perhaps a more sophisticated skin tone detector would help with this image.  

Pano
Pano_faces

eigenfaces --findface ..\faceimages\group\class_pano_handheld.tga eigenfaces.txt .1 .7 .01 mark 33 pano.tga

This was a very large image with loads of faces and lots of artifacts due to the panorama stiching process. The viewer can click on the lower image to get a full resolution version - its easier to see the boxes in the full resolution image. This image produced a high number of false positives but did manage to find three faces. Looking at the various patches detected as faces I would say that there seems to be too much preference for high variance in our algorithm and not enough emphasis on the closeness to average face. I think that training on a much larger dataset, using more eigenfaces, and preprocessing each subimage for contrast invariance might be helpful.    

Experiment: Verify Faces

Methodology

In this experiment we used the -verifyface feature of our executable to determine if a picture was of a particular individual. First we created an eigensystem with 6 eigenfaces using the "non-smiling" class training dataset. Then we ran a test for each individual using a known positive picture (the individuals corresponding "smiling" photo) and a known negative picture (a randomly selected "smiling" photo of a different individual). We then swept the MaxMSE parameter for the verify face function from 0 to 1150000 in increments of 10000 (we determined the range with an initial low resolution sweep - a finer step might have resulted in a prettier graph but would have taken longer to run) and recorded the true positive and false positive rates for each setting. With this data in hand we create an ROC plot so we could examine the performance visually.

Questions

1. What MSE thresholds did you try? Which one worked best? What search method did you use to find it?

 The MSE's used were 0 to 1150000 in steps of 10000. We did an initial lower resolution parameter sweep to determine the range of interesting values. This range of values encompassed all behaviors including getting all the positive right, all the positives wrong, all the negatives right, and all the negatives wrong.We might have used a finer step size for producing our data but we are short on time here.

2. Using the best MSE threshold, what was the false negative rate? What was the false positive rate?

ROCplot
Using the ROC plot we can choose the point which maximizes the criteria we are interested in. For example, if our primary goal is to recognize known individuals as often as possible and we dont mind a few false positives, we might choose the point at 0.27 false positive rate, and 1.0 true positive rate (i.e 0 false negative rate) (which corresponds to a threshold of 200000) which gets all of our positive examples correct but thinks that 27% of our negative examples are positive examples. On the other hand suppose we were using this to build a security system and so false positives were very, very bad. Then we might choose the point which minimizes false positives first, and maximizes true positives second. That point is at 0 false positive rate, and 0.46 true positive rate (i.e. 0.54 false negative rate) and corresponding to an MSE threshold of 40000. So I would say that the "best" MSE really depends on your application. Apologies for the scaling on the chart, we could not get Kompozer to scale the image any large in the y direction. Click on it to see the full size image.

Bells & Whistles

Skin tone recognition

We attempted to improve the performance of the face recognition classifier by adding a skin tone component. To do this we took the "elf" picture, cropped the babie's face and then took a histogram which gave us average red, green, and blue values. We encoded that into a reference vector in our findFace function and then for each window under consideration we computed the average color in the window and penalized the window by the distance of this average color from the skin tone. We experimented with support for multiple skin tones, and even started to implement a PCA version of this but time ran out and we reverted to the simple baby skin tone detector in order to ensure that we were able to automatically crop the baby (for some reason it really liked the man's elbow, hmmm... it likes elbows in general). Our PCA idea is the take a variety of average skin tone vectors and consider these to be "examples" of a linear color subspace, just like we consider faces to be examples in a linear subspace of pixels. We would then construct the average skin color, the matrix A of convariances of skin color residuals, and compute the eigenvectors of our skintone subspace. This would allow us to perform a procedure not unlike the face detection procedure, that would help us to more accurately determine if a window was a skin area. It would probably be a good idea to use a histogram instead an RGB vector for this since that would capture more information about the distribution of colors. Given more time I think this would be a really cool thing to try.