Jason Mackay
Computer Vision (CSE 455), Winter 2012
Project 4: Eigenfaces
Objectives
(Note: 'we' in the below is used in the proverbial sense, and all work was performed by the above student.)
In this project, we used PCA to perform face detection in images. The process involves constructing a vector subspace in the space of images such that faces are well represented by linear combinations of the subspace basis vectors. The first step is to input a number of training samples into the algorithm, find the average face, and then compute the delta from the average face to each sample face. These deltas are then used to contruct a covariance matrix for each sample image, and these matrices are summed into an overall covariance matrix A. By finding the largest n eigenvectors of A, which represent the variance of the faces in various directions from most variant to least, we then construct a basis set that represents the images with an arbitrary degree of accuracy (if all of the eigenvectors are used then we can perfectly reconstruct the training set). A new image can then be projected into this space by first subtracting the average face and then taking the dot product of the residual with each basis vector, yielding a set of coefficients which can be used to construct the closest image on the hyperplane to the new image.
We can then perform various feats with this reconstruction. We can, for example, represent each training sample by its coefficients and then given a new image we can take its projection and determine the distance to each training sample in face-space, taking the closest such sample as the "recognized" sample (not unlike a nearest neghbor classifier). Another interesting application is face detection. In this application we take an input image and move a window over it at various locations and scales. For each subimage so defined we can compute its projection in face-space and then "reconstruct" the original image using only a linear combination of the basis vectors. Then we can take the difference between the original and the reconstruction and apply a threshold to decide if it is a face or not. The approach is a bit naive and the accuracy is not terribly good without some additional heuristics (some of which we tried and will discuss below). The overall concept is quite a useful one and is common in statistics, computer vision, and machine learning fields. It was challenging to implement and fun to apply it in a doiman where we get to see what eigenvectors look like.
Challenges
The most challenging part of the project was definitely face detection. The basic algorithm of thresholding distance to the hyperplane ended up detecting lots of garbage. Many additional heuristics were applied in an attempt to improve the accuracy of this part of the project, and the best performing such heuristics were left in the final code. Hopefully it performs well on test data but its clear that building a robust face detector on top of PCA would require quite a bit of time, lots of training data, and plenty of creativity. Other algorithmic aspects of this problem were also challenging, including ensuring that overlapping windows were not returned in the result set, and generally keeping track of the various coordinates and scales involved.
Lessons Learned
As a result of this project we learned a technique for constructing a linear subspace from samples in a vector space. This approach can be used for recognition of previously seen samples and for recognition of new samples, for compression and dimensionality reduction, for feature selection, and for a variety of other applications. It was quite interesting and fun to try it out in the context of face detection as this gave us an opportunity to literally see the results of the process. It was also interesting to see how it worked in practice on unseen data and fun to explore various heuristics to improve its classification performance.
The project involved filling in major functionality in the skeleton of a face recognition program. The program was written using Microsoft Visual C++ and came with a number of important routines already written, including support for reading and writing images, vector operations, image manipulation, and some important mathematical support routines such as the Jacobian algorithm for computing eigensystems.
The follow describes the routines implemented:
Methodology
In this experiment we constructed an eigenspace using the non-smiling students dataset, which consisted of cropped pictures of students from our class in "neutral" expressions. We used the entire set of pictures to create a set of 10 eigenvectors of size 25x25 pixels to represent the subspace. The average face and the 10 eigenvectors are shown below:
Average face using the non-smiling student pictures
The 10 eigenfaces produced from our class non-smiling dataset, 0-4 on the top row, 5-9 on the bottom row.
With these eigenvectors in hand we then computed a "userbase" consisting of the coefficients for each of our input samples in eigenspace and then attempted to classify each of the original persons. Instead of using the original images however, we now used the "smiling" images. These pictures had our students in various facial poses, in an attempt to make things more difficult for our recognition program. To give the reader an idea of how difficult, here is a sample from the "smiling" collection.
In
order to get a sense of how the algorithm performed, we swept the
number of eigenfaces used from 1 to 33, in steps of 2, and computed the
number of correctly recognized faces. The results are given in the
chart below. Apologies for the scaling on the chart, we
could not get Kompozer to scale the image any large in the y direction.
Click on it to see the full size image.
Questions
The input image. | The first guess. | The second guess. | The third guess. |
Methodology
In this experiment we used the same of eigenvectors we used in the previous experiment to automatically detect unknown faces in various images.
In the first part of this experiment we used the program to crop a face from a given image (the "elf" image) automatically. The scale parameters were swept from .45 to .55 at increments of .01. The source image and the final cropped image are shown below.
Questions
Sample Results
eigenfaces --findface me.tga eigenfaces.txt .06 .1 .001 mark 4 me_faces.tga
This is a large image, .originally 1003x1172 pixels. Despite dramatic scaling and sweeping a large range of parameters we were unable to get the only face to be detected. Instead the program selected an area consisting mostly of wall texture. This may be a confusing image for our face detector since there is a lot of featureless wall in the image, and the wall is roughly skin-toned. The high resolution may cause the image to darken significantly when it is scaled down or may cause aliasing or other artifacts to occur that might inhibit the detection process. Since our algorithm includes a heuristic to detect face colors, the wall color here may be getting extra points that cause the algorithm to misdetect it as a face.
eigenfaces --findface india.tga eigenfaces.txt 1.0 1.2 .1 mark 10 india_faces.tga
This was a smaller image, 400x300 in size. Unfortunately the algorithm seemed very distracted by all the trees in the background and did not detect any faces. Its possible that the heuristics added for variance maybe have caused the algorithm to favor the highly variant areas with trees and sky. Other areas that seemed to get preference included the drab section of road highlighted. My face in this picture includes sunglasses which may have confused the algorithm further.
eigenfaces --findface "..\faceimages\group\group_neutral (2).tga" eigenfaces.txt .3 .6 .1 mark 3 group_neutral(2)_faces.tga
This was a medium sized image 600x400 in size. We had the program sweep the scaling from .3 to .6 in increments of .1 and mark the top three faces. We were delighted to see that we got two out of three faces correct. The program apparently thought that the guy on the left's elbow was more face-like than the guy on the right's face. We were not entirely sure why this would be. Our face detector uses reconstruction error, distance from the average face in pixel space, penalizes low-variance windows, and prefers skin tone. The reconstruction error and distance from the average face seem like they should favor the face on the right. The elbow window seems like it should have lower variance in pixel space and so it should be penalized. The skin tone component should be about the same on both, although perhaps this is the culprit since the face on the right has more dark tones in it. Perhaps a more sophisticated skin tone detector would help with this image.
eigenfaces --findface ..\faceimages\group\class_pano_handheld.tga eigenfaces.txt .1 .7 .01 mark 33 pano.tga
This was a very large image with loads of faces and lots of artifacts due to the panorama stiching process. The viewer can click on the lower image to get a full resolution version - its easier to see the boxes in the full resolution image. This image produced a high number of false positives but did manage to find three faces. Looking at the various patches detected as faces I would say that there seems to be too much preference for high variance in our algorithm and not enough emphasis on the closeness to average face. I think that training on a much larger dataset, using more eigenfaces, and preprocessing each subimage for contrast invariance might be helpful.
Methodology
Questions
1. What MSE thresholds did you try? Which one worked best? What search method did you use to find it?
The MSE's used were 0 to 1150000 in steps of 10000. We did an initial lower resolution parameter sweep to determine the range of interesting values. This range of values encompassed all behaviors including getting all the positive right, all the positives wrong, all the negatives right, and all the negatives wrong.We might have used a finer step size for producing our data but we are short on time here.
2. Using the best MSE threshold, what was the false negative rate? What was the false positive rate?
Skin tone recognition
We attempted to improve the performance of the face recognition classifier by adding a skin tone component. To do this we took the "elf" picture, cropped the babie's face and then took a histogram which gave us average red, green, and blue values. We encoded that into a reference vector in our findFace function and then for each window under consideration we computed the average color in the window and penalized the window by the distance of this average color from the skin tone. We experimented with support for multiple skin tones, and even started to implement a PCA version of this but time ran out and we reverted to the simple baby skin tone detector in order to ensure that we were able to automatically crop the baby (for some reason it really liked the man's elbow, hmmm... it likes elbows in general). Our PCA idea is the take a variety of average skin tone vectors and consider these to be "examples" of a linear color subspace, just like we consider faces to be examples in a linear subspace of pixels. We would then construct the average skin color, the matrix A of convariances of skin color residuals, and compute the eigenvectors of our skintone subspace. This would allow us to perform a procedure not unlike the face detection procedure, that would help us to more accurately determine if a window was a skin area. It would probably be a good idea to use a histogram instead an RGB vector for this since that would capture more information about the distribution of colors. Given more time I think this would be a really cool thing to try.