CSE 455: Project 4

The Making of Face Recogition

Author

Alissa Harrison

Skelton code by David Dewey

Project Overview

The program uses eigenfaces to do face recognition. A selection of the eigenfaces used is shown in Appendix E

Experiment 1: Testing recognition with cropped class images

How many faces did the program recognize correctly? Incorrectly?
Of the 22 people matched, 19 were correctly recognized on the first match and 3 were not (see Appendix A, Fig 1).
For instances where the program was wrong, what was the average position of the correct answer in the list of closest matches?
Of the 3 that did not match immediately, they were recognized correctly on average on the 4th match.
If you recognize a female face, will the second, third and fourth matches usually be female? How about for male faces? Give supporting data.
There was a slight correlation on matching based on sex (see Appendix A, Fig 2). Due to the sex ratio imbalance in the class, it's hard to completely verify the degree of correlation. In fact, males tended to match to males on the second match by an overwhelming 92%, but this diminished quickly to 38% on the third and fourth matches. For females, 50% of second order matches were female, 75% of third order matches were female, but only 13% were female on the fourth order matches.
In a real-life scenario with thousands of users would you use the entire user set to compute the face space? Why or why not?
I would only use a fraction of the faces to compute the face space. Ideally, I would pick out the faces that were most clear and resembling the users in lighting and head positions. If I used thousands of faces I run the risk of making a poorly defined face space and hence calculating poor eigenfaces.
Why might it be better to use a face set independent of the user set to compute the eigenfaces?
A face set independent of the user set would diversifying the face space more. For instance if everyone was smiling in the user set and looked very similar in pose but the pictures to match too were very diverse, the result would be poor with such a homogenous face space.

Experiment 2: Recognizing the undergraduate faces

Why is it best to use as few eigenfaces as possible while still getting good results?
Too many eigenfaces will cause the face space to get larger and cause more false positives for faces because the quality of the eigenfaces becomes progressively worse. The subspace of images that are faces should be a clearly defined but neither too narrow or too broad.
Of the students in the class set who are also in the class undergraduate set, how many did the program recognize correctly? Incorrectly?
5 eigenfaces: Of the 18 people matched, 2 were correctly recognized on the first match and the other 16 were not.
10 eigenfaces: Of the 18 people matched, 3 were correctly recognized on the first match and the other 15 were not.
12 eigenfaces: Of the 18 people matched, 4 were correctly recognized on the first match and the other 14 were not.
15 eigenfaces: Of the 18 people matched, 4 were correctly recognized on the first match and the other 14 were not.
20 eigenfaces: Of the 18 people matched, 4 were correctly recognized on the first match and the other 14 were not.
Are the incorrect identifications reasonable? Do they look similar to the actual person? Give some example images.
Appendix B, Fig 1 shows 4 examples of the incorrect matches. The mismatches do not seem to be because another person looked the same, but due to the balance of colors and position of the face. The same faces repeated showed up in the the first, second, and third order matches, no matter if they were similar to the target person or not.
Why does the program perform more poorly in this recognition task than the previous one? Give at least three reasons.
The lighting conditions are considerably different between the two sets of images which causes the reflections and shading on the faces to be different. Also for some people the position of the head is noticeably different. Combined with a different expression, it would be hard even for the human eye to pick out which match is correct.
How did changing the number of eigenfaces used change your results? What number worked best?
There was no consistent difference in the results between the various number of eigenfaces used. Only for a couple people the matching order changed but only in trival ways. Ideally, one would think there is an optimal number of eigenfaces which would define a clear face subspace that is neither too narrow or too broad. However, I believe to the poor results of matching between these two sets, an optimal number simply cannot be found.

Experiment 3: Cropping the undergraduate faces

What min_scale, max_scale, and scale step did you end up using?
I used two different scales, one which runs under 10 seconds for each image and one that takes about a 1 minute for each image but has results more consistent with the sample cropped images. The scale that runs fast is from from 0.1 to 0.2 with a 0.1 step. The slower running scale is from 0.2 to 0.35 with a 0.05 step.
How many of your crop results look correct (cropped to the same part of the face as the pre-cropped images)? How many look incorrect?
All of the are correct in terms of cropping the face region only of the image, however the precise scales they were cropped too are not necessarily the same (see Appendix C, Fig1). The scale that runs slower, tends to match the sample solution better because it allowed for a larger scale. I would say none of the cropped results are incorrect, only trivial differences in scales used.
What is the problem with using a min_scale that is too small?
A minscale that is too small causes two problems: one, it unnecessarily slows the running time and two, it can result the picture becoming too noisy to determine the actual content. It could easily cause faces to be found due to noise and not based on actual face features. For the most efficient results, it's best to use scales where the faces will most likely be found rather than trying to do an exhaustive search.

Experiment 4: Finding faces in a group photo

Show the results of the face detections
See Appendix D.
What min_scale, max_scale, and scale_step did you use for each image?
For the picture of friends I used a scale from 0.8 to 1.2 and a step of 0.05. For the picture of the Gilligan's Island cast, I used a scale from 1.0 to 1.45 with a step of 0.05.
If there are any errors, explain why the program might have failed and how you could improve the input or the algorithm to correct this.
On the picture of friends, there was one face not recognized which was the woman with a very unusual expression. I think it was due to this expression, her face wasn't found. Perhaps including more diversity in expression to create the facespace would improve this. But at the same time, I think it may cause more false matches.
On the picture of the celebrities, I think that the low texture area of the T-shirt threw off the face recognition. I found this problem reoccuring in many pictures, and I think it would be best to find a better function to adjust the MSE than the one I used suggested in the homework. I also think some thresholding might be required to disregard areas that are quite uniform in intensity. As for the woman's face being off, I am not sure what could be improved, it is one of the rare cases I had of a false face overlapping a real face. In most all situations, either the face was dead on, or completely off.

Extra-Credit: Verify Face

What MSE thresholds did you try? Which one worked best? What search method did you use to find it?
I tried 60000, 50000, and 20000 for the MSE thresholds. I found that 20000 was the ideal. I looked at the mse's and estimated the number that would minimized.
Using the best MSE threshold, what was the false negative rate? What was the false positive rate?
There were 0% in false negatives, but 25% false positive rate.
In a real-life verification scenario, why might it be better to have a low false positive rate than a low false negative rate?
In real life, one might want to use face verification for security. In this case, you would not want to get many false positive verification on people or else the system would be easily compromised.