CSE 455: Homework Set 3
Content-Based Image Retrieval

For this assignment, you may work in teams of up to two people. However, there will be more requirements for a two person team.

Materials

Download the set of images.
Download the set of image thumbnails. (The image thumbnails, in jpeg format, should help keep your write-up a manageable size.)

Note that there is no provided code for this assignment. You can start with your k-means code from the previous assignment if you like.

Contents

In this assignment, you will develop a content-based image retrieval system that retrieves database images based on the similarity between their regions and those of a query image.

mountain image segmented by color

another mountain image segmented by color

What To Do

The main idea is to represent each image (the database images and the query image) by a set of regions obtained from color clustering, their attributes, and (for two person teams) their simple spatial relationships. Then, come up with a distance function between two images in this representation, and use this distance function to find the images most similar to a query image.
  1. For each image in the database you will perform the following procedure:

    1. Run your color clustering on it to obtain a labeled cluster image.
    2. Run connected components on it to obtain a labeled segmentation image. Possibly perform some noise cleaning/merging operations to improve the regions. Don't vary any parameters between images.
    3. For each major region (use a size threshold), compute at least the following attributes:

      • size
      • mean color, in RGB or whatever space you like
      • at least the following co-occurrence texture features (use spatial relationship d = (1,1)):
        1. energy
        2. entropy
        3. contrast
      • centroid (row, column)
      • bounding box or other representation of where the region is
      • Two person teams: RAG (region adjacency graph). This should probably be represented as a set of adjacency lists, one for each node (region) in the graph.
    4. Two person teams: For each pair of adjacent regions in the RAG, find and record the following possible relationships: inside, above_adjacency, below_adjacency, left_adjacency, right_adjacency, other_adjacency (if none of the others are satisfied by whatever requirements you impose to define them).
    5. Store the attributes and relationships in a data structure that you define and that can be both used in memory and also saved in a file so you don't have to keep rerunning the analysis. We'll refer to the data structure for image I as DS(I)

  2. Develop a distance measure that will compute the distance RELDIST(I1,I2) between DS(I1) and DS(I2) for any two images I1 and I2. To do this, you need to find a correspondence between the regions of I1 and the regions of I2.

    One person teams: Compute this correspondence greedily. That is, for each region in I1, find its closest match in I2, and so on.
    Two person teams: Compute the optimal correspondence. You can do this with an exponential search procedure, since the number of regions will be small.

    Finding correspondences is in Chapter 11 and will be covered in the February 8 lecture.

    Once you have the correspondence, the distance between I1 and I2 should be some function of:

  3. Create a query system in which you can select a query image Q and compare it to each image I in your database by computing RELDIST(Q,I). Then you order the images in the database according to their distance to the query and return the ordered list (the images and associated distances).

    Extra credit: Create a simple GUI for performing these queries.

  4. Test your system as indicated below.

Sample Two Person Timeline
Sample One Person Timeline

Turnin

Your turnin should describe all aspects of your system, and show the required results. You should also submit all of your code.

  1. Describe your color clustering and region-finding algorithm.
  2. List the region attributes you used.
  3. List the region relationships you used and explain how you determine the relationship between two regions (two person teams).
  4. Describe your distance measure, including the region correspondence algorithm.

  5. Use the provided thumbnail images to show the results for each of the required queries. Also include the distances between the query image and each result image. Print the images in order of ascending distance. Hopefully, the query image will have distance zero to itself, and similar images will have smaller distances than dissimilar ones.

    Example Query Results for boat_2

    boat_2
    d = 0

    boat_4
    d = 0.05

    boat_3
    d = 0.07

    boat_5
    d = 0.07

    beach_3
    d = 0.12

    beach_2
    d = 0.13

    beach_4
    d = 0.15

    crater_2
    d = 0.16

    boat_1
    d = 0.20
    ...
    ...
    ...
    ...
    ...
    ...
    ... ... ... ...
    sunset1_5
    d = 0.98

  6. Write a short discussion of your results.
  7. Include an appendix that contains all of your code.

This assignment is due on Thursday, February 17th, by 11:59 PM.

Also, you will have to show your results in class on Tuesday, February 22nd.