Project 4 Report: Automatic Tone Mapping By Learning User Preferences

CSE576, Spring 2008

Shulin Yang, Xiaoyu Chen, and Xing Li

June 06, 2008

1 Introduction

In this project, we developed a automatic tone mapping system to produce images in a user-specific manner, by learning user-preferred, tone-mapping parameters for different images. After user training on a limited set of images and a certain extent of learning on-the-fly, our system is able to provide parameter-free tone mapping. Comparing with normal tone-mapping software, our system may save users many manual operations and much time.

Moreover, we implemented HDR construction and compression to generate a HDR image from a set of images with different exposure. Our goal is to provide automatic tone manipulation for both ordinary images and HDR images.

2 Related works

2.1 Tone manipulation

A great deal of work has been done on the tone manipulation problem [1, 2, 3, 4]. Some work has focused on the high dynamic range compression problem for HDR image, about which we will discuss later. To apply tone reproduction to ordinary images, many different tone mapping operators have been proposed over decades. A common way to classify the different approaches is global operators and local operator. Specifically, global operators utilize a curve to map each pixel to a display value, and local operators utilize local information of the image patch around a pixel to adjust its value. Generally speaking, global operators are usually faster, while spatially variant operators are better at preserving local contrasts of an image.

Although algorithms for tone mapping have been developed for a long time [5] and some of the operators are called as “automatic”, most of them require tweaking parameters for a better result for a particular input image. Parameter tweaking has been a problem for all tone mapping algorithms, and research on adjustment of tonal values has been going on all the time. Some operators have been extended to dynamic and interactive setting [4].   

2.2 HDR compression

The goal of HDR compression is to spatially vary the mapping from scene luminance to display luminance while preserving local contrasts. There are multiple ways to compress a HDR image [6, 7]. One common method is to decompose a HDR image into multi-scale detail layers and a base layer. By reducing the contrast at the base layer, while boosting the fine scale detail layers, a reduced dynamic range image with well preserved details is then reconstructed by combining the compressed base layer and boosted detail layers. To perform the multi-scale decompositions, there are a bunch of methods based on different smoothing filters, such as bilateral filter, anisotropic diffusion, and weighted least squares. All these filters are local non-linear edge-preserving filters. The problem with global linear smoothing filters, such as the Laplacian pyramid, is the halo artifacts produced near edges. This problem may also exist for some non-linear filters and is still an open research area.

Apart from the described multi-scale decomposition based approach, there is another gradient domain HDR compression method [8]. As we can observe that any drastic change in the luminance across a high dynamic range image must give rise to large magnitude luminance gradients at some scales. Fine details, such as texture, on the other hand, correspond to gradients of much smaller magnitude. Based on this observation, HDR compression can be achieved by identifying large gradients at various scales and attenuating their magnitudes while keeping their direction unaltered. The attenuation must be progressive, penalizing larger gradients more heavily than smaller ones, thus compressing drastic luminance changes, while preserving fine details. A reduced dynamic range image is then reconstructed from the attenuated gradient field.  

2.3 GIST

Research in scene understanding has traditionally treated objects as the atoms of recognition. However, behavior experiments on fast scene perception suggest that we do not need to perceive objects in the scene to identify its semantic category. The spatial layout is more important for scene recognition [9]. Literatures in this area have proposed different approaches to represent the gist of a scene, for example, some methods are based on the analysis of texture [10]. Specifically, there is a proposed computational model [11] of the scene recognition that bypasses the problem of segmentation and the processing of individual objects or scenes. The procedure is based on a very low dimensional representation of the scene, called spatial envelope. Basically, the dominant spatial structure of a scene is represented by a set of perceptual dimensions, including naturalness, roughness, openness, expansion, and ruggedness. These dimensions may be reliably estimated using spectral and coarsely localized information. Based on these perceptual dimensions, a multidimensional space is constructed in the work of the spatial envelope descriptor. In the multidimensional space, similar scenes, such as streets and highway are projected closed together.

3 Approaches

We applied / developed several tone-mapping techniques, including detail manipulation, intensity manipulation, and color manipulation. Based on six intuitive parameters, our system provides simple but effective ways to perform tone mapping for an image. In parallel, we implemented HDR construction and compression to generate a HDR image.

3.1 System design

Our system first learns a user’s preferences on tone-mapping parameters. An initial user database is set up by training the user on a set of images. After that, a new image loaded by the user will be automatically tone mapped with parameters that are learned from the user database. The user can choose to manually change parameters and re-process the image. Finally, the final tone-mapping parameters along with the features of the input image can be stored to extend the user database, which is called online learning prefrences in our system. The details of our system design is shown in the following diagram.

3.2 Tone mapping techniques

Three aspects of tone mapping techniques are used in our system, detail manipulation, intensity manipulation, and color manipulation.

3.2.1 Detail manipulation

In some images, detail information may be either too little or too much. In our tone mapping method, we used an edge-preserving operator based on the WLS (weighted least squares) optimization framework to find out an edge layer at a certain scale, and then increase/decrease edge information by attenuating or exaggerating this edge layer.

We choose the WLS-based operator because it was demonstrated to be able to extract detail at arbitrary scales and effectively avoid the halo artifacts [4]. Moreover, the WLS-based operator is robust and particularly well-suited for progressive coarsening of images, and for detail extraction at various spatial scales.

We use WLS to extract an edge layer E from the original image I: E = I - wls(I). Then we can exaggerate or attenuate the edge layer: I’ = I + E*t. When t>0, the edge information in the image is increased, and the image will appear to contain more details; when t<0, edge information is decreased and the image will appear smooth. The results showing detail manipulation will be shown in Section “Experiments”.

3.2.2 Intensity manipulation

A great part of the problems of an image come from its lighting condition, for example, too bright or too dark. Though part of the problems will cause losing information of a scene and thus cannot be revised by simply modifying the image, adjusting intensity of an image can still be effective in changing how good it looks. Therefore, we provide two kinds of intensity operations, intensity shift and intensity exaggeration. For both of these operations, we use a function to form a mapping from the original intensity value of a pixel to a new intensity value. The mapping function determines how the intensity values of an image is adjusted.

For intensity shift, we use a convex function to modify intensity value. The curvature of the function determines the extent to which intensity value will be modified. The following graphs are examples of two ways of intensity shift: The left one will lighten up the whole image since it is a concave function, and the right one will darken the whole image since it is a convex function.

for intensity contrast, we use functions with an “S” shape to modify the intensity. When the function has a positive “S” shape (as in the right graph), it will spread out the intensity values of an image from the center and enlarge their difference. When the function has a opposite “S” shape (as in the left graph), it will converge the intensity of an image to the center.

3.2.3 Color manipulation

Adjusting colors of an image can play an important role in making it look better. Therefore, similar to intensity, the values of different color channels of an image can be shifted or exaggerated to make an image look better.

Color shift enables a user to shift the color of an image along the R/G/B channel. Specifically, R shift increases the value of R channel of an image while reducing value of the other two color channels. Similarly, G shift increases the G channel of an image while reducing channel R and B, and B shift increases the B channel of an image while reducing channel R and G. The formulas for R shift are:

r’ = r (1+t)

g’ = g (1-t’)

b’ = b (1-t’)

In the above formulas, r, g, and b are the original values of the three channels, and r’, g’, and b’ are the value after R shift. t is a parameter representing the extent to which the value of R channel should be increased. We define t’ = k_r*r*t/(k_g*g+k_b*b ), so that the intensity value of a pixel will not be changed.

Color exaggeration enables a user to enlarge the contrast of different color channels. Specifically, the new values for the color channels r’, g’, b’ are calculated as follows:

r’ = i + (r-i) t

g’ = i + (g-i) t

b’ = i + (b-i) t

where t is a parameter representing the extent to which all color channels should be exaggerated, and i is intensity value of a pixel. The results of color shift and color exaggeration will also be shown in Section “Experiments”.

3.3 Image similarity

We measure the similarity of two images from two aspects: 1)the overall color and intensity of an image; 2)the scene content of the image. We use intensity histogram and color histogram to measure image similarity in terms of overall intensity and color their similarity, and as to scene content of the images, we use a Gist feature descriptor to represent the images and measure image similarity using the Gist features. The metrics Pearson correlation coefficient is used for calculating distances of feature vectors for histogram and Gist separately, and then we used weighted sum of the two aspects of distance measure as final output of similarity of two images.

3.4 HDR construction and compression

3.4.1 HDR construction

High dynamic range radiance maps of real scene can be constructed from a few photographs of the scene with different exposure. The specific method we use is described in [12]. The following image acquisition pipeline shows how scene radiance becomes pixel values. Unknown nonlinear mapping can occur during exposure, development, scanning, digitization, and remapping. The proposed algorithm determines the aggregate mapping from scene radiance L to pixel values Z from a set of differently exposed image.

 

After the response function of the imaging process has been recovered, the algorithm can fuse the multiple photographs into a single, high dynamic range radiance map whose pixel values are proportional to the true radiance values in the scene.

3.4.2 HDR compression

In our project, we have implemented the HDR compression by using the WLS-based, multi-scale decompositions. Specifically, we apply 4-level decompositions, one coarse base layer and three detail layers of the log-luminance channel, multiply each level by a scaling factor, and reconstruct a new log-luminance channel. There are two reasons for working in the log domain: 1) the logarithm of the luminance is a crude approximation to the perceived brightness; 2)Gradients in the log domain correspond to ratios (local contrasts) in the luminance domain. The processing block diagram[7] is displayed as followed.  

Our goal is to generate a rather flat image with exaggerated local contrasts. This was achieved by compressing the base layer, and boosting the fine scale detail layers. Specifically, the scaling factor needs to be set smaller than other scaling factors, whereas the scaling factor for the finest scale detail layer needs to be the largest. Usually is set to be 1 and is set to be around 0.2. Consequently, and are chosen to be some middle numbers in between and As for the scaling factor for the color channel , it is suggested to be set as equal to .

3.5 GUI design

We designed a GUI to integrate most of the functions of our system and to facilitate user interaction with our system. The training interface shown below is used to train a user with pre-processed images. Given a training image A, we pre-process it with four different settings of tone-mapping parameters. After a user select his favorite processed image B, the tone-mapping parameters used to generate image B coupled with the features of image A will be stored into the user’s database.

The functions of the training interface include:

The main interface shown below allows a user to process a new image. After a new image is loaded, the system will search in the user’s database to find the most similar image by comparing the features of the new image and the features of each image in the database. Then, the system will extract the parameters used to process the most similar image and apply them to process the new image. The user is allowed to adjust the automatically extracted parameters and to re-process the image. The user can also save the new image and its tone-mapping parameters to the database.

The functions of the main interface include:

4 Experiments

4.1 Applying individual tone manipulation techniques

The followed images show the results of detail manipulation. The left one is the original image. The one in the middle is the result with image details attenuated, while the right one is the result with image details exaggerated. We can observe that results of both increasing and decreasing details look pretty natural.

The followed images show the results of color manipulation. The left one is the original image. The one in the middle is the result of doing R-shift for the original image, while the right one is the result after exaggerating the image color. With image intensity remaining the same, both color shifting and color exaggeration produce very interesting and good looking output.

4.2 System evaluation

We selected five pairs of images (called original pairs) with different scenes and colors, including cottage, flower, iceberg, sea, and waterfall. Separating each pair into two groups, we got two sets of images to evaluate our system: one set was used as training images, while the other set was used as test images. After user training, the features of the five training images and the tone-mapping parameters used to process them were stored into the user database (details below). For each test image, the system first identified the most similar image in the database. (Ideally, the most similar image of a test image should be its partner in the original pair.) Then, the parameters applied to the most similar image were used to process the test image. If the system works well, we will be able to see that the test image was processed in the same way as its partner in the original pair.

4.2.1 User training

For each user, our system generates a database to record the features of a set of images and the tone-mapping parameters used to process the images in a user-specific manner. For simplicity, we assume that there is only one user in the followed experiments.

We show below the training interface for each training image, which contains four pre-processed images of different styles. The selected radio-button (the one with red dot) indicates which processed image the user preferred, and the corresponding parameters were stored in the database along with the features of the training image.

waterfall_train cottage_train

flower_train iceberg_train

sea_train

4.2.2 Learning the most similar image

As described in Section “Image similarity”, we used the weighted sum of two Pearson correlation coefficients, which measures image distance in terms of scene content as well as color and intensity. The followed table contains the similarity of each test image to each training image. We can observe that every test image got the highest similarity to its partner in the original pair (shown in italic). This indicates that our metric is effective in measuring image similarity. It leads to the success of our system in identifying the most similar image by finding the nearest neighbor.

Image Cottage_train Flower_train Iceberg_train Sea_train Waterfall_train
Cottage_test 0.442 0.375 0.263 0.331 0.309
Flower_test 0.418 0.561 0.044 -0.087 0.264
Iceberg_test 0.326 -0.062 0.558 0.364 0.149
Sea_test 0.135 -0.243 0.134 0.365 -0.244
Waterfall_test 0.383 0.354 0.161 0.172 0.681

4.2.3 Results for the test images

After learning the most similar image for a test image, the tone-mapping parameters associated with the most similar image were applied to the test image. We show below the results of automatic tone mapping for each test image. To compare, we also show the image in the database that was identified as the most similar image to the test image. (Top left: test image; Top right: automatically tone-mapped image; Bottom left: the most similar image identified from the database; Bottom right: the pre-processed image that the user selected during training.)

Waterfall

Cottage

Flower

Iceberg

Sea

4.3 Online learning

We used the system to process one of our own images (shown below on the left). The system performed automatic tone-mapping based on the database of five images (described in Section “System evaluation”), and the result image is shown below in the middle. The result looked reddish and lack of details. So we tried to remove a little bit red and add more details. The processed image is shown below on the right. It looks much better than the original one, and we saved the parameters used to generate it into the database along with the features of the original image. In total, we had six images in the database.

We then processed in our system two more images taken in the similar place. The results after automatic tone-mapping are shown below.

The results suggests that our system identified the last image saved to the database as the most similar image to the two new images, and that the parameters we just saved to the database were applied to the new images. This illustrates the value of the online learning function of our system: Although starting with a small database, our system has the power to enrich the database in a user-specific manner while it is used.

4.4 Results on more images

To generate results on more images, we first expanded our database to contain ten images by performing online learning. The added four more images are followed. (Left: original image; Right: processed image.)

We then applied our system to some randomly picked images. Some results are interesting as shown below:

However, the results for some images are not very good:

Since the original photo is taken under a very dark condition, most detail information is lost in the original image. As a result, it is hard to recovery the information simply by adjusting its lighting and color globally.

For some images, even if both their spatial features and color/intensity features are very similar, they can still have very different image style. In such cases, unnatural results can still be generated by applying the same parameters to those images.

Our similarity metric is based on the spatial features and the color/intensity features of an image, and no object oriented measure is involved. For images that focus on the content of a specific object in the scene, measuring similarity based on Gist features may lead to failures in finding the most similar image.

4.5 Results on HDR

4.5.1 HDR reconstruction

The following photographs are taken with increasing exposure time: 1/750s, 1/180sand 1/45s.

In order to recover the high dynamic range radiance map, first we need to combine them together to estimate the response function for each color channel. By applying the construction algorithm described before, we get the estimated response functions as follow. Here, the polynomial fitting has been employed to suppress noise affection. The recovered response curve corresponds to the solid line in each picture.

After we get the response function, we can fuse the multiple photographs into a single, high dynamic range radiance map. The dynamic range for the reconstructed radiance is about 3000:1. If we map the high dynamic range image linearly to the display range of 0~255 without any tone manipulation, we get a really ugly picture as follow.

 

4.5.2 HDR compression

After we constructed the HDR radiance map, we will apply the WLS-based decomposition to the HDR image. The scaling factors for each level are set as 1.0, 0.8, 0.4 and 0.16. The compressed image is displayed as follow. As we can see, the details are well preserved. But the color does not look pleasant. The reason for this problem is two fold: 1) when reconstructing the imaging system response curve for each color channel, the scaling factors relating relative radiance to absolute radiance for each channel are unknown. As a result, the color balance of the radiance map may be changed. 2) During the multi-scale decomposition, the scaling factors for the color channel may not be chosen properly.

In this experiment, we found that it is very important to choose the proper smoothness coefficients for the WLS formulation. Otherwise, some details from certain scales may be lost during the decomposition. Another important problem is about how to set the scaling factor for each decomposition level. Different choices will affect the whole dynamic range and local contrasts of the resulting image.

4.5.3 Apply automatic tone mapping to HDR

In order to refine the resulted low dynamic range image, we plug it into the our system to perform automatic tone mapping. Then we get a lovely HDR image as follow. This example has indicated the effectiveness of our system.

There is another example. The exposure time for each photograph is as follow: 1/500s,1/125s, 1/30, 1/8s, 1/2s. All the results are obtained in the same manner as described above.

The color of the compressed image also seems like been washed out.

By refining it with our automatic tone manipulation system, the HDR image looks pretty nice.

 

5 Conclusion & future work

5.1 Conclusion

In summary, we designed a system for automatic image tone mapping. The basic idea is to apply previous tone mapping parameters to new images that has similar color/intensity histogram and spatial features. At the beginning, when the database contains only a small set of images, on-line learning is important, which makes use of the same user’s earlier parameter setting for later images. When the user database is enriched, the system will be more capable of producing images without user adjusting parameters. The more the system is used, the more powerful it will be.

5.2 Future work

Firstly, we can extend the system to provide more options for color manipulation, rather than merely color shift and exaggeration for R/G/B channels. Secondly, we want to use alternative image similarity metric to combine both spatial information and tone information, rather than merely taking the weighted sum of them. Thirdly, another change we can make is to generate multiple automatic tone-mapping output images for a new, instead of only one output. Finally, we want to expand the database to include a large number of pre-processed images.

References

[1] Erik Reinhard, etc. Photographic tone reproduction for digital images.

[2] Kresimir Matkovic, etc. A survey of tone mapping techniques.

[3] Dani Lischinski, etc. Interactive local adjustment of tonal values.

[4] Zeev Farbman, etc. Edge-preserving decompositions for multi-scale tone anddetail manipulation.

[5] Reinhard, E., etc. High dynamic range imaging.

[6] Fredo Durand, etc. Fast bilateral filtering for the display of high dynamicrange images.

[7] Jack Tumblin, etc.LCIS: a boundary hierarchy for detail-preserving contrast reduction.

[8] Raanan Fattal,etc. Gradient domain high dynamic range compression.

[9] Aude Oliva,etc. Building the gist of a scene: the role of global image features inrecognition.

[10] Laura Walder, etc. When is scene identificationjust texture recognition?

[11] Aude Oliva, etc. Modeling the shape of thescene: a holistic representation of the spatial envelope.

[12] Paul E., etc. Recovering high dynamic rangeradiance maps from photographs.