Exploring the Mathematics of Images with the XFORM Programming Language

(A User's Guide) -- THIS DOCUMENT IS UNDER CONSTRUCTION

by Steve Tanimoto

8 January 1996

Mathematics Experiences Through Image Processing

Dept of Computer Science and Engineering, University of Washington, Seattle, WA 98195

1. Introduction

What is XFORM?

There are lots of programs available nowadays that provide image processing capabilities -- photo enhancement, touch-up, and darkroom effects. All of these programs use digital image processing technology, which is based on mathematical operations. Most of these programs prevent the user from using or even seeing the mathematical representations of the data inside the program. This results in a separation of the user from an accurate understanding of the principles involved in image processing, and it also limits what the users can do. XFORM is a programming language that works together with a visual interface to permit the user to add an extra degree of understanding and control to the image processing experience.

XFORM works on IBM PC/486 and Pentium systems running Windows 3.1, Windows 95, or Windows NT. It will also work on 386 systems, but then should be restricted to small images and simple operations to prevent long waits for the image processing to be done.

The XFORM Language and Lisp

The XFORM language is embedded in (and includes) Lisp, a mature programming language commonly used for rapid prototyping, symbolic computation, scripting languages, and artificial intelligence. The user of XFORM may use as much or as little of LISP as desired. There are three components to the XFORM language:

1. The Transformation language -- a small subset of Lisp. Here are the technical specifications for what is valid in an XFORM transformation expression.

2. The METIP library -- a collection of functions that the user program calls, which manipulate the user interface, perform image processing operations, and access images in bitmap format from the hard disk. Here is a Lisp file that contains descriptions of these functions.

3. Common Lisp -- the language and functions of Lisp itself, as implemented by Betz, Tierney and others in the XLISP-STAT for Windows release from the University of Minnesota. (Here is an Introduction to XLispStat The Tech Rept for XLISP-STAT is also available.

The most important part of XFORM for most users is the transformation language. This is a little language in which arithmetic expressions are passed to the "transformation engine" to produce new images from old. Supported are operations for transformations in both cartesian and polar modes.

2. Getting Started

After downloading and installing the XFORM system (also known as the METIP Transformation Programmer), start up a session by double-clicking on XLISP-STAT & METIP Prog Envt.

When the XLISP-STAT prompt appears, enter the following line of code, followed by a carriage return.

(xform-startup 2 "mona.bmp")

You should see some flashing of new windows followed by the loading of the Mona Lisa image into two new windows on the screen.

To get back to the XLISP-STAT prompt, hold the Alt key and press the Tab key.

Now perform an operation on the Mona Lisa image by typing,

(xform-vertical-flip)

To see the result, go back to the image screen by holding the Alt key and typing that Tab key. (this is a way to get back and forth between different programs running in Windows. The XFORM system is actually implemented as two separate programs -- one for XLISP-STAT and one for the image processing subsystem.)

Another transformation you can try is a reflection of Mona Lisa around the line y = x.

(xform-x-y-flip)

Here is something a little more strange -- a fisheye lens transformation:

(xform-fisheye)

Now that you have tried some "canned" transformations that have already been defined so that you can call them by name, try one that is not already pre-programmed.

(xform '(source1 (mod (* 2 x) x-max) y))

Here you must be careful to type each parenthesis exactly as shown. Also, the apostrophe, called the "quote" character in XFORM and in Lisp, must be typed correctly for this to work.

Don't put any spaces between the characters of names such as "x-max". The hyphen is part of the name.

What did this transform do? It causes a new image to be determined by computing, at each (x,y) position, a value for the new image's pixel. The formula says it should first double the x coordinate and take that value modulo the value of x-max (the value of the highest x-coordinate in the Mona Lisa image). Let's call this resulting value xnew. It should then look up the pixel at (xnew, y) in the Mona Lisa image (i.e., the source image SOURCE1). That is the pixel value it should plot at (x,y). If all went correctly when you typed in this formula, you should see in the new image window (the "destination" window) a pair of squashed Mona Lisas next to each other.

If you got an error because of typing in something incorrectly, the simplest thing to do is to exit XLISP-STAT (automatically exiting the Image processing subsystem, too), and restart it. There are more graceful ways to recover from various Lisp and XFORM errors, but there is not time to explain them now.

Exiting from XFORM.

The best way to exit from the XFORM environment (XLISP-STAT and the Image processing interface), assuming you have saved any images and program dialogs that you don't want to go away, is to get back to XLISP-STAT, go to the file menu, and selected "Exit". This will close down the Image processing interface and then close down XLISP-STAT all in one operation.

3. Formula-Based Image Transformations with XFORM

Arithmetic operations on pixel values.

Let us consider the following simple transformation, which creates, in the destination window, an image whose pixel values are twice those of the corresponding source image ones.

(xform '(* 2 (source1 x y)))

This tells XFORM to transform the image at SOURCE1 by computing at each destination pixel (x,y), two times the value of the source1 pixel at (x,y). The subexpression (SOURCE1 X Y) accesses the pixel of the source1 image at position (X, Y). An expression of the form (* A B) causes the values of A and B to be multiplied.

Regarding notation, lower-case inputs are automatically converted to UPPER CASE by XLISP-STAT, except within strings. This document uses the following convention: displayed examples of input are given in Courier font lower case. Corresponding output is given in Courier font UPPER-CASE. Both inputs and outputs given within paragraphs of text are shown in Times New Roman UPPER CASE.

Arithmetic operations on coordinate values.

Suppose we wish to scale an image down. We can do this by multiplying the coordinates of the source pixel by a number greater than 1, thus mapping a larger range of the source image into the destination image, effectively shrinking the image. Here is an example.

(xform '(source1 (* x 3) (* y 3)))

We don't have to use integers in these formulas. We can use floating point numbers like 1.5 as follows.

(xform '(source1 (* x 1.5) (* y 1.5)))

Using the conditional form ("IF").

The value of a destination pixel can depend on a logical condition. An example of this is shown by the following, which makes all pixels to the right of the line x=50 turn white.

(xform '(if (> x 50) 255 (source1 x y)))

Here, the IF expression has the form (IF E1 E2 E3) where the first expression (represented by E1) gives a condition, the second expression (represented by E2) gives the value to return if the condition is true, and the third expression (represented by E3) gives the value to return if the condition is false. In the example above, if the X coordinate of the destination pixel is greater than 50, then the value for the destination pixel is 255 (white); otherwise it is the value of the source pixel at (X, Y), which means the image is unchanged to the left of the line x=50.

X-MAX and Y-MAX

The coordinates of pixels in the image being computed (the destination image) range between 0 and an upper limit. The upper limits for each of X and Y depend upon the size of the bitmap already loaded in the destination window. In a formula, these limits can be referenced using the symbols X-MAX and Y-MAX. The canned transformation XFORM-VERTICAL-FLIP is actually implemented with the following transformation formula.

(xform '(source1 x (- y-max y)))

This means that at each destination pixel (x, y), the y value is subtracted from the maximum y value of the image, thus complementing the y value (reflecting it to the bottom or top of the image). Then the source pixel value at that new height is read and assigned to (x, y) in the destination image.

Using polar coordinates.

Although many interesting transformations can be expressed easily with Cartesian coordinates (x and y), it is often more convenient to use polar coordinates (rho and theta). Here is a transformation that complements the rho values and thus turns an image "inside out".

(use-polar)

(xform '(source1-polar (- rho-max rho) theta))

Note that we use the command (USE-POLAR) to put the transformation engine into polar-coordinates mode. Then we reference the source image using the function SOURCE1-POLAR followed by a radius (from the center of the image) and an angle (in radians).

Resampling transformations in cartesian coordinates.

We showed earlier how to shrink an image by multiplying the source coordinates by a number larger than 1 We can, of course, also expand or blow up and image by using factors less than 1. These operations, implemented in this fashion, are sometimes called "resampling."

(xform '(source1 (* x 0.5) (* y 0.5)))

Resolution reduction, periodic replication.

We can also make an image become "blocky". We essentially ignore some of the pixel data of the source image, but we reuse the pixels we do take, so that the destination image has the same size as the source.

(xform '(source1 (- x (mod x 8)) (- y (mod y 8))))

Here, the MOD function gives the remainder after dividing X or Y by 8.

Resampling transformations in polar coordinates.

The resampling operations can also be applied using polar coordinates. Here is an example in which we expand an image by oversampling the radius coordinate.

(use-polar)

(xform '(source1-polar (* 0.5 rho) theta))

However, if we do the same thing with the angle coordinate (theta) we don't obtain a very natural-looking image.

(xform '(source1-polar rho (* 0.5 theta)))

(use-cartesian)

Resolution reduction, periodic replication.

By combining the use of shrinking an image (via undersampling) and the use of modular arithmetic on the cartesian coordinates, we can get many small copies of the source image. Here's a transformation that gives us a dozen tiny versions of the source image in a 3 by 4 array.

(xform '(source1 (mod (* x 3) x-max) (mod (* y 4) y-max)))

Combining multiple images.

Until now we have been working with only one source image, referenced using the function SOURCE1 or the function SOURCE1-POLAR. The transformation engine can use either one or two source images at a time. In order to use a second source image, we should already have a window containing that image on the screen. For example, the function call (METIP-LOAD-FILE 2 "mona.bmp") can be used to load the Mona Lisa image into Window 2. One could also use an image whose value has been computed using other formulas.

To add two images together, where the first source is in window 1, the second source is in window 2, and the destination is in window 3, we can use the following sequence of statements.

(set-source1 1)

(set-source2 2)

(set-destination 3)

(xform '(+ (source1 x y) (source2 x y))

This specifies that each destination pixel (x, y) gets computed as the sum of the source1 pixel at (x, y) plus the source1 pixel at (x, y).

5. Activities and Simple Projects for Transformations.

Make up a transformation that rotates the Mona Lisa 180 degrees (pi radians), (a) using Cartesian coordinates only, and (b) using polar coordinates only.

Make up a transformation, using the IF conditional form, that leaves pixels unchanged if they are above 128 but which subtracts them from 255 if they are less than or equal to 128.

Create block versions of the Mona Lisa with different block sizes. (The 8 by 8 block size example is given in the section on resolution reduction, above). Display each image full-screen, using the Image processing interface window controls and magnifying tool. How far away to you have to get from the screen to perceive and recognize the Mona Lisa in each case?

Make up a transformation that takes the average of two images. That is, each pixel of the destination image is equal to the mean of the two corresponding source image pixels. Can you still recognize each of the two source image scenes or portraits in the "double exposure" you have created?

Sequences of Calls to the METIP Library.

Here are some additional projects and exercises that require more sophisticated use of the transformation engine and/or Lisp.

Programming sequences of enhancement operations.

Using a program to do the same sequences of operations to several different images.

Writing a program that will "monogram" your initials into the lower-right corner of any image you choose.

Creating a mystery image by writing a subtle pattern into the pixels.

Generating "double exposure" images with control of weights on each source image.

How to compute a new value for a pixel or set of pixels.

Emulating the Pixel Calculator in a program -- How to perform the same image enhancement operations that the pixel calcular does, but in XFORM code (more slowly, but using a general language)

Representing and Processing Images with Common Lisp

Representing a small image as a Lisp array.

The Common Lisp language includes facilities for handling arrays of any dimension. Using a 2-dimensional array, Common Lisp can easily store digital images. Here is a simple example.

(setq *nrows* 8)

(setq *ncols* 8)

(make-array (list *nrows* *ncols*)

:initial-contents '(

(1 1 1 1 1 1 0 0)

(0 1 0 0 0 0 1 0)

(0 1 0 0 0 0 0 1)

(0 1 0 0 0 0 0 1)

(0 1 0 0 0 0 0 1)

(0 1 0 0 0 0 0 1)

(0 1 0 0 0 0 1 0)

(1 1 1 1 1 1 0 0) )) )

In order to manipulate such images, we will need to write functions in Lisp. Here is a pair of functions, PRINT-ROW and PRINT-IMAGE that work together to display one of these numeric images on the screen.

(defun print-row (image row)

"Prints one row of an image."

(dotimes (i *nrows*)

(format t "~5D" (aref image row i)) ) )

(defun print-image (image)

"Print out IMAGE nicely formatted."

(dotimes (i *nrows*)

(print-row image i)

(format t "~%") ) )

Next, let's see a definition for a function that does something to an image represented in Lisp. This function, called AMPLIFY-IMAGE, returns a new image whose pixels have been obtained from the corresponding input image pixels by multiplying by the value FACTOR.

(defun amplify-image (image1 factor)

"Returns result of multiplying each element of

IMAGE1 by FACTOR."

(let ((image2 (make-array (list *nrows* *ncols*))))

(dotimes (row *nrows*)

(dotimes (col *ncols*)

(setf (aref image2 row col)

(* (aref image1 row col) factor) ) ) )

image2) )

For a full account of using the Common Lisp constructs DEFUN, LET, MAKE-ARRAY, LIST, DOTIMES, SETF and AREF, consult a text on Common Lisp, such as Touretsky -- Common Lisp: A Gentle Introduction to Symbolic Computation, or Chapter 2 of Tanimoto -- The Elements of Artificial Intelligence Using Common Lisp, 2nd Edition.

Exercises: Write Lisp programs for (a) Thresholding an image, (b) Computing edges in an image, and (c) Finding connected components in an image.

Converting from METIP bitmap images to Lisp arrays and vice-versa.

It is possible to convert images between the format used by the Image processing component of XFORM and Lisp arrays. To convert a METIP bitmap to a Lisp array, first load the bitmap into a window using, e.g., (METIP-LOAD-FILE 1 "mona.bmp"). Then allocate a Lisp array large enough for the image (Note: avoid using large images in Lisp arrays). The size of the loaded image can be obtained by making

the call (metip-get-win-size), which returns a list of the width and the height of the image. Then using (metip-get-pix window x y) inside of a loop, retrieve all the image pixels and store them in the Lisp array using something like the form (setf (aref myimage x y) the-value).

It is possible to convert Lisp arrays to METIP bitmaps by going in the other direction. One must already have an image window open with a blank image in it (for example by loading the image BLANK128.BMP). In a loop, call (metip-put-pix window x y gra) to transfer each Lisp array value into the appropriate pixel of the image.

Programming a Presentation

Using all three components of XFORM

(to be added)

A sample presentation

(to be added)

How to Specify Image Transformations

Here are the technical specifications for what is valid in an XFORM transformation expression.

Suggested Projects

Explore translation, scaling, and rotations by arbitrary angles. Create a way to easily combine these three types of transformations into a single transformation that takes several arguments. Make up puzzles for students, where they are given a pair of images that are related by such a sequence of transformations, and they must come up with the parameters that correctly map one image into the other.
Here is a sample solution to this. http://www.cs.washington.edu/homes/marla/trans_scale_rot.html

Write a routine that takes two images A and B, and generates the frames of a dissolve sequence, whereby the sequence starts out with A and ends up with B, but A gradually fades out as B fades in. Display either a 6-frame dissolve or an 8-frame dissolve on your screen, depending on the size of your monitor. Experiment with the weighting of the two images. Does a linear fade-out/fade-in work as well as a sigmoid function such as one of those used commonly in neural net backpropagation algorithms?

g(h) = 1 / ((1 + e^(-h))

Here is a sample solution to the dissolve problem.

Illustrate the use of symmetry groups of transformations on images. Provide a function that takes a transformation (the "generator") and applies over and over again to an image (with each result being shown in a different window) until the image that results is the same as the original, or a maximum number of windows is reached. If you use a transformation that causes a loss of image information, demonstrate the effects of error accumulation. At what point is the loss of information noticeable? At what point, if any, does all the information become lost?

Create a collection of functions and transformations that make it easy to create stencils (images with holes through which you can see other images below). Explore the mathematical properties of stencils. When you combine two stencils, is the operation commutative? When you combine three, is the operation associative?