by Steve Tanimoto
8 January 1996
Mathematics Experiences Through Image Processing
Dept of Computer Science and Engineering, University of Washington,
Seattle, WA 98195
There are lots of programs available nowadays that provide image
processing capabilities -- photo enhancement, touch-up, and darkroom
effects. All of these programs use digital image processing technology,
which is based on mathematical operations. Most of these programs
prevent the user from using or even seeing the mathematical representations
of the data inside the program. This results in a separation
of the user from an accurate understanding of the principles involved
in image processing, and it also limits what the users can do.
XFORM is a programming language that works together with a visual
interface to permit the user to add an extra degree of understanding
and control to the image processing experience.
XFORM works on IBM PC/486 and Pentium systems running Windows 3.1, Windows 95, or Windows NT. It will also work on 386 systems, but then should be restricted to small images and simple operations to prevent long waits for the image processing to be done.
The XFORM language is embedded in (and includes) Lisp, a mature programming language commonly used for rapid prototyping, symbolic computation, scripting languages, and artificial intelligence. The user of XFORM may use as much or as little of LISP as desired. There are three components to the XFORM language:
1. The Transformation language -- a small subset of Lisp. Here are the technical specifications for what is valid in an XFORM transformation expression.
2. The METIP library -- a collection of functions that the user program calls, which manipulate the user interface, perform image processing operations, and access images in bitmap format from the hard disk. Here is a Lisp file that contains descriptions of these functions.
3. Common Lisp -- the language and functions of Lisp itself, as
implemented by Betz, Tierney and others in the XLISP-STAT for
Windows release from the University of Minnesota.
(Here is an
Introduction to XLispStat
The Tech Rept for XLISP-STAT is also available.
The most important part of XFORM for most users is the transformation
language. This is a little language in which arithmetic expressions
are passed to the "transformation engine" to produce
new images from old. Supported are operations for transformations
in both cartesian and polar modes.
After downloading and installing the XFORM system (also known as the METIP Transformation Programmer), start up a session by double-clicking on XLISP-STAT & METIP Prog Envt.
When the XLISP-STAT prompt appears, enter the following line of
code, followed by a carriage return.
(xform-startup 2 "mona.bmp")
You should see some flashing of new windows followed by the loading
of the Mona Lisa image into two new windows on the screen.
To get back to the XLISP-STAT prompt, hold the Alt key and press
the Tab key.
Now perform an operation on the Mona Lisa image by typing,
(xform-vertical-flip)
To see the result, go back to the image screen by holding the
Alt key and typing that Tab key. (this is a way to get back and
forth between different programs running in Windows. The XFORM
system is actually implemented as two separate programs -- one
for XLISP-STAT and one for the image processing subsystem.)
Another transformation you can try is a reflection of Mona Lisa
around the line y = x.
(xform-x-y-flip)
Here is something a little more strange -- a fisheye lens transformation:
(xform-fisheye)
Now that you have tried some "canned" transformations
that have already been defined so that you can call them by name,
try one that is not already pre-programmed.
(xform '(source1 (mod (* 2 x)
x-max) y))
Here you must be careful to type each parenthesis exactly as shown. Also, the apostrophe, called the "quote" character in XFORM and in Lisp, must be typed correctly for this to work.
Don't put any spaces between the characters of names such as "x-max".
The hyphen is part of the name.
What did this transform do? It causes a new image to be determined by computing, at each (x,y) position, a value for the new image's pixel. The formula says it should first double the x coordinate and take that value modulo the value of x-max (the value of the highest x-coordinate in the Mona Lisa image). Let's call this resulting value xnew. It should then look up the pixel at (xnew, y) in the Mona Lisa image (i.e., the source image SOURCE1). That is the pixel value it should plot at (x,y). If all went correctly when you typed in this formula, you should see in the new image window (the "destination" window) a pair of squashed Mona Lisas next to each other.
If you got an error because of typing in something incorrectly,
the simplest thing to do is to exit XLISP-STAT (automatically
exiting the Image processing subsystem, too), and restart it.
There are more graceful ways to recover from various Lisp and
XFORM errors, but there is not time to explain them now.
The best way to exit from the XFORM environment (XLISP-STAT and the Image processing interface), assuming you have saved any images and program dialogs that you don't want to go away, is to get back to XLISP-STAT, go to the file menu, and selected "Exit". This will close down the Image processing interface and then close down XLISP-STAT all in one operation.
Let us consider the following simple transformation, which creates,
in the destination window, an image whose pixel values are twice
those of the corresponding source image ones.
(xform '(* 2 (source1 x y)))
This tells XFORM to transform the image at SOURCE1 by computing
at each destination pixel (x,y), two times the value of the source1
pixel at (x,y). The subexpression (SOURCE1 X Y) accesses the
pixel of the source1 image at position (X, Y). An expression
of the form (* A B) causes the values of A and B to be multiplied.
Regarding notation, lower-case inputs are automatically converted
to UPPER CASE by XLISP-STAT, except within strings. This document
uses the following convention: displayed examples of input are
given in Courier font lower case. Corresponding output is given
in Courier font UPPER-CASE. Both inputs and outputs given within
paragraphs of text are shown in Times New Roman UPPER CASE.
Suppose we wish to scale an image down. We can do this by multiplying
the coordinates of the source pixel by a number greater than 1,
thus mapping a larger range of the source image into the destination
image, effectively shrinking the image. Here is an example.
(xform '(source1 (* x 3) (* y
3)))
We don't have to use integers in these formulas. We can use floating point numbers like 1.5 as follows.
(xform '(source1 (* x 1.5) (*
y 1.5)))
The value of a destination pixel can depend on a logical condition.
An example of this is shown by the following, which makes all
pixels to the right of the line x=50 turn white.
(xform '(if (> x 50) 255 (source1
x y)))
Here, the IF expression has the form (IF E1 E2 E3) where the first expression (represented by E1) gives a condition, the second expression (represented by E2) gives the value to return if the condition is true, and the third expression (represented by E3) gives the value to return if the condition is false. In the example above, if the X coordinate of the destination pixel is greater than 50, then the value for the destination pixel is 255 (white); otherwise it is the value of the source pixel at (X, Y), which means the image is unchanged to the left of the line x=50.
The coordinates of pixels in the image being computed (the destination
image) range between 0 and an upper limit. The upper limits for
each of X and Y depend upon the size of the bitmap already loaded
in the destination window. In a formula, these limits can be
referenced using the symbols X-MAX and Y-MAX. The canned transformation
XFORM-VERTICAL-FLIP is actually implemented with the following
transformation formula.
(xform '(source1 x (- y-max y)))
This means that at each destination pixel (x, y), the y value
is subtracted from the maximum y value of the image, thus complementing
the y value (reflecting it to the bottom or top of the image).
Then the source pixel value at that new height is read and assigned
to (x, y) in the destination image.
Although many interesting transformations can be expressed easily
with Cartesian coordinates (x and y), it is often more convenient
to use polar coordinates (rho and theta). Here is a transformation
that complements the rho values and thus turns an image "inside
out".
(use-polar)
(xform '(source1-polar (- rho-max
rho) theta))
Note that we use the command (USE-POLAR) to put the transformation engine into polar-coordinates mode. Then we reference the source image using the function SOURCE1-POLAR followed by a radius (from the center of the image) and an angle (in radians).
We showed earlier how to shrink an image by multiplying the source
coordinates by a number larger than 1 We can, of course, also
expand or blow up and image by using factors less than 1. These
operations, implemented in this fashion, are sometimes called
"resampling."
(xform '(source1 (* x 0.5) (* y 0.5)))
We can also make an image become "blocky". We essentially
ignore some of the pixel data of the source image, but we reuse
the pixels we do take, so that the destination image has the same
size as the source.
(xform '(source1 (- x (mod x 8))
(- y (mod y 8))))
Here, the MOD function gives the remainder after dividing X or Y by 8.
The resampling operations can also be applied using polar coordinates.
Here is an example in which we expand an image by oversampling
the radius coordinate.
(use-polar)
(xform '(source1-polar (* 0.5
rho) theta))
However, if we do the same thing with the angle coordinate (theta)
we don't obtain a very natural-looking image.
(xform '(source1-polar rho (* 0.5 theta)))
(use-cartesian)
By combining the use of shrinking an image (via undersampling)
and the use of modular arithmetic on the cartesian coordinates,
we can get many small copies of the source image. Here's a transformation
that gives us a dozen tiny versions of the source image in a 3
by 4 array.
(xform '(source1 (mod (* x 3) x-max) (mod (* y 4) y-max)))
Until now we have been working with only one source image, referenced
using the function SOURCE1 or the function SOURCE1-POLAR. The
transformation engine can use either one or two source images
at a time. In order to use a second source image, we should already
have a window containing that image on the screen. For example,
the function call (METIP-LOAD-FILE 2 "mona.bmp") can
be used to load the Mona Lisa image into Window 2. One could
also use an image whose value has been computed using other formulas.
To add two images together, where the first source is in window
1, the second source is in window 2, and the destination is in
window 3, we can use the following sequence of statements.
(set-source1 1)
(set-source2 2)
(set-destination 3)
(xform '(+ (source1 x y) (source2
x y))
This specifies that each destination pixel (x, y) gets computed
as the sum of the source1 pixel at (x, y) plus the source1 pixel
at (x, y).
Make up a transformation that rotates the Mona Lisa 180 degrees
(pi radians), (a) using Cartesian coordinates only, and (b) using
polar coordinates only.
Make up a transformation, using the IF conditional form, that
leaves pixels unchanged if they are above 128 but which subtracts
them from 255 if they are less than or equal to 128.
Create block versions of the Mona Lisa with different block sizes.
(The 8 by 8 block size example is given in the section on resolution
reduction, above). Display each image full-screen, using the
Image processing interface window controls and magnifying tool.
How far away to you have to get from the screen to perceive and
recognize the Mona Lisa in each case?
Make up a transformation that takes the average of two images.
That is, each pixel of the destination image is equal to the
mean of the two corresponding source image pixels. Can you still
recognize each of the two source image scenes or portraits in
the "double exposure" you have created?
Here are some additional projects and exercises that require more sophisticated use of the transformation engine and/or Lisp.
The Common Lisp language includes facilities for handling arrays
of any dimension. Using a 2-dimensional array, Common Lisp can
easily store digital images. Here is a simple example.
(setq *nrows* 8)
(setq *ncols* 8)
(make-array (list *nrows* *ncols*)
:initial-contents '(
(1 1 1 1 1 1 0 0)
(0 1 0 0 0 0 1 0)
(0 1 0 0 0 0 0 1)
(0 1 0 0 0 0 0 1)
(0 1 0 0 0 0 0 1)
(0 1 0 0 0 0 0 1)
(0 1 0 0 0 0 1 0)
(1 1 1 1 1 1 0 0) )) )
In order to manipulate such images, we will need to write functions
in Lisp. Here is a pair of functions, PRINT-ROW and PRINT-IMAGE
that work together to display one of these numeric images on the
screen.
(defun print-row (image row)
"Prints one row of an image."
(dotimes (i *nrows*)
(format t "~5D"
(aref image row i)) ) )
(defun print-image (image)
"Print out IMAGE nicely formatted."
(dotimes (i *nrows*)
(print-row image i)
(format t "~%")
) )
Next, let's see a definition for a function that does something
to an image represented in Lisp. This function, called AMPLIFY-IMAGE,
returns a new image whose pixels have been obtained from the corresponding
input image pixels by multiplying by the value FACTOR.
(defun amplify-image (image1 factor)
"Returns result of multiplying each element of
IMAGE1 by FACTOR."
(let ((image2 (make-array (list *nrows* *ncols*))))
(dotimes (row *nrows*)
(dotimes (col *ncols*)
(setf (aref image2 row col)
(* (aref image1 row col) factor) ) ) )
image2) )
For a full account of using the Common Lisp constructs DEFUN,
LET, MAKE-ARRAY, LIST, DOTIMES, SETF and AREF, consult a text
on Common Lisp, such as Touretsky -- Common Lisp: A Gentle
Introduction to Symbolic Computation, or Chapter 2 of Tanimoto
-- The Elements of Artificial Intelligence Using Common Lisp,
2nd Edition.
It is possible to convert images between the format used by the Image processing component of XFORM and Lisp arrays. To convert a METIP bitmap to a Lisp array, first load the bitmap into a window using, e.g., (METIP-LOAD-FILE 1 "mona.bmp"). Then allocate a Lisp array large enough for the image (Note: avoid using large images in Lisp arrays). The size of the loaded image can be obtained by making
the call (metip-get-win-size), which returns a list of the width
and the height of the image. Then using (metip-get-pix window
x y) inside of a loop, retrieve all the image pixels and store
them in the Lisp array using something like the form (setf (aref
myimage x y) the-value).
It is possible to convert Lisp arrays to METIP bitmaps by going in the other direction. One must already have an image window open with a blank image in it (for example by loading the image BLANK128.BMP). In a loop, call (metip-put-pix window x y gra) to transfer each Lisp array value into the appropriate pixel of the image.
Here are the
technical specifications for what is valid in
an XFORM transformation expression.
Explore translation, scaling, and rotations by arbitrary angles.
Create a way to easily combine these three types of transformations
into a single transformation that takes several arguments. Make
up puzzles for students, where they are given a pair of images
that are related by such a sequence of transformations, and they
must come up with the parameters that correctly map one image
into the other.
Here is a sample solution to this.
http://www.cs.washington.edu/homes/marla/trans_scale_rot.html
Write a routine that takes two images A and B, and generates the
frames of a dissolve sequence, whereby the sequence starts out
with A and ends up with B, but A gradually fades out as B fades
in. Display either a 6-frame dissolve or an 8-frame dissolve
on your screen, depending on the size of your monitor. Experiment
with the weighting of the two images. Does a linear fade-out/fade-in
work as well as a sigmoid function such as one of those used commonly
in neural net backpropagation algorithms?
g(h) = 1 / ((1 + e^(-h))
Here is a sample solution to the dissolve problem.
Illustrate the use of symmetry groups of transformations on images.
Provide a function that takes a transformation (the "generator")
and applies over and over again to an image (with each result
being shown in a different window) until the image that results
is the same as the original, or a maximum number of windows is
reached. If you use a transformation that causes a loss of image
information, demonstrate the effects of error accumulation. At
what point is the loss of information noticeable? At what point,
if any, does all the information become lost?
Create a collection of functions and transformations that make
it easy to create stencils (images with holes through which you
can see other images below). Explore the mathematical properties
of stencils. When you combine two stencils, is the operation
commutative? When you combine three, is the operation associative?