Clifford Watson,
Consider a linear operator S, a function f(x) and a function g(x) = S(f(x)) (this is sometimes called a system). In the case of image processing, f(x) could represent the original image (a one-dimensional image in this case), S a filtering operation and g(x) the filtered output image.
If S is also shift invariant, then we can write

a convolution integral, also written g = f*h. h(t) is called the point spread function or impulse response and completely characterizes the filter. Discretizing this equation, we can write

which is appropriate for dealing with digital images. h(t) is usually non-zero only in some range including x, say [-w,+w], so we may rewrite this equation

Thus the output image at a pixel x has a value given by a weighted sum of the values of the pixels in [-w,+w] in the original image, with the weights given by h(k). For the output at x+1, h(k) is shifted by one and the weighted sum is recomputed. This series of shift-multiply-sum operations is performed over the whole image; this is called a digital convolution. For a two-dimensional image, we have

which is similarly computed by a series of shift-multiply-sum operations. h(x,y)'s values are often referred to as the filter weights, filter kernel or filter mask.
Note: we can create non-linear filters by varying the kernel over the image. Such filters are more complicated to represent, but can often remove the noise from regions of an image without smearing edges.