|
Fractal Data Stuart Reges,
Principal Lecturer
|
|
.
This page has a collection of resources from a talk given at the 2011 CS4HS workshop at the University of
Washington.
I mentioned that the theme of the talk is the idea of using real world data as
a way to make our courses more relevant and interesting to our students. I
mentioned two examples from my intro programming class:
- I use data from the Social Security Administration about popular baby names as a
programming assignment in which students show how the popularity of
particular names has changed over time (1900 to 2000).
- I use zip code data to find zip codes within a certain search radius. I
use a large zip code data file that gives the
latitude and longitude of each zip code and I have a Java program that does the search.
I mentioned three kinds of distributions:
- Uniform: In this kind of distribution, you don't expect to see any
special patterns for things like the first digit of a number. This is
the boring case that you get with random values and with sequences that
increase in a linear manner. They don't have much of a pattern to
them.
- Gaussian: This is the normal
distribution that we all studied in statistics class with the classic
bell curve.
- Exponential: I focused most on this because we don't have very good
intuitions about exponential phenomena.
When we think of fractals, we normally think of those pretty pictures you can
produce with a fractal
shape. Fractals have a property known as self-similarity. One way to
think of it is that if you zoom in and out, you see the same kind of pattern.
Many natural phenomena have this same property. Think of looking at a mountain
range and zooming in and out. You tend to see the same kinds of patterns at
every scale.
Exponential sequences have this same propert of self-similarity. And that
gives them some curious properties. For example, if you have numbers that come
from an exponential process, then you'll find that most of them start with a 1
(over 30%). The odd distribution of digits is known as Benford's Law. We see
this property in all sorts of real world data.
We explored why this is so using an excel spreadsheet.
I used a program for counting the distribution of leading digits that is
available either as a Java program or a Python program.
I mentioned three data sets as examples:
Stuart Reges
Last modified: Fri Aug 12 09:32:26 PDT 2011