"Genetic Search"
Genetic search: search in which
a "population" of states is gradually changed according to "mutation" and
"crossover" transformations with some combination of directed (by a fitness
function) and random guidance.
Motivation
Large search spaces are often "irregular";
it's easy to get stuck in
local minima.
A combination of random and directed search is often effective.
"Natural selection" offers one compelling model.
We can take advantage of parallel
processing.
Individuals, the Population, and Solutions
An individual may correspond to
a state in a state space.
In the usual biological
model, each individual is represented as a string, as in a DNA molecule:
e.g., ACGTCCGACAGTT
(In Lisp, we can use a list.)
The population is a collection of individuals currently "alive" (under consideration).
Any solution to the problem should be in the state space.
We want the final population to
contain a solution.
Mutations
A mutation is a (small) transformation to an individual. It is analogous to a move in a state space.
Mutations are usually effected with some degree of randomness.
Examples:
Start with ACGTCCGACAGTT
1. random selection of location
in the string, followed by replacement of that symbol by a randomly chosen,
different one from the alphabet.
ACTTCCGACAGTT (third letter changed
to T)
2. random selection of two locations
in the string, followed by an exchange of the two selected elements.
AAGTCCGCCAGTT (second and eighth
elements traded places)
3. random deletion of one element.
ACGCCGACAGTT (fourth element,
T, deleted)
4. random insertion of a random
element.
ACGTCACGACAGTT (A inserted
after 5th element).
Crossovers
Two individuals are selected.
Each is split into two parts
The first part of one is combined with the second part of the other, producing a new individual.
Example: Start with
ACGTCCGACAGTT
TTTTTTTTTTTTT
Cut both after the 5th character
ACGTC
CGACAGTT
TTTTT
TTTTTTTT
Combine 1st of 1st with 2nd of
2nd
ACGTCTTTTTTTT
Fitness Functions
A fitness function is a function that maps from the set of individuals to the real numbers and that is used to determine which individuals survive from one generation to the next.
Fitness functions serve several
purposes:
1. to help discriminate between
individuals that correspond to "feasible solutions" and those that correspond
to infeasible solutions.
For example, in the travelling
salesman problem, we might interpret "feasible solution" to mean a sequence
of cities corresponding to an actual tour (visiting all cities, repeating
none).
Or in the painted squares
puzzle, a feasible solution might be thought of as one in which all four
pieces have been placed, but not necessarily with matching sides.
2. to indicate, within the set of
feasible solutions, which ones seem best or better than the others,
and within the set of infeasible
solutions, which ones seem closest to being feasible.
3. to lend some direction to what otherwise might be a completely random search.
Fitness functions can actually hamper the search if (a) they are overly influenced by irrelevant criteria, or (b) influenced in the wrong way by relevant criteria, or (c) if they are not appropriately counterbalanced by enough randomness in the mutation and crossover operations.
Fitness functions tend to drive the search (and the whole population) towards minima in the space.
The randomness of mutation and crossover operations keep the search moving and can help escape from local, non-global minima.
These two kinds of mechanisms must
be in balance for genetic search to work well.
Termination Criteria
In some problems such as puzzles, there are easily recognizable solutions.
At the end of each "generation" of new individuals, if any of them satisfies a test for being a solution to the problem, then we can stop there.
In other problems, the solution
is defined to be the "best" among a set of candidates.
For example, in the TSP, the candidates
are tours, and the best one is the one with the smallest total distance.
It is sometimes easy to identify
a solution to such a problem, particularly if there is a method for computing
tight lower bounds on the cost of a solution.
However, often no such method is
available.
Some termination criteria are these...
1. Stop after a fixed number of
generations, or a fixed amount of computer time has been spent.
2. Stop as soon as the population
changes from one generation to the next get small enough.
(And if the population fails to
change, then stop immediately.)
In addition to termination criteria, there are other policies to set and adjust:
1. Size of the population.
It should be big enough to accommodate
a diversity of individuals. Otherwise, crossovers will not lead to
new individuals.
It should be small enough to keep
the computational load manageable.
2. Balance of mutations and crossovers.
Crossovers seem to offer best results
when the good parts of two existing individuals can be combined into something
better.
Mutations seem to work best when
something random needs to be done to produce an individual that is different
enough from its "peers" to help the whole population get out if its local
minimum.
Genetic search is related to "simulated
annealing."
Related to these are "Boltzmann
machines" and the "Metropolis algorithm".
Last modified: October 27, 1998
Steve Tanimoto