C.1. Genetic Algorithms in function maximization
Genetic algorithms are mechanisms which can be used for finding maxima
in complex functions. GAs probe into the function space at plausible points,
and need no information other than the value of the function at each probe
point. Since no other knowledge of the function is
used, GAs can work well in cases where more specific methods fail. However, since they do not take advantage of
other information when it is known, they perform less well than other
methods that can take advantage of such information.
GAs work with fixed size
populations of fixed-length strings. Each string is translated into a probe value,
the function is evaluated at each probe point, and the function value
is used as a fitness value for that string.
After all string fitness values have been calculated, genetic operators
are applied to the population, using high-fitness strings as parents,
and combining their contents into new children strings. If the function is not too ill-behaved, some
of these children will likely have higher fitness (function) values than
any of their parents. This process
is iterated until a satisfactory result is found.
If the function has many maxima or near maxima, several of them
will tend to be found, each by a different string.
If only one maximum, all strings will tend to converge on that
value. If the function is sufficiently complex, a maximum
may not be found in a reasonable number of steps, but a value might be
found that is good enough.
The genetic operators
most often used are reproduction, crossover, and mutation.
Reproduction is applied to strings selected on a stochastic basis
from the population; selected strings are placed into a mating pool.
The selection is weighted to give high fitness strings more opportunities
to be included. As the selection
is with replacement, high fitness strings will have multiple occurances
in the mating pool.
After the mating pool
is filled, strings in it are randomly selected for further genetic operations.
For crossover, two parents are selected.
Then two offspring are created by swapping all characters up to
a randomly chosen point. These two offspring replace the parents in the
mating pool. After two strings
have been operated on, and replaced by their offspring, the offspring
are not further manipulated by the crossover operator.
The proportion of the mating pool used as parents is a variable
that can be manipulated by the experimenter according to the problem at
hand.
After crossover is completed,
some of the strings in the mating pool are chosen at random for application
of the mutation operator. A randomly chosen character in the selected
string is changed to another character.
The purpose of the mutation operator is to prevent the premature
loss of genetic material. Reproduction
and crossover can occasionally lose some genetic material (particular
characters at particular locations). Mutation,
sparingly used, can replace this without turning the entire process into
a random walk. Empirical results
of GAs show that good results are obtained by mutation rates of about
1 mutation per 1000 bits transferred (Goldberg 1988, page 14).
Examination of a simple
example is instructive. Consider
the function y=(abs(x-16))**2 on the interval (0,31). Generating 6 five bit strings by coin flips,
we start with these strings, which have the corresponding function values:
11001
81
11110
196
10100
16
11011
121
10000
0
00101
25
The average of these
function values is 73. There are
a number of ways to select strings to be placed into the mating pool.
One is to construct a biased roulette wheel, where each string
is given an area of the wheel in proportion to its fitness.
Spin the wheel once for each string to be placed into the pool,
and pick the string thus selected. This
can be simulated by adding the fitness functions together to get the size
of the simulated wheel. Each string
then is given an interval equal to its fitness function.
Random numbers are picked between 0 and the size.
Now, to apply the reproduction
operator, construct a simulated
biased roulette wheel by calculating ranges based on the fitness values
for these strings. The ranges will
be (0-80, 81-276, 277-292, 293-413, 413-413, 414-438).
Picking a random number between 0 and 438 allows us to select one
of these strings. Picking 6 random numbers by computer gives us
320, 76, 138, 133, 236, 426. Picking
these strings to place into the reproduction pool gives us the following
pool:
11011
11001
11110
11110
11110
00101
Now, to apply the crossover
operator, we pair the strings at random, and pick a point within the string
at random. Pairing (4, 1), (2,
6), and (5, 3) and crossing over at 3, 4, and 1, gives us the following
population, for which we calculate the function values:
11111
225
11010
100
11001
81
00101
25
11110
196
11110
196
The
average of these function values is 137.
So, in one generation, we have approximately doubled the average
function value.
To compute the next step,
first construct the roulette wheel. The ranges now will be (0-224, 225-324, 325-405,
406-430, 431-626, and 627-822). Picking
by computer 6 random numbers between 0 and 822 gives 98, 774, 486, 63,
268, 208. Picking these strings
to place into the reproduction pool gives the following pool:
11111
11110
11110
11111
11010
11111
To construct the third
generation, we apply the crossover operator to this pool.
Pairing, at random (4, 1), (2, 5), (6, 3) and crossing over at
1, 4, and 2, gives us the following population, for which we calculate
the function values:
11111
225
11111
225
11110
196
11010
100
11111
225
11110
196
The average function
value for this population is 194. Skipping the details on the roulette wheel,
picking for the mating pool, and crossover, the fourth generation is:
11111
225
11111
225
11111
225
11111
225
11111
225
11111
225
The system has converged,
in three generations, on the string 11111, which has a function value
of 225.
There are several instructive
points about this example. One is that it did not converge on the optimal
point, which is 00000, for a function value of 256. The primary reason for this is the small number
of strings in the population. With
only a few strings, there is great danger of losing good genetic material,
as shown in this example when 00101 disappeared in the third generation.
No other strings had 0s in their left hand positions, and mutation
is not fast enough to regenerate them.
Another, much smaller problem was shown for the function value
0 for the string 10000 in the first generation.
Since the fitness value was 0, there was no way that this string
could propagate into the next generation. Fitness functions should be selected so that
they are always greater than 0.
Even in this small system,
where the number of strings was much too small for practical problems,
the GA converged on an answer that was very close to the maximum.
While GA's are very good
at searching for a function maximum, they cannot be directly applied to
many problems in AI. This requires
linking GA's with a performance system, which can accept inputs and produce
outputs. The performance system
used with GA's is the bucket brigade classifier system. This will be explained in two steps, first simple
classifier systems, then those with the bucket brigade added.
C.2. Classifier Systems
Classifier systems (Holland
& Reitman, 1978) are basically production systems. They have a memory mechanism, an input mechanism,
an output mechanism, and set of rules which relate elements of the first
three. All inputs, outputs, and
memories are represented as fixed length strings over a fixed alphabet. The memory mechanism is called the message list.
It consists of a number of messages, each of which is a string.
The input mechanism adds messages to the message list depending
on events and states in the outside world.
The output mechanism can examine the message list and cause actions
in the outside world. The rule
set is called a classifier list. Each
rule contains one or more conditions which match messages in the message
list, and an action, which creates a new message in the message list.
Since all messages are
represented as fixed length strings over a fixed alphabet, these elements
are completely general, knowing nothing about the content or structure
of the environment. The relationship
between the strings and the outside world is totally contained in the
input mechanisms (detectors) and output mechanisms (effectors).
A classifier system operates
by looping through several steps:
(1)
The detectors post zero or more messages to the message list.
(2)
All messages on the message list are compared to all conditions of all
rules.
(3)
For each match, fire the rule, which will generate a message on the new
message list.
(4)
Replace the entire message list with the contents of the new message list.
(5)
If any messages are output messages, the effectors will generate an output.
Even though classifier
systems can operate with strings over any alphabet, there are some theoretical reasons, which will
be discussed later, to use the alphabet (0, 1). This allows complete generality, and makes operation
and analysis easier.
The detectors translate
the real world into fixed length strings by encoding particular states
or events of the environment. For
example, property detectors could be used to set a specific bit if the
particular property was true, or reset it if it was false. Other detectors could be used, such as determining
in which window a mouse click occurred, or which key the user pressed.
Once the detectors generate a message, it is added to the message
list.
The rules in the classifier
list consist of two parts: one or more conditions and an action.
Each condition consists of a fixed length string from the alphabet
(0, 1, #), with # being interpreted as 'don't care'.
Conditions are compared to messages, and match in the case that,
where the condition is 0 or 1, the message contains the same value.
If there is more than one condition in a rule, each must match
some message (probably a different message for each condition) for the
rule to be considered matching. A
condition may be preceded with a negate sign ("-"), which will
cause the condition to be true if there are no messages on the message
list that match the remainder of the condition.
The action consists of
a fixed length string from the alphabet (0, 1, #) , with # being interpreted
as 'pass-through'. When a match
is found, a new message is posted to the new message list.
The content of the message is specified by the action, with the
content of the new message matching the action whenever the action is
0 or 1. Where the action contains
#, the corresponding letter of the matching message is used.
Since the entire message
list is replaced by the new message list at each cycle, special steps
must be taken to regenerate any messages that need to remain, as they
would otherwise be lost.
Certain messages are
sent to the effectors and are translated to actions there. Only those messages with certain predefined
patterns in predefined tag fields are accepted by the effectors. The effectors make whatever translation is necessary
to cause actions in the outside world. For example, a certain message
could cause a new window of a particular type to be created at a specified
place on the screen. Output messages
are not just sent to the effectors; they stay on the message list until
replaced in the next cycle and may be matched by conditions in other rules
just like non-effector messages.
The rules need not be
all at one semantic level. Rules
at higher levels might recognize a certain situation from the input, and
post a message stating what the current goal of the system is. This message would be posted every cycle until
the goal was achieved, or the situation changed enough that it was no
longer relevant. The presence of
this message would cause other rules that would help accomplish this goal
to become active. Intermediate
level goals are handled in precisely the same fashion.
Now let us consider a
simple example. This example will
have messages and conditions that are 7 letters long. The detector has 5 bits of input: bit 0: (least
significant bit) mouse in window A; bit 1: mouse in window B; bit 2:
mouse down; bit 3: window A open, and bit 4: window B open.
The mouse button is assumed to be debounced, so that only one 'down'
message is posted per mouse click. Bits 5 and 6 of all detector messages
will be '00' to indicate that this message came from the input detectors.
The effectors have 4
bits of output: bit 0: open window A; bit 1: close window A; bit 2: open
window B; and bit 3: close window B. Bit 4 is not used and is ignored. Bits 5 and 6 of messages sent to the effectors
will be '11' to indicate that this message is intended for output.
The system includes the
following classifiers with two conditions each:
(a) '000#11#', '#######' / '1100100' which translates to "if mouse down,
and mouse in window B, and window B closed, then send 'open window B'".
(b)
'001#11#', '#######' / '1101000'
which translates to "if mouse down, and mouse in window B,
and window B open, then send 'close window B'".
The action of this classifier
system will be simply to toggle window B between the open and closed states
every time the mouse is clicked inside the window. Note that the position of the window may change
at any time; the input detectors must take care of finding it.
Now, let us consider
a slightly more complex example. The input detectors have only 3 bits of input:
bits 0, 1, and 2 as in the previous example. Bits 3 and 4 will always be zero. Note that the detectors that read whether a
window is open or not are not present.
The effectors are the same. The
classifier system must remember
whether the windows are open or closed as the detector no longer supplies
that information.
This example includes
the following classifiers:
(a)
'00##11#','-0100001' / '1100100' A
literal translation of this is "if a detector message that has '11'
in bits 1 and 2 is present, and if there is not a message that matches
'01000001', then output to the effectors '0100'".
A free translation is "if the mouse is down in window B, and
there is not a message that says that window B is open, then open window
B".
(b)
'00##11#', '-0100001' / '0100001' Note
that the condition part is the same as that of the previous classifier,
so this classifier will operate at precisely the same time as the previous. A free translation is "If the mouse is
down in window B, and window B is not open, then post a message that says
that it is now open." This
message, as all messages do, remains on the message list for only one
cycle.
(c)
'01000##', '#######' / '#######' A
literal translation is "if there is a message with '01000' in bits
2..6, and there is any message, then post a message identical to the first".
Making a free translation yields "If any window is open, remember
that fact". Note that this is necessary because all messages
are replaced at each cycle.
This classifier system
will open window B the first time that the mouse is clicked inside window
B, and will remember that fact forever.
The extension of this
example to closing windows involves modifications to the control structure.
In the modified control structure, the inputs are allowed to post
messages only every other cycle. This
is necessary to avoid a race condition.
The following classifier
is added to the classifier list:
(d)
'00##11#','0100001' / '1101000' A
free translation is "If the mouse is down in window B, and window
B is open, close it".
Classifier
(c) is replaced with the following:
(c)
'0100001','-11#1###' / '#######' A
free translation is "If there is a message that says that the state
of window B is open, and there is not a message directing the effectors
to close window B, remember that it is open."
The effector message
will be generated only on one cycle, and will not persist, so the classifier
system must arrange on that cycle to forget that window B is open.
On all other cycles, if the message saying that the window is open
is there, it will be remembered.
The action of this classifier
system is the same as that of the first example: namely that the state
of window B will be toggled between open and closed every time the mouse
is clicked inside that window. Now
the classifier system is able to remember the current state of window
B for itself, rather than requiring the input detectors to keep track
of that.
Classifier systems are
just rule based production systems, rather simple ones at that.
Without some real advantage over other production systems, there
would be no reason to bother with them. The advantages come in when adding mechanisms
to manipulate the rules; rules of this form are easier to manipulate than
LISP predicates or other production rule representations. Adding mechanisms to manipulate the rules turns
a simple classifier system into a learning classifier system.
C.3. Bucket Brigade Classifier Systems
A bucket brigade classifier
system is composed of a classifier system with the bucket brigade learning
mechanism. This learning mechanism
takes a reward function as feedback from the external world for each output;
positive for desirable outputs, negative for undesirable outputs, zero
for others.
The Bucket Brigade algorithm
is a mechanism for changing the probability that a rule will be used when
its conditions are matched by messages. In this system, each classifier has a strength,
which affects the probability of that classifier being executed. A matched classifier bids to be executed, with
the bid being the product of a bid constant less than 1 (typically 1/8),
its strength, its specificity (proportion of letters that are not "don't
care"), and its support (the sum of the strengths of the rules that
posted messages that match this rule).
The highest bids are stochastically accepted, and the winning classifiers
are allowed to post messages.
The bid constant is used
to insure that classifiers do not squander their entire strength in a
few unsuccessful bids. The strength
is used as a predictor of ability to predict correctly. High strength classifiers are those that have
been successful at predicting good outcomes in the past. The specificity is used so that classifiers
that are highly specific to the particular situation are favored over
those that are more general, but have the same strength and support. The specificity increases as the number of "don't
care" specifiers decreases. Including
this parameter allows default hierarchies to form, as rules that are highly
specific to the situation will bid more in relation to the more general
rules of the same strength and support.
Support is another measure of relevance.
Rules that match messages posted by rules that bid a lot tend to
be very relevant to the current situation.
Support is passed from rule to rule by messages.
If a rule's strength
is a measure of its ability to predict correctly, there must be some method
of modifying the strength. When
a rule makes or supports a correct output, its strength must be increased.
If not, if the prediction is incorrect or not rated, the strength
must decrease. This is accomplished by passing strength from
rule to rule by way of messages. A
classifiers strength is reduced by the bid amount when it wins and posts
a message. This strength is redistributed
to the classifiers that posted the messages that this classifier responded
to. So, when a rule posts a message
that is consumed by another rule, its strength is to some extent replenished. Rules that post messages that are not consumed
by some other rule or by the output reward function lose strength.
The ultimate source of
strength to replenish the classifiers is the reward function connected
to the effectors. The classifier
that produces an effector message that results in a positive reward function
has its strength increased by the reward. The next time that the same situation arises,
rules that feed the final classifier are rewarded by that final classifiers
bid. In this way, strength is eventually
redistributed to all the classifiers that help produce useful outcomes
(positive reward functions). Classifiers
that post messages that do not lead to positive rewards are eventually
reduced in strength to the level where they no longer win bids and become
inactive.
An example of a classifier
system using the bucket brigade is useful here. Consider an classifier system that has the same
inputs and outputs as the first example, above, and that uses the following
rules:
(a)
'00#01#1', '#######' / '1100001', strength 70; which translates to "If
the mouse is down in window A and window A is closed, open window A."
(b)
'00#011#', '#######' / '1100001', strength 80; which translates to "If
the mouse is down in window B and window A is closed, open window A."
(c)
'000#1#1', '#######' / '1100100', strength 90; which translates to "If
the mouse is down in window A and window B is closed, open window B."
(d)
'000#11#', '#######' / '1100100', strength 100; which translates to "If
the mouse is down in window B and window B is closed, open window B."
With the bid constant
set at .1, and support for input messages arbitrarily set at 5, the bids
for the input message '0000110' ("mouse is down inside window B,
and both A and B are closed") will be (a) no bid, (b) 28, (c) no
bid, (d) 35. Classifier (d) will
probably win, posting its message. The
strength of d will be reduced by 35 to 65.
The output message, "open window B" is the correct output,
so the classifier that posted the message will be rewarded, and have its
strength increased by 100 to 165. So,
when the classifier system produces the correct response, it is rewarded,
and the chance of it producing the correct response again for the same
input is increased.
Now, if the classifier
system is given the input '0000101' ("mouse is down inside window
A, and both A and B are closed"), the bids will be (a) 25, (b) no
bid, (c) 32, and (d) no bid. Rule
(c) will probably win, posting its message.
Its strength will be reduced by 32 to 58. The output message "open window B"
is not the correct output, so no reward will be given. The next time that the same input message is
posted, classifier (a) will probably win, since its strength is now higher
than that of classifier (c). If
it does win, it will be rewarded for producing the correct output, and
its strength will increase.
Classifier (c) will probably
not win any more bids, so its strength will not decrease further.
It is still available in case the environment changes, and the
response to classifier (a) becomes incorrect.
After a few trials where (a) produces an incorrect response, and
has its strength reduced, (c) will begin to win bids again.
If its response is now correct, its strength will increase.
Note that if the system were given a negative reward for producing
the incorrect output, the result in this case would have been the same,
but the strength of classifier (c) would be reduced by more than just
the amount it lost making its bid.
Now that we have examined
simple classifier systems, and those using the bucket brigade, it's time
to combine those with Genetic Algorithms. This combination is called a Learning Classifier
System.
C.4. Learning Classifier Systems
Recall that genetic algorithms
are a way of finding maxima in function values. Now consider the performance of a bucket brigade
classifier system. As the system
performs better, it will be rewarded more often. The strengths of the rules that cause it to
perform better will increase. This
increase can be considered as the function to optimize.
If the rules can be changed,
there is a chance that the system of rules will work together better,
and produce a higher total reward. This is what we are trying to accomplish by
applying the genetic algorithm. The
GA changes the population of rules by combining characteristics of the
most fit individuals, the higher strength rules.
This requires some small
changes in the mechanism of the Genetic Algorithm. There is no longer a requirement that strings
be translated into probe values and that the function be evaluated. The strength can be used directly as the function
value.
The example problems
given above are too simple to benefit from learning new classifiers, but
provide material for the following examples of application of the various
genetic operators. Consider the classifier '000#11#', '#######' / '1100100',
which translates to "If the mouse is down in window B and window
B is closed, open window B." If
the mutation operator changes the fourth letter, the result might be '000111#',
'#######' / '1100100', which translates to "If the mouse is down
in window B, and window B is closed and window A is open, open window
B". This is a specialization
of the more general parent rule. In
this case, it is probably not a useful specialization, and will probably
not last long in the competition. Note
that the parent rule is not replaced by the offspring.
Offspring replace low-strength rules which have not fared well
in the competition.
Crossover between '000#11#',
'#######' / '1100100' and '00#01#1', '#######' / '1100001' between the
third and fourth letters gives the following offspring: '000 01#1', '#######'
/ '1100001' and '00# #11#', '#######' / '1100100' which translate to "If
the mouse is down in window A and both A and B are closed, then open window
A" and "If the mouse is down in window B, then open window B"
respectively. These are plausible
rules in the current context, but again will probably not survive long
in the competition.
Consider the world outside
the LCS as a search space, with the reward function being the function
to be optimized. The search space
can be very complex, multi-dimensional, and time varying. Searching this space can be done with many different
techniques, with greater or lesser success. Much of AI consists of techniques for searching
such spaces. Some search strategies
(e.g. hill-climbing) are subject to being fooled by local maxima.
Others are not computationally tractable for useful problems.
LCSs using genetic algorithms result in fast searching through
the possible solution space, without many of the problems of other methods. A more complete analysis may be found in Holland
1986
|