GA01

C.1. Genetic Algorithms in function maximization

Genetic algorithms are mechanisms which can be used for finding maxima in complex functions. GAs probe into the function space at plausible points, and need no information other than the value of the function at each probe point. Since no other knowledge of the function is used, GAs can work well in cases where more specific methods fail. However, since they do not take advantage of other information when it is known, they perform less well than other methods that can take advantage of such information.

GAs work with fixed size populations of fixed-length strings. Each string is translated into a probe value, the function is evaluated at each probe point, and the function value is used as a fitness value for that string. After all string fitness values have been calculated, genetic operators are applied to the population, using high-fitness strings as parents, and combining their contents into new children strings. If the function is not too ill-behaved, some of these children will likely have higher fitness (function) values than any of their parents. This process is iterated until a satisfactory result is found. If the function has many maxima or near maxima, several of them will tend to be found, each by a different string. If only one maximum, all strings will tend to converge on that value. If the function is sufficiently complex, a maximum may not be found in a reasonable number of steps, but a value might be found that is good enough.

The genetic operators most often used are reproduction, crossover, and mutation. Reproduction is applied to strings selected on a stochastic basis from the population; selected strings are placed into a mating pool. The selection is weighted to give high fitness strings more opportunities to be included. As the selection is with replacement, high fitness strings will have multiple occurances in the mating pool.

After the mating pool is filled, strings in it are randomly selected for further genetic operations. For crossover, two parents are selected. Then two offspring are created by swapping all characters up to a randomly chosen point. These two offspring replace the parents in the mating pool. After two strings have been operated on, and replaced by their offspring, the offspring are not further manipulated by the crossover operator. The proportion of the mating pool used as parents is a variable that can be manipulated by the experimenter according to the problem at hand.

After crossover is completed, some of the strings in the mating pool are chosen at random for application of the mutation operator. A randomly chosen character in the selected string is changed to another character. The purpose of the mutation operator is to prevent the premature loss of genetic material. Reproduction and crossover can occasionally lose some genetic material (particular characters at particular locations). Mutation, sparingly used, can replace this without turning the entire process into a random walk. Empirical results of GAs show that good results are obtained by mutation rates of about 1 mutation per 1000 bits transferred (Goldberg 1988, page 14).

Examination of a simple example is instructive. Consider the function y=(abs(x-16))**2 on the interval (0,31). Generating 6 five bit strings by coin flips, we start with these strings, which have the corresponding function values:

11001 81

11110 196

10100 16

11011 121

10000 0

00101 25

The average of these function values is 73. There are a number of ways to select strings to be placed into the mating pool. One is to construct a biased roulette wheel, where each string is given an area of the wheel in proportion to its fitness. Spin the wheel once for each string to be placed into the pool, and pick the string thus selected. This can be simulated by adding the fitness functions together to get the size of the simulated wheel. Each string then is given an interval equal to its fitness function. Random numbers are picked between 0 and the size.

Now, to apply the reproduction operator, construct a simulated biased roulette wheel by calculating ranges based on the fitness values for these strings. The ranges will be (0-80, 81-276, 277-292, 293-413, 413-413, 414-438). Picking a random number between 0 and 438 allows us to select one of these strings. Picking 6 random numbers by computer gives us 320, 76, 138, 133, 236, 426. Picking these strings to place into the reproduction pool gives us the following pool:

11011

11001

11110

00101

Now, to apply the crossover operator, we pair the strings at random, and pick a point within the string at random. Pairing (4, 1), (2, 6), and (5, 3) and crossing over at 3, 4, and 1, gives us the following population, for which we calculate the function values:

11111 225

11010 100

11001 81

00101 25

11110 196

The average of these function values is 137. So, in one generation, we have approximately doubled the average function value.

To compute the next step, first construct the roulette wheel. The ranges now will be (0-224, 225-324, 325-405, 406-430, 431-626, and 627-822). Picking by computer 6 random numbers between 0 and 822 gives 98, 774, 486, 63, 268, 208. Picking these strings to place into the reproduction pool gives the following pool:

11111

11110

11111

11010

11111

To construct the third generation, we apply the crossover operator to this pool. Pairing, at random (4, 1), (2, 5), (6, 3) and crossing over at 1, 4, and 2, gives us the following population, for which we calculate the function values:

11111 225

11110 196

11010 100

11111 225

11110 196

The average function value for this population is 194. Skipping the details on the roulette wheel, picking for the mating pool, and crossover, the fourth generation is:

11111 225

The system has converged, in three generations, on the string 11111, which has a function value of 225.

There are several instructive points about this example. One is that it did not converge on the optimal point, which is 00000, for a function value of 256. The primary reason for this is the small number of strings in the population. With only a few strings, there is great danger of losing good genetic material, as shown in this example when 00101 disappeared in the third generation. No other strings had 0s in their left hand positions, and mutation is not fast enough to regenerate them. Another, much smaller problem was shown for the function value 0 for the string 10000 in the first generation. Since the fitness value was 0, there was no way that this string could propagate into the next generation. Fitness functions should be selected so that they are always greater than 0.

Even in this small system, where the number of strings was much too small for practical problems, the GA converged on an answer that was very close to the maximum.

While GA's are very good at searching for a function maximum, they cannot be directly applied to many problems in AI. This requires linking GA's with a performance system, which can accept inputs and produce outputs. The performance system used with GA's is the bucket brigade classifier system. This will be explained in two steps, first simple classifier systems, then those with the bucket brigade added.

C.2. Classifier Systems

Classifier systems (Holland & Reitman, 1978) are basically production systems. They have a memory mechanism, an input mechanism, an output mechanism, and set of rules which relate elements of the first three. All inputs, outputs, and memories are represented as fixed length strings over a fixed alphabet. The memory mechanism is called the message list. It consists of a number of messages, each of which is a string. The input mechanism adds messages to the message list depending on events and states in the outside world. The output mechanism can examine the message list and cause actions in the outside world. The rule set is called a classifier list. Each rule contains one or more conditions which match messages in the message list, and an action, which creates a new message in the message list.

Since all messages are represented as fixed length strings over a fixed alphabet, these elements are completely general, knowing nothing about the content or structure of the environment. The relationship between the strings and the outside world is totally contained in the input mechanisms (detectors) and output mechanisms (effectors).

A classifier system operates by looping through several steps:

(1) The detectors post zero or more messages to the message list.

(2) All messages on the message list are compared to all conditions of all rules.

(3) For each match, fire the rule, which will generate a message on the new message list.

(4) Replace the entire message list with the contents of the new message list.

(5) If any messages are output messages, the effectors will generate an output.

Even though classifier systems can operate with strings over any alphabet, there are some theoretical reasons, which will be discussed later, to use the alphabet (0, 1). This allows complete generality, and makes operation and analysis easier.

The detectors translate the real world into fixed length strings by encoding particular states or events of the environment. For example, property detectors could be used to set a specific bit if the particular property was true, or reset it if it was false. Other detectors could be used, such as determining in which window a mouse click occurred, or which key the user pressed. Once the detectors generate a message, it is added to the message list.

The rules in the classifier list consist of two parts: one or more conditions and an action. Each condition consists of a fixed length string from the alphabet (0, 1, #), with # being interpreted as 'don't care'. Conditions are compared to messages, and match in the case that, where the condition is 0 or 1, the message contains the same value. If there is more than one condition in a rule, each must match some message (probably a different message for each condition) for the rule to be considered matching. A condition may be preceded with a negate sign ("-"), which will cause the condition to be true if there are no messages on the message list that match the remainder of the condition.

The action consists of a fixed length string from the alphabet (0, 1, #) , with # being interpreted as 'pass-through'. When a match is found, a new message is posted to the new message list. The content of the message is specified by the action, with the content of the new message matching the action whenever the action is 0 or 1. Where the action contains #, the corresponding letter of the matching message is used.

Since the entire message list is replaced by the new message list at each cycle, special steps must be taken to regenerate any messages that need to remain, as they would otherwise be lost.

Certain messages are sent to the effectors and are translated to actions there. Only those messages with certain predefined patterns in predefined tag fields are accepted by the effectors. The effectors make whatever translation is necessary to cause actions in the outside world. For example, a certain message could cause a new window of a particular type to be created at a specified place on the screen. Output messages are not just sent to the effectors; they stay on the message list until replaced in the next cycle and may be matched by conditions in other rules just like non-effector messages.

The rules need not be all at one semantic level. Rules at higher levels might recognize a certain situation from the input, and post a message stating what the current goal of the system is. This message would be posted every cycle until the goal was achieved, or the situation changed enough that it was no longer relevant. The presence of this message would cause other rules that would help accomplish this goal to become active. Intermediate level goals are handled in precisely the same fashion.

Now let us consider a simple example. This example will have messages and conditions that are 7 letters long. The detector has 5 bits of input: bit 0: (least significant bit) mouse in window A; bit 1: mouse in window B; bit 2: mouse down; bit 3: window A open, and bit 4: window B open. The mouse button is assumed to be debounced, so that only one 'down' message is posted per mouse click. Bits 5 and 6 of all detector messages will be '00' to indicate that this message came from the input detectors.

The effectors have 4 bits of output: bit 0: open window A; bit 1: close window A; bit 2: open window B; and bit 3: close window B. Bit 4 is not used and is ignored. Bits 5 and 6 of messages sent to the effectors will be '11' to indicate that this message is intended for output.

The system includes the following classifiers with two conditions each:

(a) '000#11#', '#######' / '1100100' which translates to "if mouse down, and mouse in window B, and window B closed, then send 'open window B'".

(b) '001#11#', '#######' / '1101000' which translates to "if mouse down, and mouse in window B, and window B open, then send 'close window B'".

The action of this classifier system will be simply to toggle window B between the open and closed states every time the mouse is clicked inside the window. Note that the position of the window may change at any time; the input detectors must take care of finding it.

Now, let us consider a slightly more complex example. The input detectors have only 3 bits of input: bits 0, 1, and 2 as in the previous example. Bits 3 and 4 will always be zero. Note that the detectors that read whether a window is open or not are not present. The effectors are the same. The classifier system must remember whether the windows are open or closed as the detector no longer supplies that information.

This example includes the following classifiers:

(a) '00##11#','-0100001' / '1100100' A literal translation of this is "if a detector message that has '11' in bits 1 and 2 is present, and if there is not a message that matches '01000001', then output to the effectors '0100'". A free translation is "if the mouse is down in window B, and there is not a message that says that window B is open, then open window B".

(b) '00##11#', '-0100001' / '0100001' Note that the condition part is the same as that of the previous classifier, so this classifier will operate at precisely the same time as the previous. A free translation is "If the mouse is down in window B, and window B is not open, then post a message that says that it is now open." This message, as all messages do, remains on the message list for only one cycle.

(c) '01000##', '#######' / '#######' A literal translation is "if there is a message with '01000' in bits 2..6, and there is any message, then post a message identical to the first". Making a free translation yields "If any window is open, remember that fact". Note that this is necessary because all messages are replaced at each cycle.

This classifier system will open window B the first time that the mouse is clicked inside window B, and will remember that fact forever.

The extension of this example to closing windows involves modifications to the control structure. In the modified control structure, the inputs are allowed to post messages only every other cycle. This is necessary to avoid a race condition.

The following classifier is added to the classifier list:

(d) '00##11#','0100001' / '1101000' A free translation is "If the mouse is down in window B, and window B is open, close it".

Classifier (c) is replaced with the following:

(c) '0100001','-11#1###' / '#######' A free translation is "If there is a message that says that the state of window B is open, and there is not a message directing the effectors to close window B, remember that it is open."

The effector message will be generated only on one cycle, and will not persist, so the classifier system must arrange on that cycle to forget that window B is open. On all other cycles, if the message saying that the window is open is there, it will be remembered.

The action of this classifier system is the same as that of the first example: namely that the state of window B will be toggled between open and closed every time the mouse is clicked inside that window. Now the classifier system is able to remember the current state of window B for itself, rather than requiring the input detectors to keep track of that.

Classifier systems are just rule based production systems, rather simple ones at that. Without some real advantage over other production systems, there would be no reason to bother with them. The advantages come in when adding mechanisms to manipulate the rules; rules of this form are easier to manipulate than LISP predicates or other production rule representations. Adding mechanisms to manipulate the rules turns a simple classifier system into a learning classifier system.

C.3. Bucket Brigade Classifier Systems

A bucket brigade classifier system is composed of a classifier system with the bucket brigade learning mechanism. This learning mechanism takes a reward function as feedback from the external world for each output; positive for desirable outputs, negative for undesirable outputs, zero for others.

The Bucket Brigade algorithm is a mechanism for changing the probability that a rule will be used when its conditions are matched by messages. In this system, each classifier has a strength, which affects the probability of that classifier being executed. A matched classifier bids to be executed, with the bid being the product of a bid constant less than 1 (typically 1/8), its strength, its specificity (proportion of letters that are not "don't care"), and its support (the sum of the strengths of the rules that posted messages that match this rule). The highest bids are stochastically accepted, and the winning classifiers are allowed to post messages.

The bid constant is used to insure that classifiers do not squander their entire strength in a few unsuccessful bids. The strength is used as a predictor of ability to predict correctly. High strength classifiers are those that have been successful at predicting good outcomes in the past. The specificity is used so that classifiers that are highly specific to the particular situation are favored over those that are more general, but have the same strength and support. The specificity increases as the number of "don't care" specifiers decreases. Including this parameter allows default hierarchies to form, as rules that are highly specific to the situation will bid more in relation to the more general rules of the same strength and support. Support is another measure of relevance. Rules that match messages posted by rules that bid a lot tend to be very relevant to the current situation. Support is passed from rule to rule by messages.

If a rule's strength is a measure of its ability to predict correctly, there must be some method of modifying the strength. When a rule makes or supports a correct output, its strength must be increased. If not, if the prediction is incorrect or not rated, the strength must decrease. This is accomplished by passing strength from rule to rule by way of messages. A classifiers strength is reduced by the bid amount when it wins and posts a message. This strength is redistributed to the classifiers that posted the messages that this classifier responded to. So, when a rule posts a message that is consumed by another rule, its strength is to some extent replenished. Rules that post messages that are not consumed by some other rule or by the output reward function lose strength.

The ultimate source of strength to replenish the classifiers is the reward function connected to the effectors. The classifier that produces an effector message that results in a positive reward function has its strength increased by the reward. The next time that the same situation arises, rules that feed the final classifier are rewarded by that final classifiers bid. In this way, strength is eventually redistributed to all the classifiers that help produce useful outcomes (positive reward functions). Classifiers that post messages that do not lead to positive rewards are eventually reduced in strength to the level where they no longer win bids and become inactive.

An example of a classifier system using the bucket brigade is useful here. Consider an classifier system that has the same inputs and outputs as the first example, above, and that uses the following rules:

(a) '00#01#1', '#######' / '1100001', strength 70; which translates to "If the mouse is down in window A and window A is closed, open window A."

(b) '00#011#', '#######' / '1100001', strength 80; which translates to "If the mouse is down in window B and window A is closed, open window A."

(c) '000#1#1', '#######' / '1100100', strength 90; which translates to "If the mouse is down in window A and window B is closed, open window B."

(d) '000#11#', '#######' / '1100100', strength 100; which translates to "If the mouse is down in window B and window B is closed, open window B."

With the bid constant set at .1, and support for input messages arbitrarily set at 5, the bids for the input message '0000110' ("mouse is down inside window B, and both A and B are closed") will be (a) no bid, (b) 28, (c) no bid, (d) 35. Classifier (d) will probably win, posting its message. The strength of d will be reduced by 35 to 65. The output message, "open window B" is the correct output, so the classifier that posted the message will be rewarded, and have its strength increased by 100 to 165. So, when the classifier system produces the correct response, it is rewarded, and the chance of it producing the correct response again for the same input is increased.

Now, if the classifier system is given the input '0000101' ("mouse is down inside window A, and both A and B are closed"), the bids will be (a) 25, (b) no bid, (c) 32, and (d) no bid. Rule (c) will probably win, posting its message. Its strength will be reduced by 32 to 58. The output message "open window B" is not the correct output, so no reward will be given. The next time that the same input message is posted, classifier (a) will probably win, since its strength is now higher than that of classifier (c). If it does win, it will be rewarded for producing the correct output, and its strength will increase.

Classifier (c) will probably not win any more bids, so its strength will not decrease further. It is still available in case the environment changes, and the response to classifier (a) becomes incorrect. After a few trials where (a) produces an incorrect response, and has its strength reduced, (c) will begin to win bids again. If its response is now correct, its strength will increase. Note that if the system were given a negative reward for producing the incorrect output, the result in this case would have been the same, but the strength of classifier (c) would be reduced by more than just the amount it lost making its bid.

Now that we have examined simple classifier systems, and those using the bucket brigade, it's time to combine those with Genetic Algorithms. This combination is called a Learning Classifier System.

C.4. Learning Classifier Systems

Recall that genetic algorithms are a way of finding maxima in function values. Now consider the performance of a bucket brigade classifier system. As the system performs better, it will be rewarded more often. The strengths of the rules that cause it to perform better will increase. This increase can be considered as the function to optimize.

If the rules can be changed, there is a chance that the system of rules will work together better, and produce a higher total reward. This is what we are trying to accomplish by applying the genetic algorithm. The GA changes the population of rules by combining characteristics of the most fit individuals, the higher strength rules.

This requires some small changes in the mechanism of the Genetic Algorithm. There is no longer a requirement that strings be translated into probe values and that the function be evaluated. The strength can be used directly as the function value.

The example problems given above are too simple to benefit from learning new classifiers, but provide material for the following examples of application of the various genetic operators. Consider the classifier '000#11#', '#######' / '1100100', which translates to "If the mouse is down in window B and window B is closed, open window B." If the mutation operator changes the fourth letter, the result might be '000111#', '#######' / '1100100', which translates to "If the mouse is down in window B, and window B is closed and window A is open, open window B". This is a specialization of the more general parent rule. In this case, it is probably not a useful specialization, and will probably not last long in the competition. Note that the parent rule is not replaced by the offspring. Offspring replace low-strength rules which have not fared well in the competition.

Crossover between '000#11#', '#######' / '1100100' and '00#01#1', '#######' / '1100001' between the third and fourth letters gives the following offspring: '000 01#1', '#######' / '1100001' and '00# #11#', '#######' / '1100100' which translate to "If the mouse is down in window A and both A and B are closed, then open window A" and "If the mouse is down in window B, then open window B" respectively. These are plausible rules in the current context, but again will probably not survive long in the competition.

Consider the world outside the LCS as a search space, with the reward function being the function to be optimized. The search space can be very complex, multi-dimensional, and time varying. Searching this space can be done with many different techniques, with greater or lesser success. Much of AI consists of techniques for searching such spaces. Some search strategies (e.g. hill-climbing) are subject to being fooled by local maxima. Others are not computationally tractable for useful problems. LCSs using genetic algorithms result in fast searching through the possible solution space, without many of the problems of other methods. A more complete analysis may be found in Holland 1986

This website's first version is by Ryan Knowles and is maintained by Pat McGee.