Abstract

This page summarizes the research conducted during the development of Ascension's genetic algorithm and also serves as the user's guide. The genetic algorithm is briefly introduced and the problem of premature convergence is outlined. Many existing and novel techniques are then examined and the results of a large-scale experimental study are presented.

Contents

Introduction
The role of diversity
Examined techniques
Experimental setup
Experimental results
Conclusion
References
API

Introduction

A genetic algorithm is inspired by the process of evolution. It maintains a population of solutions and relies on the principles of heredity, genetic variability and natural selection to improve the solutions over time. New solutions are created by means of sexual reproduction and replace the existing ones. There are two principal forms of GA:

Reproduction consists of two steps: It is important to select the crossover and mutation operators appropriate for the problem, since they largely determine the algorithm performance. Some common crossover operators can be found in Crossovers.pas module.

Ascension employs the steady-state version, as it is far more flexible. It is especially good for experimentation, as various parts of can be modified independently without having to rewrite the whole algorithm. A steady-state iteration consists of the following steps:

  1. Selection: select two individuals from the population.
  2. Reproduction: create a new solution by performing a crossover between the selected solutions and then applying the mutation operator.
  3. Replacement: select a solution to be replaced by the newly created one and decide whether to actually perform the replacement.
Various options are possible during the selection and replacement steps, but the baseline steady-state algorithm is very simple, with a single iteration looking like this:

The role of diversity

Most basic GA examples found in textbooks and tutorials use selection and replacement strategies that ensure the reproduction and survival of the fittest solutions alone. This strategy may seem natural, but in the long term leads to poor performance. Good solutions proliferate exponentially, quickly taking over the entire population. The algorithm has no time for exploration, converging to a single local optimum early on. Maintaining a diverse population is therefore vital for solving complex multimodal problems. Diversity is a precious resource that should not be squandered. But simply maintaining it is not sufficient. It acts as a potential energy and must be "converted" to good solutions.

It is quite easy to modify the GA in order to achieve a near-perfect diversity preservation. But strategies that focus on diversity alone also tend to perform poorly, as they sacrifice depth for breadth. It appears that a god GA variant must satisfy contradictory requirements. The observation that such contradictions arise in many engeneering problems lies at the heart of TRIZ (ТРИЗ, Теория Решения Изобретательских Задач, Theory of Inventive Problem Solving). Although TRIZ was developed for engineering problems specifically, its key idea is very much relevant to the design of efficient metaheuristics: an inventive solution does not compromise the conflicting objectives, but resolves the contradiction dialectically, on a deeper level. Simulated annealing is a great example of an inventive solution: instead of relying on a fixed exploration-exploitation tradeoff, it achieves the conflicting objectives by gradually changing the nature of the search. Tabu search also benefits from this strategy.

TRIZ sheds a new light upon the GA and allows to construct the diversity preservation methods in a systematic manner. A general principle that can be used to resolve the exploration-exploitation contradiction is the division of labor: divide the search into parts specializing in each objective. A variety of GA variants can be constructed by choosing different exploration methods, division types and part interactions. The division can be temporal, spatial or rank-based, further abbreviated as TD, SD and RD. Curiously, SD and TD types even have corresponding TRIZ principles: "3. Local quality" and "15. Dynamics".

Examined techniques

The behavior of many GA variants can be analysed using a very simple model. Suppose that the population consists of two solution types, A and B, B being the better one. Crossover produces an offspring having the type of a random parent and mutation does nothing. This model represents a problem with two local optima and a mutation that can only make small changes, but cannot change the solution's basin of attraction. An important quantity in this model is the takeover time - the mean time until a single type B solution takes over the entire population. Takeover time quantifies GA's exploration ability. Approximate solutions to this model were be obtained by ignoring the finite population size effects (genetic drift), which led to differential equations describing the GA dynamics. Due to the nature of such approximation, the takeover time was defined as the time between B(t) = 1 and A(t) = 1.

Definitions
N Population Size
A Number of solutions of type A
B Number of solutions of type B
ΔB Expected change of B after a single steady state iteration
α A / N
β B / N
τ Takeover time in generations
r Normalized rank, 0 corresponding to the best and 1 to the worst solution.
t Normalized time, 0 corresponding to the start of the algorithm and 1 to the maximum number of fitness function evaluations being reached.
P A selection or replacement parameter

Score-based replacement

Score-based selection

Distance-based selection

Distance-based techniques rely on a distance operator that quantifies the degree of dissimilarity between two solutions. In fact it only needs to satisfy two of the three properties of a true distance - identity and symmetry, but not necessarily the triangle inequality. Calculating the distance can be as expensive as evaluating the fitness function. For this reason, many well known techniques such as niching and clearing that require many distance calculations were not implemented.

Distance-based replacement

A child's contribution to diversity (henceforth called originality) can be measured in two different ways: by its distance to either the most similar parent or to the best individual in population. All presented distance-based replacement methods support both.

It is possible to transition between exploration and exploitation smoothly. Suppose that the probability of using an exploitation strategy is given by tq. Then the effective exploitation time is ∫tq dt = 1 / (1 + q). The number of exploitation events per generation follows a binomial distribution with mean N tq. Near t = 0 it can be approximated by a Poisson distribution, which gives the probability of an exploitation-free generation Exp(-N tq). The effective exploration time can then be defined as ∫ Exp(-N tq) dt ≈ t-1 / q ((1 / q)!) ≈ t-1 / q. A good value of q can be obtained by maximizing the product of the exploration and exploitation times since both are important, which gives q ≈ Ln(N) + 1. This estimate is used many times in this work, except that for the rank-based methods the value of q is sometimes reduced by 1 since the worst parent replacement introduces an additional rank bias.

Topological selection

Methods from this group impose a certain structure upon the population in order to increase the takeover time. This structure (topology) is defined by a graph whose vertices correspond to the individuals. Selection is performed by picking a random vertex and then picking a random edge originating from that vertex. The graph can be weighted, in which case the edge selection probabilities are proportional to their weights. Cellular genetic algorithms [3] belong to this category, with graph being a regular grid. Calculating the takeover times is generally hard, but they can be bounded by the graph diameter. Asymptotics marked with "?" are educated guesses. All the topologies under consideration have a well-defined structure, which makes the explicit graph representation unnecessary. Instead, the selection process can be defined in terms of vertex indices ranging from 0 to N - 1.

Another popular group of techniques also topological in nature are the island genetic algorithms [4]. The population is partitioned into a number of weakly interacting subpopulations ("islands"), which increases the takeover time. The traditional island interaction method is the exchange of the individuals from different islands at regular time intervals ("migration"). Ascension features a number of island-like methods with island interactions being defined implicitly, through the population topology alone. The subpopulations can have a ring or a complete graph topology, called simply "isles" in the latter case. The connections between them can also form a ring or a complete graph, with several options in case of a ring:

This results in eight possible topologies. Two of them (bridged rings and chained rings) are not sufficiently distinct from the ring topology (effectively two rings intertwined, τ = O(N)) and thus were not implemented. The number of subpopulations was set to √N in all cases (N assumed to be a square), a middle ground most distinct from both extremes. This choice also makes the island topologies directly comparable to the torus, which can be viewed as √N linked subpopulations.

Another way to create an island structure is to have a small chance of randomly selecting the parents from distinct subpopulations. This probability was set to 1 / √N so that there is one such event per subpopulation per generation on average. The subpopulations can be selected at random or form a ring. Combined with two possible topologies of the subpopulations themselves, this gives four more options:

Yet another option is to use completely disconnected islands and gradually decrease their number until a single population is reached. In the current implementation the number of islands is initially set to √N and then decreased linearly.

Population topologies

Injection of new solutions

Injection of new solutions into the population can serve two purposes: it provides new genetic material and it creates fitness "holes" that can then be occupied by offspring less fit than the rest of the population.

Non-elitist acceptance

Techniques from this category may accept the child even if it is worse than the solution being replaced, which should prevent the population from getting stuck in a local optimum.

Hybridization with simulated annealing has been reported in the literature. However, this approach requires at least two extra parameters, the initial and the final annealing temperatures. The former can be automatically selected before running the algorithm, but the latter can't. It is possible to achieve a similar effect without the additional parameters by using the quantiles of the population fitness as the acceptance threshold:

Another possibility is to take the originality into acount:

Experimental setup

Comparing multiple GA configurations is not a trivial task. It is quite easy to outperform a baseline GA if the population size is kept constant across the tests. Such a comparison would have been flawed, as the diversity deficit of the faster converging configurations could be compensated by increasing the population size. Given a fixed computational budget, each configuration has its own optimal population size, the less the better its diversity preservation is. Thompson sampling was therefore employed to select the optimal value in each case without a huge computational cost. The tested values are a subset of squares (for topological selection compatibility), with the ratio of two subsequent values never exceeding 4 / 3: 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 225, 289, 361, 441, 576, 729, 961, 1225, 1600, 2116, 2809, 3721, 4900, 6400, 8464. Each configuration was normally tested until the sample size reached 1000, although the underperforming ones were shut down earlier. A number of multimodal problems that could in principle benefit from diversity preservation was selected for testing.

Optima grid

A 2D test function similar to Rastrigin's. Consists of a square grid of local optima plus the global quadratic term.
cx = Cos(τ F x)
cy = Cos(τ F y)
f(x, y) = x2 + y2 + √3 - cx - cy - cx cy

Despite being multimodal, this function does not benefit from diversity preservation. Separability is likely the reason: the global minimum can be found by optimizing it along the x and y axes independently, which greatly simplifies the optimization.

Rotated optima grid

Same as the optima grid, but the coordinates are first rotated by the arctangent of the golden ratio. This transformation destroys the separability and makes this function much more difficult.

Double optima grid

A sum of two optima grids, one rotated by the arctangent of the golden ratio and the other unrotated. Combined with the lack of the global term, this resuts in a complicated moire-like pattern. Parameters identical to the previous two problems.

Double optima grid test function

Fractal

A 2D fractal lanscape given by the superposition of rotated cosine waves with exponentially increasing frequencies and exponentially decaying amplitudes:
f(x, y) = ( (kA - 1) / kA ) ∑i kA-i ( 1 - Cos( τ kFi ( x Cos(i θ) + y Sin(i θ) ) ) ),
where θ equals τ divided by the golden ratio and i varies from 0 to ∞.

The fractal nature of the landscape results in a challenging test function. The difficulty can be controlled by adjusting the fractal dimension. The domain was intentionally shifted to avoid the global optimum being located in the center. This was done to negate the effect of any potential bias of the crossover towards the center.

Fractal test function

Funnel

An important feature of many real problems is the so called globally convex (also called funnel or big valley) fitness landscape: a multitude of local optimal, with the global one located in the "center" (the distance to the global optimum is negatively correlated with the quality of the local optima). This test function was designed to imitate such structure. It is the sum of multiple 1D functions with two local optima:
f(x) = ∑ √1 - (2 | |xi| - 1 / 2|)2 + |xi| / n,
where n is the problem size.

CoreWar

CoreWar is a programming game in which programs written in assembly-like language fight each other in virtual memory. Each program has a number of integer parameters that must be optimized for best combat efficiency. Due to complex parameter interactions this is usually a difficult problem. This test relies on a precalculated 800 x 800 score surface to avoid the computational cost of emulating the battles. The most challenging surface was chosen from several instances. One instance that had very little parameter interactions was found to be particularly easy. This is further evidence of separability being a good difficulty indicator.

CireWar score surface test function

Heilbronn

The problem consists of placing a number of points within a given 2D region so as to maximize the area of the minimal triangle formed by any three points. This problem quickly becomes very difficult as the number of points increases. An anomalous case, with most configurations failing and unconditional acceptance performing the best.

Binary epistasis

Consider the simplest binary minimization problem with interacting variables: f(0, 0) = 0, f(1, 1) = 1, f(0, 1) = f(1, 0) = 2. A more difficult problem can be constructed by concatenating multiple 2-bit blocks and taking the sum of f over them. If the population has converged to the wrong local optimum on a certain block, it would take a long time (quadratic in problem size) for it to get out by a lucky mutation. This is the only test which does not use a fixed computational budget. Instead, the algorithm was executed until the global optimum was found and the NFE was recorded.

Hierarchy

A hierarchic version of the binary epistasis, subject to maximization. The score is calculated by going through consecutive 2-bit blocks and adding S0 in case of 00 and S1 in case of 11. The bit string is then downsampled by replacing 00 with 0, 11 with 1 and the other blocks with any other symbol. Block score values are then multiplied by K and exchanged. The process is repeated until there is only one bit left. K factor ensures that the higher levels have the most impact. Alternation of the block scores makes the problem deceptive: bits optimal at one level are suboptimal at the next one. The impact of diversity preservation on this problem was particularly large.

Minimal linear arrangement

The problem consists of arranging the integers from 1 to n2 on a n by n grid so as to minimize the sum of the absolute differences between the numbers in adjacent squares. The optimal values are given by A195647.

Experimental Results

The results are presented in the form of effect sizes. Specifically, Glass's Δ was used: Δ = ±(μx - μc) / σc, where μx is the mean result of an experimental configuration and μc and σc are the mean and the standard deviation of the baseline algorithm's results. The sign was chosen for each problem so that the positive values correspond to better results. This measure was chosen as it preserves the ranking of the algorithms by mean. In the context of the current study, an effect size of 1 should be considered very large. Suppose that the baseline results are exponentially distributed starting at the global optimum. Then the oracle algorithm always finding this optimum would have Δ = 1. Positive effect sizes with p-values less than 0.01 are highlighted. "S", "R", "A" and "O" prefixes stand for selection, replacement, acceptance and originality. "RPS" stands relative population size and is calculated as the geometric mean of the ratio of the optimal population size of an experimental configuration to thats of the baseline algorithm.

Rating measures an overall performance across all problems calculated using an Elo-type system. The score of one configuration against another on a particular problem is defined as the probability of it achieving a better result in a single run. Normal approximation was used: S(A, B) = NCDF(±(μA - μB) / √σA2 + σB2), where NCDF is the cumulative distribution function of the standard normal distribution. In the Elo model, the expected score is given by the logistic function: ES(A, B) = 1 / (1 + Exp(RB - RA)), where RA and RB are the configuration ratings. The ratings were initialized with zeroes and then iteratively updated using an inverse logistic function: R = Rμ + Ln(Sμ / (1 - Sμ)), whre Sμ is the mean score against other configurations across all problems and Rμ is their mean rating. The results were then normalized by substracting the rating of the baseline configuration.

Configuration Grid RotGrid DoubleGrid Fractal Funnel ScoreSurf Heilbronn MinLA Hierarchy BinEpistasis Minimum Rating RPS
R.WorstParent 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00
R.Worst -0.03 -0.07 -0.06 -0.87 -0.19 -0.01 -0.17 0.00 0.24 -0.01 -0.87 -0.14 2.64
R.RandParent -0.17 -0.21 0.06 -0.01 -0.18 0.06 -0.13 0.09 0.40 0.32 -0.21 0.01 0.65
R.InvRank1 -0.18 -0.12 -0.14 -0.83 -0.44 -0.01 -0.22 -0.07 -0.16 -0.33 -0.83 -0.25 2.36
S.Ring -1.08 0.55 1.73 -1.40 0.70 1.68 -0.50 0.40 1.68 -0.24 -1.40 0.28 0.60
S.Torus -0.18 0.56 1.85 -0.25 0.57 1.04 -0.23 0.22 1.58 0.65 -0.25 0.52 0.91
S.LeakyRing -0.78 0.55 1.75 -1.46 0.66 1.67 -0.43 0.37 1.58 0.22 -1.46 0.32 0.67
S.LeakyRingTD -0.55 0.45 1.70 -1.42 0.84 1.54 -0.55 0.52 4.01 -- -1.42 0.47 0.62
S.Segment -0.53 0.37 1.18 -0.23 0.46 1.05 -0.30 0.29 0.67 -1.10 -1.10 0.21 0.71
S.SegmentTD -0.21 0.14 1.13 -0.72 0.84 1.04 -0.74 0.44 2.04 -- -0.74 0.35 0.50
S.Crescent -0.38 0.40 1.12 0.09 0.46 0.85 -0.24 0.18 0.69 -1.01 -1.01 0.23 0.87
S.VarDegreeTD -0.10 -0.01 1.02 -0.68 0.90 0.81 -0.75 0.42 3.96 -- -0.75 0.39 0.48
S.Halo -0.50 0.36 1.25 -0.21 -0.30 1.02 -0.26 0.30 0.07 -2.12 -2.12 0.05 0.74
S.DirHalo -0.40 -0.08 0.93 -0.19 0.02 0.75 -0.53 0.27 1.86 -1.09 -1.09 0.10 0.54
S.RandNearConRings -0.43 0.61 1.84 -0.98 0.59 1.46 -0.32 0.33 1.34 0.36 -0.98 0.41 0.67
S.RandFarConRings -0.20 0.64 1.85 -0.92 0.46 1.47 -0.21 0.21 0.63 0.39 -0.92 0.38 0.67
S.RandNearConIsles -0.50 0.50 1.38 0.91 0.57 0.78 -0.25 0.37 1.75 0.34 -0.50 0.51 0.95
S.RandFarConIsles -0.23 0.51 1.28 0.86 0.39 0.75 -0.20 0.26 0.92 0.43 -0.23 0.45 1.01
S.ThreadedRings -0.76 0.56 1.74 -1.30 0.26 1.62 -0.40 0.30 0.52 -0.87 -1.30 0.14 0.70
S.InterconRings -0.67 0.58 1.66 -1.13 -1.29 1.55 -0.42 0.12 -0.30 -2.62 -2.62 -0.14 0.65
S.ThreadedIsles -1.13 0.47 1.33 1.00 0.53 1.04 -0.41 0.49 1.16 -0.86 -1.13 0.36 0.78
S.BridgedIsles -1.18 0.45 1.34 0.98 0.59 1.03 -0.40 0.51 1.18 -0.81 -1.18 0.36 0.81
S.ChainedIsles -0.54 0.48 1.32 0.95 0.58 0.80 -0.29 0.30 1.60 0.21 -0.54 0.47 0.85
S.InterconIsles -0.52 0.46 1.32 0.91 0.19 0.92 -0.34 0.32 0.37 -0.42 -0.52 0.29 0.93
S.DisconIslesTD -0.23 0.10 0.89 0.55 0.81 0.50 -0.80 0.66 3.45 -- -0.80 0.48 0.58
S.RankProp1 0.05 0.15 -0.08 -0.18 -0.22 -0.01 0.09 -0.00 -0.23 -0.66 -0.66 -0.09 1.74
S.RankPropNeg1 -0.21 -0.61 0.26 -0.81 0.03 0.04 -0.37 0.13 3.21 1.01 -0.81 0.09 0.60
S.RankPropNeg2 -0.32 -0.84 0.37 -1.31 -0.07 0.04 -0.64 0.10 3.00 0.93 -1.31 -0.04 0.52
S.NearRank -0.28 -1.03 -0.60 -1.14 -0.69 0.11 -0.62 0.15 3.13 0.57 -1.14 -0.17 0.54
S.Dist3 -0.33 -0.84 -0.94 -0.02 0.37 -0.37 0.12 0.31 0.11 1.01 -0.94 -0.02 0.89
S.DistNeg3 -0.09 0.16 0.99 1.22 -1.30 0.24 0.06 -0.06 -0.14 -4.03 -4.03 -0.04 1.21
S.DistToBest3 -1.31 -1.49 -0.94 -0.36 0.37 -0.14 -0.54 0.07 0.52 0.89 -1.49 -0.21 0.86
R.InfluxRare 0.02 0.11 -0.04 0.21 -0.79 -0.02 0.31 0.20 -0.12 0.57 -0.79 0.04 0.54
R.InfluxRD -0.71 -0.11 -0.16 -0.20 -0.98 -0.12 -0.19 -0.44 -0.37 -1.26 -1.26 -0.39 0.85
R.InfluxSD -1.04 -0.48 -0.26 -0.31 -1.05 -0.17 -0.28 -0.48 -0.40 -1.23 -1.23 -0.49 0.76
R.InfluxTD -0.21 0.31 -0.15 -0.13 -0.79 -0.06 0.23 -0.23 -0.44 -- -0.79 -0.14 0.92
A.Threshold -0.08 -0.02 -0.04 -0.13 -0.12 -0.02 0.01 0.10 -0.09 -0.01 -0.13 -0.04 1.19
A.ThresholdSD -0.11 -0.02 -0.11 -0.16 -0.20 -0.02 0.05 0.05 -0.12 -0.18 -0.20 -0.07 0.94
A.ThresholdTD -0.44 0.09 -0.15 -0.88 -0.48 -0.04 0.12 0.06 -0.31 -- -0.88 -0.22 1.42
A.Uncond -3.94 0.16 -1.19 -1.30 -3.71 -0.50 0.42 -1.31 -0.92 -4.21 -4.21 -0.92 0.93
A.DistToParentRD -0.02 0.47 -0.05 -0.48 -0.13 -0.02 0.11 -0.07 -0.07 -0.05 -0.48 -0.02 0.74
A.DistToBestRD -0.84 -0.38 -0.25 -0.71 -0.08 -0.11 0.04 -0.07 -0.09 -0.13 -0.84 -0.25 0.76
A.DistToParentSD -0.49 -0.18 -0.05 -0.34 -0.17 -0.01 0.02 -0.12 -0.08 -0.02 -0.49 -0.15 0.92
A.DistToBestSD -0.92 -0.82 -0.36 -0.31 -0.16 -0.10 -0.07 -0.11 -0.09 -0.05 -0.92 -0.29 0.98
A.DistToParentTD 0.23 0.93 -0.05 -0.63 0.01 -0.00 0.06 0.06 -0.08 -- -0.63 0.08 0.70
A.DistToBestTD -1.68 -0.50 -0.36 -0.60 0.04 -0.22 0.06 0.19 -0.05 -- -1.68 -0.26 0.73
R.InvRank6_UC -1.05 0.05 -0.38 -1.01 -0.52 -0.07 -0.09 0.06 -0.20 -0.36 -1.05 -0.36 1.87
R.InvRankTD_UC -0.61 0.51 -0.37 -1.25 -0.98 0.29 0.13 0.16 -0.41 -- -1.25 -0.27 0.99
R.SimParent -0.54 -0.77 0.16 -1.97 -0.00 0.38 -0.97 -0.23 1.98 0.73 -1.97 -0.22 0.38
R.CenParent -5.07 -4.04 -1.45 -0.68 -4.08 -0.50 -1.61 -0.09 0.88 -0.19 -5.07 -0.97 0.43
R.SimWorseParent -0.11 -0.10 0.00 -0.17 0.03 0.08 0.01 0.13 0.46 1.17 -0.17 0.14 0.90
R.CenWorseParent -0.35 -0.35 -0.62 0.50 0.40 -0.15 0.03 0.21 0.69 1.27 -0.62 0.15 0.83
R.NoveltySim -0.54 -0.80 0.18 -1.99 0.00 0.39 -0.97 -0.19 2.01 0.79 -1.99 -0.20 0.43
R.NoveltyCen -5.19 -3.75 -1.47 -0.67 -1.69 -0.50 -1.36 0.01 1.32 0.72 -5.19 -0.75 0.39
R.CompHardSimRD -0.31 -0.31 0.11 -0.57 -0.02 0.18 -0.36 0.18 2.16 1.11 -0.57 0.13 0.60
R.CompHardCenRD -0.89 -0.89 -0.86 -0.21 0.03 -0.29 -0.57 0.19 1.25 1.12 -0.89 -0.11 0.61
R.CompHardSimSD -0.31 -0.42 0.09 -0.51 0.02 0.16 -0.49 -0.17 0.61 0.38 -0.51 -0.08 0.57
R.CompHardCenSD -1.35 -1.18 -0.99 0.26 -1.48 -0.29 -0.66 -0.02 0.58 -2.15 -2.15 -0.55 0.69
R.CompHardSimTD3 -0.37 -0.31 0.15 -0.26 0.24 0.28 -0.11 0.13 1.40 -- -0.37 0.10 0.63
R.CompHardCenTD3 -1.97 -1.61 -1.28 0.80 -0.16 -0.43 -0.51 0.16 2.86 -- -1.97 -0.24 0.69
R.CompSoftSimRD -0.42 -0.56 0.22 -0.98 -0.04 0.29 -0.53 0.12 2.09 0.96 -0.98 0.01 0.49
R.CompSoftCenRD -1.82 -1.75 -1.41 -0.46 -0.51 -0.39 -1.08 0.09 1.06 0.98 -1.82 -0.45 0.49
R.CompSoftSimSD -0.49 -0.72 0.14 -1.79 -0.04 0.35 -0.76 0.06 2.14 0.78 -1.79 -0.13 0.38
R.CompSoftCenSD -3.14 -2.67 -1.39 -0.62 -0.89 -0.42 -1.33 -0.03 1.00 0.65 -3.14 -0.66 0.51
R.CompSoftSimTD -0.51 -0.62 0.15 -0.70 0.15 0.34 -0.52 0.16 2.43 -- -0.70 0.00 0.42
R.CompSoftCenTD -2.95 -2.66 -1.35 -0.02 -0.44 -0.45 -1.10 0.09 1.19 -- -2.95 -0.56 0.50

Detailed experimental data for each problem is also available.

Conclusion

The experiments demonstrate the effectiveness of diversity preservation, with effect sizes of the best performing configurations normally ranging from medium to large. However, there is no silver bullet as the techniques in question may have a detrimental effect in case of simpler problems.

Worst parent, random parent, worst in population replacement and rank-proportional selection demonstrated similar performance on most problems, with the first two methods coming slightly ahead. This suggests that thre baseline algorithm provides sufficient evolutionary pressure and there is no point in accelerating it. The conventional practice of using a fitness-proportional or a rank-proportional selection is therefore dubious. Inverse rank-proportional replacement was worse than the baseline in every test. This is an interesting comparison, since the worst parent replacement results in an identical distribution of the ranks of the replaced solutions. The discrepancy in results is likely due to the fact that the diversity is better preserved when replacing one of the parents.

Topological selection was the best performing group of techniques. Even the slowest ring graph demonstrated good results. The only underperforming configurations are halo, directed halo, threaded and interconnected rings. Chained isles appears to be the best unweighted island-like topology.

Distance-based replacement group demonstrated modest results at best. Methods that focused on diversity alone (novelty and least original parent replacement) performed poorly. Similar worse parent replacement is a safe choice, having almost no negative impact in the worst case. Compound hard RD and TD are two other decent methods. Of the two originality measures, the parent-centric one was generally a better choice.

Non-elitist acceptance methods did not perform well. Perhaps the lack of temperature parameters is to blame. The only noteworthy configuration is the one using inverse rank TD replacement, improving upon the baseline on 4 problems.

Rare influx was the only decent method from the new solution injection group, normally having a small to medium positive effect. It performed well on the Heilbronn problem where most other configurations failed.

Inverse rank-proportional, near rank and distance-based selection demonstrated mixed results. These methods should be considered situational, capable of greatly improving performance on some problems while failing on others. The results of the positive and the negative distance selection are negatively correlated. Either can be beneficial depending on the problem. A dynamic choice of the optimal selection method during the algorithm execution is a promising avenue for future research.

Hopefully this study has convinced the reader of the importance of diversity. Now let's consider some philosophical implications. The form of diversity that matters in human society the most is the diversity of ideas. The ideas exist at two different levels: inside one's head and within the human population. At each level they are subject to evolutionary processes not too different from the steps of a GA. It should come as no surprise that their diversity is vital for a healthy thinking and a healthy culture. For this reason it can be a good strategy to avoid reading the existing works on a subject before thinking about it independently (that's partly the reason why this study is light on citations). Unfortunately, schools and universities condition the students to only ever consider a single viewpoint (the one they are told by the superiors) by punishing any semblance of critical thinking. Academia is a toxic community hostile to anything not conforming to the established norms. Peer review is the primary anti-diversity mechanism, ensuring that the works that challenge the mainstream consensus never see the light of day. As a species that like to call themselve sentient, we should be ashamed.

References

  1. Marcus Hutter. Fitness Uniform Selection to Preserve Genetic Diversity. Proceedings of the 2002 Congress on Evolutionary Computation.
  2. Kenneth O. Stanley, Risto Miikkulainen. Evolving Neural Networks Through Augmenting Topologies. Evolutionary Computation 10 (2), 2002.
  3. Enrique Alba, Bernabe Dorronsoro. Cellular Genetic Algorithms. Springer US, 2008.
  4. Zbigniew Maciej Skolicki. An Analysis of Island Models in Evolutionary Computation. PhD thesis, 2007.

API

Data types

Routines

// Apply a crossover operator to Parent1 and Parent2 to obtain a Child. Recalc
// indicates whether a score recalculation is strictly necessary.
procedure Crossover(
   var   Child    :  TSolution;
   const Parent1,
         Parent2  :  TSolution;
         Recalc   :  Boolean);
		 
// Apply a mutation operator to the Solution
procedure Mutate(
   var   Solution    :  TSolution);
   
// Return the distance between Solution1 and Solution2
function Distance(
   const Solution1,
         Solution2   :  TSolution
         )           :  TSolutionDistance;