Roughly speaking NFL says ``the average performance of [search]
algorithms across all possible problems is identical''
[Wolpert and
Macready1997, page 67,].
There are only 256 () three input Boolean functions
however it is not practical to use them to experimentally confirm any
of the NFL theorems.
This is because the number of fitness functions (i.e. problems) is too big.
In GP we consider only a small number of fitness functions, typically only one. The three input boolean functions are unusual in that 256 fitness functions have been investigated. However this is still a miniscule proportion of all the possible fitness functions and so NFL does not apply. Indeed the performance of three search algorithms (GP [Koza1992], hill climbing [Lang1995] and uniform random search) are not identical when averaged across all 256 three input Boolean functions.
Conventionally the fitness of a Boolean function of n inputs
is defined as the number of times its output matches that required
when tested upon all possible combinations of inputs.
The total number of test patterns is .
Therefore the fitness function yields a value in
and the total number of possible fitness functions is
where |x| is the size of the search space.
If we consider searching in the space of trees then |x| is already
very big and so considering all
fitness functions is infeasible.
(A program to calculate |x|
is available from ftp node ftp.mad-scientist.com directory pub/genetic-programming/code sub-directory gp-code in file ntrees.cc and via my home page).
If instead of considering searching the space of trees,
we consider search in the space of the Boolean functions
themselves
then there are only
possible fitness
functions.
In the case of n=3 this is
.
Clearly comparing the performance of two or more search algorithms
on this number of problems is still infeasible.
If
we limit ourselves to n=2,
i.e. the 16 () Boolean functions of 2 bits,
there are only
possible
fitness functions.
While it might be feasible to evaluate the performance of various
search algorithms on them and so
experimentally confirm NFL,
such a demonstration is unlikely to be persuasive
due to the small size of the search space.
The small number of points in the search space
makes it likely that stochastic search techniques such as GAs
will resample one or
more points.
NFL explicitly requires the algorithms not to do so.
Any measured difference in performance between algorithms is likely to
be due to this and stochastic noise inherent in randomised search
techniques.