The ERA method for training neural networks

The ERA method for training neural networks

The most commonly used methods for training neural networks are well-known to have difficulties with local minima (trapping in sub-optimal solutions); these can be demonstrated to occur even for simple problems such as XOR that require a network with only a few neurons. Worse, the more efficient the algorithm -- moving for example from error backpropagation to conjugate gradient descent -- the more likely the algorithm is to be trapped in a local minimum. This was clearly demonstrated in the thesis work of Dr Adrian J Shepherd and in subsequent investigations by Gorse, Shepherd and Prof John G Taylor (then of Dept of Mathematics, KCL).

It is commonly believed that the only generally applicable neural network training techniques guaranteed to converge to a global minimum with a probability approaching 1.0 are stochastic in character (tending to be time-consuming and unreliable unless carefully applied) with methods based on simulated annealing and genetic algorithms being amongst the most popular. But this belief could be mistaken. A classical global minimisation method, TRUST, was presented by Cetin, Barhen at al. in the early 1990s, which succeeded in avoiding local minima by 'tunnelling' through unfavourable regions of the neural network error surface. TRUST is however somewhat complex to apply, and utilises concepts from dynamical systems theory that may be unfamiliar to many workers in neural computing.

Expanded Range Approximation (ERA) was developed by Gorse, Shepherd and Taylor in the mid-1990s as an alternative to more complex classical global minimisation methods such as TRUST, and is guaranteed to succeed in avoiding local minima in a large class of neural network training problems. ERA is not tied to an particular training algorithm and can be used with methods such as conjugate gradient descent and the quasi-Newton method as well as the error backpropagation method that is more familiar to most people. The method works by modifying the range of the training target values by compressing them down to their mean value, and then progressively expanding these compressed targets back toward their original values. Simulation work with the ERA algorithm demonstrated it could be very successful in helping avoid local minima during training, and analytical work indicated that intractable cases (where the global minimum is lost as the range of the targets is expanded) are likely to be relatively rare.