Chapter 2: Literature overview
2.3 Neural networks
2.3.4 Training neural networks
Training a neural network is the process in which the network’s weights are adjusted using a chosen training algorithm. The algorithm makes use of certain learning rules to adjust the weights according to the training data provided. The training data usually consists of example patterns representing the specific task the network is being trained to perform. The weights can be updated in an iterative manner to improve the performance of the neural network i.e. the network can be trained again and again. The ability to learn without a specified set of rules makes neural networks superior to conventional expert systems [10].
Training a neural network successfully involves understanding the environment in which it is to operate. A model of this environment can then be used to discover what information is available to the neural network. This model is called a learning paradigm. The different learning paradigms are as follow [7], [10]:
Supervised learning – the network is supplied with example patterns of correct behaviour in the form of input-output pairs and learns to reproduce the relationship present in the data. An explicit teacher is present.
Unsupervised learning – the desired outputs are not available during training and the network learns through self-organizing behaviour using competitive learning rules or a pre-defined cost function. The network learns to extract features or patterns from input data with no feedback from the environment about desired outputs.
Reinforcement learning – a computational agent interacts with its environment and learns correct actions from a reinforcement signal grading its performance. The aim is to discover a policy for selecting actions that minimize an estimated cost function.
There are three major practical issues when learning from example patterns: capacity, sample complexity and computational complexity. First of all, the issue of capacity must be addressed. It concerns determining how many examples can be stored and what exactly a specific network is capable of – what functions it can approximate or decision boundaries it can form [10].
The second issue is sample complexity and considers the number of examples necessary to obtain a network which generalizes well. Computational complexity is the third issue and refers to the time taken to properly train the network [10]. Addressing these issues can be traced back directly to the choices in the neural network design process discussed earlier.
2.3.4.1 Training algorithms
A training algorithm or learning rule is a procedure used to adjust the weights of a network i.e. train the network. The purpose of training is to improve the performance of the network as measured by some performance or error function. This procedure can be applied to one pattern at a time or a whole batch of patterns at once. These two training styles are called incremental and batch training respectively [9].
A multitude of training algorithms abound and an obvious problem is choosing the appropriate one for the task at hand. This multitude of training algorithms is due to the fact that original algorithms have been modified, extended or combined into various different forms to achieve whatever tasks they were intended for. Certain algorithms are appropriate only for certain network types and the choice of algorithm becomes simpler once a network architecture has been chosen [9].
Table 2-2 provides a summary of network architectures and which training algorithms or learning rules are commonly used to train each of them. The summary also shows which purpose each network was initially intended for. As with the multitude of algorithms, network architectures have also been modified over time and resulted in various different forms. For this reason, the following summary has been restricted to the original algorithms and architectures.
Note that BP in the summary indicates the backpropagation algorithm – an extremely popular training algorithm. The second stage of the BP algorithm in which the weights are adjusted can be accomplished using a variety of different optimization schemes including conjugate gradient methods as well as evolutionary programming. BP in the summary refers to all of these variations as well as standard gradient descent. Hagan, Demuth and Beale [9] provided the information for this summary.
Table 2-2: Summary of neural networks
Category Name Algorithm Purpose Behaviour Paradigm
Feed- forward
neural networks
Linear
Perceptron Perceptron rule Pattern recognition Static Supervised Linear layer Widrow-Hoff
rule
Pattern recognition and function approximation
Static Supervised
Adaline Static Supervised
Multi-layer perceptron
BP
Pattern recognition and
function approximation Static Supervised
Wavelet Function approximation Static Supervised
Time delay
Linear with tapped delay
Widrow-Hoff
rule Digital signal processing Dynamic Supervised Focused
time-delay BP Time-series prediction Dynamic Supervised Distributed
time-delay BP Pattern recognition Dynamic Supervised
Radial basis function
neural networks
Generalized regression
Orthogonal least squares / kernel methods
Function approximation Static Supervised
Probabilistic Pattern recognition Static Supervised
Simple radial basis Pattern recognition and
function approximation Static Supervised
Associative neural networks
Hopfield Hebb rule Pattern recognition and
optimization Dynamic Unsupervised Simple recall Outstar
learning Pattern recall Static Unsupervised Simple associative Hebbian
learning Pattern association Static Unsupervised
Recurrent neural networks
Layered-recurrent
BP
Filtering and modelling
applications Dynamic Supervised Nonlinear
autoregressive with exogenous input
Modelling of nonlinear
dynamic systems Dynamic Supervised Real-time recurrent BP Real-time applications Dynamic Supervised
Elman BP Recognition and generation of
spatial and temporal Patterns Dynamic Supervised
Competitive neural networks
Kohonen self- organizing map
Kohonen (instar) learning
Pattern recognition Static Unsupervised Competitive layer Kohonen
learning Pattern recognition Dynamic Unsupervised
Hamming Dynamic Unsupervised
Grossberg Continuous- time instar learning
Clustering Dynamic Unsupervised
Adaptive resonance
theory Instar / outstar
learning Pattern recognition Dynamic Unsupervised Learning vector
quantization
Kohonen (instar) learning
Pattern recognition Static Hybrid