Lecture AI Generalization

(1)

Generalization

Generalization concept

ANN model should be trained so that it performs as well in new situations as it does in training phase

Key Strategy by Ockham’s Razor

Generate simplest model that explains well the data.

The more complexity you have on your model, the greater the possibility of errors.

Simple model in ANN

The model with smallest number of free parameters (Weight and biases).

Develop a model with smallest number of neurons, yet still capable of maintaining high accuracy.

(2)

Generalization method

Growing: Start with no neurons in the network and then add neurons until the performance is adequate.

Pruning: Start with large networks, which likely overfit, and then remove neurons one at a time until the performance degrades significantly.

Global searches: Genetic algorithm approach can be used to search the space of all possible network architectures to locate the simplest model that explains the data

Keep the network small by constraining the magnitude of the weights, rather than by constraining the number of network weights.

Regularization Early stopping

(3)

Generalization

Suppose we have input and target pairs

The output can be defined as:

The error of prediction is:

(4)

Overfitting

Overfitting; Poor extrapolation

Blue curve: function g(.)

Open circles: noisy target point

Black curve: trained network response Overfit

The network response exactly matches the training points.

However, It does a very poor job of matching the underlying function

Interpolation error (Overfitting): occurs for input [-3 0]

Extrapolating error (lack of training data): occurs for input [0 3]

(5)

Good Interpolation

Good Interpolation; Poor extrapolation Overfitting; Poor extrapolation

Poor extrapolation can be solved by providing sufficient training data

(6)

Extrapolation error

Extrapolation error can be solved by providing sufficient training data It is not difficult to prepare training data with single input.

For multiple input, we should consider the combination of all input ranges

(7)

Generalization indicator

Training data

Training set

Testing set

 Testing set is not included during training phase Testing set have same range/region with training set

 After the ANN model has been trained, we will compute the errors that the trained network makes on this test set.

 The test set errors indicates how the network will perform in the future; they are a measure of the generalization capability of the network.

Training error

Testing error

(8)

Restricting the magnitude of free parameters

Early stopping

 As training progresses to minimum error, the network uses more of its weights.

 Increasing number of iteration means increasing the complexity of the network.

 If training is stopped before the minimum is reached, the network will effectively be using fewer parameters and will be less likely to overfit.

When should we properly stop the training progress?

(9)

Early stopping

Training data

Training set

Testing set

Training set

Validation set

Training error

Validation error

 Cross validation decides when the training progress should be stopped by evaluating the accuracy on validation set.

 The training set is used to compute gradients and to determine the weight update at each iteration.

 The validation set is an indicator to check the performance of network model. When the error on the validation set goes up for several iterations, the training is stopped, and the weights with minimum error on the validation set are used as the final trained network weights.

70%

15%

(10)

Early stopping

Validation results have better

interpolation when iteration stops at point a

As iteration increases, the training error keeps decreasing, while validation error fluctuates

Error

(11)

Restricting the magnitude of free parameters

Regularization

Tikhonov introduces the concept of regularization

Modify sum squared error

(12)

Regularization

Sum squared error Sum of squares of the weights

Why do we need penalize the sum squared of weights ?

(13)

Regularization

 When the weights are large, the function created by the network can have large slopes (Overfit)

 If we restrict the weights to be small, then the

network function will create a smooth interpolation through the training data.

(14)

Regularization

 Blue line: the true function

 Large open circles: the noisy data.

 Black curve: the network response

 Circles filled with crosses: the network response at the training points.

(15)

Regularization

Bayesian regularization aims to adjust regularization parameter automatically.

The method puts the training of neural networks into a Bayesian statistical framework.

Bayes rule: