• Tidak ada hasil yang ditemukan

Lecture AI Generalization

N/A
N/A
Wanda Rulita

Academic year: 2024

Membagikan "Lecture AI Generalization"

Copied!
15
0
0

Teks penuh

(1)

Generalization

Generalization concept

ANN model should be trained so that it performs as well in new situations as it does in training phase

Key Strategy by Ockham’s Razor

Generate simplest model that explains well the data.

The more complexity you have on your model, the greater the possibility of errors.

Simple model in ANN

The model with smallest number of free parameters (Weight and biases).

Develop a model with smallest number of neurons, yet still capable of maintaining high accuracy.

(2)

Generalization method

Growing: Start with no neurons in the network and then add neurons until the performance is adequate.

Pruning: Start with large networks, which likely overfit, and then remove neurons one at a time until the performance degrades significantly.

Global searches: Genetic algorithm approach can be used to search the space of all possible network architectures to locate the simplest model that explains the data

Keep the network small by constraining the magnitude of the weights, rather than by constraining the number of network weights.

Regularization Early stopping

(3)

Generalization

Suppose we have input and target pairs

The output can be defined as:

The error of prediction is:

(4)

Overfitting

Overfitting; Poor extrapolation

Blue curve: function g(.)

Open circles: noisy target point

Black curve: trained network response Overfit

The network response exactly matches the training points.

However, It does a very poor job of matching the underlying function

Interpolation error (Overfitting): occurs for input [-3 0]

Extrapolating error (lack of training data): occurs for input [0 3]

(5)

Good Interpolation

Good Interpolation; Poor extrapolation Overfitting; Poor extrapolation

Poor extrapolation can be solved by providing sufficient training data

(6)

Extrapolation error

Extrapolation error can be solved by providing sufficient training data It is not difficult to prepare training data with single input.

For multiple input, we should consider the combination of all input ranges

(7)

Generalization indicator

Training data

Training set

Testing set

 Testing set is not included during training phase Testing set have same range/region with training set

 After the ANN model has been trained, we will compute the errors that the trained network makes on this test set.

 The test set errors indicates how the network will perform in the future; they are a measure of the generalization capability of the network.

Training error

Testing error

(8)

Restricting the magnitude of free parameters

Early stopping

 As training progresses to minimum error, the network uses more of its weights.

 Increasing number of iteration means increasing the complexity of the network.

 If training is stopped before the minimum is reached, the network will effectively be using fewer parameters and will be less likely to overfit.

When should we properly stop the training progress?

(9)

Early stopping

Training data

Training set

Testing set

Training set

Validation set

Training error

Validation error

 Cross validation decides when the training progress should be stopped by evaluating the accuracy on validation set.

 The training set is used to compute gradients and to determine the weight update at each iteration.

 The validation set is an indicator to check the performance of network model. When the error on the validation set goes up for several iterations, the training is stopped, and the weights with minimum error on the validation set are used as the final trained network weights.

70%

15%

15%

(10)

Early stopping

Validation results have better

interpolation when iteration stops at point a

As iteration increases, the training error keeps decreasing, while validation error fluctuates

Error

(11)

Restricting the magnitude of free parameters

Regularization

Tikhonov introduces the concept of regularization

Modify sum squared error

(12)

Regularization

Sum squared error Sum of squares of the weights

Why do we need penalize the sum squared of weights ?

(13)

Regularization

 When the weights are large, the function created by the network can have large slopes (Overfit)

 If we restrict the weights to be small, then the

network function will create a smooth interpolation through the training data.

(14)

Regularization

 Blue line: the true function

 Large open circles: the noisy data.

 Black curve: the network response

 Circles filled with crosses: the network response at the training points.

(15)

Regularization

Bayesian regularization aims to adjust regularization parameter automatically.

The method puts the training of neural networks into a Bayesian statistical framework.

Bayes rule:

Referensi

Dokumen terkait

Dari hasil penelitian yang dilakukan dengan 4 eksperimen percobaan terhadap data set training dan testing membuktikan bahwa klasifikasi topik soal ujian nasional

Facial motion capture gambar annotation Model AAM Sumber data training testing training

Start Data Collection Data Separation for Training and testing Normalization of training and testing Data Collection Algorithm Implementation Fletcher Reeves Network

The second group, in indicator of articulate though and ideas effectively and indicator of use communication for a range of purposes were included in the low category which indicated

In the testing phase, a fraction of the data [17] i.e., the remaining 30% of data, or 14 samples, not utilized in the training phase is used to evaluate the performance of the neural

Aspect Component Condition Result Classification Predictor model Neo4j Learning rate: 0.001 75% training data 25% testing data F1—0.95 Ac—0.98 Processing time Import CSV 100,000

3.2 Data Training and Data Testing The Extreme Gradient Boosting Xgboost model was used in this study, which separates the data into training and testing.. The data training is done

CONCLUSION Based on the results and discussion of data analysis showed the student's mathematical generalization ability of SMP on number patterns were obtained: 1 8 students had high