• Tidak ada hasil yang ditemukan

Machine Learning: Model Evaluation

N/A
N/A
Khairul Huda

Academic year: 2023

Membagikan "Machine Learning: Model Evaluation"

Copied!
25
0
0

Teks penuh

(1)

Program Studi Teknik Informatika

Fakultas Teknik – Universitas Surabaya

Model Evaluation

Week 11

1604C055 - Machine Learning

(2)

Model Evaluation

• Model evaluation is a process to measure the performance a classification model.

• Aims to estimate the generalization performance of a model on future (unseen/new/out-of-sample) data.

• Model evaluation is required to build a robust model.

(3)

How to evaluate the Classifier’s Generalization Performance?

Assume that we test a binary classifier on some test set, and we derive at the end the following confusion matrix:

Predicted class

Actual class

Pos Neg

Pos TP FN

Neg FP TN

P

N

(4)

Metrics for Classifier’s Evaluation

• Accuracy =

• Error = = Accuracy

• Precision =

• Recall/ Sensitivity/TP rate =

• FP Rate =

• Specificity = = FP Rate

• F1 score =

(5)

Metrics for Classifier’s Evaluation

• For multiclass classification

– Accuracy

– Precision and recall for a certain class can be calculated by assuming that the class is positive class and the remaining classes are negative class

(6)

How to Estimate the Metrics?

• We can use:

– Training data;

– Independent test data;

– Hold-out method;

k-fold cross-validation method;

– Leave-one-out method;

– Bootstrap method;

– And many more…

(7)

Estimation with Training Data

• The accuracy/error estimates on the training data are not good indicators of performance on future data.

– Q: Why?

– A: Because new data will probably not be exactly the same as the training data!

• The accuracy/error estimates on the training data measure the degree of classifier’s overfitting.

Training set

Classifier

Training set

(8)

Estimation with Independent Test Data

• Estimation with independent test data is used when we have plenty of data and there is a natural way to forming training and test data.

• For example: Quinlan in 1987 reported experiments in a medical domain for which the classifiers were trained on data from 1985 and tested on data from 1986.

Training set

Classifier

Test set

(9)

Hold-out Method

• The hold-out method splits the data into training data and test data. (e.g., 60:40, 2/3:1/3, 70:30)

• Then we build a classifier using the train data and test it using the test data.

• The hold-out method is usually used when we have thousands of instances, including several hundred instances from each class.

Training set

Classifier

Test set Data

(10)

Train, Validation, Test Split

Data

Predictions

Y N

Results Known

Training set

Validation set

++ -- +

Classifier Builder Evaluate

+ - + -

Classifier Final Test Set

+ - + -

Final Evaluation

Model Builder

The test data can’t be used for parameter tuning!

(11)

Making the Most of the Data

• Once evaluation is complete, all the data can be used to build the final classifier.

• Generally, the larger the training data the better the classifier (but returns diminish).

• The larger the test data the more accurate the error estimate.

(12)

Stratification

• The holdout method reserves a certain amount for testing and uses the remainder for training.

– Example: one third for testing, the rest for training.

• For “unbalanced” datasets, samples might not be representative.

– Few or none instances of some classes.

• Stratified sample: advanced version of balancing the data.

– Make sure that each class is represented with approximately equal proportions in both subsets.

(13)

Repeated Holdout Method

• Holdout estimate can be made more reliable by repeating the process with different subsamples.

– In each iteration, a certain proportion is randomly selected for training (possibly with stratification).

– The error rates on the different iterations are averaged to yield an overall error rate.

• This is called the repeated holdout method.

(14)

Repeated Holdout Method

• Still not optimum: the different test sets overlap, but we would like all our instances from the data to be tested at least ones.

• Can we prevent overlapping?

witten & eibe

(15)

k-Fold Cross-Validation

• k-fold cross-validation avoids overlapping test sets:

– First step: data is split into k subsets of equal size;

– Second step: each subset in turn is used for testing and the remainder for training.

• The subsets are stratified

• before the cross-validation.

• The estimates are averaged to

• yield an overall estimate.

Classifier

Data

train train test train test train

test train train

(16)

More on Cross-Validation

• Standard method for evaluation: stratified 10-fold cross-validation.

• Why 10? Extensive experiments have shown that this is the best choice to get an accurate estimate.

• Stratification reduces the estimate’s variance.

• Even better: repeated stratified cross-validation:

– E.g. ten-fold cross-validation is repeated ten times and results are averaged (reduces the variance).

(17)

Leave-One-Out Cross-Validation

• Leave-One-Out is a particular form of cross-validation:

– Set number of folds to number of training instances;

– I.e., for n training instances, build classifier n times.

• Makes best use of the data.

• Involves no random sub-sampling.

• Very computationally expensive.

• A disadvantage of Leave-One-Out-CV is that stratification is not possible:

– It guarantees a non-stratified sample because there is only one instance in the test set!

(18)

Model Evaluation in Python:

Train Test Split

(19)

Model Evaluation in Python:

Confusion Matrix

(20)

Model Evaluation in Python:

Graphical Confusion Matrix

(21)

Model Evaluation in Python:

Evaluation Metrics

(22)

Model Evaluation in Python: Train, Validation, Test

Split

(23)

Model Evaluation in Python:

Stratified Repeated Hold Out

(24)

Model Evaluation in Python: k-Fold Cross-Validation

(25)

Model Evaluation in Python:

Leave-One-Out Cross-Validation

Referensi

Dokumen terkait

K-Fold Cross Validation adalah metode dimana di lakukannyanya pembagian semua dokumen ke dalam sekumpulan dataset, yang dimana nantinya, setiap dataset yang telah

Penerapan K-Fold Cross Validation HASIL DAN PEMBAHASAN Seperti yang telah dipaparkan sebelumnya bahwa tahapan yang dilakukan pada penelitian ini adalah dengan melakukan pembagian data

Prosedur pengujian yang dilakukan yaitu cross validation 5 fold, supplied data test, dan percentage split 70% data training dan 30% data testing dengan melakukan

31 พยากรณ์ผู้เข้าศึกษาโดยผ่านเครือข่ายอินเทอร์เน็ต มีวิธีการสร้างตัวแบบที่แตกต่างกัน 3 วิธี คือ วิธีการตรวจสอบไขว้ k-Fold Cross Validation วิธีการแบ่งข้อมูลแบบสุ่มด้วยการแบ่งร้อยละ

Table 4 Time efficiency for Kidney Function Test Algorithm Time taken in seconds Random Forest under bagging method 0.08 10-fold- cross validation in C4.5 0.06 Hybrid Method-

Cross Validation - Choose different subsets as test data - Repeatedly build models 10 - fold crossvalidation - 902 tram , 10% test = covers whole data across 10 iteration If

Komparasi Algoritma Algoritma Validasi RMSE Linear Regression Cross Validation 248 SVM Regression Cross Validation 329 Decision Tree Regressor Cross Validation 57 Neural Network

The results of performance calculations for the K-Nearest Neighbor, Decision Tree, Random Forest, and Naïve Bayes algorithms using data validation, namely 10 k-fold and performance