Project Implementation
5.1 System Performance
After setting up and training the model, the system performance for implemented system was evaluated. The normalized testing data was used to calculate the accuracy for the system implemented. Accuracy is the fraction of predictions the classifier model got right and measure the algorithm’s performance in an interpretable way. The predicted test data label using the trained model was compared to the original test data label to calculate the accuracy score. The training and classification process was done on both Visual Studio 2019 and Jupyter Notebook with C++ and Python programming language as shown in Figure 5.1.1 and Figure 5.1.2. Jupyter Notebook was used to visualize our system performance for evaluation using Python.
Figure 5.1.1 Training and Classification Process on Visual Studio 2019 using C++
Figure 5.1.2 Training and Classification Process on Jupyter Notebook using Python The accuracy score value for the test data was 0.95 which represents 95% of accuracy.
This proved that the implemented system has achieved the system performance definition which was the accuracy of the system should be higher than 90%. Confusion matrix was also used to evaluate our system performance with the testing data. It is a matrix with the inputs versus the prediction results. Figure 5.1.3 below is the confusion matrix generated using our testing data and it was labelled with the classes which are fighting and normal.
43 Bachelor of Computer Science (Honours)
Faculty of Information and Communication Technology (Kampar Campus), UTAR Figure 5.1.3 Confusion Matrix Visualization There are four ways to check if the predictions are right or wrong.
1. True Positive - the case was positive and predicted positive.
2. True Negative - the case was negative and predicted negative.
3. False Positive (Type 1 Error) - the case was negative but predicted positive.
4. False Negative (Type 2 Error) - the case was positive but predicted negative.
From the confusion matrix in Figure 5.1, the fighting case represents the Positive class and the normal case represents the Negative class. We can observe that the classifier model has a very high accuracy and very less false predictions. There is only one result of fighting case misclassified as normal case. This misclassified result is one of the False Negative case (Type 2 Error). A classification report showing the classification metrics which is related to the confusion matrix was generated. Table 5.1 below is the classification report of the testing data.
Table 5.1 Classification Metrics Report
44 Bachelor of Computer Science (Honours)
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Precision refers to the accuracy with which positive predictions are made. The ratio of true positives to the sum of true positives and false positives for each class is used to compute it. The percentage of positives accurately detected is known as recall. The ratio of true positives to the sum of true positives and false negatives for each class is used to compute it. The F1 score is a weighted harmonic mean of precision and recall, with 1.0 being the best and 0.0 being the worst. The amount of actual occurrences of the fighting and normal classes in the dataset is known as support.
From the report, 0 represents the fighting class and 1 represents the normal class. The fighting class has slightly lower precision score compared to normal class since it was observed there is a misclassified result for the fighting class from the confusion matrix.
Other than that, we can observe that the precision, recall and f1-score values of both fighting and normal classes have a very high value. This have proved the validity of the implemented system. It has a high accuracy of the positive prediction and high ratio of positive samples that are correctly detected and a high F1 score.
Next, AUC-ROC curve was generated to evaluate the system performance. AUC represents the degree or measure of separability, whereas ROC is a probability curve.
It indicates how well the model can distinguish between classes. The AUC indicates how well the model predicts fighting classes as fighting case and normal classes as normal case. The higher the AUC, the better the model is at distinguishing between fighting and normal events. An excellent model has an AUC close to 1, indicating that it has a high level of separability. AUC approaching 0 indicates a poor model, which means it has the lowest measure of separability.
The relative operating characteristic curve (ROC) was plotted using the True Positive Rate (TPR) and False Positive Rate (FPR) and shown in Figure 5.1.4 below. Area under curve(AUC) was also calculated. From the ROC curve, we can observe that the orange line is very close to the conceptual optimal location with the recall value of 1 and the area under curve has value of 0.95 which indicates that our system has a large recall and low false positive rate (FPR).
45 Bachelor of Computer Science (Honours)
Faculty of Information and Communication Technology (Kampar Campus), UTAR Figure 5.1.4 AUC-ROC Curve
The learning curve for the SVC model using the training and validation set were plotted and shown in Figure 5.1.5 below to show the relationship between training score and test score and evaluate the underfitting or overfitting of the model. Underfitting happens when the model fails to sufficiently learn the problem and performs poorly on a training dataset and does not perform well on testing data. Overfitting happens when a model learns the training dataset too well, performing well on the training dataset but does not perform well on testing data. A good fit model will suitably learn the training dataset and generalizes well to the testing data.
Figure 5.1.5 Learning Curve
46 Bachelor of Computer Science (Honours)
Faculty of Information and Communication Technology (Kampar Campus), UTAR
In Figure 5.1.5 above, the training set's accuracy score is labelled "Training Score,"
whereas the testing set's accuracy score is labelled "Cross-Validation Score." The SVC model's training score is substantially higher than the test score until 90 training instances. If the current dataset has fewer than 90 training examples, increasing the number of training instances will improve generalization. However, after 90 training instances, the model is unlikely to benefit significantly from adding more training data.
The graph plotted showed that our trained model has a good fit since the gap between the accuracy score for training and testing set are not that big.