The signal represents approximately 1 s of data. The sample rate for the data is 51200 Hz, and the window size for STFT was 0.01 s. The data consists of two classes: normal and abnormal. The ratio of normal to abnormal is 95:5, which can be considered unbalanced data. Figure 14 shows the result of signal to image conversion using STFT.
25
Figure 14 The sound signal and STFT
In this study, in order to verify the performance of the classification model, we have conducted a comparison between the performances of a support vector machine (SVM), tree model, and NN, which are widely used as classification models. Train data and test data are selected at a random ratio of 3:1, respectively. We used the STFT image as input for all models and the size of the input image was
256 218
.Over-sampling
Speaker production data are used in the experiment; however, there is an imbalance problem between OK and NG in this data. This can be a major problem for the optimization function of the model [47].
We oversampled the NG data to overcome the data imbalance issue [48]. Through repetition of the NG data, we ensure that the NG data contributes multiple times to the loss function. For the test data, the NG data is not repeated.
Performance Comparison Support Vector Machine
The SVM is a classification algorithm suitable for classifying high-dimensional nonlinear data. In this study, we used STFT images. We used the Radial basis function (RBF) kernel as the SVM kernel, and set the parameter
to five for the margin. The left table in Table 2 shows the confusion matrix of the SVM where the input is the signal type. The overall accuracy is 94.43%. For the SVM model, the classification results are always OK regardless of the input. This is the biggest problem of unbalanced data; the model is not able to reflect small portions of the data. In case of STFT images, the accuracy of the SVM is 93.35% and the confusion matrix for each action is shown on the right side of Table 2.The overall accuracy of the SVM is as high as 93.35, but the confusion matrix indicates that the accuracy of NG is low, whereas the accuracy of OK is high.
26
Table 2: Confusion matrix of SVM
Signal STFT image
Model OK Model NG Model OK Model NG
True
OK 1.0 0.0 True
OK 0.97 0.03
True
NG 1.0 0.0 True
NG 0.7 0.3
Neural Network
The NN is the most basic model of the various deep learning models. The input data is the same as the input data for SVM. The NN used in this study consists of two hidden layers, and each layer has 100 nodes. The ReLU function is used as an activation function. The left side of Table 3 shows the result of the NN where the input is the signal type. The overall accuracy is 48.42%, and it can be inferred that the NN is not properly optimized. It can be seen that that the reason is that the complexity of the NN that we propose at the input of the signal form is insufficient to find the pattern. The right side of Table 3 shows the result of putting the input data into the STFT image. In this case, the NN is able to identify the pattern correctly. The accuracy is 95.84% in this case.
Table 3: Confusion matrix of NN (signal)
Signal STFT image
Model OK Model NG Model OK Model NG
True
OK 0.48 0.52 True
OK 0.97 0.03
True
NG 0.4 0.6 True
NG 0.42 0.58
27
Random ForestThe random forest is an ensemble method for arbitrarily learning a plurality of decision trees. By using several models, it is possible to increase performance and reduce overfitting issues. The random forest used in this study consists of 20 basic trees, and the maximum depth of each tree is 10. The random forest also uses input in the form of signal and STFT images. The accuracy was 94.85% when the input signal was 96.01% when STFT image. Although the accuracy of the signal is better, the confusion matrix shows that both cases still show imbalance problems.
Table 4: Confusion Matrix of Random Forest
Signal STFT image
Model OK Model NG Model OK Model NG
True
OK 0.99 0.01 True
OK 0.99 0.01
True
NG 0.4 0.6 True
NG 0.57 0.43
Convolutional Neural Network
We used the architecture described in Figure 13. The CNN is specialized for analyzing image-type inputs, so only a case where the input was given as an image was considered. The accuracy is the highest at 97.42% and the NG accuracy in the confusion matrix is high. The accuracy of OK is slightly less than that of random forest, but the accuracy of NG is definitely higher. It is more important not to pass defective products on to the customers. Therefore, it is important to catch NG products more clearly in this problem.
Table 5: Confusion Matrix of CNN Model OK Model NG True
OK 0.98 0.02
True
NG 0.20 0.80
28
Table 6 compares the overall accuracy and NG accuracy of each methodology. CNN shows high accuracy performance. For the accuracy in the classification of NG data, the other model is close to 50%, whereas the CNN shows almost 80%.
Table 6: Performance Comparison by Methodology
SVM Neural
Network
Random
Forest CNN
NG
accuracy 30% 58% 43% 80%
Overall
accuracy 93.35% 95.84% 94.85% 97.42%
Class Activation Map
The Class Activation Map (CAM) is the additional layer used for the interpretation of the classification results. The CAM gives high scores to areas that the model focuses on. Therefore, we can interpret the model response to NG by comparing these scores with the original image.
Figure 15 is an example of the CAM results on the input image. A high score is given to parts of the map where the model estimates it to be most likely NG. In the case of NG data, it can be seen that the areas with problems are highlighted in red. Using this information, we can find unexpected important features to distinguish NG and OK.
Figure 15 Class Activation Map for OK and NG
29
In the NG example in Figure 16, it can be observed that that the deep learning model is capable of finding the NG that occurred in various parts of the image. In particular, the STFT image used as an input image in this study contains time information on the x-axis and frequency information on the y- axis. Thus, it is easy to see which time and frequency are important when it comes to classify the NG.
Figure 16 Example of an NG signal