• Tidak ada hasil yang ditemukan

42

The bidirectional RNN model exhibited better performance than the other models. However, the individual classification performance is shown in Table 8, and it can be seen that the results obtained prior to data augmentation show that the accuracy was 100% for some people, whereas the accuracy was 73% for other people. Hence, in order to apply the model to real-world environments, it is essential to produce more robust results. The cause of the difference in performance is the physical discrepancy among people. We have dealt with such issues by applying preprocessing and data augmentation techniques, and the person with the lowest accuracy had an increase from 73%–87% after these steps were applied.

Figure 27 (a) shows the classification performance of the bidirectional RNN without data augmentation. The accuracy of the entire motion classification is 89.23%. (b) is the performance of the bidirectional RNN after data augmentation, and the overall accuracy is 96.35%. It can be seen that the performance is higher when the data augmentation is not performed.

Table 9: Performance Comparison by Methodology

SVM

Neural Network RNN LSTM RNN Bidirectional RNN

non augment augment non augment augment non augment augment non

augment augment

Accuracy 77.32% 73.46% 79.40% 35.72% 35.12% 86.89% 93.42% 89.23% 96.35%

Table 9 compares the overall accuracy of each methodology. Despite the fact that the standard RNN is a deep learning model, it exhibited the lowest performance due to structural problems, whereas the LSTM RNN and the bidirectional RNN showed remarkable performances. Data augmentation is executed during the learning stage of the deep learning model. Therefore, we compared the pre- and post-augmentation results of the NN, RNN, LSTM RNN and bidirectional RNN, respectively. For the RNN results, the performance was bad regardless of whether the data augmentation was applied or not, due to the vanishing gradient problem. The performance of the other models has been improved.

43

performance fluctuations between different people have been found throughout the models. The reason for such performance fluctuation can be implied as the physical characteristics of each individual.

Data augmentation was applied to solve such issues, and the results showed that the performance fluctuation has been reduced. Through data augmentation, the performance of LSTM RNN was increased from 86.39%–93.42%, and the performance of the bidirectional RNN was increased from 89.23%–96.35%. For a normal RNN case, the performance tends to be poor due to the fact that vanishing gradient issues cause the model to become unable to train properly. In conclusion, the bidirectional RNN showed better performance compared to other models, and the performance fluctuation could be reduced through data augmentation.

44

6 CONCLUSION AND FUTURE RESEARCH

Deep learning models automatically extract features, but determining how to input the data to the model is still an important issue. In this study, we have used a rich representation of the frequency and time from a sound signal that was obtained by applying STFT. The results of the STFT-applied sound signals were used as inputs for the CNN model. As a result, we were able to improve the classification accuracy and reduce false positive rates.

However, the biggest disadvantage of deep learning is that it is a black box model that people cannot interpret. Adding a CAM layer helps resolve this reliability issue. We can use the results of the CAM to determine which parts are important features, and we can determine where these features can be used as prior information when creating a new classification model for similar systems.

In future studies, it is necessary to propose a model that is capable of reflecting the feature information acquired through similar previous models. In addition, it is necessary to create a model that can find the cause of the problem from the part where the model is concentrated to prevent recurrence.

For the motion recognition section, the bidirectional RNN was applied for analysis of these motions.

The results have shown that the bidirectional RNN has better performance compared to the other models.

However, the diversity of the physical characteristics induces performance fluctuations between different people. Data augmentation was applied to solve such issues. As a result, we were able to improve the overall classification performance and reduce the performance fluctuation between each individual.

For future studies, we will carry out practical studies to monitor the workers. First, we will quantify the work efficiency in order to calculate the cycle time. Second, we will develop an abnormal action detection model. This model is necessary to improve the production efficiency and protect the physical health of the workers.

45

7 REFERENCES

[1] Kurada, S., & Bradley, C. (1997). A review of machine vision sensors for tool condition monitoring. Computers in industry, 34(1), 55-72.

[2] Austerlitz, H. (2002). Data acquisition techniques using PCs. Academic press.

[3] Davies, C., & Greenough, R. M. (2000, October). The use of information systems in fault diagnosis. In Advances in Manufacturing Technology-conference (Vol. 14, pp. 383-388).

Taylor & Francis LTD.

[4] Safizadeh, M. S., Lakis, A. A., & Thomas, M. (2005). Using short-time fourier transform in machinery diagnosis. Proceedings of WSEAS (Brazil), 494-200.

[5] Wang, W. J., & McFadden, P. D. (1993). Early detection of gear failure by vibration analysis i. calculation of the time-frequency distribution. Mechanical Systems and Signal Processing, 7(3), 193-203.

[6] Rusli, M. (2016). Application of Short Time Fourier Transform and Wavelet Transform for Sound Source Localization Using Single Moving Microphone in Machine Condition Monitoring. KnE Engineering, 1(1).

[7] Shahidi, P., Maraini, D., & Hopkins, B. Railcar Diagnostics Using Minimal-Redundancy Maximum-Relevance Feature Selection and Support Vector Machine Classification.

[8] Guo, M., Xie, L., Wang, S. Q., & Zhang, J. M. (2003, October). Research on an integrated ICA-SVM based framework for fault diagnosis. In Systems, Man and Cybernetics, 2003.

IEEE International Conference on (Vol. 3, pp. 2710-2715). IEEE.

[9] Ying, J., Kirubarajan, T., Pattipati, K. R., & Patterson-Hine, A. (2000). A hidden Markov model-based algorithm for fault diagnosis with partial and imperfect tests. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 30(4), 463-473.

[10] Jha, D. K., Srivastav, A., & Ray, A. (2016). Temporal Learning in Video Data Using Deep Learning and Gaussian Processes. In Workshop on Machine Learning for Prognostics and Health Managament at.

[11] Chen, Z., Li, C., & Sanchez, R. V. (2015). Gearbox fault identification and classification with convolutional neural networks. Shock and Vibration, 2015.

[12] Jeong, H., Park, S., Woo, S., & Lee, S. (2016). Rotating Machinery Diagnostics Using Deep Learning on Orbit Plot Images. Procedia Manufacturing, 5, 1107-1118.

[13] Zhao, R., Wang, J., Yan, R., & Mao, K. (2016, November). Machine health monitoring with LSTM networks. In Sensing Technology (ICST), 2016 10th International Conference on (pp. 1-6). IEEE.

[14] Akintayo, A., Lore, K. G., Sarkar, S., & Sarkar, S. (2016). Prognostics of combustion instabilities from hi-speed flame video using a deep convolutional selective autoencoder.

International Journal of Prognostics and Health Management, 7.

[15] Chen, Z., Li, C., & Sanchez, R. V. (2015). Gearbox fault identification and classification with convolutional neural networks. Shock and Vibration, 2015.

[16] Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., ... & Moore, R. (2013). Real-time human pose recognition in parts from single depth images.

Communications of the ACM, 56(1), 116-124.

[17] Wang, J., Liu, Z., & Wu, Y. (2014). Learning actionlet ensemble for 3D human action

recognition. In Human Action Recognition with Depth Cameras (pp. 11-40). Springer

International Publishing.

46

[18] Xia, L., Chen, C. C., & Aggarwal, J. K. (2012, June). View invariant human action recognition using histograms of 3d joints. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on (pp. 20-27). IEEE.

[19] Vemulapalli, R., Arrate, F., & Chellappa, R. (2014). Human action recognition by representing 3d skeletons as points in a lie group. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 588-595).

[20] Zanfir, M., Leordeanu, M., & Sminchisescu, C. (2013). The moving pose: An efficient 3d kinematics descriptor for low-latency action recognition and detection. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2752-2759)..

[21] Chaudhry, R., Ofli, F., Kurillo, G., Bajcsy, R., & Vidal, R. (2013). Bio-inspired dynamic 3d discriminative skeletal features for human action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 471-478).

[22] Liu, H., Tu, J., & Liu, M. (2017). Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition. arXiv preprint arXiv:1705.08106.

[23] Cho, K., & Chen, X. (2014, January). Classifying and visualizing motion capture sequences using deep neural networks. In Computer Vision Theory and Applications (VISAPP), 2014 International Conference on (Vol. 2, pp. 122-130). IEEE.

[24] Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., & Baskurt, A. (2011, November).

Sequential deep learning for human action recognition. In International Workshop on Human Behavior Understanding (pp. 29-39). Springer, Berlin, Heidelberg.

[25] Grushin, A., Monner, D. D., Reggia, J. A., & Mishra, A. (2013, August). Robust human action recognition via long short-term memory. In Neural Networks (IJCNN), The 2013 International Joint Conference on (pp. 1-8). IEEE.

[26] Zhang, R., & Li, C. (2015). Motion Sequence Recognition with Multi-sensors Using Deep Convolutional Neural Network. In Intelligent Data Analysis and Applications (pp. 13-23).

Springer, Cham.

[27] Chéron, G., Laptev, I., & Schmid, C. (2015). P-CNN: Pose-based CNN features for action recognition. In Proceedings of the IEEE international conference on computer vision (pp.

3218-3226).

[28] Grushin, A., Monner, D. D., Reggia, J. A., & Mishra, A. (2013, August). Robust human action recognition via long short-term memory. In Neural Networks (IJCNN), The 2013 International Joint Conference on (pp. 1-8). IEEE.

[29] McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5(4), 115-133.

[30] Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological review, 65(6), 386..

[31] Minsky, M., Papert, S. A., & Bottou, L. (2017). Perceptrons: An introduction to computational geometry. MIT press.

[32] Rumelhart, D. E., McClelland, J. L., & PDP Research Group. (1987). Parallel distributed processing (Vol. 1, p. 184). Cambridge, MA, USA:: MIT press.

[

33

] Han, J., & Moraga, C. (1995). The influence of the sigmoid function parameters on the speed of backpropagation learning. From Natural to Artificial Neural Computation, 195- 201..

[34] Hahnloser, R. H., Sarpeshkar, R., Mahowald, M. A., Douglas, R. J., & Seung, H. S. (2000).

Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit.

Nature, 405(6789), 947-951.

Dokumen terkait