42
The bidirectional RNN model exhibited better performance than the other models. However, the
individual classification performance is shown in Table 8, and it can be seen that the results obtained
prior to data augmentation show that the accuracy was 100% for some people, whereas the accuracy
was 73% for other people. Hence, in order to apply the model to real-world environments, it is essential
to produce more robust results. The cause of the difference in performance is the physical discrepancy
among people. We have dealt with such issues by applying preprocessing and data augmentation
techniques, and the person with the lowest accuracy had an increase from 73%–87% after these steps
were applied.
Figure 27 (a) shows the classification performance of the bidirectional RNN without data
augmentation. The accuracy of the entire motion classification is 89.23%. (b) is the performance of the
bidirectional RNN after data augmentation, and the overall accuracy is 96.35%. It can be seen that the
performance is higher when the data augmentation is not performed.
Table 9: Performance Comparison by Methodology
SVM
Neural Network RNN LSTM RNN Bidirectional RNN
non augment augment non augment augment non augment augment non
augment augment
Accuracy 77.32% 73.46% 79.40% 35.72% 35.12% 86.89% 93.42% 89.23% 96.35%
Table 9 compares the overall accuracy of each methodology. Despite the fact that the standard RNN is
a deep learning model, it exhibited the lowest performance due to structural problems, whereas the
LSTM RNN and the bidirectional RNN showed remarkable performances. Data augmentation is
executed during the learning stage of the deep learning model. Therefore, we compared the pre- and
post-augmentation results of the NN, RNN, LSTM RNN and bidirectional RNN, respectively. For the
RNN results, the performance was bad regardless of whether the data augmentation was applied or not,
due to the vanishing gradient problem. The performance of the other models has been improved.
43
performance fluctuations between different people have been found throughout the models. The reason
for such performance fluctuation can be implied as the physical characteristics of each individual.
Data augmentation was applied to solve such issues, and the results showed that the performance
fluctuation has been reduced. Through data augmentation, the performance of LSTM RNN was
increased from 86.39%–93.42%, and the performance of the bidirectional RNN was increased from
89.23%–96.35%. For a normal RNN case, the performance tends to be poor due to the fact that
vanishing gradient issues cause the model to become unable to train properly. In conclusion, the
bidirectional RNN showed better performance compared to other models, and the performance
fluctuation could be reduced through data augmentation.
44
6 CONCLUSION AND FUTURE RESEARCH
Deep learning models automatically extract features, but determining how to input the data to the model
is still an important issue. In this study, we have used a rich representation of the frequency and time
from a sound signal that was obtained by applying STFT. The results of the STFT-applied sound signals
were used as inputs for the CNN model. As a result, we were able to improve the classification accuracy
and reduce false positive rates.
However, the biggest disadvantage of deep learning is that it is a black box model that people cannot
interpret. Adding a CAM layer helps resolve this reliability issue. We can use the results of the CAM
to determine which parts are important features, and we can determine where these features can be used
as prior information when creating a new classification model for similar systems.
In future studies, it is necessary to propose a model that is capable of reflecting the feature
information acquired through similar previous models. In addition, it is necessary to create a model that
can find the cause of the problem from the part where the model is concentrated to prevent recurrence.
For the motion recognition section, the bidirectional RNN was applied for analysis of these motions.
The results have shown that the bidirectional RNN has better performance compared to the other models.
However, the diversity of the physical characteristics induces performance fluctuations between
different people. Data augmentation was applied to solve such issues. As a result, we were able to
improve the overall classification performance and reduce the performance fluctuation between each
individual.
For future studies, we will carry out practical studies to monitor the workers. First, we will quantify
the work efficiency in order to calculate the cycle time. Second, we will develop an abnormal action
detection model. This model is necessary to improve the production efficiency and protect the physical
health of the workers.
45
7 REFERENCES
[1] Kurada, S., & Bradley, C. (1997). A review of machine vision sensors for tool condition monitoring. Computers in industry, 34(1), 55-72.
[2] Austerlitz, H. (2002). Data acquisition techniques using PCs. Academic press.
[3] Davies, C., & Greenough, R. M. (2000, October). The use of information systems in fault diagnosis. In Advances in Manufacturing Technology-conference (Vol. 14, pp. 383-388).
Taylor & Francis LTD.
[4] Safizadeh, M. S., Lakis, A. A., & Thomas, M. (2005). Using short-time fourier transform in machinery diagnosis. Proceedings of WSEAS (Brazil), 494-200.
[5] Wang, W. J., & McFadden, P. D. (1993). Early detection of gear failure by vibration analysis i. calculation of the time-frequency distribution. Mechanical Systems and Signal Processing, 7(3), 193-203.
[6] Rusli, M. (2016). Application of Short Time Fourier Transform and Wavelet Transform for Sound Source Localization Using Single Moving Microphone in Machine Condition Monitoring. KnE Engineering, 1(1).
[7] Shahidi, P., Maraini, D., & Hopkins, B. Railcar Diagnostics Using Minimal-Redundancy Maximum-Relevance Feature Selection and Support Vector Machine Classification.
[8] Guo, M., Xie, L., Wang, S. Q., & Zhang, J. M. (2003, October). Research on an integrated ICA-SVM based framework for fault diagnosis. In Systems, Man and Cybernetics, 2003.
IEEE International Conference on (Vol. 3, pp. 2710-2715). IEEE.
[9] Ying, J., Kirubarajan, T., Pattipati, K. R., & Patterson-Hine, A. (2000). A hidden Markov model-based algorithm for fault diagnosis with partial and imperfect tests. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 30(4), 463-473.
[10] Jha, D. K., Srivastav, A., & Ray, A. (2016). Temporal Learning in Video Data Using Deep Learning and Gaussian Processes. In Workshop on Machine Learning for Prognostics and Health Managament at.
[11] Chen, Z., Li, C., & Sanchez, R. V. (2015). Gearbox fault identification and classification with convolutional neural networks. Shock and Vibration, 2015.
[12] Jeong, H., Park, S., Woo, S., & Lee, S. (2016). Rotating Machinery Diagnostics Using Deep Learning on Orbit Plot Images. Procedia Manufacturing, 5, 1107-1118.
[13] Zhao, R., Wang, J., Yan, R., & Mao, K. (2016, November). Machine health monitoring with LSTM networks. In Sensing Technology (ICST), 2016 10th International Conference on (pp. 1-6). IEEE.
[14] Akintayo, A., Lore, K. G., Sarkar, S., & Sarkar, S. (2016). Prognostics of combustion instabilities from hi-speed flame video using a deep convolutional selective autoencoder.
International Journal of Prognostics and Health Management, 7.
[15] Chen, Z., Li, C., & Sanchez, R. V. (2015). Gearbox fault identification and classification with convolutional neural networks. Shock and Vibration, 2015.
[16] Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., ... & Moore, R. (2013). Real-time human pose recognition in parts from single depth images.
Communications of the ACM, 56(1), 116-124.
[17] Wang, J., Liu, Z., & Wu, Y. (2014). Learning actionlet ensemble for 3D human action
recognition. In Human Action Recognition with Depth Cameras (pp. 11-40). Springer
International Publishing.
46
[18] Xia, L., Chen, C. C., & Aggarwal, J. K. (2012, June). View invariant human action recognition using histograms of 3d joints. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on (pp. 20-27). IEEE.
[19] Vemulapalli, R., Arrate, F., & Chellappa, R. (2014). Human action recognition by representing 3d skeletons as points in a lie group. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 588-595).
[20] Zanfir, M., Leordeanu, M., & Sminchisescu, C. (2013). The moving pose: An efficient 3d kinematics descriptor for low-latency action recognition and detection. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2752-2759)..
[21] Chaudhry, R., Ofli, F., Kurillo, G., Bajcsy, R., & Vidal, R. (2013). Bio-inspired dynamic 3d discriminative skeletal features for human action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 471-478).
[22] Liu, H., Tu, J., & Liu, M. (2017). Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition. arXiv preprint arXiv:1705.08106.
[23] Cho, K., & Chen, X. (2014, January). Classifying and visualizing motion capture sequences using deep neural networks. In Computer Vision Theory and Applications (VISAPP), 2014 International Conference on (Vol. 2, pp. 122-130). IEEE.
[24] Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., & Baskurt, A. (2011, November).
Sequential deep learning for human action recognition. In International Workshop on Human Behavior Understanding (pp. 29-39). Springer, Berlin, Heidelberg.
[25] Grushin, A., Monner, D. D., Reggia, J. A., & Mishra, A. (2013, August). Robust human action recognition via long short-term memory. In Neural Networks (IJCNN), The 2013 International Joint Conference on (pp. 1-8). IEEE.
[26] Zhang, R., & Li, C. (2015). Motion Sequence Recognition with Multi-sensors Using Deep Convolutional Neural Network. In Intelligent Data Analysis and Applications (pp. 13-23).
Springer, Cham.
[27] Chéron, G., Laptev, I., & Schmid, C. (2015). P-CNN: Pose-based CNN features for action recognition. In Proceedings of the IEEE international conference on computer vision (pp.
3218-3226).
[28] Grushin, A., Monner, D. D., Reggia, J. A., & Mishra, A. (2013, August). Robust human action recognition via long short-term memory. In Neural Networks (IJCNN), The 2013 International Joint Conference on (pp. 1-8). IEEE.
[29] McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5(4), 115-133.
[30] Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological review, 65(6), 386..
[31] Minsky, M., Papert, S. A., & Bottou, L. (2017). Perceptrons: An introduction to computational geometry. MIT press.
[32] Rumelhart, D. E., McClelland, J. L., & PDP Research Group. (1987). Parallel distributed processing (Vol. 1, p. 184). Cambridge, MA, USA:: MIT press.
[
33
] Han, J., & Moraga, C. (1995). The influence of the sigmoid function parameters on the speed of backpropagation learning. From Natural to Artificial Neural Computation, 195- 201..
[34] Hahnloser, R. H., Sarpeshkar, R., Mahowald, M. A., Douglas, R. J., & Seung, H. S. (2000).
Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit.
Nature, 405(6789), 947-951.