Indonesian Shadow Puppet Recognition Using VGG-16 and Cosine Similarity

(1)

Indonesian Shadow Puppet Recognition Using VGG-16 and Cosine Similarity

Ida Bagus Kresna Sudiatmika^1,*, I Gusti Ayu Agung Sari Dewi²

1 Information Technology Departement, STMIK Primakara, Denpasar, Indonesia

2 Accouting Information System, STMIK Primakara, Denpasar, Indonesia Email: ^1,*[email protected], ²[email protected]

Submitted: 16/11/2020; Accepted: 14/03/2021; Published: 29/03/2021

Abstract−Wayang is one of Indonesia's cultural heritages that must be preserved, where puppets have various functions not only as a means of art, Wayang is also used as a means of religious ceremonies in Indonesia. Therefore, It is very important to preserve the wayang tradition, to preserve it and it is well known by the public, especially young people. In this research, to participate in the preservation of wayang, the authors propose a preservation approach using artificial intelligence and web scraping techniques. Artificial intelligence in this case is carrying out the process of recognizing wayang images and finding information from the puppet data. In searching for information, web scraping techniques are used where information is automatically retrieved from the internet. Information obtained from the internet will be processed using the TF-IDF method and cosine similarity. This study uses the CNN architecture on the existing model, namely VGG-16. The training process is carried out with 200 iterations with an accuracy of 89%. The application of the CNN network architecture and web scraping has been successfully applied and will still be developed in the future to o btain maximum results.

Keywords: Wayang; Convolutional Neural Network; Web Scraping; TF-IDF; Cosine Similarity

1. INTRODUCTION

Indonesia is rich in culture that should be preserve. Each region in Indonesia has a culture with its own characteristics, one of the cultures that has been inherent for a long time in Indonesia is Wayang or Shadow puppet. Shadow puppet is one of Indonesia's traditional cultures which is usually performed in art or religious events. Shadow puppet is a story that describes human character. In addition, puppets also tell about heroism in which good characters fight or eradicate evil characters. Because of the popularity of shadow puppet in Indonesia and even the world makes shadow puppet recorded as works of art by UNESCO, shadow puppet is an extraordinary work, where the art of shadow puppet contains various values ranging from spirituality, philosophy of life, ethics, music, to complex art. Because the art of shadow puppet is very popular, so in this study the authors try to participate in preserving Indonesian Shadow Puppet. With current technological developments, it is possible to preserve shadow puppet using technology. So that in this case the writer combines the art of shadow puppet with technology in terms of preservation. In computer science there are many algorithms and methods. In this study, the authors used one of the existing methods, namely Deep Learning and Scraping.

In previous research, there were several studies that applied pattern recognition to Shadow puppet as a case study.

As done by Kristian Adi Nugraha et al., they use backpropagation algorithms and edge detection to perform shadow puppet pattern recognition, however this technique is still traditional because the process is separate [1]. In addition, this process is highly dependent on the extraction results from edge detection. Backpropagation is an artificial neural network that has existed before and the level of computation that can be done is still limited. Similar research was also carried out by Muhathir et al., in their research they used SVM dan GLCM for pattern recognition [2]. This study aims to develop research that has been done before using the methods that are very popular today and has an advantage in deep learning which is expected to be able to recognize Shadow Puppet patterns better than previous research. One of the novelties in this study is to display stories from puppet stories automatically, It is done because each shadow puppet character has its own story which is very important to know

Deep Learning is one of the newest methods now [3], Deep learning appears in the realm of machine learning which consists of several hidden layers of neural networks. The deep learning method applies nonlinear transformations and high-level model abstractions for very large datasets. Many have done research using deep learning so that this makes deep learning very popular. Deep learning has solved many problems such as image classification, voice recognition, and object detection [4]. Deep learning provides comprehensive techniques without having to share feature extraction techniques with the introduction process [5]. One of the deep learning algorithms used in this research is Convolutional Neural Network (CNN) [6]. CNN is used to perform image object recognition in this study. CNN is one type of Artificial Neural Network and there are multiplication layers of convolution in this network. CNN consists of several very important layers, namely the convolution layer, the pooling layer and the fully connected layer [7]. CNN is able to learn the very complex features and functions that represent a high level of abstraction. The CNN architecture has proven itself in terms of object recognition, sentence introduction, and text classification, multivariate time series and analysis, health.

biological image classification. Apart from the use of CNN, the author also uses Web scraping techniques. The web scraping technique is a semi-structured technique for retrieving documents on web pages in HTML and XHTML languages. In this research, web scraping is used to automate information retrieval from a web page where this information is determined by the output category of CNN. There has been a lot of research done with web scraping and it has been successfully applied.

(2)

The purpose of this research is to preserve shadow puppet by introducing objects in the puppet image, with the introduction of this object, it is hoped that the community, especially the younger generation, will easily recognize the character of a puppet. This introduction is so that the public and the younger generation always learn the characters of shadow puppet and know the story of a puppet. So that in this way it can help prevent extinction of Indonesian puppets.

In addition, this study also aims to test deep learning in introducing shadow puppet objects and the use of web scraping to retrieve information about shadow puppet. This paper consists of a research method session that describes the research steps, results and discussion in which it explains the experimental results then continues with conclusions.

2. RESEARCH METHODOLOGY

In this research, there are several steps taken, namely collecting the dataset, determination of the CNN architecture, architectural testing and integration between the prediction results of the model and web scraping.

2.1 Data Collection

The dataset is data used to conduct training in mode and testing, The dataset used in this research is to use several types of puppets, namely Yudistira Puppets, Bima Puppets, Nakula Puppets, Arjuna Puppets, Drona Puppets and Gatot Kaca Puppets. The dataset has dimensions 224x224 (RGB), Each wayang image has a total of 200, so the total dataset in this study is 1200 images.

2.2 Web Scraping

In this research, web scraping is used to retrieve the results from model predictions [16] – [17], Web scraping is used to retrieve text related to prediction results. So using e web scraping makes it easier to find information and retrieve text data. Figure 1 describes the use of web scraping from this study.

Figure 1. CNN and Web Scraping 2.3 CNN Architecture

The architecture used in this study is the VGG-16 model, This model is used for the wayang image recognition process.

We implemented the pre-trained VGG-16 without using the top layer for feature extraction and the network layer (MLP) for classification. This model consists of a convolutional layer, a pooling layer and a network layer, reduce the number of parameters performed in the computation [8]. VGG-16 is a model with a very deep architecture created by Visual Geometry Group (VGG), Oxford University [9]. This model was tested in the ILSVRC in training on 1 000 000 ImageNet images in 2014. The convolution layer in this model uses the ReLu activation function except that the output layer uses the SoftMax activation function to estimate the probability of several classes / labels. Equation 1 shows ReLU and equation 2 shows softmax.

𝑓(𝑥) = max⁡(0, 𝑥) (1)

𝜎(𝑧)_𝑖= ⁡ ^𝑒^𝑧𝑗

∑^𝐾_𝑘=1𝑒^𝑧𝑘⁡ (2)

We also use dropout regularization to prevent overfitting. In general, deep neural networks require a lot of training data to learn the representation of the data. If the number of datasets is still small, there are several techniques that can be used, one of which is transfer learning. The model we use is pretrained from VGG-16 which has been training on ImageNet images with a number of 1000000 images. Although the VGG16 model is not used to classify shadow puppets, in this study you are trying to classify shadow puppets with transfer learning [10]. Previously there was research on shadow puppet pattern recognition using backpropagation algorithms and edge detection in image processing for feature extraction [11], but this research was still conventional because it had to complete the image processing process and then studied it by the backpropagation network [12]. So that with deep learning, it will study feature extraction from images

(3)

that is more optimal and studied by the deep neural network of VGG-16 [13]. The training process uses the GPU to speed up calculations in training [14]. Training is carried out with several variations of the number of epochs. The number of epochs used is 50, 100, 150, 200. The training process will be calculated based on accuracy [15]. After conducting the training, the testing stage will be carried out. The testing phase is the stage that is carried out to test the training model to determine the correctness of the output.

3. RESULT AND DISCUSSION

In this research, the author uses several iterations in training namely 50,100,150,200. This step is carried out to determine the accuracy generated through testing from several iterations. The first training carried out was with a total of 50 iterations. Figure 2 shows the results of 50 iterations of training.

Figure 2. Training and Validation using epoch 50

The picture above shows that using 50 iterations the accuracy of the training obtained is 88%, with a validation value of 88%. In the training process there is still a bit of overfitting. Measurements are also carried out by observing the loss value from the training shown in Figure 3.

Figure 3. Loss function using epoch 50

In the following figure, it can be seen that the model can still be trained, It can be seen from the decreasing loss value, so the test will be carried out using 100 iterations.

Figure 4. Training and validation using epoch 100

(4)

By using 100 iterations, obtained an accuracy of up to 87% and most likely it can still increase again. The loss function obtained is shown in the following figure.

Figure 5. Loss Function using epoch 100

We can see the model merge quickly but there is still a small overfit, so continue with the model training using epoch 150. By using epoch 150 it can be seen that the accuracy reaches up to 88% on this model does not look overfit compared to training 50.

The loss function value also decreases until it reaches a value of 0.35, better than the loss value in the previous iteration.

The 200 iteration shows that the training and validation lines are closer and better than the training in the previous iteration, the accuracy obtained reaches 89%, as well as the value of the function loss which decreases to 0.20.

(a) (b)

Figure 8. Training and Validation (a), Loss Function (b) using epoch 200

(5)

Through the training above, the author uses 200 iterations in making the model. After the training, the author continues by making predictions on the puppet image using the previously model.

Table 1. Prediction Result

The table above shows that the model can predict well by showing the dominant result on the true value compared to the wrong value.

The next step is integration using web scraping to obtain information related to the predicted output, So using scraping will help to get information with the predicted puppets. At the web scraping stage, there is a weighting method, namely the Term Weighting TF-IDF. This method is used to minimize the number of the same words or sentences that appear when web scraping is carried out. Term Frequency is the frequency of the appearance of a term in the document concerned. The greater the number of term appearances (higher TF) in the document, the greater the weight or the greater the suitability value. Inverse Document Frequency (IDF) is a calculation of how widely distributed terms are in the collection of documents concerned. After the weight is determined, the sentence similarity level will be measured using the cosine similarity.This method is also very good to use to find the relevance of a keyword in scraping text. The test results are shown in the following table 2.

Table 2. Cosine Similarity Results

Testing is done with some test data that has been prepared.The data tests have their respective labels making it easier to read predictions.The data test label is stored in an array so that the output will be matched against the array.

From the tests carried out, one of them is in the Arjuna image, In this test the image is tested on a previously created model, then the model will classify the image and provide output. The output will then be matched against an array containing the values. As the following picture shows, that web scraping will make text adjustments based on the output of the prediction model. In the future this system will be applied to the mobile version making it easier for users to use it.

At this stage it is still in the form of development that will always be renewed its ability to recognize the object.

Figure 7. Output Process

4. CONCLUSION

This research has been conducted with several stages that have been passed, starting from collecting the dataset, define architecture, conduct training and testing on the model and finally evaluate the research. This research uses deep learning and web scraping methods. In characteristic extraction, The convolutional layer of the convolutional neural network plays a very important role in providing information that will be processed by the neural network. The VGG-16 is a very deep network and has proven successful in image classification. In this study, the application of the VGG-16 architecture has

Wayang True Prediction False Prediction

Yudistira 8 2

Arjuna 8 2

Bima 7 3

Nakula 6 4

Drona 8 2

Gatot Kaca 7 3

Query Cosine Similarity Score Wayang Yudistira 0.231

Wayang Arjuna 0.256

Wayang Bima 0.0056

Wayang Nakula 0.017

Wayang Drona 0.104

Wayang Gatot Kaca 0.0080

(6)

been successfully carried out by using the finetuning approach. In the next study, the writer will try to collaborate with several methods and even make comparisons of research with other methods that can provide more accurate results. The application of web scraping in this study has been carried out, and has succeeded in providing information based on the prediction results from CNN.

REFERENCES

[1] K. A. Nugraha et al., “Algoritma Backpropagation Pada Jaringan Saraf Tiruan Untuk Pengenalan Pola Wayang Kulit,” Semin.

Nas. Inform., vol. 2013, no. semnasIF, pp. 8–13, 2013, [Online]. Available: Wayang Kulit, Jaringan Saraf Tiruan, Backpropagation, Deteksi Tepi.

[2] M. Muhathir, M. H. Santoso, and D. A. Larasati, “Wayang Image Classification Using SVM Method and GLCM Feature Extraction,” JITE, vol. 4, no. 2, pp. 373–382, Jan. 2021, doi: 10.31289/jite.v4i2.4524.

[3] A. Waleed Salehi, P. Baglat, and G. Gupta, “Review on machine and deep learning models for the detection and prediction of Coronavirus,” Materials Today: Proceedings, Jun. 2020, doi: 10.1016/j.matpr.2020.06.245.

[4] V. Chandrasekar, V. Sureshkumar, T. S. Kumar, and S. Shanmugapriya, “Disease prediction based on micro array classification using deep learning techniques,” Microprocessors and Microsystems, vol. 77, p. 103189, Sep. 2020, doi:

10.1016/j.micpro.2020.103189.

[5] D. Karimi, H. Dou, S. K. Warfield, and A. Gholipour, “Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis,” Medical Image Analysis, vol. 65, p. 101759, Oct. 2020, doi: 10.1016/j.media.2020.101759.

[6] S. Han, C. Seng, S. Joseph, and P. Remagnino, “How deep learning extracts and learns leaf features for plant classification,” vol.

71, pp. 1–13, 2017, doi: 10.1016/j.patcog.2017.05.015.

[7] X. Wu et al., “Deep learning-based multi-view fusion model for screening 2019 novel coronavirus pneumonia: A multicentre study,” European Journal of Radiology, vol. 128, p. 109041, Jul. 2020, doi: 10.1016/j.ejrad.2020.109041

[8] Z. Yang, J. Yue, Z. Li, and L. Zhu, “Vegetable Image Retrieval with Fine-tuning VGG Model and Image Hash,” IFAC- PapersOnLine, vol. 51, no. 17, pp. 280–285, 2018.

[9] M. V. Valueva, N. N. Nagornov, P. A. Lyakhov, G. V. Valuev, and N. I. Chervyakov, “Application of the residue number system to reduce hardware costs of the convolutional neural network implementation,” Mathematics and Computers in Simulation, vol.

177, pp. 232–243, Nov. 2020, doi: 10.1016/j.matcom.2020.04.031.

[10] T. Shanthi, R. S. Sabeenian, and R. Anand, “Automatic diagnosis of skin diseases using convolution neural network,”

Microprocessors and Microsystems, vol. 76, p. 103074, Jul. 2020

[11] X. Hao, G. Zhang, and S. Ma, “Deep Learning,” International Journal of Semantic Computing, vol. 10, no. 3, pp. 417–439, Sep.

2016, doi: 10.1142/s1793351x16500045

[12] N.J. Majaj and D. G. Pelli, “Deep learning—Using machine learning to study biological vision,” Journal of Vision, vol. 18, no.

13, p. 2, Dec. 2018, doi: 10.1167/18.13.2.

[13] S. Rahman, L. Wang, C. Sun, and L. Zhou, “Deep Learning Based HEp-2 Image Classification: A Comprehensive Review,”

Medical Image Analysis, p. 101764, Jul. 2020, doi: 10.1016/j.media.2020.101764S. Rahman, L. Wang, C. Sun, and L. Zhou,

“Deep Learning Based HEp-2 Image Classification: A Comprehensive Review,” Medical Image Analysis, p. 101764, Jul. 2020, doi: 10.1016/j.media.2020.101764

[14] K. Shankar, A. R. W. Sait, D. Gupta, S. K. Lakshmanaprabu, A. Khanna, and H. M. Pandey, “Automated detection and classification of fundus diabetic retinopathy images using synergic deep learning model,” Pattern Recognition Letters, vol. 133, pp. 210–216, May 2020, doi: 10.1016/j.patrec.2020.02.026.

[15] F. Ertam, “An effective gender recognition approach using voice data via deeper LSTM networks,” Applied Acoustics, vol. 156, pp. 351–358, Dec. 2019, doi: 10.1016/j.apacoust.2019.07.033.

[16] F. Polidoro, R. Giannini, R. L. Conte, S. Mosca, and F. Rossetti, “Web scraping techniques to collect data on consumer electronics and airfares for Italian HICP compilation,” SJI, vol. 31, no. 2, pp. 165–176, May 2015.

[17] B. Zhao, “Web Scraping,” in Encyclopedia of Big Data, Springer International Publishing, 2017, pp. 1–3.