Deep Learning - automated detection and classification

Figure 2.4: Illustration of ML In Comparison with DL (Adapted from Alzubaidi, et al., 2021)

Recent studies also prove that DL models had already surpassed humans in classifying images. This is observed in Figure 2.5, where the top-5 error rate of state- of-the-art DL models is compared with human errors estimated to be 5.1% according to Russakovsky, et al. (2015). The top-5 error rate is the percentage of classification made by the model on a given image where the correct label is not on its top 5 predictions. It is one of the methods used to evaluate machine learning models in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). The annual challenge started from 2010 until present uses a subset of ImageNet with approximately 1000 images for each of the 1000 classes. The images are split into 1.2 million for training, 50 thousand for validation, and 150 thousand for testing (Krizhevsky, Sutskever and Hinton, 2017). The goal for each year is to reduce the error rate of the previous models.

However, ImageNet had announced that starting from 2018, the classification of 3D objects will be involved (Wikipedia, 2021). Hence, the DL models that participated in ILSVRC, used for comparison in Figure 2.5, are dated until 2017 only.

The three main techniques for DL include deep supervised learning, deep unsupervised learning, and deep reinforcement learning. There are also hybrid techniques such as semi-supervised learning, while another common method that is widely used is known as deep transfer learning. Deep transfer learning will be discussed more in-depth than other techniques as it will be utilized in this project.

Figure 2.5: ImageNet Top-5 Error Rate of DL Models Compared to Human Errors (Adapted from Alzubaidi, 2021)

Table 2.3: DL Models and Their References

DL models ILSVRC Results Reference

SENet 2017 Winner Hu, et al., 2019

ResNeXt 2016 1^st Runner-up Xie, et al., 2017

Inception-V4 - Szegedy, et al., 2017

ResNet 2015 Winner He, et al., 2015

Inception-V3 2015 1^st Runner Up Szegedy, et al., 2016

Inception-V2 - Ioffe and Szegedy, 2015

DenseNet - Huang, et al., 2017

Inception V1 (GoogLeNet) 2014 Winner Szegedy, et al., 2015 VGGNet 2014 1^st Runner up Simonyan and Zisserman,

2015

ZFNet 2013 Winner Zeiler and Fergus, 2014 AlexNet 2012 winner Krizhevsky, Sutskever

and Hinton, 2017

XRCE 2011 winner Sanchez, et al., 2013

NEC-UIUC 2010 winner Lin, et al., 2011

28.20%

25.80%

16.40%

11.70%

7.30%

6.67%

5.54%

5.10%

4.80%

3.58%

3.57%

3.08%

3.03%

2.25%

0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00%

NEC-UIUC XRCE AlexNet ZFNet VGGNet Inception V1 DenseNet Human Inception V2 Inception V3 ResNet Inception V4 ResNeXt SENet

Error Rate

ImageNet Top-5 Error Rate

2.4.1 Deep Supervised Learning

Deep supervised learning is the most common and simplest approach for DL. It is used to train datasets that are labelled. During training, the input data are applied together with the resultant output. The agent then predicts the input to the desired output, optimizing the network’s internal parameters using the loss function computed from previous predictions to minimize the error until the desired output is sufficiently met.

A few algorithms used for deep supervised learning may include recurrent neural networks (RNN), convolutional neural networks (CNN), and deep neural networks (DNN). This approach benefits in retrieving data or producing a data output from the previous knowledge. However, it suffers from overstraining the decision boundary when the training set lacks samples from an existing class (Alzubaidi, et al., 2021). An illustration of deep supervised learning is shown below.

Figure 2.6: Deep Supervised Learning (Qian, et al., 2020)

2.4.2 Deep Unsupervised Learning

Deep unsupervised learning is a type of DL approach that involves unlabelled data.

This means that only input data is applied in the training process. Internal representation or notable features are learnt instead to uncover the underlying links in the input data. Several algorithms used for deep unsupervised learning include dimensionality reduction, generative adversarial networks, and, most popularly, clustering. The disadvantage of this approach is that it is complex, and the data output

generated may be inaccurate for tasks related to sorting data (Alzubaidi, et al., 2021).

An illustration of deep unsupervised learning is shown below.

Figure 2.7: Deep Unsupervised Learning (Qian, et al., 2020)

2.4.3 Deep Reinforcement Learning

Deep reinforcement learning is another approach for DL that involves the agent interacting with its environment. The agent will make a series of actions whereby, for each iteration, a reward function will be produced from the environment in which the agent uses it to optimize its state for the next iteration (Amiri, et al., 2018). This approach will perform an infinite amount of iterations with an objective to reduce loss while maximizing reward (Neftci and Averback, 2019). Deep reinforcement learning can be seen used to plan corporate strategy as well as industrial robots. This is because this approach can help identify the preferred action to obtain the maximum reward.

However, it can take more time and computing power to achieve it in a larger workspace (Alzubaidi, et al., 2021).

Figure 2.8: Deep Reinforcement Learning (Amiri, et al., 2018)

2.4.4 Deep Semi-supervised Learning

Deep semi-supervised learning is a hybrid approach for DL that involves both labelled and unlabelled data. It contains the pros and cons of both supervised and unsupervised learning. This approach is favourable when labelled data are difficult to collect compared to the widely available unlabelled data. Hence, it can benefit DL where there is a lack of labelled data. The main drawback is that assumptions must be made so that this approach can work effectively. These assumptions include smoothness, cluster, and manifold assumptions (Ouali, Hudelot and Tami, 2020). One application of deep semi-supervised learning can be seen in classification tasks concerning text documents (Alzubaidi, et al., 2021).

2.4.5 Deep Transfer Learning

Following the widespread use of deep architectures, most notably the convolutional neural network (CNN), deep transfer learning becomes increasingly popular for DL.

This is because it is an effective approach for DL on undersized annotated datasets where overfitting is a major issue (Tan, et al., 2018). In contrast, traditional approaches often demand vast amount of datasets, which also requires a lot of time and computing resources, to train and build a DL model from scratch (Alzubaidi, et al., 2021). The main idea behind transfer learning is to repurpose existing DL models trained previously for one task to another novel task (Best, Ott and Linstead, 2020). These DL models are also known as pre-trained models.

The two common terminologies used in transfer learning are source and target, whereby each is further described by domain and task. The domain where knowledge will be learned is known as the source domain, 𝒟𝒟S, while the domain where the knowledge will be transferred to is known as the target domain, 𝒟𝒟_T. The following notations and definitions will match closely on the survey paper done by Pan and Yang (2010).

Domain, denoted as 𝒟𝒟, can be expressed mathematically as 𝒟𝒟 = {ℱ, P(X)}. It is observed that there are two components in the expression: a feature space, ℱ, and an edge probability distribution, P(X), where X = {x1, ..., xm} ∈ ℱ (Tan, et al., 2018).

For a task concerning binary classification problem, ℱ is the space with a collection of all feature vectors, xi is the ith feature vector corresponding to the ith term, while X is the particular samples used for training. Generally, if two domains are different, this could mean that either their feature spaces or the marginal probability distributions are different. Then, for a given 𝒟𝒟, a task, denoted as 𝒯𝒯, can be expressed mathematically as 𝒯𝒯 = { 𝒴𝒴, f(·)}. It also consists of two parts, where the first part is a label space, 𝒴𝒴. The second part is a predictive function, f(·), which is learned from the instances and label training pairs {xi, yi}, where xi ∈ X and yi ∈𝒴𝒴, and it can also be alternatively viewed as a conditional probability distribution P(y|x). The source task is denoted as 𝒯𝒯_𝑆𝑆 and the source predictive function is denoted as fS(·), whereas the target task is denoted as 𝒯𝒯_𝑇𝑇 and the target predictive function can be denoted as fT(·). Recalling back to the binary classification problem, 𝒴𝒴 is the collection of all labels that contains true and false, and yi can have a value of either true or false (Weiss, Khoshgoftaar and Wang, 2016).

With the notation defined above, 𝒟𝒟_S can be formally expressed as 𝒟𝒟_S = {(xS1, yS1), …, (xSm, ySm)}, where ySi∈𝒴𝒴 is the class label corresponding to the data instance, xSi ∈ ℱ^S. Likewise, 𝒟𝒟T can be formally expressed as 𝒟𝒟T = {(xT1, yT1), …, (xTm, yTm)}, where yTi∈𝒴𝒴 is the class label corresponding to the data instance, xTi∈ ℱ^T. Finally, the definition of transfer learning can be formally described. For a given 𝒟𝒟_S and its corresponding learning tasks 𝒯𝒯𝑆𝑆, and a 𝒟𝒟T and its corresponding learning tasks 𝒯𝒯𝑇𝑇 the goal of transfer learning is to enhance the performance of fT(·) by leveraging the knowledge in 𝒟𝒟_S and 𝒯𝒯_𝑆𝑆, where 𝒟𝒟_S ≠ 𝒟𝒟_T or 𝒯𝒯_𝑆𝑆 ≠ 𝒯𝒯_𝑇𝑇 (Tan, et al., 2018). The performance of transfer learning can be categorized into positive transfer and negative transfer. For a given predictive learner fT1(·) that is trained with 𝒟𝒟_T only, and another predictive learner fT2(·) that is trained with 𝒟𝒟S and 𝒟𝒟T combined, the transfer is said to be negative if fT1(·) performs better than fT2(·), whereas the transfer is said to be positive if fT2(·) performs better than fT1(·) (Weiss, Khoshgoftaar and Wang, 2016). Figure 2.9 illustrates the transfer learning process.

There are two types of approaches to perform transfer learning using pre- trained models: feature extraction, and fine-tuning. The former extracts the feature maps from the pre-trained model to be built on top of a shallow model, while the latter makes fine adjustments on the pre-trained model to increase its accuracy and performance whilst retaining the initial weights learned by the model for the new task (Mustafid, Pamuji and Helmiyah, 2020). All in all, transfer learning benefits in requiring lesser dataset and time for training while improving performance and network generalization (Alzubaidi, et al., 2021).

Figure 2.9: Transfer Learning Process (Tan, et al., 2018)

Dalam dokumen automated detection and classification (Halaman 39-46)