Molecular cancer classification method on microarrays gene expression data using hybrid deep neural network and grey wolf algorithm

(1)

https://doi.org/10.1007/s12652-020-02478-x ORIGINAL RESEARCH

Molecular cancer classification method on microarrays gene expression data using hybrid deep neural network and grey wolf algorithm

AliReza Hajieskandar¹ · Javad Mohammadzadeh¹ · Majid Khalilian¹ · Ali Najafi²

Received: 23 December 2019 / Accepted: 14 August 2020

Abstract

Gene selection methods are critical in cancer classification, which depends on the expression of a small number of biomarker genes, which have been a significant issue of enormous recent studies. Microarray technology allows generating tumors gene expression datasets. Cancer classification based on these datasets commonly has a kind of small sample size against the number of genes involved and includes multiclass categories. In this paper, grey wolf algorithm was used for extracting notable features in the pre-processing stage, and deep neural network (DNN) was used as deep learning for improving the accuracy degree of cancer detection from three datasets, i.e., STAD (Stomach adenocarcinoma), LUAD (lung adenocarcinoma) and BRCA (breast invasive carcinoma). The proposed method achieved the highest accuracy for these three datasets.

The proposed method was able to achieve accuracy close to 100. Furthermore, the proposed method was compared with linear support vector machine classification, RBF, the nearest neighbor, linear regression, one vs. all, Naive Bayes, and decision tree algorithms. The proposed method had 0.57 improvement on the LUAD dataset, 1.11 optimization on the STAD dataset, and 0.78 development on the BRCA dataset.

Keywords Cancer classification · DNA Microarray · Deep neural networks · Grey wolf algorithm · Deep learning

1 Introduction

Cancer is a multifactorial and complex disorder, mostly caused by acquired mutations and epigenetic alterations that affect gene expression. Accordingly, most of the cancer investigations focus on the identification of genetic biomarkers that can be used to precisely diagnose and effective

treatment (Butterfield et al. 2017; Knudson 2000). 90%

of human cancers have an epithelial origin, which shows aneuploidy, deletions, duplications, and genetic instability.

These complexities probably explain the clinical diversities of similar tumor tissues and the need for a comprehensive understanding of the genetic changes in tumors (Chen et al.

2005; Gray and Collins 2000).

The initial human genome sequence has led to the identification of genetic complexities of the common cancers using advanced technologies. Now, there are high-throughput technologies to identify all the cancer abnormalities at DNA, RNA, and protein levels. The gene expression studies in human cancers can identify different genetic biomarkers in malignant transformations. Conventionally, these studies are limited to assess a few genes at a time. However, there are high-throughput methods available, especially DNA microarray and RNA-Seq, for a comprehensive analysis of RNA expression (expression profile) in human tumor samples, which is a significant breakthrough (Young 2000).

Microarray technology is considered as a promising opportunity that can be used for analyzing and investigating thousands of gene expression cancer profiles and its related

* Javad Mohammadzadeh [email protected] AliReza Hajieskandar [email protected] Majid Khalilian [email protected] Ali Najafi

[email protected]

1 Department of Computer Engineering, Karaj Branch, Islamic Azad University, Karaj, Iran

2 Molecular Biology Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran

(2)

issues (Bunz 2016). This technology has made it possible to figure out cell behavior at the molecular level. It can produce gene expression data on a large scale. Thus, large amounts of genetic data are generated, which can be quickly and precisely analyzed and managed. Furthermore, several statistical processes and machine learning algorithms are used, which leads to the emergence of an analytical perspective (Dwivedi 2018).

Since tumor scanning is harmful to the human body, but the microarray method is harmless, the clinicians used the microarray method for cancer detection. During the experimental process of preparing a microarray dataset, large amounts of data are produced, and this creates a great challenge for analyzing gene expression alteration data.

The data have high dimensions, and the number of samples is small. However, only a few genes are detected that can play a significant role in cancer differentiation (Tabakhi 2015; Motieghader et al. 2017; Ram et al. 2017). Thus, gene selection methods are critical in this area. In general, gene selection is a crucial pre-processing stage in cancer classification, which is aimed at destroying inappropriate and inactive genes. In this way, it improves an understanding of classification models with the highest degree of accuracy and precision. Also, in most cases, finding a small subset of biomarker genes is critical (Zhou et al. 2015; Varadharajan 2018; Moslehi and Haeri 2019; Chen et al. 2016).

In this paper, the grey wolf algorithm was used for the pre-processing stage and selecting useful genes for cancer detection. Also, a 15-layer deep neural network algorithm was used for classifying selected genes to detect cancer (Agrawal and Agrawal 2015; Tavakoli 2019). Then, the proposed hybrid method was applied to three cancer datasets, named STAD, LUAD, and BRCA. Also, the proposed method was compared to the following classification algorithms: linear support vector machine, RBF, the closest neighbor, linear regression, One VS all, Naive Bayes, and decision tree. The results of the proposed method indicated its more desirable performance in comparison to the other methods. The significant contributions of this research study are as follows:

• A Grey wolf algorithm is used as a pre-processor for selecting significant features out of all features. Using a meta-heuristic algorithm such as the grey wolf can con- tribute to the selection of useful optimal features. The optimal choice of significant features can facilitate the attainment of optimal response and operation improvement.

• The main algorithm is based on deep learning, DNN neural network with a 15-layer ReLU non-linear activation function which consists of a drop-out layer, dense layer, and fully connected layer.

• Using deep learning and grey wolf algorithm optimizes the generalization of a problem to other problems.

The following section of the paper is organized as follows: Sect. 2 clarifies deep neural network and grey wolf meta-heuristic algorithm; Sect. 3 gives a brief review of the related works about the above-mentioned research problem.

Section four introduces and describes the proposed method.

Section 5 reports the simulation results and discusses a comparison of the proposed method to the other methods, and finally, Sect. 6 concludes the paper.

2 Background

In this section, a brief review of DNN and grey wolf algorithm are given.

2.1 Deep neural network

Artificial neural networks (ANN) are novel computational methods and systems for machine learning, demonstrating knowledge, and applying the obtained knowledge for the prediction of output responses from complicated systems.

The rationale behind these networks has been inspired by the mechanism of the biological neural network for processing data and information so that information is learned, and knowledge is acquired. The key component of ANNs is the generation of new structures for the information processing system. This system consists of a large number of extraordinary processing elements, i.e., neurons, which are attached and connected to each other. For solving a problem, neurons operate in harmony with each other, and they transmit information through synapses. In these networks, if a cell is damaged, the other cells can compensate for its absence; they also work together for amending and recov- ering it. ANNs are able to learn things. For example, by applying burn in the neural cells of touch organs, cells learn not to move towards hot objects; via this algorithm, the system is trained to amend its mistake. Learning in these systems is done adaptively; that is, through using samples and instances, the weights of synapses are altered in such a way that the system generates the right response in case new inputs are given. Deep learning, which is also referred to as profound machine learning, in-depth structure learning, or hierarchical learning, is considered to be a sub-branch of machine learning based on a set of algorithms. It is aimed at clarifying high-level abstract notions in the data. It uses a deep graph with several processing layers, which includes several layers of linear and non-linear conversions for clarifying the process. In other words, deep learning is based on demonstrating knowledge and features in layers.

(3)

In general, deep learning is a sort of machine learning algorithm that uses several layers for extracting high-level features from raw input (Wilmott 2019). Figure 1 depicts the structure of ANNs. As shown in this figure, hidden layers indicate the deepness of the analysis in ANNs. Such networks are referred to as DNN (deep neural network). DNN is the most important structure and technique of deep learning. Deep learning or profound machine learning denotes profound structure learning or hierarchical learning (Aggar- wal 2018). DNN is a sub-branch of machine learning, and it is based on a set of algorithms. As mentioned above, it is intended to clarify high-level abstract notions (Aggarwal 2018). For example, in image processing, lower layers can detect edges; however, higher layers can detect meaningful aspects such as faces and words (Wilmott 2019). By capital- izing on DNN and deep machine learning in this study, we intended to identify and classify cancer.

2.2 Grey wolf algorithm

Grey wolf algorithm (GWO) is a meta-heuristic algorithm that has been inspired by nature. The rationale behind this algorithm is the hierarchical structure and social behavior of wolves at hunting time. GWO is a population-based algorithm, and it has a simple adjustment and arrangement procedure. Four types of grey wolves, such as Alpha, Beta, Delta, and Omega, are used for simulating leadership hierarchy. Three main steps of hunting, i.e., search for prey, sur- rounding prey, and attacking prey, are executed (Mirjalili et al. 2014). The hierarchical structure and social behavior of wolves are as follows: grey wolves are at the head of the food continuum and have social lives; the average number of wolves in each wolf pack ranges from 5 to 12. There are four ranks in each wolf pack as follows:

• Alpha wolves (alpha): these wolves are referred to as leader wolves that may be male or female. They dominate packs and manage issues such as rest and the manner of hunting. Nonetheless, besides the dominant and superior behavior of alpha wolves, a sort of democratic structure is observed within wolf packs.

• Beta wolves (beta): beta wolves help alpha wolves in the decision-making process, and they are likely to replace alpha wolves in the future.

• Delta wolves (delta): delta wolves are at a lower rank than beta wolves that include old wolves, hunting wolves, and wolves that look after wolfing.

• Omega wolves (omega): these wolves have the lowest rank within the hierarchy in their packs. They have the smallest rights in comparison to the other members of the group. They ought to have food after the other wolves and have no roles in the decision-making process.

Then we Calculate the fitness of each search agent as follows:

• Xα = the best search agent

• Xβ = the second-best search agent

• Xδ = the third best search agent

Figure 2 shows the flowchart of this algorithm.

3 Related work

In the early 1990s, according to research findings, if a smart ANN-based structure can be used for identifying and classifying anomalies in medical images, not only identification rate is enhanced, but also fewer false-positive

Fig. 1 An example of the ANNs structure (Aggarwal, 2018)

(4)

detections (FP) in comparison to image processing are obtained in identical problems (Lo and Lou 1995).

SCAD (simultaneous clustering and distinction) algorithm was proposed by Frigui and Nasraoui. This algorithm is trained for simultaneously weighing features and determining the weight differences of datasets during the clustering process. SCAD algorithm has two merits:

firstly, it divides the dataset into more meaningful clusters.

Secondly, it can be used as part of a more complicated learning system for improving learning behavior. SCAD was developed for clustering text documents (dependent on the weight of word set), which operates better than traditional clustering. This algorithm may also be used for selecting significant genes in gene expression data.

However, research studies indicate that SCAD does not operate well in this field, which is due to not considering different weights of each feature within each cluster (local adaptive distance). In some cases, local adaptive distance my not be appropriate because it leads to falling into local optima and indefinite solution (Frigui and Nasraoui 2000).

Wang et al. applied another algorithm for identifying microarrays in DNA form simulating selection from among a thousand genes. To extract optimal information from genes, they used a correlation-based method. Then, they used classification algorithms such as SVM, decision tree, and Naive Bayes for selecting the respective gene class (Wang et al. 2005).

For classifying DNA microarrays and detecting cancer, Sharma and Paliwal used small scale sampling (SSS), standard learning algorithms implemented on a database of different genes, and LDA algorithm. Using LDA for extracting genomic information, they maintained that this method produces outstanding results for extracting data (Sharma and Paliwal 2008).

Mishra et al. proposed another algorithm in which a meta- heuristic method is applied for classifying DNA microarrays and detecting cancer. They used PSO (particle swarm optimization) and SA (simulated annealing) for achieving the above-mentioned objective. The obtained results from implementing PSO-FLANN indicated that the possibility of evaluating and identifying the genome by this method is high (Mishra et al. 2012).

A smart evolutionary algorithm, namely IDGA (immune detector generation algorithm), was developed by Dashtban and Balafar, which is based on genetic algorithm and artificial intelligence (Dashtban and Balafar 2017). It includes two stages: the first stage applies a method based on scoring for reducing dimensions and, more importantly, for provid- ing significant statistical genes for the next stage. The second stage is to use Fisher and Laplace scoring methods (Yang et al. 2016). The performance of the Fisher score and its power against noise has been demonstrated and proved (Oly- aee et al. 2013).

Fakoor et al. introduced a two-stage method based on feature extraction, dimension reduction via PCA, and classification via an autoencoder. The results experimented on a number of datasets. The accuracy degree of this method on the AML dataset was 95.15 (Fakoor et al. 2013).

With regard to this research issue, Hui Chen et al. developed a method and assumed that the weight of a feature for all clusters (global adaptive distance) is identical. In this algorithm, a kernel-based clustering method for gene selection (KBCGS) was presented in which different weights are assigned to different genes, and each class is regarded as a known cluster. Then, the most desirable weight of genes is obtained by minimizing the clustered target function. In the clustering process, the variance is measured through the sum of Euclidean distances between samples and clustered

Fig. 2 The flowchart of GWO algorithm

(5)

centers. Variance for each gene is computed separately by using kernel functions. Finally, top genes are ranked and selected for displaying cancer classification (Chen et al.

2016).

Abdul Zaher et al. used a deep belief network-based method on Wisconsin Breast Cancer. The accuracy degree of this method was 99.69 (Abdel-Zaher and Eldeib 2016).

Lingo Gao et al. proposed another algorithm. Like other feature selection algorithms (Peng et al. 2005), it is a hybrid algorithm which is aimed at achieving high performance.

In this algorithm, gene selection is carried out in 2 stages by IG and SVM algorithms. IG is one of the ranking filter methods which analyzes the correlation between features and classes. Also, SVM is a machine learning algorithm based on structural error minimization. It has high classification performance because it is capable of minimizing global optimization and generalizability in comparison to traditional classifiers (Gao et al. 2017).

In a technical research study (Liu et al. 2018), Liu et al.

compared KBCGS with SCAD method and found the following differences:

• SCAD is an unsupervised learning algorithm, but KBCGS is a supervised learning algorithm.

• SCAD uses local adaptive distance, which leads to falling into local optima; however, KBCGS uses global adaptive distance, and it can predict the problem.

• SCAD variance is simply measured by using Euclidean distance. It is measured when the set of data is linearly separated. However, variance in the KBCGS method is measured by using the kernel. It can generate non-linear hybrid levels from among clusters. The introduced changes can optimize performance in this area. KBCGS method makes it possible to use adaptive distances, which change in each step of the iteration. This type of mutual measurement is appropriate for learning the weight of features throughout the clustering process and optimizing a section of algorithm performance. Further- more, the KBCGS algorithm is simple to implement, and there is no need to amend or optimize parameters on each dataset.

The results are given in Table 1 indicate that Chen et al.’s method is more efficient and faster than other feature selection methods. Also, it is a free method with a free parameter.

Each method has its own specifications, which impacts on the consistency of the results. Moreover, Laplace score analysis reveals its competitive performance in identifying predictive genes. Although the Laplace score is unsupervised, which depends on the structure of a dataset, it may be used as a pre-processing stage due to its competitive feature. However, IDGA is a strong measurement criterion in comparison with

the Laplace score and Fisher. After reducing dimensions and selecting statistically significant genes, IDGA is applied.

The evolutionary strategy is a numerical genetic algorithm with dynamic genotyping, smart adaptive parameters, and modified genetic operators. In a similar vein, some of the evaluations are random (populations including produced chromosomes have random length). The strategy of this algorithm is based on a genetic algorithm that uses chromosomes with a variable length and numerical coding for feature selection.

These chromosomes, which are used with numerical coding by compatible genetic operators, can lead to the efficacy of the algorithm. The concurrent speed of this algorithm leads to further motivation for application in very large dimensions.

In this algorithm, the Cross rate and compatible mutation rate were used based on the social strategy of reward and punish- ment. The IDGA algorithm obtains its parameters simply with respect to the quality of the discovered solutions in comparison with the total solutions (Liu et al. 2018).

Shekar and Dagnew used the regulators of L1 features and deep learning algorithms based on linear SVM for extracting features and classifying DNA microarrays. They found that extracting these features have a significant impact on identifying cancer. They used the Softmax activator function (Shekar and Dagnew 2019) for doing so.

Guia et al. proposed another algorithm based on deep learning for classifying DNA microarrays and identifying features of microarrays in two-dimensional and three-dimensional forms. Using DNN of two-dimensional images with 95.65%

precision, it extracts features (Guia et al. 2019).

The hybrid method based on Ensamble learning of features according to the genetic algorithm is another method which firstly extracts significant features in a nested way by extracted genetic (Lee and Leu 2011). Then, it was experimented on a lung cancer dataset (Dong and Markovic 2018) by support vector machine classifier of microarray class. The results revealed a 98.4 accuracy degree (Sayed et al. 2019).

The method introduced by Janse et al. is based on the feature selection of genes in microarray data according to a 2-stage genetic algorithm. The first stage is the selection of the information of relevant genes, and the second stage is to select the best candidates from the selected genes in the first stage.

Then, classification is done according to the support vector machine algorithm (Rani and Devaraj 2019).

Lu et al. proposed a novel pathological brain detection method using AlexNet and transfer learning. In order to apply AlexNet in pathological brain detection, they employed transfer learning (Lu et al. 2018).

(6)

Table 1 Function of KBCGS and six other gene selection methods (Chen et al. 2016)

Dataset Method ACC TPR FPR Time Dataset Method ACC TPR FPR ACC_IE

On the KNN 2-class dataset On the SVM 2-class dataset

AMLALL KBCGS 97.54 1 0.072 0.2411 AMLALL KBCGS 97.84 0.9872 0.04 86.36 ± 7.91

χ2-Statistic 97.89 0.9936 0.052 1.9400 χ2-Statistic 98.07 0.9915 0.04 90.48 ± 6.47

GINI 97.77 0.9957 0.056 22.6152 GINI 98.18 0.9936 0.04 91.57 ± 6.49

Info.Gain 97.43 0.9830 0.04 1.8732 Info.Gain 98.04 0.9915 0.04 90.25 ± 6.62

KW 95.82 0.9787 0.08 19.8651 KW 95.43 0.9766 0.088 83.58 ± 8.27

Relief-F 96.11 0.9787 0.072 7.8097 Relief-F 97.64 0.9851 0.04 85.22 ± 7.31

MRMR 97.29 0.9787 0.04 45.4010 MRMR 97.20 0.9830 0.048 90.21 ± 6.78

DLBCL KBCGS 98.45 0.9579 0.0069 0.2148 DLBCL KBCGS 98.25 0.9368 0.0017 83.89 ± 7.54 χ2-Statistic 96.29 0.9632 0.0362 1.7611 χ2-Statistic 96.34 0.9316 0.0259 88.54 ± 6.82

GINI 94.25 0.9421 0.0569 27.6935 GINI 95.91 0.8947 0.0207 87.46 ± 6.63

KW 90.91 0.9947 0.1190 17.5190 KW 91.68 0.9316 0.0879 84.63 ± 6.98

Relief-F 97.21 0.9421 0.0190 2.9264 Relief-F 97.77 0.9737 0.0207 80.05 ± 7.90

MRMR 97.39 0.9895 0.0310 43.3957 MRMR 97.55 0.9526 0.0172 83.75 ± 7.60

Lung KBCGS 87.00 0.8750 0.14 0.0370 Lung KBCGS 78.67 0.9042 0.4133 73.68 ± 14.57 χ2-Statistic 81.00 0.7417 0.0733 1.2936 χ2-Statistic 73.42 0.8167 0.4 71.78 ± 14.18

GINI 81.33 0.7500 0.0933 5.0624 GINI 72.75 0.8208 0.42 71.54 ± 14.43

KW 73.08 0.7875 0.36 5.7797 KW 63.17 0.7542 0.5733 70.63 ± 14.06

Relief-F 78.33 0.8250 0.28 1.0456 Relief-F 71.75 0.8042 0.42 70.26 ± 15.58

MRMR 74.08 0.7375 0.26 44.1737 MRMR 65.25 0.7458 0.4867 71.62 ± 14.20

Prostate KBCGS 95.17 0.9231 0.022 0.4503 Prostate KBCGS 94.71 0.9173 0.022 82.31 ± 10.72 χ2-Statistic 96.80 0.9558 0.02 3.0632 χ2-Statistic 96.67 0.9596 0.026 87.84 ± 7.11

GINI 95.47 0.9462 0.036 28.6469 GINI 96.85 0.9673 0.03 86.17 ± 6.60

KW 92.95 0.9481 0.09 42.9346 KW 95.47 0.9346 0.024 80.72 ± 8.01

Relief-F 94.21 0.9192 0.034 9.6568 Relief-F 92.37 0.9096 0.062 83.17 ± 6.96

MRMR 95.88 0.9423 0.024 38.8481 MRMR 95.60 0.9423 0.03 86.24 ± 7.78

Average KBCGS 94.54 0.94 0.06 0.24 Average KBCGS 92.37 0.9364 0.1193

χ2-Statistic 92.99 0.91 0.05 2.01 χ2-Statistic 91.13 0.9248 0.1230

GINI 92.21 0.91 0.06 21.00 GINI 90.92 0.9191 0.1277

Info.Gain 93.01 0.92 0.05 1.89 Info.Gain 90.62 0.9211 0.1341

KW 88.19 0.93 0.16 21.52 KW 86.44 0.8992 0.1933

Relief-F 91.47 0.92 0.10 5.36 Relief-F 89.88 0.9181 0.1357

MRMR 91.16 0.91 0.09 42.95 MRMR 88.90 0.9059 0.1455

Dataset Method ACC Kappa ACC_IE Dataset Method ACC Kappa Time (s)

On the multi-class KNN dataset On the multi-class SVM dataset

Brain_Tumor1 KBCGS 90.67 0.8033 75.84 ± 6.64 Brain_Tumor1 KBCGS 90 0.7963 0.2865 χ2-Statistic 87.67 0.7363 76.11 ± 5.48 χ2-Statistic 86.33 0.7040 1.4253

GINI 89.67 0.7852 76.38 ± 5.61 GINI 89.89 0.7931 38.9763

Info.Gain 90.89 0.8087 77.47 ± 5.51 Info.Gain 89.11 0.7824 1.4857

KW 83.33 0.6279 74.61 ± 6.73 KW 79 0.5329 15.6658

Relief-F 86.89 0.7149 75.20 ± 5.74 Relief-F 88.33 0.7541 3.4569

MRMR 88.11 0.7450 75.79 ± 6.16 MRMR 89 0.7714 35.1308

(7)

4 The proposed method

The use of a deep learning-based method and deep neural network for identifying and extracting information from microarray datasets for classifying cancer cells is considered a novel approach. One of the most important learning approaches is the deep neural network that, in this arti- cle, has been considered. In our proposed method, GW- HDNN (grey wolf hybrid deep neural network), we used a grey wolf algorithm for extracting features and the DNN network for classifying microarray cancer datasets. The applied DNN network is a 15-layer deep neural network which consists of drop-out and dense layers. It includes 14 hidden layers and a final layer as the output. In this layer- ing, a fully connected layer was used, which is responsible for connecting the available units in each layer. These connected layers are aimed at preventing the addition of layers and accelerating them in the network training; as a result, the inputs of the next layer are removed. Also,

ReLU is used as a non-linear activation function. Figure 3 shows the architecture of the proposed method. Drop-out function (0.01) was used for normalizing and batch reset- ting. The last defined layer in the DNN model is a fully connected layer that uses Sigmoid activation function for binary classification and, also, Softmax activation operation for some classification cases. Table 2 gives the layout of the respective layers.

Figure 3 shows the flowchart and a general outline of the proposed method. The input of the set enters the proposed method. At first, the initial valuing is done based on the grey wolf algorithm. Then, the fitness function is computed for each input row. The best features are kept. The maximum input value is compared with the number of iteration. If the number of iterations is less than maximum input, the status of the features is updated, and the best current status is taken into consideration as the input of the first step. After this stage, the best extracted features are considered as the input of the DNN.

Table 1 (continued)

Dataset Method ACC Kappa ACC_IE Dataset Method ACC Kappa Time (s)

Lymphoma KBCGS 100 1 87.86 ± 6.11 Lymphoma KBCGS 100 1 0.2292

χ2-Statistic 100 1 86.89 ± 6.03 χ2-Statistic 100 1 0.9138

GINI 98.40 0.9670 85.77 ± 7.13 GINI 98.10 0.9603 15.7760

Info.Gain 100 1 86.49 ± 6.20 Info.Gain 100 1 0.9328

KW 98.38 0.9670 86.57 ± 6.31 KW 100 1 9.8692

Relief-F 100 1 85.90 ± 6.78 Relief-F 100 1 1.4333

MRMR 100 1 87.49 ± 5.81 MRMR 100 1 34.1350

NCI60 KBCGS 78.24 0.7503 43.63 ± 11.49 NCI60 KBCGS 80.52 0.7761 0.2004

χ2-Statistic 72.98 0.6883 39.99 ± 12.01 χ2-Statistic 75.29 0.7149 1.2021

GINI 33.67 0.2314 28.82 ± 9.12 GINI 38.50 0.2901 38.7968

Info.Gain 76.81 0.7326 39.99 ± 11.83 Info.Gain 78.05 0.7470 1.3379

KW 55.40 0.4861 38.11 ± 10.87 KW 66.76 0.6188 14.2269

Relief-F 75.24 0.7158 38.56 ± 12.91 Relief-F 73.02 0.6887 2.4940

MRMR 71.76 0.6733 40.12 ± 10.96 MRMR 75.69 0.7192 50.8011

SRBCT KBCGS 100 1 87.61 ± 8.78 SRBCT KBCGS 100 1 0.0768

χ2-Statistic 98.81 0.9833 86.27 ± 7.43 χ2-Statistic 99.75 0.9967 0.5811

GINI 97.60 0.9668 84.72 ± 6.62 GINI 98.65 0.9817 14.0525

Info.Gain 100 1 88.41 ± 6.34 Info.Gain 100 1 0.5623

KW 93.22 0.9061 76.7 ± 9.54 KW 99.28 0.9899 5.7861

Relief-F 100 1 90.36 ± 5.58 Relief-F 100 1 1.1407

MRMR 100 1 91.29 ± 5.64 MRMR 100 1 37.5048

Average KBCGS 92.23 0.89 Average KBCGS 92.63 0.89 0.20

χ2-Statistic 89.86 0.85 χ2-Statistic 90.34 0.85 1.03

GINI 79.83 0.74 GINI 81.28 0.76 26.90

Info.Gain 91.92 0.89 Info.Gain 91.79 0.88 1.08

KW 82.59 0.75 KW 86.26 0.79 11.39

Relief-F 90.53 0.86 Relief-F 90.34 0.86 2.13

MRMR 89.97 0.85 MRMR 91.17 0.87 39.39

(8)

The proposed DNN consists of 15 layers. The first and second layers are dense layers. Adding a fully connected layer is a cheap way for learning high-level non-linear com- binations of the features. The flattened output is fed to the feeder neural network, and “go back-” is applied in each iteration of the training. The next layer is the “drop-out,”

which is responsible for controlling the phenomenon of over-fitting. Over-fitting occurs when the neural network learns well on the training data, but it is not generalizable on the experimental set. In this method, the inputs of the

next layer are removed with 0.2 probability; then, they are re-trained. Basically, the release layer has no impact on the input or output of the next layer. It is only used for controlling proper training. The next layer is a dense layer, which is a layer with a non-linear function on previous inputs. This layer is applied to the inputs and on a non-linear function, namely Relu, so that learning neural networks is improved.

The next layer is another release layer with 0.2 probability.

This procedure continues until the 14^th layer. The final layer is a fully-connected layer with a Sigmoid activation function.

This layer sends the input probability values of the previous 14 layers to the respective class. The input microarray data of the grey wolf algorithm is placed in its respective class after 15 layers.

5 Simulation results

With respect to the above-mentioned discussion, the results of the proposed method are examined and evaluated here.

The proposed method was experimented on three cancer datasets, BRCA, LUAD, and STAD, as the datasets which were related to microarrays for cancer detection. For implementing the proposed method, Python language and Tensor- flow packages were used. Figure 3 depicts an error function for approximately 500 iterations of the proposed method and error reduction for BRCA. Training error was continuously reduced thanks to the proper use of the layout of the drop-out layers (release) and dense layer. It was fixed at 0.05, which is regarded as a desirable value. Furthermore, Table 3 gives

Fig. 3 The flowchart of proposed method

Table 2 Arrangement of analysis layers in proposed method

Layer (type) Output shape Param #

dense_253 (Dense) None, 160 3680

dropout_169 (Dropout) None, 164 0

dropout_170TT (Dropout) None, 128 0

(9)

a synopsis of the comparison of the proposed method with seven other algorithms. The proposed plan had the highest recall, accuracy, and f1 criterion values on the BRCA dataset. As shown in Fig. 4, the value of loss function for this dataset was negligible (0.05). If the value of error function regarding the LUAD dataset is analyzed, it will be found that the error, at first, had an ascending trend in 500 iterations, and it approached 0.1 value. However, within 300 iterations, the error value had an ascending tick up to 2.5. Then, the proposed method was able to desirably control the amount of error, and it was fixed at 0.1.

Figure 5 shows the diagram of the loss function and accuracy of the proposed method for the LUAD dataset. Error function for this dataset has some fluctuations, which may be attributed to the over-fitting phenomenon. Then, the dropout layer could error degree very well and reduce it up to 0.1 value. In 300 iterations, the error value was high, which might be due to the learning rate. Of course, the fluctua- tion was controlled very well and was reduced to the fixed

amount of 0.1 learning rate for the proposed method was regarded as 0.1.

As shown in Table 4, the accuracy of the proposed method for the LUAD dataset was close to 100, i.e., 99.89. In other words, in all evaluation criteria, the proposed method had the highest values. The value of the error function for the LUAD dataset was almost 0.1. Given the comparison of the proposed method with seven other algorithms, the precision of the proposed method was almost 8% better than the best algorithm (Naive Bayes). The recall value of the proposed method was 2% better than the best algorithm (Naive Bayes); also, the f1 value of the proposed method was 4%

better than Naive Bayes. Regarding accuracy, the proposed method managed to optimize it for 12%.

Figure 6 depicts a small amount of error function on the STAD dataset. After ten iterations, the amount of error became fixed. That is, like the LUAD dataset, it was 0.1. In this dataset, the value of the error function was reduced very well, and it

Table 3 Results for BRCA dataset

Algorithm Precision Recall F1-score Accuracy

GW-HDNN 95 100 97 99.19

Naive Bayes 90 95 92 94.94

SVM(rbf) 91 95 93 95.344

SVM(linear) 90 94 93 94.941

Logistic regression 96 94 96 95.546

Decision tree 91 93 92 92.914

One VS all 89 94 91 93.94

K-nearest neighbor 91 95 93 95.344

Xgboost 47.8 49.82 48.78 95.29

LightGBM 47.72 50 48.83 95.45

CNN 47.81 50 48.88 95.62

Fig. 4 The loss function associated with the BRCA dataset for the proposed method

Fig. 5 The loss function associated with the LUAD dataset

Table 4 Results for the LUAD dataset

Algorithm Precision Recall F1-score Accuracy

GW-HDNN 96 89.68 92.732 99.89

Naive Bayes 88 87 88 87.719

SVM(rbf) 74 86 79 85.964

K-nearest neighbor 74 86 79 85.964

SVM(linear) 84 87 83 86.421

One VS all 89 88 83 87.709

Xgboost 71.69 66.24 76.01 90.35

LightGBM 70.53 54.06 61.2 90.35

CNN 50.61 30.95 38.41 84.53

(10)

had an ascending peak in almost 4 to 6 iterations. Then, the ascend was stabilized, and the error amount decreased.

As shown in Fig. 6, the proposed method for the STAD dataset outperformed the other methods remarkably with respect to the evaluation criteria. The proposed method was able to achieve an accuracy value of 99.37 and a precision value of 100. If the STAD dataset is examined with regard to the results given in Table 5, it will indicate that the proposed method is 5% better than the other algorithms. In a similar vein, its recall value was almost 107% better than the best algorithm (linear regression); f1 value for the proposed method was 4% better than the closest neighbor. The accuracy value was 2% better than the linear regression. To sum it up, the results of the proposed method were better than those of other algorithms in all the criteria.

6 Conclusion

The selection of related and informative genes for cancer classification is a common task in most high-throughput gene expression studies. DNA microarray can detect the expression levels of thousands of genes under various experimental conditions. Also, microarray technology helps researchers learn about different kinds of diseases, especially cancer. Biomarker gene selection is useful for early detection and effective treatment of cancer. Early diagnosis of cancer generally increases the chances for successful treatment by focusing on detecting gene expression profiling patients. In this paper, we used a hybrid method based on a grey wolf for extracting features and 15-layer DNN neural network for detecting cancer from microarrays. The proposed method obtained 99.37 accuracy value on the STAD dataset, 99.19 value on the BRCA dataset, and 99.89 value on the LUAD dataset, which were the greatest values among all the related methods. Indeed, the proposed method was compared linear support vector machine, RBF, the closest neighbor, linear regression one versus all, Naive Bayes, and decision tree with regard to the evaluation criteria. In most cases, the proposed method was remarkably better than other algorithms. As given in 3 tables and depicted in 3 figures, the proposed method was compared with seven algorithms.

The proposed method managed to operate better than the other seven algorithms with regard to the criteria of precision, accuracy, recall, and f1. The amount of error in the two datasets of LUAD and STAD was 0.1; the amount error in the BRCA dataset was much better (0.05). The investigations indicated that the proposed method had a 0.57 improvement of Xiao et al. (2018) method on the LUAD dataset. It had a 1.11 improvement on the STAD dataset and 0.78 improvement on the BRCA dataset. Thanks to its desired results and the proximity of the accuracy value (the closeness of the predicted value to real value), the proposed hybrid method is considered to be harmless, costless, reliable, and fast in detecting cancer with minimum error in reducing death caused by cancer.

As a direction for further research, multi-task learning based on deep learning can be used as separate tasks. Also, the combination of meta-heuristic algorithms with the proposed method, such as grey wolf, genetic algorithm, and other methods can be used for detecting cancer along with deep learning.

References

Abdel-Zaher AM, Eldeib AM (2016) Breast cancer classification using deep belief networks. Expert Syst Appl 46:139–144

Fig. 6 The loss function associated with the STAD dataset

Table 5 Results for the STAD dataset

AAlgorithm Precision Recall F1-score Accuracy

GW-HDNN 100 98.73 99.3 99.37

Naive Bayes 78 72 72 71

SVM(rbf) 77 60 51 60

K-nearest neighbor 93 89 95.9 92.5

SVM(linear) 95 97 95.982 86.421

One VS all 95 94 94.49 95

Xgboost 98.71 97.72 98.21 98.33

LightGBM 97.5 95.45 96.46 96.67

CNN 96.88 96.15 96.51 96.39

(11)

Aggarwal CC (2018) Neural networks and deep learning: a textbook.

Springer, 497 p

Agrawal S, Agrawal J (2015) Neural network techniques for cancer prediction: a survey. neural network techniques for cancer prediction: a survey. Proced Comput Sci 60:769–774

Bunz F (2016) Principles of cancer genetics. Springer, p 343 Butterfield LH, Kaufman HL, Marincola FM (2017) Cancer immuno-

therapy principles and practice. Demos Medical, p 920

Chen D, Liu Z, Ma X, Hua D (2005) Selecting genes by test statistics.

J Biomed Biotechnol 2005(2):132–138

Chen H, Zhang Y, Gutman I (2016) A kernel-based clustering method for gene selection with gene expression data. J Biomed Inform 62:12–20

Dashtban M, Balafar M (2017) Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts. Genomics 109(2):91–107

Dong H, Markovic SN (2018) The basics of cancer immunotherapy.

Springer, New York, p 172

Dwivedi AK (2018) Artificial neural network model for effective cancer classification using microarray gene expression data. Neural Comput Appl 29(12):1545–1554

Fakoor R, Ladhak F, Nazi A, Huber M (2013) Using deep learning to enhance cancer diagnosis and classification. In Proceedings of the international conference on machine learning. New York, USA: ACM

Frigui H, Nasraoui O (2000) Simultaneous clustering and attribute discrimination. Ninth IEEE Int Conf Fuzzy Syst. https ://doi.

org/10.1109/FUZZY .2000.83865 1

Gao L, Ye M, Lu X, Huang D (2017) Hybrid method based on information gain and support vector machine for gene selection in cancer classification. Genom Proteom Bioinform 15(6):389–395 Gray JW, Collins C (2000) Genome changes and gene expression in

human solid tumors. Carcinogenesis 21:443–452

Guia JM, Devaraj M, Leung CK (2019) DeepGx: deep learning using gene expression for cancer classification. In: ACM 2019 IEEE/

ACM international conference on advances in social networks analysis and mining. https ://doi.org/10.1145/33411 61.33435 16 Knudson AG (2000) Chasing the cancer demon. Ann Rev Genet

34:1–19

Lee CP, Leu Y (2011) A novel hybrid feature selection method for microarray data analysis. Appl Soft Comput 11(1):208–213 Liu S, Xu C, Zhang Y, Liu J, Yu B, Liu X, Dehmer M (2018) Feature

selection of gene expression data for Cancer classification using double RBF-kernels. BMC Bioinformatics 19:396

Lo SB, Lou SA (1995) Artificial convolution neural network techniques and applications for lung nodule detection. IEEE Trans Med Imag- ing 14(4):711–718

Lu S, Lu Z, Zhang Y-D (2018) Pathological brain detection based on alexnet and transfer learning. J Comput Sci. https ://doi.

org/10.1016/j.jocs.2018.11.008

Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61

Mishra S, Shaw K, Mishra D (2012) A new meta-heuristic bat inspired classification approach for microarray data. Proc Tech- nol 4:802–806

Moslehi F, Haeri A (2019) A novel hybrid wrapper–filter approach based on genetic algorithm, particle swarm optimization for feature subset selection. J Ambient Intell Human Comput 11:1105–1127

Motieghader H, Ali Najafi A, Sadeghi B, Masoudi-Nejad A (2017) A hybrid gene selection algorithm for microarray cancer classification using genetic algorithm and learning automata. Inform Med Unlock 9:246–254

Olyaee S, Dashtban Z, Dashtban MH (2013) Design and implementa- tion of super-heterodyne nano-metrology circuits. Front Optoelec- tron 6(3):318–326

Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

Ram M, Najafi A, Shakeri MT (2017) Classification and biomarker genes selection for cancer gene expression data using random for- est. Iran J Pathol 12(4):339–347

Rani MJ, Devaraj D (2019) Two-stage hybrid gene selection using mutual information and genetic algorithm for cancer data classification. J Med Syst 43(8):235

Sayed S, Nassef M, Badr A, Farag I (2019) A nested genetic algorithm for feature selection in high-dimensional cancer microarray datasets. Expert Syst Appl 121:233–243

Sharma A, Paliwal KK (2008) Cancer classification by gradient LDA technique using microarray gene expression data. Data Knowl Eng 66(2):338–347

Shekar BH, Dagnew G (2019) L1-regulated feature selection and classification of microarray cancer data using deep learning. In: Pro- ceedings of 3rd international conference on computer vision and image processing, pp 227–242

Tabakhi S, Najafi A, Ranjbar R, Moradi P (2015) Gene selection for microarray data classification using a novel ant colony optimization. Neurocomputing 168:1024–1036

Tavakoli N, Maryam Karimi M, Norouzi A, Karimi N, Samavi S, Soroushmehr SMR (2019) Detection of abnormalities in mam- mograms using deep features. J Ambient Intell Hum Comput.

https ://doi.org/10.1007/s1265 2-019-01639 -x

Varadharajan R, Priyan MK, Panchatcharam P et al (2018) A new approach for prediction of lung carcinoma using backpropaga- tion neural network with decision tree classifiers. J Ambient Intell Human Comput. https ://doi.org/10.1007/s1265 2-018-1066-y Wang Y, Tetko IV, Hall MA, Frank E, Facius A, Mayer KFX, Mewes

HW (2005) Gene selection from microarray data for cancer classification—a machine learning approach. Comput Biol Chem 29(1):37–46

Wilmott P (2019) Machine learning: an applied mathematics introduction. Panda Ohana Publishing, p 242

Yang J, Liu YL, Feng CS, Zhu GQ (2016) Applying the fisher score to identify alzheimer’s disease-related genes. Genet Mol Res 15(2):gmr15028798

Young RA (2000) Biomedical discovery with DNA arrays. Cell 102:9–15

Zhou M, Luo Y, Sun G, Mai G, Zhou F (2015) Constraint program- ming based biomarker optimization. Biomed Res Int 2015:910515 Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.