Target-AMP: Computational prediction of antimicrobial peptides by coupling sequential information with evolutionary profile Asad Jana, Maqsood Hayata,*, Mohammad Wedyanb, Ryan Alturkic, Foziah Gazzawec, Hashim Alia, Fawaz Khaled Alarfajd a

(1)

Computers in Biology and Medicine 151 (2022) 106311

Target-AMP: Computational prediction of antimicrobial peptides by coupling sequential information with evolutionary profile

Asad Jan

^a

, Maqsood Hayat

^a^,^*

, Mohammad Wedyan

^b

, Ryan Alturki

^c

, Foziah Gazzawe

^c

, Hashim Ali

^a

, Fawaz Khaled Alarfaj

^d

aDepartment of Computer Science, Abdul Wali Khan University, Mardan, Pakistan

bDepartment of Autonomous Systems, Faculty of Artificial Intelligence, Al-Balqa Applied University, Al-Salt, 19117, Jordan

cDepartment of Information Science, College of Computer and Information Systems, Umm Al-Qura University, Makkah, Saudi Arabia

dCollege of Computer & Information Technology, King Faisal University, Saudi Arabia

A R T I C L E I N F O Keywords:

Machine learning Antimicrobial peptides SVM RF

DPC PSSM

A B S T R A C T

Antimicrobial peptides (AMPs) are gaining a lot of attention as cutting-edge treatments for many infectious disorders. The effectiveness of AMPs against bacteria, fungi, and viruses has persisted for a long period, making them the greatest option for addressing the growing problem of antibiotic resistance. Due to their wide-ranging actions, AMPs have become more prominent, particularly in therapeutic applications. The prediction of AMPs has become a difficult task for academics due to the explosive increase of AMPs documented in databases. Wet- lab investigations to find anti-microbial peptides are exceedingly costly, time-consuming, and even impossible for some species. Therefore, in order to choose the optimal AMPs candidate before to the in-vitro trials, an efficient computational method must be developed. In this study, an effort was made to develop a machine learning-based classification system that is effective, accurate, and can distinguish between anti-microbial peptides. The position-specific-scoring-matrix (PSSM), Pseudo Amino acid composition, di-peptide composition, and combination of these three were utilized in the suggested scheme to extract salient aspects from AMPs sequences. The classification techniques K-nearest neighbor (KNN), Random Forest (RF), and Support Vector Machine (SVM) were employed. On the independent dataset and training dataset, the accuracy levels achieved by the suggested predictor (Target-AMP) are 97.07% and 95.71%, respectively. The results show that, when compared to other techniques currently used in the literature, our Target-AMP had the best success rate.

1. Introduction

Antibacterial peptides (AMPs) are tiny compounds having strong antibacterial activity against viruses and other microorganisms [1]. The huge class of substances that are typically found in nature are known as antimicrobial peptides (AMPs), also known as host defense peptides (HDF). Due to their wide range of antimicrobial activities, including anticancer, antibacterial, antifungal, and antiviral properties, AMPs have emerged as a model for the introduction of new antimicrobial drugs that may aid in identifying the problem of pathogenic microorganisms developing an increasing level of multidrug resistance [2]. All multicellular organisms have developed defense mechanisms to fend against bacterial intruders and infections. Antimicrobial peptides are, therefore, crucial for their innate immune system [3]. The identification of AMPs can serve as a theoretical foundation for the research and

development of new and more effective antimicrobial drugs [4]. Small, positively charged amphipathic molecules known as AMPs are active against both gram-positive and gram-negative bacteria, fungi, protozoa, and viruses.

Due to their replacement for traditional antibiotics, AMPs have received the greatest attention in recent decades [5]. The membrane of the microbial cells or intracellular processes can be destroyed by these peptides, which can then cause cell death. They typically have a positive charge and between 15 and 45 amino acid residues [6]. Many scientists working in the field of bioinformatics are concentrating on simulating new AMPs [7]. They have potential as peptide pharmacological agents because of their short length and quick, efficient activity against microorganisms [8,9]. Due to their broad range of multidrug resistance actions and particularly their low tendency for resistance, AMPs have drawn increased interest in clinical applications [10]. Because AMPs

* Corresponding author.

E-mail address: [email protected] (M. Hayat).

Contents lists available at ScienceDirect

Computers in Biology and Medicine

journal homepage: www.elsevier.com/locate/compbiomed

https://doi.org/10.1016/j.compbiomed.2022.106311

Received 23 September 2022; Received in revised form 2 November 2022; Accepted 13 November 2022

(2)

interact with the structural components of microbial cell membranes and have several biological targets, microbes are not easily resistant to them [11]. From humans to prokaryotes, practically all living things make AMPs, which have been preserved throughout evolution in ge- nomes [12]. They can considerably boost the resistance and survival of aquatic products.

A number of computational tools have been introduced for the identification of antimicrobial peptides in order to enhance the discovery of antimicrobial drugs, including AntiBP [13], AntiBP2 [11], AMPER [14], AVPred [15], IAMP-2 [16], EFC-FCBF [17], and ClassAMP [18].

Using binary (0, 1) features, peptide sequences were translated into numeric descriptors in the AntiBP and AntiBP2 schemes, and these features were then employed as input data for several classifiers [11,13].

Support vector machines (SVM), Quantitative matrices (QM), and artificial neural networks (ANN) are used as classifiers in AntiBP. SVM yielded the best outcomes of all. Antimicrobial peptides database data was also gathered for AntiBP2 (APD). An SVM-based model for AntiBP2 was introduced using the peptide sequences’ amino acid makeup. The training sets for AntiBP and AntiBP2 were only allowed to include peptide residues at their N- and C-termini. In the CAMP technique, learning algorithms like Random Forest (RF), Discriminant Analysis (DA), and SVM were used while physicochemical characteristics were used to encode peptide sequences. Anti-microbial classes like antifungal, antibacterial, antiviral, and AMPs were used to train the algorithm [19].

A computer model for classifying AMPs was developed by Wang et al. To identify antimicrobial peptides, they applied sequence alignment data [20]. By utilizing the sequence alignment technique and the feature selection approach, the prediction was performed by classifying the query peptide into the peptide category with the highest degree of sequence similarity to the query peptide. Each peptide was coded with 270 attributes by combining the amino acid composition and pseudo amino acid composition (PseAAC). The sequence alignment model outperformed all others in terms of prediction accuracy. In order to anticipate AMPs, NG et al. suggested a new computational technique in 2015. Sequence alignment was employed, and SVM was applied as the classifier [21]. For determining the inclination of AMP sequences using SVM and RF algorithms, the ClassAMP model was suggested [18]. The ClassAMP tool was designed for the classification of antimicrobial

peptides as antifungal, antibacterial, or antiviral peptides based on sequence properties. SVM and RF algorithms were used to carry out the classification procedures. Furthermore, “iAMP-2L^′′ was proposed by Xiao et al. as a two-level predictor for the differentiation of AMPs based on fuzzy k-nearest neighbors (FKNN) as a classifier and PseAAC as a strategy for feature extraction. An anti-microbial peptide may belong to two or more functional classes, and iAMP-2L addresses this issue.

EFC-FCBF, which was initially based on machine learning and genetic programming [22], was developed by Veltri et al. to improve the prediction rate of AMPs. To collect the sequence-based features included in a peptide sequence, they used evolutionary features construction tools (EFC). They then used a fast correlation-based filter (FCBF) technique to pick the nonredundant and most informative features. A computational technique for AMP identification was created by Mehar et al. They made use of structural data and physicochemical characteristics [23]. An SVM-based computational technique has been suggested in the literature with better accuracy outcomes. In order to forecast AMPs, physio-chemical, compositional, and structural properties from AMPs were first extracted and used as input in SVM. As a learning classifier, RF was used by Badhra et al., in 2018 to propose a model for the identification of antimicrobial peptides utilizing sequential data [2].

2. Methods and materials

2.1. Dataset

A computational model’s accuracy and success rate are significantly influenced by the usage of a reliable dataset. The datasets generally include both training and testing datasets. The former is utilized for testing the proposed model, and the latter is used for training. The primary dataset and the independent dataset were both used as benchmark datasets in this study. From Xiao et al. [16], these datasets have been downloaded. The training dataset consisted of 3175 peptide sequences, of which 770 are antimicrobial peptides and 2405 are non-antimicrobial peptides. On the other hand, 1840 peptides in the independent dataset, of which 920 are antimicrobial peptides and 920 are non-antimicrobial peptides.

Fig. 1.Illustrated Propose model.

(3)

2.2. Feature extraction techniques

Features extraction is a crucial stage for machine learning-based predictions [24]. The extraction of numerical characteristics from peptide sequences is essential since the statistical model only employs numerical variables for model training. Modern feature extraction approaches have been applied in this work including Di-peptide Composition (DPC), Pseudo amino acid composition (PseAAC), and Position-Specific-Scoring-Matrix (PSSM). In addition, different combi- nations of these descriptors are combined to create a hybrid descriptor space to make up for the shortcomings of one technique over another.

The framework of proposed model is illustrated in Fig. 1. They are mentioned below.

2.2.1. Di-peptide composition (DPC)

DPC is utilized to retrieve information about amino acids using regional amino acid ordering [25]. In primary sequences, specifically, DPC shows how frequently any two adjacent residues appear [26].

Mathematically it can be shown as:

F(i) =M(i)/T (1) where i =1, 2, 3, … 400, M denotes the percentages of the di-peptide, and T signifies a number of di-peptides [27].

2.2.2. Position-specific scoring matrix (PSSM)

PSSM is employed in biological sequences to express patterns (motifs). PSSM is represented in a two-dimensional matrix, where rows determine the length of peptide sequence and cols denote the size of alphabets [28]. To describe the pattern that is inherited in multiple sequence alignments to a group of correspondent sequences, PSSM is typically utilized. PSSM’s main objective is to check the query sequence to the table’s sequence alignment after receiving it from the database [29,30]. The following formulation can be utilized to create a peptide sequence P with residues of amino acid L:

PPSSM=

⎡

⎢⎢

⎣

p1→1 p1→2 … p1→j … p1→20

E2→1 p2→2 … p2→j… p2→20

⋮ ⋮ ⋮ ⋮

pi→1 pi→2 … pi→j … pi→20

⋮ ⋮ ⋮ ⋮

pL→1 pL→2 … pL→j … pL→20

⎤

⎥⎥

⎦

(2)

In Eq. (2), pi→j the ith position of the peptide sequence has the score of the amino acid residue, and j =1, …. 20 stands for the 20 organic amino acids. PSSM is generated by running the PS1-BLAST tool [31].

2.2.3. Pseudo amino acid composition

One of the most well-known discrete methods for representing protein sequences is undoubtedly PseAAC (Chou, 2009; Nanni et al., 2014).

By altering the functional group, each amino acid residue’s characteristics may be distinguished. They have unique physicochemical properties as a result. Conventional OAAC ignores the physicochemical characteristics of the remaining twenty natural amino acid residues and solely extracts their properties. In addition, this technique is unable to describe the position of amino acid residues or the sequence order.

Therefore, in order to represent samples of membrane proteins, the PseAAC idea (Chou, 2009) was taken into consideration. The following is a representation of how the mathematical model for the composition of PseAA is

P= [p1, ...p20,p20+1, ...p20+λ,]^T (3)

where P1, P2, —, and P20 represent the normalized values of 20 natural amino acids, in addition, the correlation factors are expressed by the hydrophobicity, hydrophilicity, stiffness, flexibility, and irreplaceability of each amino acid.

⎧

⎪⎪

⎪⎨

⎪⎪

⎪⎩ τ1= 1

L− 1

∑_L−₁

i=1Ji,i+1

τ2= 1 L− 2

∑_L−₂

i=1Ji,i+2

τ3= 1 L− 3

∑_L−₃

i=1Ji,i+3

τ4= 1 L− 4

∑_L−₄

i=1Ji,i+4

τ5= 1 L− 5

∑_L−₅

i=1Ji,i+5

...

τλ= 1 L− λ

∑_L−_λ

i=1Ji,i+λ

(4)

where in the aforementioned equation (4) L denotes the length of the protein sequence, as well as the first, second, and final rank correlation factors. Following the empirical findings, we chose =21, which means that we are considering the top 21 ranks of sequence-order correlation factors.

⎧⎪

⎪⎪

⎨

⎪⎪

⎪⎩ H1(i) =

H⁰₁(i) − ∑²⁰

i=1

H⁰₁(i)

̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅20

∑²⁰

i=1

[

H⁰₁(i) − ∑²⁰

i=1

H⁰₁(i) 20

]₂

20

√√

H2(i) =

H⁰₂(i) − ∑²⁰

i=1

H⁰₂(i)

̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅20

∑²⁰

i=1

[

H⁰₂(i) − ∑²⁰

i=1

H⁰₂(i) 20

]₂

20

√√

(5)

pu=

⎧

⎪⎪

⎨

⎪⎪

⎩ fu

∑²⁰

i=1

fi+w∑^λ

j=1

Θj

(1≤u≤20)

wΘu−20

∑²⁰

i=1

fi+w∑^λ

j=1

Θj

(20+1≤u≤20+λ)

(6)

The weighted factor for the sequence-order effect is denoted by the letter “w,” and in this study, its value has been set at 0.5.

2.3. Classification algorithms 2.3.1. Random Forest

Random Forest is a classification technique that is often used for classification and regression tasks, including classifications for proteins, cloud1 screens, and biomass mapping [32,33]. Breiman [34] was the one who initially proposed RF. It consists of several trees, each of which is made up of a distinct bootstrap sample and is constructed using an equal distribution [35]. Wide-ranging benefits of RF include its optimum accuracy [36], ability to operate well on huge datasets [37], and outlier detection [38].

2.3.2. Support vector machine

SVM is a leading classification technique that is often applied in the fields of bioinformatics [39,40], image processing, and machine learning [33,41]. Vapnik [42] had initially put forward the idea. SVM creates a linear hyperplane in feature space with greater dimensions for

(4)

solving a binary classification problem [18]. SVM is typically used in pattern recognition tasks that might be useful in the categorization of biological data [42,43], such as the differentiation of protein subcellular localization and kinds of membrane proteins [44]. SVM works based on structura1 risk minimizations theory [45].

2.3.3. K- nearest neighbor (KNN)

In the fields of bioinformatics and machine learning, the KNN algorithm for classification is a simple yet widely used method. Due to its respectable performance, simplicity, and readability, this categorization algorithm is well recognized. Despite KNN’s apparent simplicity, it has outperformed other classification learners in terms of outcomes. KNN is regarded as an instance-based learner (lazy learner) since it classifies data based on its neighbors in the feature space. The KNN algorithm bases its decisions on Euclidean distance. The KNN determines its distance from each sample in the training data when a new sample has to be categorized. The euclidean distance formula is given below.

Edis(x1,x2) =∑_n

i=1

̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅

(xi1− xi2)²

√

(7) KNN arranges all the sample data as d_i=d_i+1; where i =1, 2, 3 … k.

For decision-making, KNN uses voting or other methods of data. The first one is chosen as the target class in the event of a tie. K is frequently chosen as an odd number in order to prevent ties. The number of k nearest neighbors depends on the type of data. K is considered to be little for small data and large for enormous data.

2.4. Validation check methods

The effectiveness of the predictor is validated in the literature utilizing certain cross-validation techniques. Jackknife, K-fold, and independent dataset tests are the most often applied among them [37, 46–48]. 10-fold cross-validation is implemented for model training prediction in this work. Additionally, the assessment metrics Accuracy, Sensitivity, Specificity, F-measure, Precision, and MCC were implemented. These metrics can be computed as:

Accuracy= TP+TN

TP+FP+TN+FN (8)

Sensitivity= TP

TP+TN×100 (9)

Specificity= TN

FP+TN×100 (10)

F− measure=2×precision×recall

precision+recall (11)

Precision= TP

TP+FP (12)

MCC=TP×TN− FP×FN

/ ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅

[TP+FP][TP+FN][TN+FP][TN+FN]

√

(13) where TN is a true negative, TP is true positive, FN is false negative and FP is false positive.

3. Result and discussion

In this study, a dynamic and high-throughput classification model was developed in which peptide motifs were formulated by utilizing the feature extraction techniques namely dipeptide DPC, PseAAC, and PSSM. In addition, a hybrid space was designed by combining the feature spaces. Below is a discussion of the categorization algorithms’

effectiveness and success rates.

3.1. The performance results of classifiers on different feature spaces Table 1 displays the successful outcomes of the various classifiers using a 10-fold cross-validation test. On DPC, RF attained an accuracy of 79.59%, KNN yielded 80.81% and SVM obtained 83.35%. The outcomes of RF in case PseAAC are 80.65% accuracy, SVM 82.93%, and KNN 84.42%. The performance of classifiers is improved in the case of PSSM.

RF obtained 88.25% accuracy, SVM 93.60%, and KNN got 87.73% accuracy. Further, all three feature spaces are combined and formed a hybrid space. After executing the classifiers, RF yielded 91.43% accuracy on hybrid space, KNN obtained 92.13%, and SVM achieved 97.07%

accuracy. A further examination of the classifiers’ performance reveals that SVM had the best success rate, which is 97.07%, 91.68%, 98.79%, 93.82%, and 0.91 accuracies, sensitivity, specificity, precision, and MCC, respectively. SVM was acknowledged to have consistently pro- duced excellent outcomes.

3.2. The success rate of classifiers on an independent dataset

The computational system exhibits potential generalization capa- bilities; it is viewed as being more effective. In order to evaluate the generalizability of the suggested method, an independent dataset is employed in this study. In Table 2, the success rate of several learning methods on the independent dataset is displayed. Using DPC, RF achieved an accuracy of 86.49%, SVM 87.0, and KNN 86.14. Further, the outcomes of classifiers on PseAAC feature space are executed as a result RF obtained 87.14%, SVM 86.0%, and KNN got 87.54% accuracy.

Table 1

The experimental outcomes of learners on various feature encoding schemes.

Learner Encoding Methods Acc Sn Sp F-

measure MCC

RF DPC 79.59 73.96 85.22 67.76 0.56

PseAAC 80.65 75.11 86.40 69.32 0.58

PSSM 88.25 90.76 87.80 70.33 0.66

PSSM +DPC +

PseAAC 91.43 91.23 91.48 80.20 0.75

SVM DPC 83.35 73.50 93.21 75.51 0.68

PseAAC 82.94 78.54 87.34 74.12 0.68

PSSM 93.60 82.72 97.08 86.25 0.83

PSSM þDPC þ

PseAAC 97.07 91.68 98.79 93.82 0.91

KNN DPC 80.81 76.40 85.23 69.56 0.64

PseAAC 84.42 80.45 88.39 76.26 0.69

PSSM 87.73 85.91 89.56 85.23 0.83

PSSM +DPC +

PseAAC 92.13 91.48 92.79 92.12 0.90

Table 2

Experimental outcomes of Learners on an independent dataset.

Learners Encoding Methods Acc Sn Sp Precision MCC

RF DPC 86.49 88.71 84.28 79.76 0.81

PseAAC 87.14 87.76 86.52 81.17 0.83

PSSM 89.16 88.38 89.94 84.62 0.86

PSSM +DPC +

PseAAC 94.51 92.81 92.61 96.47 0.89

SVM DPC 87.0 87.22 86.77 81.55 0.79

PseAAC 86.6 88.28 84.92 80.91 0.79

PSSM 89.14 87.63 90.66 87.32 0.90

PSSM þDPC þ

PseAAC 95.71 93.77 92.93 96.23 0.91

KNN DPC 86.14 85.83 86.45 81.55 0.79

PseAAC 87.54 86.95 88.13 82.34 0.80

PSSM 90.64 89.65 91.63 87.32 0.90

PSSM +DPC +

PseAAC 91.22 90.87 91.57 92.75 0.91

(5)

Likewise, the accuracy of RF in the case of PSSM is 89.16%, SVM 89.14%, and KNN 90.64%. The experimental results show that the outcomes of classifiers on PSSM feature space are enhanced but the improvement is not outstanding. Then the feature spaces are combined in order to constitute a hybrid space in which the feature spaces compensate for their deficiency. In the case of hybrid feature space, RF obtained 94.51% accuracy, KNN achieved 91.22% accuracy and SVM yielded 95.71% accuracy which is remarkable. Table 2 shows that SVM again obtained the highest success rates in the hybrid feature space, with an accuracy of 95.07%, sensitivity of 93.77%, specificity of 92.93%, the precision of 99.87%, and MCC of 0.91. These accomplishments are ascribed to the high performance of SVM and clear motifs of all feature spaces.

3.3. Comparison of the proposed system with published models

Table 3 includes a comparison of introduced mode1 (Target-AMP) with existing models. iAMP-2L model accuracy was 86.32% [16]

whereas iAMPpred model accuracy was 95.93%. In contrast, the accuracy of the developed model is 97.07% which is the highest accuracy among all published models. Conclusion: In terms of all assessment metrics, including accuracy, sensitivity, specificity, F-measure, and MCC, among others, our suggested predictor has so far achieved exceptional results.

4. Conclusion

Here, a vigorous and high throughput computing model has been introduced. The peptide sequences were expressed by utilizing well- known discrete techniques like PSSM, PseAAC, and DPC. Additionally, the feature spaces were blended to design the hybrid feature space.

Numerous classification techniques, including SVM, KNN, and RF, were operated. After experimental analysis, it was shown that SVM, which combines DPC, PseAAC, and PSSM, achieves the maximum accuracy which was 97.07. The outcomes of SVM on other measures were also outstanding. A similar comparison was made between the published models and that of the most recent projects. In comparison to the already used approaches, the suggested scheme achieves the best results across all performance metrics. In the future, a web predictor tool will be launched for the proposed model.

Declaration of competing interest Authors have no conflict of interest.

References

[1] W. Kamysz, M. Okr´oj, J. Łukasiak, Novel properties of antimicrobial peptides, Acta Biochim. Pol. 50 (2) (2003) 461–469.

[2] P. Bhadra, et al., AmPEP: sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest, Sci. Rep. 8 (1) (2018) 1697.

[3] K.A. Brogden, Antimicrobial peptides: pore formers or metabolic inhibitors in bacteria? Nat. Rev. Microbiol. 3 (3) (2005) 238.

[4] N.Y. Yount, M.R. Yeaman, Multidimensional signatures in antimicrobial peptides, Proc. Natl. Acad. Sci. USA 101 (19) (2004) 7363–7368.

[5] H. Jenssen, P. Hamill, R.E. Hancock, Peptide antimicrobial agents, Clin. Microbiol.

Rev. 19 (3) (2006) 491–511.

[6] H. Boman, Antibacterial peptides: basic facts and emerging concepts, J. Intern.

Med. 254 (3) (2003) 197–215.

[7] E.B. Hadley, R.E.W. Hancock, Strategies for the discovery and advancement of novel cationic antimicrobial peptides, Curr. Top. Med. Chem. 10 (18) (2010) 1872–1881.

[8] A. Loffet, Peptides as drugs: is there a market? J. Pept. Sci.: an official publication of the European Peptide Society 8 (1) (2002) 1–7.

[9] W. van t Hof, et al., Antimicrobial peptides: properties and applicability, Biol.

Chem. 382 (4) (2001) 597–619.

[10] A.K. Marr, W.J. Gooderham, R.E. Hancock, Antibacterial peptides for therapeutic use: obstacles and realistic outlook, Curr. Opin. Pharmacol. 6 (5) (2006) 468–472.

[11] S. Lata, N.K. Mishra, G.P. Raghava, AntiBP2: improved version of antibacterial peptide prediction, BMC Bioinf. 11 (1) (2010) S19.

[12] R.E. Hancock, Cationic antimicrobial peptides: towards clinical applications, Expet Opin. Invest. Drugs 9 (8) (2000) 1723–1729.

[13] S. Lata, B. Sharma, G. Raghava, Analysis and prediction of antibacterial peptides, BMC Bioinf. 8 (1) (2007) 263.

[14] C.D. Fjell, R.E. Hancock, A. Cherkasov, AMPer: a database and an automated discovery tool for antimicrobial peptides, Bioinformatics 23 (9) (2007) 1148–1155.

[15] N. Thakur, A. Qureshi, M. Kumar, AVPpred: collection and prediction of highly effective antiviral peptides, Nucleic Acids Res. 40 (W1) (2012) W199–W204.

[16] X. Xiao, et al., iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types, Anal. Biochem. 436 (2) (2013) 168–177.

[17] G. Wang, X. Li, Z. Wang, APD3: the antimicrobial peptide database as a tool for research and education, Nucleic Acids Res. 44 (D1) (2015) D1087–D1093.

[18] S. Joseph, et al., ClassAMP: a prediction tool for classification of antimicrobial peptides, IEEE ACM Trans. Comput. Biol. Bioinf 9 (5) (2012) 1535–1538.

[19] M.R. Yeaman, N.Y. Yount, Mechanisms of antimicrobial peptide action and resistance, Pharmacol. Rev. 55 (1) (2003) 27–55.

[20] P. Wang, et al., Prediction of antimicrobial peptides based on sequence alignment and feature selection methods, PLoS One 6 (4) (2011), e18476.

[21] X.Y. Ng, B.A. Rosdi, S. Shahrudin, Prediction of Antimicrobial Peptides Based on Sequence Alignment and Support Vector Machine-Pairwise Algorithm Utilizing LZ- Complexity, 2015, BioMed research international, 2015.

[22] D. Veltri, U. Kamath, A. Shehu, Improving recognition of antimicrobial peptides and target selectivity through machine learning and genetic programming, IEEE ACM Trans. Comput. Biol. Bioinf 14 (2) (2015) 300–313.

[23] P.K. Meher, et al., Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou’s general PseAAC, Sci. Rep. 7 (2017), 42362.

[24] N. Zhang, T. Huang, Y.-D. Cai, Discriminating between deleterious and neutral non-frameshifting indels based on protein interaction networks and hybrid properties, Mol. Genet. Genom. 290 (1) (2015) 343–352.

[25] S. Gupta, et al., Identification of B-cell epitopes in an antigen for inducing specific class of antibodies, Biol. Direct 8 (1) (2013) 27.

[26] Q.-B. Gao, et al., Prediction of protein subcellular location using a combined feature of sequence, FEBS Lett. 579 (16) (2005) 3444–3448.

[27] F. Ali, M. Hayat, Machine learning approaches for discrimination of Extracellular Matrix proteins using hybrid feature space, J. Theor. Biol. 403 (2016) 30–37.

[28] Z.U. Khan, et al., piEnPred: a bi-layered discriminative model for enhancers and their subtypes via novel cascade multi-level subset feature selection algorithm, Front. Comput. Sci. 15 (6) (2021) 1–11.

[29] Z.U. Khan, et al., iRSpot-SPI: deep learning-based recombination spots prediction by incorporating secondary sequence information coupled with physio-chemical properties via Chou’s 5-step rule and pseudo components, Chemometr. Intell. Lab.

Syst. 189 (2019) 169–180.

[30] I.A. Khan, et al., A Privacy-Conserving Framework Based Intrusion Detection Method for Detecting and Recognizing Malicious Behaviours in Cyber-Physical Power Networks, Applied Intelligence, 2021, pp. 1–16.

[31] T. Liu, et al., Accurate prediction of protein structural class using auto covariance transformation of PSI-BLAST profiles, Amino Acids 42 (6) (2012) 2243–2249.

[32] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, John Wiley & Sons, 2012.

[33] S. Akbar, et al., iAtbP-hyb-EnC: prediction of antitubercular peptides via heterogeneous feature representation and genetic algorithm based ensemble learning model, Comput. Biol. Med. (2021), 104778.

[34] L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5–32.

[35] F. Ali, et al., DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information, J. Comput.

Aided Mol. Des. (2019) 1–14.

[36] Z.N.K. Swati, et al., Brain tumor classification for MR images using transfer learning and fine-tuning, Comput. Med. Imag. Graph. 75 (2019) 34–46.

[37] S. Akbar, et al., iHBP-DeepPSSM: identifying hormone binding proteins using PsePSSM based evolutionary features and deep learning approach, Chemometr.

Intell. Lab. Syst. 204 (2020), 104103.

[38] M. Ullah, et al., A Foreground Extraction Approach Using Convolutional Neural Network with Graph Cut. In 2018, IEEE 3rd International Conference on Image, Vision and Computing (ICIVC), 2018 (IEEE.

[39] F. Ali, et al., DBPPred-PDSD: machine learning approach for prediction of DNA- binding proteins using Discrete Wavelet Transform and optimized integrated features space, Chemometr. Intell. Lab. Syst. 182 (2018) 21–30.

[40] F. Ali, et al., SDBP-Pred: prediction of single-stranded and double-stranded DNA- binding proteins by extending consensus sequence and K-segmentation strategies into PSSM, Anal. Biochem. 589 (2020), 113494.

[41] A. Ahmad, et al., Identification of Antioxidant Proteins Using a Discriminative Intelligent Model of K-Space Amino Acid Pairs Based Descriptors Incorporating with Ensemble Feature Selection, Biocybernetics and Biomedical Engineering, 2020.

Table 3

Comparison of the proposed approach with the published models.

Methods Acc (%) Sn (%) Sp (%) F-measure (%) MCC

iAMP-2L 86.32 87.13 86.03 75.52 0.67

iAMPpred 95.93 96.28 95.58 91.65 0.88

Proposed method 97.07 91.68 98.79 93.82 0.91

(6)

[42] V.N. Vapnik, The Nature of Statistical Learning, Theory, 1995.

[43] K.-R. Muller, et al., An introduction to kernel-based learning algorithms, IEEE Trans. Neural Network. 12 (2) (2001) 181–201.

[44] M. Hayat, A. Khan, MemHyb: predicting membrane protein types by hybridizing SAAC and PSSM, J. Theor. Biol. 292 (2012) 93–102.

[45] Y.-D. Cai, S.-l. Lin, K.-C. Chou, Support vector machines for prediction of protein signal sequences and their cleavage sites, Peptides 24 (1) (2003) 159–161.

[46] A. Ahmad, et al., Deep-AntiFP: prediction of antifungal peptides using distanct multi-informative features incorporating with deep neural networks, Chemometr.

Intell. Lab. Syst. 208 (2021), 104214.

[47] O. Barukab, F. Ali, S.A. Khan, DBP-GAPred, An intelligent method for prediction of DNA-binding proteins types by enhanced evolutionary profile features with ensemble learning, J. Bioinf. Comput. Biol. (2021), 2150018.

[48] M. Arif, et al., TargetCPP: accurate prediction of cell-penetrating peptides from optimized multi-scale features using gradient boost decision tree, J. Comput. Aided Mol. Des. 34 (8) (2020).