SAGDTI: self-attention and graph neural network with multiple information representations for the prediction of drug-target interactions

(1)

SAGDTI: self-attention and graph neural network with multiple information representations for the prediction of drug-target interactions

Item Type Article

Authors Li, Xiaokun;Yang, Qiang;Luo, Gongning;Xu, Long;Dong,

Weihe;Wang, Wei;Dong, Suyu;Wang, Kuanquan;Xuan, Ping;Gao, Xin

Citation Li, X., Yang, Q., Luo, G., Xu, L., Dong, W., Wang, W., Dong, S., Wang, K., Xuan, P., & Gao, X. (2023). SAGDTI: self-attention and graph neural network with multiple information representations for the prediction of drug-target interactions. Bioinformatics Advances. https://doi.org/10.1093/bioadv/vbad116

Eprint version Post-print

DOI 10.1093/bioadv/vbad116

Publisher Oxford University Press (OUP) Journal Bioinformatics Advances

Rights Archived with thanks to Bioinformatics Advances under a Creative Commons license, details at: https://creativecommons.org/

licenses/by/4.0/

Download date 2023-11-17 08:12:58

Item License https://creativecommons.org/licenses/by/4.0/

Link to Item http://hdl.handle.net/10754/693827

(2)

Systems Biology

SAGDTI: self-attention and graph neural network with multiple information representations for the prediction of drug-target interactions

Xiaokun Li

^2,3

Qiang Yang

^2,3

, Gongning Luo

^1,∗

, Long Xu

^2,3

, Weihe Dong

^3,4

, Wei Wang

¹

, Suyu Dong

⁴

, Kuanquan Wang

^1,∗

, Ping Xuan

^2,5

and Xin Gao

⁶

1

Biocomputing Research Center, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China,

2

School of Computer Science and Technology, Heilongjiang University, Harbin, 150080, China,

³

Postdoctoral Program of Heilongjiang Hengxun Technology Co., Ltd., Harbin, 150090, China,

⁴

College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China,

⁵

Department of Computer Science, School of Engineering, Shantou University, Shantou, 515063, China and

⁶

Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal 23955, Saudi Arabia.

∗To whom correspondence should be addressed.

Associate Editor: Shanfeng Zhu

Received on XXXXX; revised on XXXXX; accepted on XXXXX

Abstract

Motivation:

Accurate identification of target proteins that interact with drugs is a vital step

in silico, which

can significantly foster the development of drug repurposing and drug discovery. In recent years, numerous deep learning-based methods have been introduced to treat drug-target interaction (DTI) prediction as a classification task. The output of this task is binary identification suggesting the absence or presence of interactions. However, existing studies often (i) neglect the unique molecular attributes when embedding drugs and proteins, and (ii) determine the interaction of drug-target pairs without considering biological interaction information.

Results:

In this study, we propose an end-to-end attention-derived method based on the self-attention mechanism and graph neural network, termed SAGDTI. The aim of this method is to overcome the aforementioned drawbacks in the identification of DTI interaction. SAGDTI is the first method to sufficiently consider the unique molecular attribute representations for both drugs and targets in the input form of the SMILES sequences and three-dimensional structure graphs. In addition, our method aggregates the feature attributes of biological information between drugs and targets through multi-scale topologies and diverse connections. Experimental results illustrate that SAGDTI outperforms existing prediction models, which benefit from the unique molecular attributes embedded by atom-level attention and biological interaction information representation aggregated by node-level attention. Moreover, a case study on SARS-CoV-2 shows that our model is a powerful tool for identifying DTI interactions in real life.

Availability:

https://github.com/lixiaokun2020/SAGDTI.

Contact:

[email protected] or [email protected]

Supplementary information:

Supplementary data are available at

Bioinformatics Advances

online.

1 Introduction

Research on drug-target interaction (DTI) prediction is a crucial step in the repurposing of current drugs and the discovery of new drugs

(Rifaiogluet al., 2019; Ashburnet al., 2004; Zhenget al., 2020). Accurate identification of potential DTI greatly reduces the requirement of costly and time-consuming high-throughput screening (Bagherianet al., 2021;

Daet al., 2019). However, it is impractical to rapidly distinguish every possible compound-target pair because of the large-scale chemical space of 1

This is an Open Access article distributed under the terms of the Creative Commons Attribution License

(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

(3)

2 Li et al.

molecular compounds, proteins and protein-ligand complexes (Maiaet al.,

9

2020; Zhaoet al., 2022). Based on this observation, various computational

10

methods have been introduced to identify potential DTI. In the incipient

11

stage of using computational approaches for DTI identification, molecular

12

simulation, and molecular docking are the mainstream methods ( ´Sled´z

13

et al., 2018; Csermelyet al., 2013). These methods strongly depend on

14

the three-dimensional (3D) structural information of proteins. However,

15

the performance of these structure-based methods is limited due to the

16

inadequate spatial structure for many proteins. In the past decade, machine

17

learning-based approaches were introduced to overcome some of these

18

difficulties in the process of identifying DTI. For example, Ezzatet al.,

19

2017 establish a matrix factorization framework with k nearest known

20

neighbors for drug-target interaction prediction that incorporates both

21

drug and target similarities. Conventional shallow machine learning-based

22

methods, such as NRLMF (Liuet al., 2016) and KronRLS (Pahikkala

23

et al., 2015), only considered the similarity features of drug-target pairs.

24

Consequently, these approaches cannot fully exploit the comprehensive

25

relationships between drugs and proteins.

26

With the rapid development of computing power and data mining, deep

27

learning is widely accepted in the field of natural language processing and

28

computer vision (Hazraet al., 2021; Bharathet al., 2019). Deep learning

29

techniques have elevated traditional computational approaches due to the

30

advantage of comprehensively extracting latent feature representations,

31

rendering DTI prediction a research hotspot (Ru et al., 2021). Deep

32

learning methods for inferring DTI are divided into two main types.

33

One type is designed to manage a sequential input representation that

34

transforms all feature information into a vector. For instance, DeepDTA

35

(Öztürket al., 2018) utilizes the Simplified Molecular Input Line Entry

36

System (SMILES) of ligands and amino acid sequences of proteins to

37

decide the binding affinity of drug-target pairs through convolutional

38

neural networks (CNNs). Similarly, molecular transformer-based models

39

(Shin et al., 2019; Huang et al., 2021) are introduced to determine

40

the high-dimensional structure of a molecular from SMILES string and

41

character-embedding sequence through the self-attention mechanism.

42

However, these methods can not model the potential relationships of

43

compounds since the atom’s positional distribution and relative atoms are

44

fixed, thereby limiting the prediction performance. Thus, a more flexible

45

input representation of pairwise drug-target is required, another type of

46

deep learning model, namely graph-based neural networks is frequently

47

discussed. The application of graph descriptions in DTI prediction involves

48

the treatment of atoms as nodes and chemical bonds as corresponding

49

edges. Typically, GraphDTA (Nguyenet al., 2021) and IGT (Liuet al.,

50

2022) are the outstanding identifiers for predicting drug-target interaction

51

based on the graph representations of ligands and target receptors.

52

Regrettably, most graph neural network (GNN)-based methods exhibit

53

poor performance in extracting the spatial information of proteins through

54

amino acid sequence representations, which is a staple factor in DTI

55

determination. Proteins are composed of abundant atoms that require a

56

large-scale sparse (3D) matrix to receive the entire constrained spatial

57

structure. Hence, it is difficult to obtain an accurate protein high-resolution

58

3D structure. An alternative strategy was developed to handle the above-

59

mentioned problem. In this strategy, the target proteins are expressed as a

60

contact/distance map, which transforms the interaction among proteins

61

into a matrix (Zhenget al., 2020; Jiang et al., 2020). However, this

62

contact/distance map is based on a heuristic approach. Therefore, it

63

provides only an abstract outcome of the true structure of proteins, which

64

is generally distinguished from X-ray crystallography or nucleic magnetic

65

resonance (NMR) spectroscopy (Tradigoet al., 2013).

66

Several previously reported prediction models convert multi-scale

67

neighboring topologies and diverse connections (i.e. interactions,

68

associations and similarities) among biological entities as feature

69

representations to predict DTI. For example, GCDTI was proposed

70

(Xuanet al., 2022) to capture and fuse the neighboring topologies and 71

diverse connections information based on three-way GCNs and CNN 72

with multi-level attention. Similarly, Xuanet al., 2022 employed graph 73

convolutional and variational autoencoders to encode multiple pairwise 74

representations between a drug and its targets, such as attention-enhanced 75

topology, attribution, and distribution. Unfortunately, the precision of 76

these models in DTI prediction is relatively low because they ignored 77

the molecular and chemical features of drug-protein pairs. Furthermore, 78

existing deep learning methods always simply construct the interaction of 79

drugs toward related targets by concatenating feature representations that 80

cannot sufficiently express the interactions. Due to space constraints, we 81

included more relevant studies in the supplementary materials. ⁸² Considering the above information, in this study, an end-to-end 83

attention-derived method SAGDTI are introduced. SAGDTI accepts three 84

input representations from multiple molecular and biological data sources, 85

including the SMILES strings of drugs with relative atom distance, 86

the graph embedding of proteins that contain spatial information of 87

binding sites, and graph-based interaction information from biological 88

heterogeneous network. First, we introduced a molecular transformer 89

module based on multi-head self-attention (MSA) to extract the feature 90

embedding of SMILES sequences and protein graphs. The feature 91

representations of each drug-target pair were subsequently transformed 92

into an interaction pairing map and we fed it into CNNs to represent 93

the molecular attributes at the atom-level. Secondly, graph attention 94

networks (GAT) are constructed to capture the interaction information that 95

covers the multi-scale topologies and diverse connections in the form of a 96

heterogeneous network (matrix) via graph-based attention. Finally, we use 97

convolutional-pooling (C-P) networks to allocate the important weights of ⁹⁸ two obtained attribute representations and fully connected neural networks 99

(FNNs) to predict the interaction score for drug-target pairs. 100

The main contributions of our study are summarized below. 101

• To accurately capture the unique molecular information of both 102

compounds and receptors, a molecular transformer module is designed 103

to improve the relative element information among atoms in drug 104

compounds and 3D structural features of target proteins’ binding 105

pockets. 106

• Our proposed model is the first DTI prediction method to transform 107

the input representation of molecular drugs and targets into totally 108

different forms. Moreover, it aggregates the attributes of interaction 109

information from biological entities based on graph attention networks. 110

• Owing to its self-attention, the proposed method is attention- 111

derived, offering high interpretability when aggregating the attribute 112

representations from neither molecular nor biological feature 113

information due to its self-attention. 114

• To our best cognition, our experimental results are state-of-the-art 115

(SoTA) in feature representation learning models for DTI prediction, 116

as confirmed using three benchmark datasets. 117

2 Materials and Methods

118

In this study, we regarded the problem of predicting DTI as a binary 119

classification issue. We model a objective functionF(·)with multiple 120

information representations for drugs and targets to forecast the presence 121

or absence of DTI. The visualized framework of SAGDTI is shown 122

in Figure 1. Our model consists of four main parts: molecules 123

input embedding; molecular transformer module; biological interaction 124

information aggregating module and potential DTI classification module. 125

Specifically, in molecule input embedding (Figure 1B), we transformed 126

the unique molecular information of drugs and proteins into different 127

input forms. The SMILES strings of small compounds were embedded as 128

(4)

sequential vectors that cover long-distance relative position information

129

among atoms using the strategy introduced in (Zenget al., 2021). Inspired

130

by the graph input representation of proteins (Yazdaniet al., 2022), the

131

protein pockets with binding sites that interact with compounds were

132

embedded as 3D graphs. Feature representations of drugs and proteins were

133

subsequently fused to achieve the latent drug-target complex attributes

134

that include their unique characteristics through the molecular transformer

135

(Figure 1C). Next, we modeled and aggregated the biological attributes of

136

interaction information between drugs and targets by using GATs (Figure

137

1D). Finally, the two obtained attribute representations were incorporated

138

and then fed into C-Ps and FNNs for DTI prediction (Figure 1E).

139

2.1 Molecules input embedding

140

2.1.1 Drug SMILES strings

141

Small molecular drugs are expressed by the SMILES strings. Motivated

142

by previous research (Zeng et al., 2021; Li et al., 2022), the

143

SMILES strings were converted to continuous numerical vectors

144

representing atoms and chemical information ({C, C, l, ..., F,), C} →

145

{42,42,25, ...,11,31,42}). For a given drug:

146

D={a^d₁, a^d₂, ..., a^d_i, ...a^d_N∗} (1) whereai is the i_th atom andN^∗is a flexible sequence length that

147

rests with atom numbers in a compound. In this study, we restrict the

148

maximal SMILES sequence length as a hyperparameterℓ_d. According to

149

the traditional transformer model (Vaswaniet al., 2017), the initial input is

150

an embedding vector composed of a tokenized symbol and position signal,

151

which can be formulated as:

152

X^D=E_T^D+E^D_P (2) where E_T^D ∈ R^ℓ^d^×ϖ^d and E^D_P ∈ R^ℓ^d^×ϖ^d represent the token

153

embedding and position embedding, respectively, andϖ_dis the hidden

154

feature dimension of atomai. andX^D denotes the embedding vector

155

of the atom sequence. To model the relative distance between atoms, we

156

describe the unique molecular characteristics usingmkinds of relative

157

correlations between atoms. Inspired by (Shawet al., 2018), the input

158

embedding of drugs with spatial informationD^Incan be mathematically

159

expressed as:

160

D^In=Sof tM ax((X^DW^Q)(X^DW^K)^T+A^R

√ϖ_d )(X^DW^V) (3)

161

A^R_ij= (X_i^DW^Q)(Wmin(k,|i−j|)^R ) (4) where W^Q, W^K, W^V ∈ R^ϖ^d^×ϖ^d denote parameter matrices of

162

query, key and value in attention layer.A^R ∈ R^ℓ^d^×ℓ^dindicates the

163

relative-aware relationship matrix between atoms andX_i^D is theith 164

atom embedding inX^D.W^R ∈ R^m×ϖ^dis the learnable parameters

165

composed ofmkinds of relative relationship between atoms, and the

166

virtual “nodes” (e.g., @, =, [, etc.) inW^Rare selected to be zero. Thus,

167

when we models the relative distance between atoms, the distance between

168

special characters in SMILES strings is 0.

169

2.1.2 Protein graphs

170

We utilized the spatial structure of large-scale proteins that are collected

171

from the Protein Data Bank (PDB) (Bermanet al., 2000). This is a

172

worldwide dataset that provides experimental evidence for proteins (e.g.,

173

through X-ray diffraction, cryo-electron microscopy, NRM). We sought

174

to capture the binding sites of protein pockets from 3D structures. For this

175

purpose, we adopted the algorithm introduced by (Saberiet al., 2014) to

176

transform the bounding box coordinates of each protein’s binding site into 177

a set of peptide fragments. For a given protein: 178

P={b^p₁, b^p₂, ..., b^p_j, ...b^p_M∗} (5) whereb^p_j ∈ R^ℓ^p^×υis thejthbinding site in protein,ℓpandυare the 179

maximum number of binding sites and the maximum number of atoms 180

in binding sites, respectively.M^∗denotes the varied number of protein 181

binding sites. Since the binding sites of a protein were recognized, we 182

drew individual graphs to represent each atoma^p(node) and relationships 183

between atoms (edges, denoted bye^p) for each binding site. For each 184

atoma^p, we created a feature vector with a size ofk(see Table S1 for 185

details), which was comprised of atom symbols, degree, electrical charge, 186

hydrogen, etc. Thus, each binding site graph’s node matrix is denoted 187

as X^P ∈ R^υ×k. Moreover, we use a simple linear transformation 188

to encodeP^In = W^P(X^P)^T, leading to a real-valued dense matrix 189

P^In ∈ R^υ×D^p as the input graph embedding.Dpis the high-level 190

feature dimension. 191

2.2 Molecular transformer module

192

In recent years, many attention-based algorithms have been developed 193

to address the identification of drug-related targets, such as transformer 194

(Vaswaniet al., 2017) and BERT (Kanget al., 2022). Inspired by previous 195

studies (Shinet al., 2019; Huanget al., 2021), we introduce a molecular 196

transformer module to extract the feature representations of drugs and 197

proteins, which cover their unique molecular characteristics. 198

Specifically, for the drug embedding representationD^In, we fed it into 199

a MSA layer to acquire the attention scores of each atom in the drug. We 200

subsequently threw the output of MSA into a FNN layer, and both MSA 201

and FNN layers are combined with residual operation (Heet al., 2015) and 202

layer normalization. The physical structure of the molecular transformer 203

encoder is shown in supplementary materialsFigure S1. Mathematically, 204

the drug feature representation can be formulated as: 205

M T s(D^In) =F N N(X^{M SA}+D^In) (6)

206

X^{M SA}=Concate(head1, head2, ..., headm)Wo (7)

207

headi=Att(Q, K, V) (8)

208

Att(Q, K, V) =Sof tM ax(QK^T

√ed

)V (9)

whereQ∈R^p×e^d,K∈R^p×e^dandQ∈R^q×e^v describe the a query 209

and key-value pairs by attention functionAtt().headiandWoare theith ²¹⁰

head and learnable parameter matrix in MSA layer, respectively.X^{M SA} 211

is the output feature of MSA layer. Each molecular transformer encoder 212

M T swas stacked using the residual operation to enhance molecular ²¹³

feature extraction, which is expressed as: 214

D^L_R=M T s(D^L−1_R ) +D^L−1_R (10) whereD^L−1_R is the input ofL_thmolecular transformer encoder, when 215

L −1 = 0, D^L−1_R equals toD^In. Similarly, we model the latent 216

feature representation of proteinsP_R^Lfrom molecular graph embedding 217

P^Inbased on the transformer encoders. 218

To further learn the molecular attributes of drug-protein pairs at the 219

atom-level, the two obtained feature representations for a given drug and its 220

binding sites were converted into an interaction pairing map. Specifically, 221

for each atoma^d_i in a given drug and atoma^p_j in a target protein, we 222

constructed pairwise interactions via the following formula: 223

IM= Λ(D^L_R, P_R^L) (11) whereIMis an interaction pairing map andΛ(·)denotes the operation 224

of the scalar product. Scalar products provide an interpretable approach 225

(5)

4 Li et al.

Fig. 1.The framework of our proposed model. (A) Open-source data for drug SMILES strings and 3D structures of proteins. (B) Molecules input embedding module that identifies the non-covalent intermolecular attributes of drugs and proteins, and subsequently represents the features in the forms of sequences and graphs. (C) Molecular transformer module, in which we construct sequential transformer encoders to learn the unique feature representations of drugs SMILES and the binding sites of proteins, transform them into an interaction pairing map, and then feed them into a convolutional layer to extract molecular attributes. (D) Biological interaction information aggregating module that models the multi-scale topologies and diverse connections; the biological attributes are represented using a graph attention network. (E) Potential DTI classification module, where we predict unknown interactions in a drug–target pair.

to integrating the intensity of interaction between individual drug-protein

226

pairs at the atom level. InIM, when a value is close to 1, the atoms

227

of the corresponding pairwise small molecules are certainly binding with

228

each other, otherwise, the value is 0. Thus, by utilizing this map, we

229

explicitly determined the atoms of drug-protein pairs that contributed to

230

the interaction result.

231

A CNN layer was employed to extract the adjacent information of 232

neighboring atoms for DTI inIM. LetWConv ∈ Rⁿ^w^×n^l be the 233

the convolutional filter, wherenw andnlcorrespond to the width and 234

length of the filter, respectively. There arenf channels to describe the 235

molecular attributes of a drug and its binding protein in the CNN layer. 236

The zero-padding technique is used to serve the marginal information of 237

the atom binding features. Thus, the molecular attribute representations 238

(6)

X^{M ol}between drugs and targets are obtained via flattening operation.

239

X^{M ol}=F latten(Conv(WConvIM, n_f)) (12)

2.3 Biological interaction information aggregating module

240

In this module, we introduced GATs to model the multi-scale neighboring

241

topologies and diverse connections from biological interaction information

242

for improving the performance of DTI prediction. Generally, existing

243

DTI prediction models only consider the non-covalent intermolecular

244

interactions among SMILES strings and amino acid sequences. When

245

some unexpected noise occurs, the accuracy may reduce significantly by

246

the unavailability of biological feature information of drug-protein pairs.

247

Therefore, we incorporated the biological characteristics of drug-protein

248

pairs to increase the stability and robustness of the model. Numerous

249

studies have demonstrated that the biological interaction information of

250

drugs and targeted proteins plays an important role in predicting DTI (Peng

251

et al., 2021; Zhaoet al., 2021). The schematic of the GAT mechanism are

252

shown in supplementary materialsFigure S2.

253

Specifically, we first constructed a heterogeneous network, which

254

contained all interaction information of multi-scale topologies and diverse

255

correlations at the node-level. We converted the heterogeneous network

256

into an undirected graphG= (V, E)whereV represents different nodes

257

andErepresents the different edges between nodes. Afterward, the GAT

258

layer was employed to extract the input representationhof each node.

259

h={−→ h1,−→

h2, ...,−→ hi, ...,−−→

hK^∗},−→

hi∈R^Z (13) where−→

hiandK^∗are theithnode and the number of adjacent nodes, and

260

Zdenotes the feature dimension. For each node−→

hi, we employed masked

261

attention to calculate the attention coefficients, which means we only focus

262

on the node−→

hiand its first-order neighboring node−→ hj∈ Ni.

263

eij=β(α^T[W−→ hi∪W−→

hj]) (14)

whereeijandWdenote attention coefficients and the weight matrix of

264

node feature transformation, respectively. Intuitively, the weights in GAT

265

layer are shared, which greatly improve the efficiency in computation.

266

The symbol∪ denotes the concatenation operation of the two node

267

representations, andα∈R^2Zis a weight vector.β(·)is theLeakyReLU

268

activation function with a negative input slope of 0.2. The value ofeij 269

reveals the importance of−→ hj to−→

hi. To better allocate the weight and

270

calculate the correlation strength between the central node−→

hiand all its

271

neighbors, the attention coefficients are then normalized by utilizing a

272

SoftMaxfunction.

273

aij=Sof tM axj(eij) (15) We sought to represent the interaction features more comprehensively

274

between nodes. For this purpose, we also adopted the MSA strategy when

275

normalizing the attention coefficients by employing a non-linear activation

276

function. The output feature can be formulated as:

277

−

→ h^′_i=

Head

[

k=1

β(X

j∈N_i

a^k_ijW^k−→

hj) (16)

wherekmeans thekthhead of attention coefficientsa^k_ij.

278

After extracting the complex feature representation of interaction

279

information between a drug and its target through consecutive GAT layers,

280

we then utilized a global max pooling to output the biological attribute

281

representation vector. The output representationX^Biois expressed as:

282

X^Bio=M axP oolng(−→

h^′_i) (17)

2.4 DTI classification module

283

Most existing methods only simply concatenate the learned feature 284

representations or include them into several FNN layers. These cannot 285

fully learn the real-world interaction knowledge between a drug and its 286

binding target. Inspired by Zenget al., 2021 in modeling the DTI feature, 287

C-P and FNN layers are applied to map the extracted features into a final 288

classification output. Firstly, the molecular attribute representationsX^{M ol} 289

and the biological attribute representationsX^Biowere spliced left and 290

right to fuse the ultimate feature representationX^{U lt}. 291

X^{U lt}= [X^{M ol};X^Bio] (18) Secondly, the fused attribute representationsX^{U lt} of a drug-target 292

pair was fed into C-P and FNN layers to learn the whole DTI features, and 293

aSigmoidfunction was used to output the final interaction score. 294

Y^Out=S(F N N(P ool(Conv(W_Conv^′ X^{U lt}, n^′_f)))) (19) where W_Conv^′ and n^′_f denote the convolutional kernel and 295

corresponding channels, respectively,Srepresents theSigmoidfunction, 296

in which S(x) = _1+exp(−x)¹ . We applied the Adam algorithm to 297

optimize the objective and utilize the binary cross-entropy (BCE) loss 298

function to minimize the difference between the ground truth and the 299

predicted probability, which can be calculated as: 300

Loss=−1 λ

λ

X

i=1

Y_i^∗·log(Y_i^Out) + (1−Y_i^∗)·log(1−Y_i^Out) (20)

whereλis the number of training samples andY_i^∗represents the real label 301

of a drug-target pair. When the value ofLossis close to 0, it indicates the 302

SAGDTI model can predict the DTI with high accuracy. 303

3 Results

304

3.1 Benchmark datasets

305

In this study, we evaluated the merit of our proposed model using three 306

benchmark datasets, BindingDB dataset (Gilsonet al., 2016), Kinase 307

dataset Davis (Daviset al., 2013), and Kinase Inhibitor BioActivity (KIBA) 308

(Tanget al., 2014), which are widely utilized in previous studies (Liet al., 309

2022; Chenget al., 2022). The SMILES strings of molecular drugs are 310

collected from the PubChem database (Kimet al., 2021) based on their 311

PubChem IDs, the 3D structure information of proteins is derived from the ³¹² PDB database (Bermanet al., 2000), and the interaction information of 313

drug-target pairs is obtained according to prior work (Zhaoet al., 2021).In 314

this article, when the structure information of a protein is not available 315

in the PDB database, we will remove it to make sure the model can fully 316

learn the interaction features of drug-protein pairs.Table 1summarizes 317

the statistical information for these datasets. These datasets are processed 318

to filter out the invalid drug-target interactions for data processing (see 319

supplementary materials). 320

3.2 Evaluation criteria

321

To estimate the utility of our DTI prediction model, we employed five 322

metrics in the experiments in total, which are Area Under the Receiver 323

Operating Characteristics curve (AUROC), Area Under the Precision- 324

Recall curve (AUPR), Matthews correlation coefficient (MCC) (Heet al., 325

2022),F1-Score and Balanced Accuracy (B.Acc, composed of Sensitivity 326

and Specificity). The mathematical formula of the above five metrics is 327

displayed in the supplementary materials. 328

(7)

6 Li et al.

3.3 Experiments setup

329

To comprehensively evaluate the performance of DTI prediction, we

330

adopted three experimental settings, i.e. a new-target setting, a new-

331

drug setting, and a pairwise setting. In the new-target setting, targets are

332

randomly split into five identical subsets, four of those are regarded as

333

inference targets, while the remaining subset is treated as test targets. In

334

the training process, the SAGDTI model learns the inference data, which

335

contains the inference targets and all the drugs. Subsequently, the trained

336

model is employed to suggest the pairwise interactions between test targets

337

and all the drugs. In this regard, the new-drug setting is analogous to the

338

new-target setting, and we arbitrarily divide the drugs into five groups. The

339

SAGDTI model is trained and applied to predict the interaction between

340

new drugs and all the targets. In the pairwise setting, the known DTI is

341

like drugs or targets, which are grouped as five equal folds.

342

Similar to the Co-VAE, we clipped the dataset as training data and

343

testing data at a 5:1 ratio. The training data were used as a training set

344

and a validation set to identify the optimal parameters through a five-fold

345

cross-validation strategy. For each setting, we applied the trained SAGDTI

346

model on the test set and repeated this process five times to obtain the final

347

mean and standard deviation results. We adopted identical settings for

348

impartial comparison when comparing our proposed model with existing

349

methods. To perform the SAGDTI model, several hyperparameters are

350

shown inTable 2(See supplementary materials for training details).

351

3.4 Comparison with SoTA methods

352

The performance of the proposed SAGDTI for DTI prediction was

353

compared with several SoTA methods, including DDR (Olayanet al.,

354

2018), DeepDTA (Öztürket al., 2018), GraphDTA (Nguyenet al., 2021),

355

IGT (Liuet al., 2022), Moltrans (Huanget al., 2021), AttentionSiteDTI

356

(Yazdaniet al., 2022), MATTDTI (Zenget al., 2021), and Co-VAE (Li

357

et al., 2022). For a fair comparison, we used the same experimental

358

settings on all the models and the original hyperparameter reported in

359

the corresponding publications. Details of the compared models can be

360

found in supplementary materials.

361

We first adopted the BindingDB dataset to evaluate the SAGDTI model

362

by drawing the Receiver Operating Characteristics curve and Precision-

363

Recall curve, which record the highest performance in the experimental

364

results. As plotted inFigure 2, our proposed model SAGDTI achieved the

365

best AUROC and AUPR values compared with other advanced methods.

366

Specifically, SAGDTI obtained the best performance with 0.967 (AUROC)

367

and 0.917 (AUPR), which is 0.008 and 0.017 superior to in the second-best

368

model, respectively. The results roughly show that SAGDTI specializes in

369

inferring the interaction of pairwise drug-target. In the BindingDB dataset,

370

the training set and test set were extremely unbalanced, indicating that the

371

AUPR metric is more credible for expressing the prediction performance.

372

The results for three experimental settings are presented in Table 3.

373

According to the AUROC performance, SAGDTI achieved the best values

374

and was 1.8%, 0.97% and 1.9% better than other prediction models in terms

375

of three settings (i.e. new-target setting, new-drug setting and pairwise

376

Table 1. Statistics of the three benchmark datasets: BindingDB dataset, Davis and KIBA.

Datasets Metric Target Drug Interaction

KD 587 6,704 31,239

BindingDB KI 1,225 126,122 249,598

IC50 2,582 407,462 653,344

EC50 699 75,142 105,364

Davis KD 442 68 30,056

KIBA KIBA score 229 2,111 118,254

Table 2. Statistics of the three benchmark datasets: BindingDB dataset, Davis and KIBA.

Parameters range

Max length (Drug) 150

Max length (Target) 1200

Attention head in MTs (Drug) [4-12]

Attention head in MTs (Target) [4-12]

Attention head in GAT [8-12]

Filter size 3*3

Number of Filters [32, 64, 96]

Kernel size of the pooling layer 2*2

Dropout [0.1-0.6]

Optimizer Adam

Learning rate [0.1, 0.01, 0.001, 0.0001]

Epoch [30, 50, 100]

Batch size [32, 64, 128, 256]

Fig. 2.Comparison of the SAGDTI with eight SoTA models in AUROC and AUPR metrics.

Left: ROC curves of different DTI models. Right: PR curves of different DTI models.

setting).Furthermore, we acquired the average performance of the five 377

metrics among the three settings based on the BindingDB dataset (Table 378

4). It demonstrated that SAGDTI is competent enough in DTI prediction, 379

as reflected by the average metric values obtained under three experimental 380

settings: AUROC (0.967), AUPR(0.914), MCC(0.649), F1-Score (0.861) 381

and B.Acc (0.901). 382

Existing prediction models (e.g., Co-VAE, MATTDTI and 383

AttentionSiteDTI) also exhibited excellent performance. The Co- 384

VAE model introduced a novel co-regularized variational autoencoder 385

framework with a strong generative capability. Co-VAE can retain all 386

the learned DTI features by generating the SMILES strings and amino 387

acid sequences to model the original drug or target features. Thus, Co- 388

VAE demonstrated an excellent performance in the identification of DTIs. 389

However, the Co-VAE model ignored the unique molecular information 390

and simply fused the features of drugs and targets through concatenation. 391

MATTDTI considered the long-distance relative relationships among 392

atoms in drugs, and used multi-head attention to compute the DTI 393

similarities. The AttentionSiteDTI method took advantage of the 3D 394

structure information based on the binding sites of proteins and utilized 395

a attention-based module to represent the output vector. All these models 396

captured the DTI features without fully exploiting the unique molecular 397

feature attributes. We speculate that SAGDTI outperformed other SoTA 398

models because it included the whole unique characteristics of both 399

drugs and targets, as well as aggregated the neighboring topologies 400

(8)

Table 3. Comparison of SAGDTI with other eight SoTA methods in AUROC and AUPR metrics under three different experimental conditions, using three benchmark datasets. Blod: best results.

Datasets Methods new-target new-drug pairwise

AUROC(std) AUPR(std) AUROC(std) AUPR(std) AUROC(std) AUPR(std)

BindingDB DDR 0.834 (0.033) 0.747 (0.051) 0.827 (0.037) 0.725 (0.062) 0.823 (0.031) 0.719 (0.054) DeepDTA 0.868 (0.007) 0.823 (0.011) 0.864 (0.006) 0.776 (0.0011) 0.842 (0.006) 0.811 (0.009) GraphDTA 0.875 (0.008) 0.835 (0.0010) 0.852 (0.009) 0.783 (0.013) 0.851 (0.006) 0.829 (0.009) Moltrans 0.893 (0.011) 0.842 (0.015) 0.878 (0.012) 0.812 (0.013) 0.858 (0.009) 0.847 (0.011) IGT 0.912 (0.007) 0.855 (0.011) 0.891 (0.009) 0.833 (0.015) 0.887 (0.006) 0.852 (0.009) AttentionSiteDTI 0.937 (0.005) 0.883 (0.009) 0.912 (0.006) 0.853 (0.011) 0.908 (0.005) 0.871 (0.007) MATTDTI 0.928 (0.004) 0.875 (0.008) 0.907 (0.005) 0.864 (0.007) 0.906 (0.003) 0.869 (0.005) Co-VAE 0.955 (0.002) 0.897 (0.004) 0.921 (0.001) 0.893 (0.002) 0.914 (0.002) 0.884 (0.003) SAGDTI 0.967 (0.001) 0.914 (0.002) 0.934 (0.002) 0.886 (0.002) 0.946 (0.001) 0.901 (0.001) Davis DDR 0.784 (0.035) 0.451 (0.067) 0.763 (0.033) 0.407 (0.056) 0.739 (0.023) 0.411 (0.049) DeepDTA 0.799 (0.013) 0.526 (0.021) 0.773 (0.015) 0.491 (0.022) 0.752 (0.011) 0.485 (0.018) GraphDTA 0.816 (0.015) 0.547 (0.026) 0.802 (0.011) 0.514 (0.019) 0.783 (0.013) 0.501 (0.019) Moltrans 0.849 (0.016) 0.577 (0.022) 0.826 (0.014) 0.531 (0.021) 0.815 (0.012) 0.524 (0.17)

IGT 0.873 (0.013) 0.593 (0.023) 0.849 (0.013) 0.546 (0.019) 0.839 (0.011) 0.537 (0.018) AttentionSiteDTI 0.902 (0.009) 0.601 (0.017) 0.872 (0.011) 0.578 (0.020) 0.865 (0.008) 0.561 (0.0.015)

MATTDTI 0.917 (0.006) 0.625 (0.014) 0.905 (0.007) 0.612 (0.019) 0.873 (0.005) 0.594 (0.009) Co-VAE 0.922 (0.003) 0.644 (0.009) 0.911 (0.005) 0.641 (0.009) 0.886 (0.002) 0.632 (0.003) SAGDTI 0.937 (0.002) 0.645 (0.005) 0.921 (0.003) 0.636 (0.004) 0.903 (0.001) 0.627 (0.002) KIBA DDR 0.792 (0.024) 0.446 (0.041) 0.773 (0.027) 0.416 (0.040) 0.746 (0.019) 0.404 (0.035) DeepDTA 0.804 (0.007) 0.517 (0.015) 0.784 (0.008) 0.485 (0.017) 0.765 (0.008) 0.492 (0.015) GraphDTA 0.824 (0.009) 0.556 (0.014) 0.806 (0.007) 0.512 (0.013) 0.799 (0.006) 0.516 (0.011) Moltrans 0.856 (0.008) 0.583 (0.015) 0.831 (0.006) 0.558 (0.014) 0.827 (0.007) 0.567 (0.010) IGT 0.886 (0.008) 0.607 (0.013) 0.853 (0.007) 0.601 (0.015) 0.841 (0.007) 0.591 (0.013) AttentionSiteDTI 0.907 (0.006) 0.625 (0.012) 0.886 (0.007) 0.615 (0.013) 0.864 (0.006) 0.586 (0.009) MATTDTI 0.921 (0.005) 0.632 (0.009) 0.914 (0.005) 0.627 (0.010) 0.889 (0.004) 0.619 (0.007) Co-VAE 0.932 ((0.002)) 0.645 (0.003) 0.918 (0.001) 0.651 (0.003) 0.894 (0.001) 0.645 (0.002) SAGDTI 0.945 (0.001) 0.657 (0.002) 0.927 (0.001) 0.643 (0.002) 0.911 (0.001) 0.645 (0.001)

Table 4. Performance evaluation for predicting DTI on the BindingDB dataset using the average values of five metrics.Blod: best results.

AUROC AUPR MCC F1score B.Acc

DDR 0.805 (0.037) 0.684 (0.057) 0.425 (0.071) 0.529 (0.043) 0.758 (0.035)

DeepDTA 0.883 (0.009) 0.817 (0.011) 0.597 (0.046) 0.654 (0.037) 0.811 (0.026)

GraphDTA 0.886 (0.007) 0.821 (0.010) 0.584 (0.041) 0.676 (0.034) 0.819 (0.027)

Moltrans 0.917 (0.007) 0.834 (0.013) 0.605 (0.031) 0.669 (0.035) 0.823 (0.031)

IGT 0.926 (0.008) 0.857 (0.011) 0.631 (0.035) 0.817 (0.031) 0.842 (0.028)

AttentionSiteDTI 0.925 (0.006) 0.874 (0.009) 0.627 (0.029) 0.840 (0.027) 0.857 (0.019)

MATTDTI 0.931 (0.005) 0.886 (0.004) 0.644 (0.021) 0.833 (0.021) 0.849 (0.016)

Co-VAE 0.959 (0.002) 0.897 (0.003) 0.653 (0.012) 0.856 (0.019) 0.871 (0.011)

SAGDTI 0.967 (0.002) 0.914 (0.002) 0.649 (0.013) 0.861 (0.016) 0.901 (0.011)

and diverse connections. In summary, we intuitively compared the

401

prediction capability of the eight models using the curve plots and different

402

experimental settings. Our SAGDTI exhibited the highest performance

403

overall.

404

We further employed the Davis and KIBA datasets to further evaluate

405

the prediction performance of our SAGDTI model. In the Davis dataset,

406

our model almost exhibited the best performance in each experimental

407

setting. In the new-drug setting, it ranked second only to the Co-VAE

408

model in terms of the AUPR value. As shown inTable 3, SAGDTI obtained

409

1.6%, 1.1% and 1.9% enhancement in the AUROC metric under the new-

410

target, new-drug and pairwise settings, respectively. In terms of AUPR

411

values, the performance of all methods was markedly decreased due to the

412

obvious shortage of DTI training samples in the Davis dataset. However, 413

our proposed model is still SoTA, like the Co-VAE model performs in ⁴¹⁴ the new-drug and pairwise settings. We also noted excellent performance 415

in the KIBA dataset with AUROC values of 0.945, 0.927 and 0.903, and 416

AUPR values of 0.657, 0.643 and 0.645 for the corresponding settings, 417

respectively. 418

Generally, the high accuracy of random split settings is known to be 419

overestimated, not a model’s real-world prediction performance. When 420

there is little information about the interaction between a protein and a 421

drug, the learning performance of most models drops greatly. Thus, we 422

adopted a type of non-overlapped sampling splitting strategy (i.e., cold start 423

splitting) for better evaluation. Inspired by (Ataset al., 2023)., we used the 424

(9)

8 Li et al.

modified Davis (mDavis) dataset to make sure the model can learn general

425

principles from data rather than memorizing it. We randomly select 10% or

426

15% DTI pairs for test samples and remove all of their associated drugs and

427

proteins from the training samples. The results inTable 5indicate that all

428

methods have a great performance decline in cold start splitting. However,

429

SAGDTI still achieves the best performance against other SoTA baselines,

430

and, interestingly, the enhancement is even more obvious. According to the

431

overall experimental results we obtained, it can be easily concluded that

432

SAGDTI outperforms existing SoTA models in the task of DTI prediction.

433

We think that our SAGDTI model can be significantly improved due to

434

the full consideration of the unique non-covalent intermolecular attributes

435

at the atom-level and biological interaction attributes at the node-level

436

between drugs and targets.

437

Table 5. The results of cold start splitting on mDavis dataset. Blod: best results.

Methods Remove 10% Remove 15%

AUROC AUPR AURPC AUPR

DeepDTA 0.780 0.501 0.728 0.453

GraphDTA 0.778 0.526 0.732 0.511

Moltrans 0.805 0.564 0.773 0.547

MATTDTI 0.809 0.584 0.785 0.564

Co-VAE 0.816 0.605 0.796 0.585

SAGDTI 0.863 0.617 0.843 0.604

3.5 Ablation study

438

To ensure the stability and robustness, we also conducted a comprehensive

439

ablation study on contribution of different modules in SAGDTI. The

440

AUROC and AUPR were applied to evaluate the effectiveness of five

441

SAGDTI variants (see supplementary materials for detail descriptions).

442

For example, the SAGDTIIImeans the biological interaction information

443

aggregating module was removed from the proposed model. Through this

444

step, we overlooked the attributes of topologies and diverse connections

445

between drugs and targets, and considered only the molecular attributes.

446

Consequently, the performance of SAGDTI (without GATs) has declined

447

markedly (see Table S3 and S4). In our work, we introduce multi-

448

sources input information for SAGDTI, and thus prior knowledge (e.g.,

449

the drug SMILES sequence, protein 3D structure, and multi-scale

450

drug-target interaction information) may be produced. To avoid an over-

451

optimistic evaluation of the proposed method, we removed the different

452

input elements accordingly. The performance of SAGDTI with different

453

input information is shown in Table S5. Obviously, the performance

454

of our proposed model exhibits a downward trend to some extent

455

without adequate input source. However, SAGDTI still has a competitive

456

performance in identifying the interaction between drugs and targets,

457

though we neglect the protein 3D structure and multi-scale biological

458

information simultaneously. Table S5 indicate that comprehensively

459

consider the molecular and biological information is beneficial for learning

460

high-level feature representations of drugs and targets and for improving

461

prediction performance. These results also demonstrated that our SAGDTI

462

has a good generalization ability.

463

3.6 Case study

464

Since December 2019, the infection of the new coronavirus (SARS-CoV-

465

2) has rapidly spread worldwide, leading to a pandemic (Kirtipalet al.,

466

2020). Hence, there is an urgent need to identify effective drugs for

467

patients infected with SARS-CoV-2. Thus, we enforced case studies on

468

five SARS-CoV-2-related drugs (i.e. Remdesivir, Lopinavir, Budesonide,

469

Dexamethasone and Aripiprazole) to further verify the capability of

470

SAGDTI for determining possible DTI. Each drug interacts with at least 471

12 different targets. Since space limitation, the top 10 candidate proteins 472

identified by SAGDTI for each drug were listed inTable S6. 473

For brevity, we discuss below our findings for the drugs remdesivir and 474

lopinavir. Remdesivir (a nucleoside analog) possesses antiviral activity 475

withEC50values of 74 nM against SARS-CoV and MERS-CoV in HAE 476

cells and 30 nM against murine hepatitis virus in delayed brain tumor 477

cells (Olenderet al., 2021). In animal and in vitro models, remdesivir has 478

demonstrated activity against the viral pathogens of SARS and MERS, 479

which are also coronaviruses and are structurally very similar to SARS- 480

CoV-2 (Kokicet al., 2021). We ranked the top 10 targets that are verified 481

to interact with the remdesivir. The top 2 targets, which are the Replicase ⁴⁸² polyprotein 1ab (Rep) and the RNA-directed RNA polymerase L (L), 483

happen to be the genome sequences of SARS-CoV-2-related disease. 484

Lopinavir is a protease inhibitor of the human immunodeficiency virus 485

(HIV)-1 and HIV-2. It is combined with ritonavir for the treatment of 486

HIV infection (Rebelloet al., 2018). Through experiments, we identified 487

a SARS-CoV-2-related target Pol polyprotein (Pol) that interacts with 488

lopinavir. According to the results, our proposed prediction model can 489

assist scientists in drug development and clinicians in real-life practice. 490

3.7 Discussion

491

We have demonstrated the advantageous performance of SAGDTI through 492

the evaluations of the BindingDB, Davis and KIBA datasets under three 493

experimental settings and the SARS-CoV-2 case study. The experimental 494

results proved that our proposed SAGDTI model performs well in DTI 495

prediction and may be useful in practical applications. However, there 496

are still some problems that have not been explained clearly and some 497

difficulties need to be solved urgently. Such as, how to obtain the accurate 498

quantification of the binding affinity (interaction strength) when interaction 499

happens, where is the docking position in 3D space and whether the ⁵⁰⁰ docking postures matter to the drug-target interactions in a real-life 501

situation (Weiet al., 2022). In this article, we treat the DTI prediction 502

problem as a binary classification, which simply suggests the absence or 503

presence of interactions. Therefore, we intend to expand our research on 504

the drug-target binding affinity and investigate the mechanisms through 505

which the docking position and posture influence the results. We believe 506

that our proposed model can perform well in practical prediction for 507

predicting DTIs and improve the development of drug-target binding 508

affinities in the future. 509

4 Conclusion

510

In this article, we proposed an end-to-end attention-derived model, 511

termed SAGDTA. The aim of this model is to capture the unique 512

molecular attributes of drugs as well as targets and aggregate the biological 513

interaction information of drug-target pairs to predict the DTI. The 514

proposed framework enables learning the SMILES sequence with the 515

relative distance among atoms of a given drug and the 3D structural feature 516

of proteins from their binding sites at the atom-level through the molecular 517

transformer module. SAGDTI also allows the aggregation of neighboring 518

topologies and diverse connections (i.e., interactions, associations and 519

similarities) at the node-level based on the GATs. Experimental results 520

demonstrated that, in most cases, our proposed SAGDTI model could 521

predict DTI with superior performance than other advanced models (e.g., 522

Co-VAE, MATTDTI, and AttentionSiteDTI) under three experimental 523

settings. Furthermore, we employed the trained model to predict the 524

interactions of five SARS-CoV-2-related drugs with potential targets to 525

assist scientists. Our model displayed SoTA performance, was highly 526

interpretable based on the self-attention mechanism and performed well 527

in a real-life case study. The limitations and advantages of SAGDTI have 528

been fully discussed. We believe that this work could contribute to the 529

(10)

research of drug discovery as well as drug repurposing, and be a powerful

530

tool in practical DTI prediction.

531

The authors declare no conflicts of interest.

532

Funding

533

The project is supported by the National Natural Science Foundation

534

of China (Nos. 81273649, 61501132, 61672181, 62001144, 62202092,

535

62272135), the Natural Science Foundation of Heilongjiang Province

536

(Nos. LH2019F049, LH2019A029), the China Postdoctoral Science

537

Foundation (No. 2019M650069), the Research Funds for the Central

538

Universities (No. 3072019CFT0603) and the Fund for Young Innovation

539

Team of Basic Scientific Research in Heilongjiang Province (No.

540

RCYJTD201805), Fund from China Scholarship Council (CSC), and the

541

King Abdullah University of Science and Technology (KAUST) Office

542

of Research Administration (ORA) under Award No FCC/1/1976-44-01,

543

FCC/1/1976-45-01, and REI/1/5234-01-01. The funding bodies played no

544

role in the design of the study and collection, analysis, and interpretation

545

of data and in writing the manuscript.

546

References

547

Rifaioglu AS, Atas H, Martin MJ, et al. (2019) Recent applications of deep learning

548

and machine intelligence on in silico drug discovery: methods, tools and databases.

549

Brief Bioinform.,20(5), 1878-1912.

550

Ashburn TT, Thor KB. (2004) Drug repositioning: identifying and developing new

551

uses for existing drugs.Nat Rev Drug Discov.,3(8), 673-683.

552

Zheng, S. et al. (2020) Predicting drug protein interaction using quasi-visual question

553

answering system.Nat. Mach. Intell.,2, 134–140.

554

Bagherian M, Sabeti E, Wang K, et al. (2021) Machine learning approaches and

555

databases for prediction of drug-target interaction: a survey paper.Brief Bioinform.,

556

22(1), 247-269.

557

Da Silva Rocha SFL, Olanda CG, et al. (2019) Virtual Screening Techniques in

558

Drug Discovery: Review and Recent Applications.Curr Top Med Chem.,19(19),

559

1751-1767.

560

Maia EHB, Assis LC, de Oliveira TA, et al. (2020) Structure-Based Virtual Screening:

561

From Classical to Artificial Intelligence.Front Chem.,8, 343.

562

Zhao Q, Yang M, Cheng Z, et al. (2022) Biomedical Data and Deep Learning

563

Computational Models for Predicting Compound-Protein Relations.IEEE/ACM

564

Trans Comput Biol Bioinform.,19(4), 2092-2110.

565

Sled´z P, Caflisch A. (2018) Protein structure-based drug design: from docking to´

566

molecular dynamics.Curr Opin Struct Biol.,48, 93-102.

567

Csermely P, Korcsmáros T, Kiss HJM, et al. (2013) Structure and dynamics of

568

molecular networks: a novel paradigm of drug discovery: a comprehensive review.

569

Pharmacol Ther.,138(3), 333–408.

570

Ezzat A, Zhao P, Wu M, et al. (2017) Drug-Target Interaction Prediction with Graph

571

Regularized Matrix Factorization.IEEE/ACM Trans Comput Biol Bioinform.,

572

14(3), 646-656.

573

Liu Y, Wu M, Miao C, et al. (2016) Neighborhood Regularized Logistic Matrix

574

Factorization for Drug-Target Interaction Prediction.PLoS Comput Biol,12(2),

575

e1004760.

576

Pahikkala T, Airola A, Pietilä S, et al. (2015) Toward more realistic drug-target

577

interaction predictions.Brief Bioinform.,16(2), 325-337.

578

Hazra,A. et al. (2021) Recent advances in deep learning techniques and its appli-

579

cations: an overview. In:Rizvanov, A.A. et al (eds) Advances in Biomedical

580

Engineering and Technology., pp. 103-122.

581

Bharath Ramsundar PE, Walters P, Pande V. (2019) Deep Learning for the Life

582

Sciences. In:Applying Deep Learning to Genomics, Microscopy, Drug Discovery,

583

and More.

584

Ru X, Ye X, Sakurai T, et al. (2021) Current status and future prospects of drug-target

585

interaction prediction.Brief Funct Genomics.,20(5), 312-322.

586

Öztürk H, Özgür A, Ozkirimli E. (2018) DeepDTA: deep drug-target binding affinity

587

prediction.Bioinformatics.,34(17), i821-i829.

588

Shin,B. et al. (2019) Self-attention based molecule representation for predicting

589

drug–target interaction. In: Proceedings of the 4th Machine Learning for

590

Healthcare Conference, Ann Arbor, Michigan, USA, Vol. 106, pp. 230–248.

591

Huang K, Xiao C, Glass LM, et al. (2021) MolTrans: Molecular Interaction

592

Transformer for drug-target interaction prediction.Bioinformatics.,37(6), 830-

593

836.

594

Nguyen T, Le H, Quinn TP, et al, Venkatesh S. (2021) GraphDTA: predicting drug-

595

target binding affinity with graph neural networks.Bioinformatics.,37(8), 1140-

596

1147.

597

Liu S, Wang Y, Deng Y, et al. (2022) Improved drug-target interaction prediction 598

with intermolecular graph transformer.Brief Bioinform.,23(5), bbac162. 599

Jiang M, Li Z, Zhang S, et al. (2020) Drug-target affinity prediction using graph 600

neural network and contact maps.RSC Adv.,10(35), 20701-20712. 601

Tradigo G. (2013) Protein Contact Maps. New York, NY: Springer New York, 602

1771–3. 603

Xuan P, Zhang X, Zhang Y, et al. (2022) Multi-type neighbors enhanced global 604

topology and pairwise attribute learning for drug-protein interaction prediction. 605

Brief Bioinform.,23(5), bbac120. 606

Xuan P, Fan M, Cui H, et al. (2022) GVDTI: graph convolutional and variational 607

autoencoders with attribute-level attention for drug-protein interaction prediction. 608

Brief Bioinform.,23(1), bbab453. 609

Zeng Y, Chen X, Luo Y, et al. (2021) Deep drug-target binding affinity prediction 610

with multiple attention blocks.Brief Bioinform.,22(5), bbab117. 611

Yazdani-Jahromi M, Yousefi N, Tayebi A, et al. (2022) AttentionSiteDTI: an 612

interpretable graph-based model for drug-target interaction prediction using NLP 613

sentence-level relation classification.Brief Bioinform.,23(4), bbac272. 614

Li T, Zhao XM, Li L. (2022) Co-VAE: Drug-Target Binding Affinity Prediction by 615

Co-Regularized Variational Autoencoders.IEEE Trans Pattern Anal Mach Intell., 616

44(12), 8861-8873. 617

Vaswani A, Shazeer N, Parmar N, et al. (2017) Attention is all you need. In: Advances 618

in Neural Information Processing Systems, NIPS, 5998–6008. 619

Shaw P, Uszkoreit J, Vaswani A. (2018) Self-attention with relative position 620

representations. In:Proceedings of the 2018 Conference of the North American 621

Chapter of the Association for Computational Linguistics: Human Language 622

Technologies, NAACL-HLT, New Orleans, Louisiana, USA, June 1–6, 464–8. 623

Berman HM, Westbrook J, Feng Z, e