SAGDTI: self-attention and graph neural network with multiple information representations for the prediction of drug-target interactions
Item Type Article
Authors Li, Xiaokun;Yang, Qiang;Luo, Gongning;Xu, Long;Dong,
Weihe;Wang, Wei;Dong, Suyu;Wang, Kuanquan;Xuan, Ping;Gao, Xin
Citation Li, X., Yang, Q., Luo, G., Xu, L., Dong, W., Wang, W., Dong, S., Wang, K., Xuan, P., & Gao, X. (2023). SAGDTI: self-attention and graph neural network with multiple information representations for the prediction of drug-target interactions. Bioinformatics Advances. https://doi.org/10.1093/bioadv/vbad116
Eprint version Post-print
DOI 10.1093/bioadv/vbad116
Publisher Oxford University Press (OUP) Journal Bioinformatics Advances
Rights Archived with thanks to Bioinformatics Advances under a Creative Commons license, details at: https://creativecommons.org/
licenses/by/4.0/
Download date 2023-11-17 08:12:58
Item License https://creativecommons.org/licenses/by/4.0/
Link to Item http://hdl.handle.net/10754/693827
Systems Biology
SAGDTI: self-attention and graph neural network with multiple information representations for the prediction of drug-target interactions
Xiaokun Li
2,3Qiang Yang
2,3, Gongning Luo
1,∗, Long Xu
2,3, Weihe Dong
3,4, Wei Wang
1, Suyu Dong
4, Kuanquan Wang
1,∗, Ping Xuan
2,5and Xin Gao
61
Biocomputing Research Center, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China,
2
School of Computer Science and Technology, Heilongjiang University, Harbin, 150080, China,
3Postdoctoral Program of Heilongjiang Hengxun Technology Co., Ltd., Harbin, 150090, China,
4College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China,
5Department of Computer Science, School of Engineering, Shantou University, Shantou, 515063, China and
6Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal 23955, Saudi Arabia.
∗To whom correspondence should be addressed.
Associate Editor: Shanfeng Zhu
Received on XXXXX; revised on XXXXX; accepted on XXXXX
Abstract
Motivation:
Accurate identification of target proteins that interact with drugs is a vital step
in silico, whichcan significantly foster the development of drug repurposing and drug discovery. In recent years, numerous deep learning-based methods have been introduced to treat drug-target interaction (DTI) prediction as a classification task. The output of this task is binary identification suggesting the absence or presence of interactions. However, existing studies often (i) neglect the unique molecular attributes when embedding drugs and proteins, and (ii) determine the interaction of drug-target pairs without considering biological interaction information.
Results:
In this study, we propose an end-to-end attention-derived method based on the self-attention mechanism and graph neural network, termed SAGDTI. The aim of this method is to overcome the aforementioned drawbacks in the identification of DTI interaction. SAGDTI is the first method to sufficiently consider the unique molecular attribute representations for both drugs and targets in the input form of the SMILES sequences and three-dimensional structure graphs. In addition, our method aggregates the feature attributes of biological information between drugs and targets through multi-scale topologies and diverse connections. Experimental results illustrate that SAGDTI outperforms existing prediction models, which benefit from the unique molecular attributes embedded by atom-level attention and biological interaction information representation aggregated by node-level attention. Moreover, a case study on SARS-CoV-2 shows that our model is a powerful tool for identifying DTI interactions in real life.
Availability:
https://github.com/lixiaokun2020/SAGDTI.
Contact:
[email protected] or [email protected]
Supplementary information:
Supplementary data are available at
Bioinformatics Advancesonline.
1 Introduction
Research on drug-target interaction (DTI) prediction is a crucial step in the repurposing of current drugs and the discovery of new drugs
(Rifaiogluet al., 2019; Ashburnet al., 2004; Zhenget al., 2020). Accurate identification of potential DTI greatly reduces the requirement of costly and time-consuming high-throughput screening (Bagherianet al., 2021;
Daet al., 2019). However, it is impractical to rapidly distinguish every possible compound-target pair because of the large-scale chemical space of 1
© The Author(s) 2023. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
2 Li et al.
molecular compounds, proteins and protein-ligand complexes (Maiaet al.,
9
2020; Zhaoet al., 2022). Based on this observation, various computational
10
methods have been introduced to identify potential DTI. In the incipient
11
stage of using computational approaches for DTI identification, molecular
12
simulation, and molecular docking are the mainstream methods ( ´Sled´z
13
et al., 2018; Csermelyet al., 2013). These methods strongly depend on
14
the three-dimensional (3D) structural information of proteins. However,
15
the performance of these structure-based methods is limited due to the
16
inadequate spatial structure for many proteins. In the past decade, machine
17
learning-based approaches were introduced to overcome some of these
18
difficulties in the process of identifying DTI. For example, Ezzatet al.,
19
2017 establish a matrix factorization framework with k nearest known
20
neighbors for drug-target interaction prediction that incorporates both
21
drug and target similarities. Conventional shallow machine learning-based
22
methods, such as NRLMF (Liuet al., 2016) and KronRLS (Pahikkala
23
et al., 2015), only considered the similarity features of drug-target pairs.
24
Consequently, these approaches cannot fully exploit the comprehensive
25
relationships between drugs and proteins.
26
With the rapid development of computing power and data mining, deep
27
learning is widely accepted in the field of natural language processing and
28
computer vision (Hazraet al., 2021; Bharathet al., 2019). Deep learning
29
techniques have elevated traditional computational approaches due to the
30
advantage of comprehensively extracting latent feature representations,
31
rendering DTI prediction a research hotspot (Ru et al., 2021). Deep
32
learning methods for inferring DTI are divided into two main types.
33
One type is designed to manage a sequential input representation that
34
transforms all feature information into a vector. For instance, DeepDTA
35
(Öztürket al., 2018) utilizes the Simplified Molecular Input Line Entry
36
System (SMILES) of ligands and amino acid sequences of proteins to
37
decide the binding affinity of drug-target pairs through convolutional
38
neural networks (CNNs). Similarly, molecular transformer-based models
39
(Shin et al., 2019; Huang et al., 2021) are introduced to determine
40
the high-dimensional structure of a molecular from SMILES string and
41
character-embedding sequence through the self-attention mechanism.
42
However, these methods can not model the potential relationships of
43
compounds since the atom’s positional distribution and relative atoms are
44
fixed, thereby limiting the prediction performance. Thus, a more flexible
45
input representation of pairwise drug-target is required, another type of
46
deep learning model, namely graph-based neural networks is frequently
47
discussed. The application of graph descriptions in DTI prediction involves
48
the treatment of atoms as nodes and chemical bonds as corresponding
49
edges. Typically, GraphDTA (Nguyenet al., 2021) and IGT (Liuet al.,
50
2022) are the outstanding identifiers for predicting drug-target interaction
51
based on the graph representations of ligands and target receptors.
52
Regrettably, most graph neural network (GNN)-based methods exhibit
53
poor performance in extracting the spatial information of proteins through
54
amino acid sequence representations, which is a staple factor in DTI
55
determination. Proteins are composed of abundant atoms that require a
56
large-scale sparse (3D) matrix to receive the entire constrained spatial
57
structure. Hence, it is difficult to obtain an accurate protein high-resolution
58
3D structure. An alternative strategy was developed to handle the above-
59
mentioned problem. In this strategy, the target proteins are expressed as a
60
contact/distance map, which transforms the interaction among proteins
61
into a matrix (Zhenget al., 2020; Jiang et al., 2020). However, this
62
contact/distance map is based on a heuristic approach. Therefore, it
63
provides only an abstract outcome of the true structure of proteins, which
64
is generally distinguished from X-ray crystallography or nucleic magnetic
65
resonance (NMR) spectroscopy (Tradigoet al., 2013).
66
Several previously reported prediction models convert multi-scale
67
neighboring topologies and diverse connections (i.e. interactions,
68
associations and similarities) among biological entities as feature
69
representations to predict DTI. For example, GCDTI was proposed
70
(Xuanet al., 2022) to capture and fuse the neighboring topologies and 71
diverse connections information based on three-way GCNs and CNN 72
with multi-level attention. Similarly, Xuanet al., 2022 employed graph 73
convolutional and variational autoencoders to encode multiple pairwise 74
representations between a drug and its targets, such as attention-enhanced 75
topology, attribution, and distribution. Unfortunately, the precision of 76
these models in DTI prediction is relatively low because they ignored 77
the molecular and chemical features of drug-protein pairs. Furthermore, 78
existing deep learning methods always simply construct the interaction of 79
drugs toward related targets by concatenating feature representations that 80
cannot sufficiently express the interactions. Due to space constraints, we 81
included more relevant studies in the supplementary materials. 82 Considering the above information, in this study, an end-to-end 83
attention-derived method SAGDTI are introduced. SAGDTI accepts three 84
input representations from multiple molecular and biological data sources, 85
including the SMILES strings of drugs with relative atom distance, 86
the graph embedding of proteins that contain spatial information of 87
binding sites, and graph-based interaction information from biological 88
heterogeneous network. First, we introduced a molecular transformer 89
module based on multi-head self-attention (MSA) to extract the feature 90
embedding of SMILES sequences and protein graphs. The feature 91
representations of each drug-target pair were subsequently transformed 92
into an interaction pairing map and we fed it into CNNs to represent 93
the molecular attributes at the atom-level. Secondly, graph attention 94
networks (GAT) are constructed to capture the interaction information that 95
covers the multi-scale topologies and diverse connections in the form of a 96
heterogeneous network (matrix) via graph-based attention. Finally, we use 97
convolutional-pooling (C-P) networks to allocate the important weights of 98 two obtained attribute representations and fully connected neural networks 99
(FNNs) to predict the interaction score for drug-target pairs. 100
The main contributions of our study are summarized below. 101
• To accurately capture the unique molecular information of both 102
compounds and receptors, a molecular transformer module is designed 103
to improve the relative element information among atoms in drug 104
compounds and 3D structural features of target proteins’ binding 105
pockets. 106
• Our proposed model is the first DTI prediction method to transform 107
the input representation of molecular drugs and targets into totally 108
different forms. Moreover, it aggregates the attributes of interaction 109
information from biological entities based on graph attention networks. 110
• Owing to its self-attention, the proposed method is attention- 111
derived, offering high interpretability when aggregating the attribute 112
representations from neither molecular nor biological feature 113
information due to its self-attention. 114
• To our best cognition, our experimental results are state-of-the-art 115
(SoTA) in feature representation learning models for DTI prediction, 116
as confirmed using three benchmark datasets. 117
2 Materials and Methods
118In this study, we regarded the problem of predicting DTI as a binary 119
classification issue. We model a objective functionF(·)with multiple 120
information representations for drugs and targets to forecast the presence 121
or absence of DTI. The visualized framework of SAGDTI is shown 122
in Figure 1. Our model consists of four main parts: molecules 123
input embedding; molecular transformer module; biological interaction 124
information aggregating module and potential DTI classification module. 125
Specifically, in molecule input embedding (Figure 1B), we transformed 126
the unique molecular information of drugs and proteins into different 127
input forms. The SMILES strings of small compounds were embedded as 128
sequential vectors that cover long-distance relative position information
129
among atoms using the strategy introduced in (Zenget al., 2021). Inspired
130
by the graph input representation of proteins (Yazdaniet al., 2022), the
131
protein pockets with binding sites that interact with compounds were
132
embedded as 3D graphs. Feature representations of drugs and proteins were
133
subsequently fused to achieve the latent drug-target complex attributes
134
that include their unique characteristics through the molecular transformer
135
(Figure 1C). Next, we modeled and aggregated the biological attributes of
136
interaction information between drugs and targets by using GATs (Figure
137
1D). Finally, the two obtained attribute representations were incorporated
138
and then fed into C-Ps and FNNs for DTI prediction (Figure 1E).
139
2.1 Molecules input embedding
140
2.1.1 Drug SMILES strings
141
Small molecular drugs are expressed by the SMILES strings. Motivated
142
by previous research (Zeng et al., 2021; Li et al., 2022), the
143
SMILES strings were converted to continuous numerical vectors
144
representing atoms and chemical information ({C, C, l, ..., F,), C} →
145
{42,42,25, ...,11,31,42}). For a given drug:
146
D={ad1, ad2, ..., adi, ...adN∗} (1) whereai is the ith atom andN∗is a flexible sequence length that
147
rests with atom numbers in a compound. In this study, we restrict the
148
maximal SMILES sequence length as a hyperparameterℓd. According to
149
the traditional transformer model (Vaswaniet al., 2017), the initial input is
150
an embedding vector composed of a tokenized symbol and position signal,
151
which can be formulated as:
152
XD=ETD+EDP (2) where ETD ∈ Rℓd×ϖd and EDP ∈ Rℓd×ϖd represent the token
153
embedding and position embedding, respectively, andϖdis the hidden
154
feature dimension of atomai. andXD denotes the embedding vector
155
of the atom sequence. To model the relative distance between atoms, we
156
describe the unique molecular characteristics usingmkinds of relative
157
correlations between atoms. Inspired by (Shawet al., 2018), the input
158
embedding of drugs with spatial informationDIncan be mathematically
159
expressed as:
160
DIn=Sof tM ax((XDWQ)(XDWK)T+AR
√ϖd )(XDWV) (3)
161
ARij= (XiDWQ)(Wmin(k,|i−j|)R ) (4) where WQ, WK, WV ∈ Rϖd×ϖd denote parameter matrices of
162
query, key and value in attention layer.AR ∈ Rℓd×ℓdindicates the
163
relative-aware relationship matrix between atoms andXiD is theith 164
atom embedding inXD.WR ∈ Rm×ϖdis the learnable parameters
165
composed ofmkinds of relative relationship between atoms, and the
166
virtual “nodes” (e.g., @, =, [, etc.) inWRare selected to be zero. Thus,
167
when we models the relative distance between atoms, the distance between
168
special characters in SMILES strings is 0.
169
2.1.2 Protein graphs
170
We utilized the spatial structure of large-scale proteins that are collected
171
from the Protein Data Bank (PDB) (Bermanet al., 2000). This is a
172
worldwide dataset that provides experimental evidence for proteins (e.g.,
173
through X-ray diffraction, cryo-electron microscopy, NRM). We sought
174
to capture the binding sites of protein pockets from 3D structures. For this
175
purpose, we adopted the algorithm introduced by (Saberiet al., 2014) to
176
transform the bounding box coordinates of each protein’s binding site into 177
a set of peptide fragments. For a given protein: 178
P={bp1, bp2, ..., bpj, ...bpM∗} (5) wherebpj ∈ Rℓp×υis thejthbinding site in protein,ℓpandυare the 179
maximum number of binding sites and the maximum number of atoms 180
in binding sites, respectively.M∗denotes the varied number of protein 181
binding sites. Since the binding sites of a protein were recognized, we 182
drew individual graphs to represent each atomap(node) and relationships 183
between atoms (edges, denoted byep) for each binding site. For each 184
atomap, we created a feature vector with a size ofk(see Table S1 for 185
details), which was comprised of atom symbols, degree, electrical charge, 186
hydrogen, etc. Thus, each binding site graph’s node matrix is denoted 187
as XP ∈ Rυ×k. Moreover, we use a simple linear transformation 188
to encodePIn = WP(XP)T, leading to a real-valued dense matrix 189
PIn ∈ Rυ×Dp as the input graph embedding.Dpis the high-level 190
feature dimension. 191
2.2 Molecular transformer module
192In recent years, many attention-based algorithms have been developed 193
to address the identification of drug-related targets, such as transformer 194
(Vaswaniet al., 2017) and BERT (Kanget al., 2022). Inspired by previous 195
studies (Shinet al., 2019; Huanget al., 2021), we introduce a molecular 196
transformer module to extract the feature representations of drugs and 197
proteins, which cover their unique molecular characteristics. 198
Specifically, for the drug embedding representationDIn, we fed it into 199
a MSA layer to acquire the attention scores of each atom in the drug. We 200
subsequently threw the output of MSA into a FNN layer, and both MSA 201
and FNN layers are combined with residual operation (Heet al., 2015) and 202
layer normalization. The physical structure of the molecular transformer 203
encoder is shown in supplementary materialsFigure S1. Mathematically, 204
the drug feature representation can be formulated as: 205
M T s(DIn) =F N N(XM SA+DIn) (6)
206
XM SA=Concate(head1, head2, ..., headm)Wo (7)
207
headi=Att(Q, K, V) (8)
208
Att(Q, K, V) =Sof tM ax(QKT
√ed
)V (9)
whereQ∈Rp×ed,K∈Rp×edandQ∈Rq×ev describe the a query 209
and key-value pairs by attention functionAtt().headiandWoare theith 210
head and learnable parameter matrix in MSA layer, respectively.XM SA 211
is the output feature of MSA layer. Each molecular transformer encoder 212
M T swas stacked using the residual operation to enhance molecular 213
feature extraction, which is expressed as: 214
DLR=M T s(DL−1R ) +DL−1R (10) whereDL−1R is the input ofLthmolecular transformer encoder, when 215
L −1 = 0, DL−1R equals toDIn. Similarly, we model the latent 216
feature representation of proteinsPRLfrom molecular graph embedding 217
PInbased on the transformer encoders. 218
To further learn the molecular attributes of drug-protein pairs at the 219
atom-level, the two obtained feature representations for a given drug and its 220
binding sites were converted into an interaction pairing map. Specifically, 221
for each atomadi in a given drug and atomapj in a target protein, we 222
constructed pairwise interactions via the following formula: 223
IM= Λ(DLR, PRL) (11) whereIMis an interaction pairing map andΛ(·)denotes the operation 224
of the scalar product. Scalar products provide an interpretable approach 225
4 Li et al.
Fig. 1.The framework of our proposed model. (A) Open-source data for drug SMILES strings and 3D structures of proteins. (B) Molecules input embedding module that identifies the non-covalent intermolecular attributes of drugs and proteins, and subsequently represents the features in the forms of sequences and graphs. (C) Molecular transformer module, in which we construct sequential transformer encoders to learn the unique feature representations of drugs SMILES and the binding sites of proteins, transform them into an interaction pairing map, and then feed them into a convolutional layer to extract molecular attributes. (D) Biological interaction information aggregating module that models the multi-scale topologies and diverse connections; the biological attributes are represented using a graph attention network. (E) Potential DTI classification module, where we predict unknown interactions in a drug–target pair.
to integrating the intensity of interaction between individual drug-protein
226
pairs at the atom level. InIM, when a value is close to 1, the atoms
227
of the corresponding pairwise small molecules are certainly binding with
228
each other, otherwise, the value is 0. Thus, by utilizing this map, we
229
explicitly determined the atoms of drug-protein pairs that contributed to
230
the interaction result.
231
A CNN layer was employed to extract the adjacent information of 232
neighboring atoms for DTI inIM. LetWConv ∈ Rnw×nl be the 233
the convolutional filter, wherenw andnlcorrespond to the width and 234
length of the filter, respectively. There arenf channels to describe the 235
molecular attributes of a drug and its binding protein in the CNN layer. 236
The zero-padding technique is used to serve the marginal information of 237
the atom binding features. Thus, the molecular attribute representations 238
XM olbetween drugs and targets are obtained via flattening operation.
239
XM ol=F latten(Conv(WConvIM, nf)) (12)
2.3 Biological interaction information aggregating module
240
In this module, we introduced GATs to model the multi-scale neighboring
241
topologies and diverse connections from biological interaction information
242
for improving the performance of DTI prediction. Generally, existing
243
DTI prediction models only consider the non-covalent intermolecular
244
interactions among SMILES strings and amino acid sequences. When
245
some unexpected noise occurs, the accuracy may reduce significantly by
246
the unavailability of biological feature information of drug-protein pairs.
247
Therefore, we incorporated the biological characteristics of drug-protein
248
pairs to increase the stability and robustness of the model. Numerous
249
studies have demonstrated that the biological interaction information of
250
drugs and targeted proteins plays an important role in predicting DTI (Peng
251
et al., 2021; Zhaoet al., 2021). The schematic of the GAT mechanism are
252
shown in supplementary materialsFigure S2.
253
Specifically, we first constructed a heterogeneous network, which
254
contained all interaction information of multi-scale topologies and diverse
255
correlations at the node-level. We converted the heterogeneous network
256
into an undirected graphG= (V, E)whereV represents different nodes
257
andErepresents the different edges between nodes. Afterward, the GAT
258
layer was employed to extract the input representationhof each node.
259
h={−→ h1,−→
h2, ...,−→ hi, ...,−−→
hK∗},−→
hi∈RZ (13) where−→
hiandK∗are theithnode and the number of adjacent nodes, and
260
Zdenotes the feature dimension. For each node−→
hi, we employed masked
261
attention to calculate the attention coefficients, which means we only focus
262
on the node−→
hiand its first-order neighboring node−→ hj∈ Ni.
263
eij=β(αT[W−→ hi∪W−→
hj]) (14)
whereeijandWdenote attention coefficients and the weight matrix of
264
node feature transformation, respectively. Intuitively, the weights in GAT
265
layer are shared, which greatly improve the efficiency in computation.
266
The symbol∪ denotes the concatenation operation of the two node
267
representations, andα∈R2Zis a weight vector.β(·)is theLeakyReLU
268
activation function with a negative input slope of 0.2. The value ofeij 269
reveals the importance of−→ hj to−→
hi. To better allocate the weight and
270
calculate the correlation strength between the central node−→
hiand all its
271
neighbors, the attention coefficients are then normalized by utilizing a
272
SoftMaxfunction.
273
aij=Sof tM axj(eij) (15) We sought to represent the interaction features more comprehensively
274
between nodes. For this purpose, we also adopted the MSA strategy when
275
normalizing the attention coefficients by employing a non-linear activation
276
function. The output feature can be formulated as:
277
−
→ h′i=
Head
[
k=1
β(X
j∈Ni
akijWk−→
hj) (16)
wherekmeans thekthhead of attention coefficientsakij.
278
After extracting the complex feature representation of interaction
279
information between a drug and its target through consecutive GAT layers,
280
we then utilized a global max pooling to output the biological attribute
281
representation vector. The output representationXBiois expressed as:
282
XBio=M axP oolng(−→
h′i) (17)
2.4 DTI classification module
283Most existing methods only simply concatenate the learned feature 284
representations or include them into several FNN layers. These cannot 285
fully learn the real-world interaction knowledge between a drug and its 286
binding target. Inspired by Zenget al., 2021 in modeling the DTI feature, 287
C-P and FNN layers are applied to map the extracted features into a final 288
classification output. Firstly, the molecular attribute representationsXM ol 289
and the biological attribute representationsXBiowere spliced left and 290
right to fuse the ultimate feature representationXU lt. 291
XU lt= [XM ol;XBio] (18) Secondly, the fused attribute representationsXU lt of a drug-target 292
pair was fed into C-P and FNN layers to learn the whole DTI features, and 293
aSigmoidfunction was used to output the final interaction score. 294
YOut=S(F N N(P ool(Conv(WConv′ XU lt, n′f)))) (19) where WConv′ and n′f denote the convolutional kernel and 295
corresponding channels, respectively,Srepresents theSigmoidfunction, 296
in which S(x) = 1+exp(−x)1 . We applied the Adam algorithm to 297
optimize the objective and utilize the binary cross-entropy (BCE) loss 298
function to minimize the difference between the ground truth and the 299
predicted probability, which can be calculated as: 300
Loss=−1 λ
λ
X
i=1
Yi∗·log(YiOut) + (1−Yi∗)·log(1−YiOut) (20)
whereλis the number of training samples andYi∗represents the real label 301
of a drug-target pair. When the value ofLossis close to 0, it indicates the 302
SAGDTI model can predict the DTI with high accuracy. 303
3 Results
3043.1 Benchmark datasets
305In this study, we evaluated the merit of our proposed model using three 306
benchmark datasets, BindingDB dataset (Gilsonet al., 2016), Kinase 307
dataset Davis (Daviset al., 2013), and Kinase Inhibitor BioActivity (KIBA) 308
(Tanget al., 2014), which are widely utilized in previous studies (Liet al., 309
2022; Chenget al., 2022). The SMILES strings of molecular drugs are 310
collected from the PubChem database (Kimet al., 2021) based on their 311
PubChem IDs, the 3D structure information of proteins is derived from the 312 PDB database (Bermanet al., 2000), and the interaction information of 313
drug-target pairs is obtained according to prior work (Zhaoet al., 2021).In 314
this article, when the structure information of a protein is not available 315
in the PDB database, we will remove it to make sure the model can fully 316
learn the interaction features of drug-protein pairs.Table 1summarizes 317
the statistical information for these datasets. These datasets are processed 318
to filter out the invalid drug-target interactions for data processing (see 319
supplementary materials). 320
3.2 Evaluation criteria
321To estimate the utility of our DTI prediction model, we employed five 322
metrics in the experiments in total, which are Area Under the Receiver 323
Operating Characteristics curve (AUROC), Area Under the Precision- 324
Recall curve (AUPR), Matthews correlation coefficient (MCC) (Heet al., 325
2022),F1-Score and Balanced Accuracy (B.Acc, composed of Sensitivity 326
and Specificity). The mathematical formula of the above five metrics is 327
displayed in the supplementary materials. 328
6 Li et al.
3.3 Experiments setup
329
To comprehensively evaluate the performance of DTI prediction, we
330
adopted three experimental settings, i.e. a new-target setting, a new-
331
drug setting, and a pairwise setting. In the new-target setting, targets are
332
randomly split into five identical subsets, four of those are regarded as
333
inference targets, while the remaining subset is treated as test targets. In
334
the training process, the SAGDTI model learns the inference data, which
335
contains the inference targets and all the drugs. Subsequently, the trained
336
model is employed to suggest the pairwise interactions between test targets
337
and all the drugs. In this regard, the new-drug setting is analogous to the
338
new-target setting, and we arbitrarily divide the drugs into five groups. The
339
SAGDTI model is trained and applied to predict the interaction between
340
new drugs and all the targets. In the pairwise setting, the known DTI is
341
like drugs or targets, which are grouped as five equal folds.
342
Similar to the Co-VAE, we clipped the dataset as training data and
343
testing data at a 5:1 ratio. The training data were used as a training set
344
and a validation set to identify the optimal parameters through a five-fold
345
cross-validation strategy. For each setting, we applied the trained SAGDTI
346
model on the test set and repeated this process five times to obtain the final
347
mean and standard deviation results. We adopted identical settings for
348
impartial comparison when comparing our proposed model with existing
349
methods. To perform the SAGDTI model, several hyperparameters are
350
shown inTable 2(See supplementary materials for training details).
351
3.4 Comparison with SoTA methods
352
The performance of the proposed SAGDTI for DTI prediction was
353
compared with several SoTA methods, including DDR (Olayanet al.,
354
2018), DeepDTA (Öztürket al., 2018), GraphDTA (Nguyenet al., 2021),
355
IGT (Liuet al., 2022), Moltrans (Huanget al., 2021), AttentionSiteDTI
356
(Yazdaniet al., 2022), MATTDTI (Zenget al., 2021), and Co-VAE (Li
357
et al., 2022). For a fair comparison, we used the same experimental
358
settings on all the models and the original hyperparameter reported in
359
the corresponding publications. Details of the compared models can be
360
found in supplementary materials.
361
We first adopted the BindingDB dataset to evaluate the SAGDTI model
362
by drawing the Receiver Operating Characteristics curve and Precision-
363
Recall curve, which record the highest performance in the experimental
364
results. As plotted inFigure 2, our proposed model SAGDTI achieved the
365
best AUROC and AUPR values compared with other advanced methods.
366
Specifically, SAGDTI obtained the best performance with 0.967 (AUROC)
367
and 0.917 (AUPR), which is 0.008 and 0.017 superior to in the second-best
368
model, respectively. The results roughly show that SAGDTI specializes in
369
inferring the interaction of pairwise drug-target. In the BindingDB dataset,
370
the training set and test set were extremely unbalanced, indicating that the
371
AUPR metric is more credible for expressing the prediction performance.
372
The results for three experimental settings are presented in Table 3.
373
According to the AUROC performance, SAGDTI achieved the best values
374
and was 1.8%, 0.97% and 1.9% better than other prediction models in terms
375
of three settings (i.e. new-target setting, new-drug setting and pairwise
376
Table 1. Statistics of the three benchmark datasets: BindingDB dataset, Davis and KIBA.
Datasets Metric Target Drug Interaction
KD 587 6,704 31,239
BindingDB KI 1,225 126,122 249,598
IC50 2,582 407,462 653,344
EC50 699 75,142 105,364
Davis KD 442 68 30,056
KIBA KIBA score 229 2,111 118,254
Table 2. Statistics of the three benchmark datasets: BindingDB dataset, Davis and KIBA.
Parameters range
Max length (Drug) 150
Max length (Target) 1200
Attention head in MTs (Drug) [4-12]
Attention head in MTs (Target) [4-12]
Attention head in GAT [8-12]
Filter size 3*3
Number of Filters [32, 64, 96]
Kernel size of the pooling layer 2*2
Dropout [0.1-0.6]
Optimizer Adam
Learning rate [0.1, 0.01, 0.001, 0.0001]
Epoch [30, 50, 100]
Batch size [32, 64, 128, 256]
Fig. 2.Comparison of the SAGDTI with eight SoTA models in AUROC and AUPR metrics.
Left: ROC curves of different DTI models. Right: PR curves of different DTI models.
setting).Furthermore, we acquired the average performance of the five 377
metrics among the three settings based on the BindingDB dataset (Table 378
4). It demonstrated that SAGDTI is competent enough in DTI prediction, 379
as reflected by the average metric values obtained under three experimental 380
settings: AUROC (0.967), AUPR(0.914), MCC(0.649), F1-Score (0.861) 381
and B.Acc (0.901). 382
Existing prediction models (e.g., Co-VAE, MATTDTI and 383
AttentionSiteDTI) also exhibited excellent performance. The Co- 384
VAE model introduced a novel co-regularized variational autoencoder 385
framework with a strong generative capability. Co-VAE can retain all 386
the learned DTI features by generating the SMILES strings and amino 387
acid sequences to model the original drug or target features. Thus, Co- 388
VAE demonstrated an excellent performance in the identification of DTIs. 389
However, the Co-VAE model ignored the unique molecular information 390
and simply fused the features of drugs and targets through concatenation. 391
MATTDTI considered the long-distance relative relationships among 392
atoms in drugs, and used multi-head attention to compute the DTI 393
similarities. The AttentionSiteDTI method took advantage of the 3D 394
structure information based on the binding sites of proteins and utilized 395
a attention-based module to represent the output vector. All these models 396
captured the DTI features without fully exploiting the unique molecular 397
feature attributes. We speculate that SAGDTI outperformed other SoTA 398
models because it included the whole unique characteristics of both 399
drugs and targets, as well as aggregated the neighboring topologies 400
Table 3. Comparison of SAGDTI with other eight SoTA methods in AUROC and AUPR metrics under three different experimental conditions, using three benchmark datasets. Blod: best results.
Datasets Methods new-target new-drug pairwise
AUROC(std) AUPR(std) AUROC(std) AUPR(std) AUROC(std) AUPR(std)
BindingDB DDR 0.834 (0.033) 0.747 (0.051) 0.827 (0.037) 0.725 (0.062) 0.823 (0.031) 0.719 (0.054) DeepDTA 0.868 (0.007) 0.823 (0.011) 0.864 (0.006) 0.776 (0.0011) 0.842 (0.006) 0.811 (0.009) GraphDTA 0.875 (0.008) 0.835 (0.0010) 0.852 (0.009) 0.783 (0.013) 0.851 (0.006) 0.829 (0.009) Moltrans 0.893 (0.011) 0.842 (0.015) 0.878 (0.012) 0.812 (0.013) 0.858 (0.009) 0.847 (0.011) IGT 0.912 (0.007) 0.855 (0.011) 0.891 (0.009) 0.833 (0.015) 0.887 (0.006) 0.852 (0.009) AttentionSiteDTI 0.937 (0.005) 0.883 (0.009) 0.912 (0.006) 0.853 (0.011) 0.908 (0.005) 0.871 (0.007) MATTDTI 0.928 (0.004) 0.875 (0.008) 0.907 (0.005) 0.864 (0.007) 0.906 (0.003) 0.869 (0.005) Co-VAE 0.955 (0.002) 0.897 (0.004) 0.921 (0.001) 0.893 (0.002) 0.914 (0.002) 0.884 (0.003) SAGDTI 0.967 (0.001) 0.914 (0.002) 0.934 (0.002) 0.886 (0.002) 0.946 (0.001) 0.901 (0.001) Davis DDR 0.784 (0.035) 0.451 (0.067) 0.763 (0.033) 0.407 (0.056) 0.739 (0.023) 0.411 (0.049) DeepDTA 0.799 (0.013) 0.526 (0.021) 0.773 (0.015) 0.491 (0.022) 0.752 (0.011) 0.485 (0.018) GraphDTA 0.816 (0.015) 0.547 (0.026) 0.802 (0.011) 0.514 (0.019) 0.783 (0.013) 0.501 (0.019) Moltrans 0.849 (0.016) 0.577 (0.022) 0.826 (0.014) 0.531 (0.021) 0.815 (0.012) 0.524 (0.17)
IGT 0.873 (0.013) 0.593 (0.023) 0.849 (0.013) 0.546 (0.019) 0.839 (0.011) 0.537 (0.018) AttentionSiteDTI 0.902 (0.009) 0.601 (0.017) 0.872 (0.011) 0.578 (0.020) 0.865 (0.008) 0.561 (0.0.015)
MATTDTI 0.917 (0.006) 0.625 (0.014) 0.905 (0.007) 0.612 (0.019) 0.873 (0.005) 0.594 (0.009) Co-VAE 0.922 (0.003) 0.644 (0.009) 0.911 (0.005) 0.641 (0.009) 0.886 (0.002) 0.632 (0.003) SAGDTI 0.937 (0.002) 0.645 (0.005) 0.921 (0.003) 0.636 (0.004) 0.903 (0.001) 0.627 (0.002) KIBA DDR 0.792 (0.024) 0.446 (0.041) 0.773 (0.027) 0.416 (0.040) 0.746 (0.019) 0.404 (0.035) DeepDTA 0.804 (0.007) 0.517 (0.015) 0.784 (0.008) 0.485 (0.017) 0.765 (0.008) 0.492 (0.015) GraphDTA 0.824 (0.009) 0.556 (0.014) 0.806 (0.007) 0.512 (0.013) 0.799 (0.006) 0.516 (0.011) Moltrans 0.856 (0.008) 0.583 (0.015) 0.831 (0.006) 0.558 (0.014) 0.827 (0.007) 0.567 (0.010) IGT 0.886 (0.008) 0.607 (0.013) 0.853 (0.007) 0.601 (0.015) 0.841 (0.007) 0.591 (0.013) AttentionSiteDTI 0.907 (0.006) 0.625 (0.012) 0.886 (0.007) 0.615 (0.013) 0.864 (0.006) 0.586 (0.009) MATTDTI 0.921 (0.005) 0.632 (0.009) 0.914 (0.005) 0.627 (0.010) 0.889 (0.004) 0.619 (0.007) Co-VAE 0.932 ((0.002)) 0.645 (0.003) 0.918 (0.001) 0.651 (0.003) 0.894 (0.001) 0.645 (0.002) SAGDTI 0.945 (0.001) 0.657 (0.002) 0.927 (0.001) 0.643 (0.002) 0.911 (0.001) 0.645 (0.001)
Table 4. Performance evaluation for predicting DTI on the BindingDB dataset using the average values of five metrics.Blod: best results.
AUROC AUPR MCC F1score B.Acc
DDR 0.805 (0.037) 0.684 (0.057) 0.425 (0.071) 0.529 (0.043) 0.758 (0.035)
DeepDTA 0.883 (0.009) 0.817 (0.011) 0.597 (0.046) 0.654 (0.037) 0.811 (0.026)
GraphDTA 0.886 (0.007) 0.821 (0.010) 0.584 (0.041) 0.676 (0.034) 0.819 (0.027)
Moltrans 0.917 (0.007) 0.834 (0.013) 0.605 (0.031) 0.669 (0.035) 0.823 (0.031)
IGT 0.926 (0.008) 0.857 (0.011) 0.631 (0.035) 0.817 (0.031) 0.842 (0.028)
AttentionSiteDTI 0.925 (0.006) 0.874 (0.009) 0.627 (0.029) 0.840 (0.027) 0.857 (0.019)
MATTDTI 0.931 (0.005) 0.886 (0.004) 0.644 (0.021) 0.833 (0.021) 0.849 (0.016)
Co-VAE 0.959 (0.002) 0.897 (0.003) 0.653 (0.012) 0.856 (0.019) 0.871 (0.011)
SAGDTI 0.967 (0.002) 0.914 (0.002) 0.649 (0.013) 0.861 (0.016) 0.901 (0.011)
and diverse connections. In summary, we intuitively compared the
401
prediction capability of the eight models using the curve plots and different
402
experimental settings. Our SAGDTI exhibited the highest performance
403
overall.
404
We further employed the Davis and KIBA datasets to further evaluate
405
the prediction performance of our SAGDTI model. In the Davis dataset,
406
our model almost exhibited the best performance in each experimental
407
setting. In the new-drug setting, it ranked second only to the Co-VAE
408
model in terms of the AUPR value. As shown inTable 3, SAGDTI obtained
409
1.6%, 1.1% and 1.9% enhancement in the AUROC metric under the new-
410
target, new-drug and pairwise settings, respectively. In terms of AUPR
411
values, the performance of all methods was markedly decreased due to the
412
obvious shortage of DTI training samples in the Davis dataset. However, 413
our proposed model is still SoTA, like the Co-VAE model performs in 414 the new-drug and pairwise settings. We also noted excellent performance 415
in the KIBA dataset with AUROC values of 0.945, 0.927 and 0.903, and 416
AUPR values of 0.657, 0.643 and 0.645 for the corresponding settings, 417
respectively. 418
Generally, the high accuracy of random split settings is known to be 419
overestimated, not a model’s real-world prediction performance. When 420
there is little information about the interaction between a protein and a 421
drug, the learning performance of most models drops greatly. Thus, we 422
adopted a type of non-overlapped sampling splitting strategy (i.e., cold start 423
splitting) for better evaluation. Inspired by (Ataset al., 2023)., we used the 424
8 Li et al.
modified Davis (mDavis) dataset to make sure the model can learn general
425
principles from data rather than memorizing it. We randomly select 10% or
426
15% DTI pairs for test samples and remove all of their associated drugs and
427
proteins from the training samples. The results inTable 5indicate that all
428
methods have a great performance decline in cold start splitting. However,
429
SAGDTI still achieves the best performance against other SoTA baselines,
430
and, interestingly, the enhancement is even more obvious. According to the
431
overall experimental results we obtained, it can be easily concluded that
432
SAGDTI outperforms existing SoTA models in the task of DTI prediction.
433
We think that our SAGDTI model can be significantly improved due to
434
the full consideration of the unique non-covalent intermolecular attributes
435
at the atom-level and biological interaction attributes at the node-level
436
between drugs and targets.
437
Table 5. The results of cold start splitting on mDavis dataset. Blod: best results.
Methods Remove 10% Remove 15%
AUROC AUPR AURPC AUPR
DeepDTA 0.780 0.501 0.728 0.453
GraphDTA 0.778 0.526 0.732 0.511
Moltrans 0.805 0.564 0.773 0.547
MATTDTI 0.809 0.584 0.785 0.564
Co-VAE 0.816 0.605 0.796 0.585
SAGDTI 0.863 0.617 0.843 0.604
3.5 Ablation study
438
To ensure the stability and robustness, we also conducted a comprehensive
439
ablation study on contribution of different modules in SAGDTI. The
440
AUROC and AUPR were applied to evaluate the effectiveness of five
441
SAGDTI variants (see supplementary materials for detail descriptions).
442
For example, the SAGDTIIImeans the biological interaction information
443
aggregating module was removed from the proposed model. Through this
444
step, we overlooked the attributes of topologies and diverse connections
445
between drugs and targets, and considered only the molecular attributes.
446
Consequently, the performance of SAGDTI (without GATs) has declined
447
markedly (see Table S3 and S4). In our work, we introduce multi-
448
sources input information for SAGDTI, and thus prior knowledge (e.g.,
449
the drug SMILES sequence, protein 3D structure, and multi-scale
450
drug-target interaction information) may be produced. To avoid an over-
451
optimistic evaluation of the proposed method, we removed the different
452
input elements accordingly. The performance of SAGDTI with different
453
input information is shown in Table S5. Obviously, the performance
454
of our proposed model exhibits a downward trend to some extent
455
without adequate input source. However, SAGDTI still has a competitive
456
performance in identifying the interaction between drugs and targets,
457
though we neglect the protein 3D structure and multi-scale biological
458
information simultaneously. Table S5 indicate that comprehensively
459
consider the molecular and biological information is beneficial for learning
460
high-level feature representations of drugs and targets and for improving
461
prediction performance. These results also demonstrated that our SAGDTI
462
has a good generalization ability.
463
3.6 Case study
464
Since December 2019, the infection of the new coronavirus (SARS-CoV-
465
2) has rapidly spread worldwide, leading to a pandemic (Kirtipalet al.,
466
2020). Hence, there is an urgent need to identify effective drugs for
467
patients infected with SARS-CoV-2. Thus, we enforced case studies on
468
five SARS-CoV-2-related drugs (i.e. Remdesivir, Lopinavir, Budesonide,
469
Dexamethasone and Aripiprazole) to further verify the capability of
470
SAGDTI for determining possible DTI. Each drug interacts with at least 471
12 different targets. Since space limitation, the top 10 candidate proteins 472
identified by SAGDTI for each drug were listed inTable S6. 473
For brevity, we discuss below our findings for the drugs remdesivir and 474
lopinavir. Remdesivir (a nucleoside analog) possesses antiviral activity 475
withEC50values of 74 nM against SARS-CoV and MERS-CoV in HAE 476
cells and 30 nM against murine hepatitis virus in delayed brain tumor 477
cells (Olenderet al., 2021). In animal and in vitro models, remdesivir has 478
demonstrated activity against the viral pathogens of SARS and MERS, 479
which are also coronaviruses and are structurally very similar to SARS- 480
CoV-2 (Kokicet al., 2021). We ranked the top 10 targets that are verified 481
to interact with the remdesivir. The top 2 targets, which are the Replicase 482 polyprotein 1ab (Rep) and the RNA-directed RNA polymerase L (L), 483
happen to be the genome sequences of SARS-CoV-2-related disease. 484
Lopinavir is a protease inhibitor of the human immunodeficiency virus 485
(HIV)-1 and HIV-2. It is combined with ritonavir for the treatment of 486
HIV infection (Rebelloet al., 2018). Through experiments, we identified 487
a SARS-CoV-2-related target Pol polyprotein (Pol) that interacts with 488
lopinavir. According to the results, our proposed prediction model can 489
assist scientists in drug development and clinicians in real-life practice. 490
3.7 Discussion
491We have demonstrated the advantageous performance of SAGDTI through 492
the evaluations of the BindingDB, Davis and KIBA datasets under three 493
experimental settings and the SARS-CoV-2 case study. The experimental 494
results proved that our proposed SAGDTI model performs well in DTI 495
prediction and may be useful in practical applications. However, there 496
are still some problems that have not been explained clearly and some 497
difficulties need to be solved urgently. Such as, how to obtain the accurate 498
quantification of the binding affinity (interaction strength) when interaction 499
happens, where is the docking position in 3D space and whether the 500 docking postures matter to the drug-target interactions in a real-life 501
situation (Weiet al., 2022). In this article, we treat the DTI prediction 502
problem as a binary classification, which simply suggests the absence or 503
presence of interactions. Therefore, we intend to expand our research on 504
the drug-target binding affinity and investigate the mechanisms through 505
which the docking position and posture influence the results. We believe 506
that our proposed model can perform well in practical prediction for 507
predicting DTIs and improve the development of drug-target binding 508
affinities in the future. 509
4 Conclusion
510In this article, we proposed an end-to-end attention-derived model, 511
termed SAGDTA. The aim of this model is to capture the unique 512
molecular attributes of drugs as well as targets and aggregate the biological 513
interaction information of drug-target pairs to predict the DTI. The 514
proposed framework enables learning the SMILES sequence with the 515
relative distance among atoms of a given drug and the 3D structural feature 516
of proteins from their binding sites at the atom-level through the molecular 517
transformer module. SAGDTI also allows the aggregation of neighboring 518
topologies and diverse connections (i.e., interactions, associations and 519
similarities) at the node-level based on the GATs. Experimental results 520
demonstrated that, in most cases, our proposed SAGDTI model could 521
predict DTI with superior performance than other advanced models (e.g., 522
Co-VAE, MATTDTI, and AttentionSiteDTI) under three experimental 523
settings. Furthermore, we employed the trained model to predict the 524
interactions of five SARS-CoV-2-related drugs with potential targets to 525
assist scientists. Our model displayed SoTA performance, was highly 526
interpretable based on the self-attention mechanism and performed well 527
in a real-life case study. The limitations and advantages of SAGDTI have 528
been fully discussed. We believe that this work could contribute to the 529
research of drug discovery as well as drug repurposing, and be a powerful
530
tool in practical DTI prediction.
531
The authors declare no conflicts of interest.
532
Funding
533
The project is supported by the National Natural Science Foundation
534
of China (Nos. 81273649, 61501132, 61672181, 62001144, 62202092,
535
62272135), the Natural Science Foundation of Heilongjiang Province
536
(Nos. LH2019F049, LH2019A029), the China Postdoctoral Science
537
Foundation (No. 2019M650069), the Research Funds for the Central
538
Universities (No. 3072019CFT0603) and the Fund for Young Innovation
539
Team of Basic Scientific Research in Heilongjiang Province (No.
540
RCYJTD201805), Fund from China Scholarship Council (CSC), and the
541
King Abdullah University of Science and Technology (KAUST) Office
542
of Research Administration (ORA) under Award No FCC/1/1976-44-01,
543
FCC/1/1976-45-01, and REI/1/5234-01-01. The funding bodies played no
544
role in the design of the study and collection, analysis, and interpretation
545
of data and in writing the manuscript.
546
References
547
Rifaioglu AS, Atas H, Martin MJ, et al. (2019) Recent applications of deep learning
548
and machine intelligence on in silico drug discovery: methods, tools and databases.
549
Brief Bioinform.,20(5), 1878-1912.
550
Ashburn TT, Thor KB. (2004) Drug repositioning: identifying and developing new
551
uses for existing drugs.Nat Rev Drug Discov.,3(8), 673-683.
552
Zheng, S. et al. (2020) Predicting drug protein interaction using quasi-visual question
553
answering system.Nat. Mach. Intell.,2, 134–140.
554
Bagherian M, Sabeti E, Wang K, et al. (2021) Machine learning approaches and
555
databases for prediction of drug-target interaction: a survey paper.Brief Bioinform.,
556
22(1), 247-269.
557
Da Silva Rocha SFL, Olanda CG, et al. (2019) Virtual Screening Techniques in
558
Drug Discovery: Review and Recent Applications.Curr Top Med Chem.,19(19),
559
1751-1767.
560
Maia EHB, Assis LC, de Oliveira TA, et al. (2020) Structure-Based Virtual Screening:
561
From Classical to Artificial Intelligence.Front Chem.,8, 343.
562
Zhao Q, Yang M, Cheng Z, et al. (2022) Biomedical Data and Deep Learning
563
Computational Models for Predicting Compound-Protein Relations.IEEE/ACM
564
Trans Comput Biol Bioinform.,19(4), 2092-2110.
565
Sled´z P, Caflisch A. (2018) Protein structure-based drug design: from docking to´
566
molecular dynamics.Curr Opin Struct Biol.,48, 93-102.
567
Csermely P, Korcsmáros T, Kiss HJM, et al. (2013) Structure and dynamics of
568
molecular networks: a novel paradigm of drug discovery: a comprehensive review.
569
Pharmacol Ther.,138(3), 333–408.
570
Ezzat A, Zhao P, Wu M, et al. (2017) Drug-Target Interaction Prediction with Graph
571
Regularized Matrix Factorization.IEEE/ACM Trans Comput Biol Bioinform.,
572
14(3), 646-656.
573
Liu Y, Wu M, Miao C, et al. (2016) Neighborhood Regularized Logistic Matrix
574
Factorization for Drug-Target Interaction Prediction.PLoS Comput Biol,12(2),
575
e1004760.
576
Pahikkala T, Airola A, Pietilä S, et al. (2015) Toward more realistic drug-target
577
interaction predictions.Brief Bioinform.,16(2), 325-337.
578
Hazra,A. et al. (2021) Recent advances in deep learning techniques and its appli-
579
cations: an overview. In:Rizvanov, A.A. et al (eds) Advances in Biomedical
580
Engineering and Technology., pp. 103-122.
581
Bharath Ramsundar PE, Walters P, Pande V. (2019) Deep Learning for the Life
582
Sciences. In:Applying Deep Learning to Genomics, Microscopy, Drug Discovery,
583
and More.
584
Ru X, Ye X, Sakurai T, et al. (2021) Current status and future prospects of drug-target
585
interaction prediction.Brief Funct Genomics.,20(5), 312-322.
586
Öztürk H, Özgür A, Ozkirimli E. (2018) DeepDTA: deep drug-target binding affinity
587
prediction.Bioinformatics.,34(17), i821-i829.
588
Shin,B. et al. (2019) Self-attention based molecule representation for predicting
589
drug–target interaction. In: Proceedings of the 4th Machine Learning for
590
Healthcare Conference, Ann Arbor, Michigan, USA, Vol. 106, pp. 230–248.
591
Huang K, Xiao C, Glass LM, et al. (2021) MolTrans: Molecular Interaction
592
Transformer for drug-target interaction prediction.Bioinformatics.,37(6), 830-
593
836.
594
Nguyen T, Le H, Quinn TP, et al, Venkatesh S. (2021) GraphDTA: predicting drug-
595
target binding affinity with graph neural networks.Bioinformatics.,37(8), 1140-
596
1147.
597
Liu S, Wang Y, Deng Y, et al. (2022) Improved drug-target interaction prediction 598
with intermolecular graph transformer.Brief Bioinform.,23(5), bbac162. 599
Jiang M, Li Z, Zhang S, et al. (2020) Drug-target affinity prediction using graph 600
neural network and contact maps.RSC Adv.,10(35), 20701-20712. 601
Tradigo G. (2013) Protein Contact Maps. New York, NY: Springer New York, 602
1771–3. 603
Xuan P, Zhang X, Zhang Y, et al. (2022) Multi-type neighbors enhanced global 604
topology and pairwise attribute learning for drug-protein interaction prediction. 605
Brief Bioinform.,23(5), bbac120. 606
Xuan P, Fan M, Cui H, et al. (2022) GVDTI: graph convolutional and variational 607
autoencoders with attribute-level attention for drug-protein interaction prediction. 608
Brief Bioinform.,23(1), bbab453. 609
Zeng Y, Chen X, Luo Y, et al. (2021) Deep drug-target binding affinity prediction 610
with multiple attention blocks.Brief Bioinform.,22(5), bbab117. 611
Yazdani-Jahromi M, Yousefi N, Tayebi A, et al. (2022) AttentionSiteDTI: an 612
interpretable graph-based model for drug-target interaction prediction using NLP 613
sentence-level relation classification.Brief Bioinform.,23(4), bbac272. 614
Li T, Zhao XM, Li L. (2022) Co-VAE: Drug-Target Binding Affinity Prediction by 615
Co-Regularized Variational Autoencoders.IEEE Trans Pattern Anal Mach Intell., 616
44(12), 8861-8873. 617
Vaswani A, Shazeer N, Parmar N, et al. (2017) Attention is all you need. In: Advances 618
in Neural Information Processing Systems, NIPS, 5998–6008. 619
Shaw P, Uszkoreit J, Vaswani A. (2018) Self-attention with relative position 620
representations. In:Proceedings of the 2018 Conference of the North American 621
Chapter of the Association for Computational Linguistics: Human Language 622
Technologies, NAACL-HLT, New Orleans, Louisiana, USA, June 1–6, 464–8. 623
Berman HM, Westbrook J, Feng Z, e