Optimization of binding affinities in chemical space with generative pretrained transformer and deep reinforcement learning
Item Type Preprint
Authors Xu, Xiaopeng;Zhou, Juexiao;Zhu, Chen;Zhan, Qing;Li, Zhongxiao;Zhang, Ruochi;Wang, Yu;Liao, Xingyu;Gao, Xin Citation Xu, X., Zhou, J., Zhu, C., Zhan, Q., Li, Z., Zhang, R., Wang, Y.,
Liao, X., & Gao, X. (2023). Optimization of binding affinities in chemical space with generative pretrained transformer and deep reinforcement learning. https://doi.org/10.26434/
chemrxiv-2023-7v4sw Eprint version Pre-print
DOI 10.26434/chemrxiv-2023-7v4sw
Publisher American Chemical Society (ACS)
Rights This is a preprint version of a paper and has not been peer reviewed. Archived with thanks to American Chemical Society (ACS) under a Creative Commons license, details at: https://
creativecommons.org/licenses/by-nc/4.0/
Download date 2023-11-02 02:13:31
Item License https://creativecommons.org/licenses/by-nc/4.0/
Link to Item http://hdl.handle.net/10754/690881
Optimization of binding affinities in chemical space with transformer and deep reinforcement learning
Xiaopeng Xu
1,2, Juexiao Zhou
1,2, Chen Zhu
3, Qing Zhan
1,2, Zhongxiao Li
1,2, Ruochi Zhang
4, Yu Wang
4, Xingyu Liao
1,2, and Xin Gao
∗,1,21Computer Science Program, Computer, Electrical and Mathematical Science and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
2Computational Bioscience Research Center (CBRC), KAUST, Thuwal, 23955-6900, Saudi Arabia
3KAUST Catalysis Center (KCC), KAUST, Thuwal, 23955-6900, Saudi Arabia
4Syneron Technology, Guangzhou, China
∗Corresponding author: [email protected].
1 Evaluation of SGPT-RL on distribution learning
SGPT-RL and Reinvent prior models were evaluated on the Moses Benchmark. The results of the Moses metrics are shown in Supplementary Table 1. We can see that the SGPT-RL prior model learned a good validity and novelty on Moses benchmark.
Supplementary Table 1: Comparison of models on the Moses benchmark
Model Validity Uniqueness SNN IntDiv Novelty
Baseline 1.000 1.000 0.642 0.856 0.000
CharRNN 0.975 0.999 0.602 0.856 0.842
VAE 0.977 0.998 0.626 0.856 0.695
AAE 0.937 0.997 0.608 0.856 0.793
JT-VAE 1.000 1.000 0.548 0.855 0.914
LatentGAN 0.897 0.997 0.537 0.857 0.950
MCMG 0.886 - 0.427 0.835 0.983
MolGPT 1.000 1.000 0.529 0.871 0.931
Reinvent prior 0.986 1.000 0.619 0.856 0.783
SGPT-RL prior 0.936 0.997 0.563 0.856 0.946
The property distributions of molecules generated by the prior models and training references were plotted as shown in Supplementary Figure 1. The SGPT-RL and the Reinvent prior models follow the same distributions as the training reference.
0.0 0.2 0.4 0.6 0.8 1.0
DRD2 activity 0.0
0.2 0.4 0.6 0.8
Density
a
Training ref.Reinvent prior SGPT-RL prior
12 10 8 6 4 2
ACE2 docking score 0.00
0.02 0.04 0.06 0.08 0.10
Density
b
Training ref.Reinvent prior SGPT-RL prior
0.0 0.2 0.4 0.6 0.8 1.0
QED 0.00
0.02 0.04 0.06 0.08 0.10
Density
c
Training ref.Reinvent prior SGPT-RL prior
0.0 0.2 0.4 0.6 0.8 1.0
SAscore 0.000
0.025 0.050 0.075 0.100 0.125 0.150 0.175
Density
d
Training ref.Reinvent prior SGPT-RL prior
10 20 30 40 50 60 70 80
Length 0.00
0.02 0.04 0.06 0.08 0.10 0.12
Density
e
Training ref.Reinvent prior SGPT-RL prior
200 250 300 350 400
Weight 0.000
0.005 0.010 0.015 0.020 0.025
Density
f
Training ref.Reinvent prior SGPT-RL prior
Supplementary Figure 1: Property distribution of molecules generated by prior models. 10,000 molecules sampled from training dataset were used as a reference. Curves of SGPT-RL and Reinvent prior samples are shown with shading. Curves of training references are shown without shading. We can see that molecules from the three sources follow similar property distributions.
The distribution of ring counts were analyzed as shown in Supplementary Figure 2. The prior models follow the same distribution as the training reference.
0 1 2 3 4 5 6 7
Ring counts 0.0
0.1 0.2 0.3 0.4 0.5 0.6 0.7
Density
a Training ref.
SGPT-RL prior
0 1 2 3 4 5 6 7
Ring counts 0.0
0.1 0.2 0.3 0.4 0.5 0.6 0.7
Density
b Training ref.
Reinvent prior
Supplementary Figure 2: Ring count distribution of molecules generated by the SGPT-RL and Reinvent prior models. 10,000 molecules sampled from the training data were used as a reference. The molecules generated by SGPT-RL and the Reinvent prior model follow the same distributions as the training reference.
2
1.1 Scaffold clustering threshold selection
Analysis of different distance cutoffs for scaffold clustering is shown in Supplementary Figure 3. A cutoff of 0.2 can be a good threshold as the curve in Supplementary Subfigure 3a is becoming flat before 0.2 and most of the clusters contain only a single scaffold using a cutoff of 0.2.
0.0 0.2 0.4 0.6 0.8
Cutoff 0
2500 5000 7500 10000 12500 15000 17500
Number of clusters
a
Number of clusters for different distance cutoffs0 2500 5000 7500 10000 12500 15000 17500
Cluster index 1
2 3 4 5 6
Number of scaffolds
b
Threshold: 0.2Supplementary Figure 3: Cutoffs for scaffold clustering. a) The number of clusters given a cutoff range from 0 to 0.9. A cutoff of 0.2 can be a good threshold as the curve is becoming flat before 0.2. b) The number of scaffolds for each cluster when using a cutoff of 0.2. Most of the clusters contain only a single scaffold using a cutoff of 0.2.
2 Hyper-parameter tuning of SGPT-RL
The hyper-parameter of SGPT-RL was fine-tuned to find the suitable value for goal-directed generation tasks. The main hyper-parameter, sigma, was evaluated as described below.
The step curves of different sigma values are shown in Supplementary Figure 4. From Supplementary Figure 4a, we can see that with larger sigma, the agents achieved a better validity after 100 steps. Besides, from Supplementary Figure 4b, we can see that larger sigma value also enabled generating more active molecules in the initial steps.
0 20 40 60 80 100
Step 0.75
0.80 0.85 0.90 0.95 1.00
Validity
a
Sigma = 60Sigma = 600 Sigma = 6000
0 20 40 60 80 100
Step 0.0
0.2 0.4 0.6 0.8 1.0
DRD2 activity
b
Sigma = 60Sigma = 600 Sigma = 6000
Supplementary Figure 4: Learning curves of different sigma values in SGPT-RL on the DRD2 task However, when comparing the agents trained after 3000 steps on Moses metrics, we found larger sigma values lead to poorer uniqueness and internal diversity of generated molecules, as shown in Supplementary Table 2. This was probably caused by mode collapse during training the agent. As good diversity was preferred in our study, a sigma value of 60 was chosen for the SGPT-RL agent.
Supplementary Table 2: Moses metrics of different sigma values in SGPT-RL on the DRD2 task
Sigma Validity Uniqueness SNN IntDiv Novelty
60 0.998 0.933 0.515 0.683 0.995
120 0.998 0.797 0.486 0.631 0.999
600 0.999 0.334 0.340 0.481 1.000
1920 1.000 0.257 0.392 0.474 1.000
4
3 Comparison of SGPT-RL and Reinvent on the DRD2 task
The property distributions of agents trained after the final step are shown in Supplementary Figure 5. Both SGPT-RL and Reinvent agents were able to generate molecules with high DRD2 activities.
0.0 0.2 0.4 0.6 0.8 1.0
DRD2 activity 0.0
0.2 0.4 0.6 0.8
Density
a
Training ref.Reinvent SGPT-RL
0.0 0.2 0.4 0.6 0.8 1.0
QED 0.00
0.02 0.04 0.06 0.08 0.10 0.12
Density
b
Training ref.Reinvent SGPT-RL
0.0 0.2 0.4 0.6 0.8 1.0
SAscore 0.000
0.025 0.050 0.075 0.100 0.125 0.150 0.175
Density
c
Training ref.Reinvent SGPT-RL
6 4 2 0 2 4 6
LogP 0.00
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08
Density
d
Training ref.Reinvent SGPT-RL
10 20 30 40 50 60
Length 0.000
0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200
Density
e
Training ref.Reinvent SGPT-RL
100 200 300 400
Weight 0.00
0.02 0.04 0.06 0.08
Density
f
Training ref.Reinvent SGPT-RL
Supplementary Figure 5: Property distributions of molecules generated by agent models on the DRD2 task. 10,000 molecules are sampled from training dataset to be used as the reference. Both SGPT-RL and Reinvent agents were able to generate molecules with high DRD2 activities.
Supplementary Figure 6 shows the six top scoring molecules generated by the agents in the 1,000 th step.
SGPT-RL generated molecules are more similar to each other comparing to Reinvent generated molecules.
SGPT-RL
Reinvent
Act.: 1.0 Act.: 1.0 Act.: 1.0
F HN
HO N
O HO NH
Cl
NH2
F NH HO
HO F
F NH
HN
HN NH HO
OH OH
Cl NH
Act.: 1.0 Act.: 1.0 Act.: 1.0
HO Cl
NH HN HO
NH2 HO
HO N
NH2 N N
HO
HO
F N
N
OH F
NH HO
OH
OH
NH F
Act.: 1.0 Act.: 1.0 Act.: 1.0
Act.: 1.0 Act.: 1.0 Act.: 1.0
Supplementary Figure 6: Top scoring molecules generated by the agents in the 1,000th step of the DRD2 task. SGPT-RL generated molecules are more similar to each other comparing to Reinvent generated molecules.
6
4 Structure-based molecular generation with ACE2 as the target
Supplementary Figure 7 shows the molecules generated by the Reinvent agent in the initial steps on the ACE2 task. This agent was randomly exploring the sequences, with no clear patterns observed.
STEP 1
STEP 11
STEP 21
STEP 31
STEP 41
DS: -10.1 DS: -7.6 DS: -8.1 DS: -9.7
Reinvent generated molecules
O HN
O NH N
S
N NN S O
S O N
N O Cl N Cl
F NN N
NH2 N N
S HN
O H
N O
N N O
O NH O
N N N
H O O
O Cl
O O
N O HN NN
NH2
Cl
F FF
O
N NH2
O
NH2 S
OH H N
O O
O O
O HN
O
N N O
ON N N O
HN
N O O HN
N HN O O FF F
N O
NH O O
N N O
HN O
F F N HN
NH O N N N
N
O N Cl
S O O
O
N
N N N O O
NH O HN N
DS: -8.4 DS: -6.7 DS: -7.9 DS: -8.8
DS: -9.0 DS: -9.3 DS: -10.0 DS: -9.1
DS: -8.7 DS: -9.1 DS: -8.4 DS: -9.7
DS: -9.4 DS: -9.3 DS: -9.8 DS: -8.5
Supplementary Figure 7: Reinvent generated molecules in the initial steps of the ACE2 task. This model was randomly exploring the sequences, with no clear patterns observed.
The property distributions of molecules generated by the agents and the training reference on the ACE2 task are shown in Supplementary Figure 8. SGPT-RL shifted the distribution towards better docking scores.
14 12 10 8 6 4 2
ACE2 docking score 0.00
0.02 0.04 0.06 0.08 0.10 0.12
Density
a
Training ref.Reinvent SGPT-RL
0.0 0.2 0.4 0.6 0.8 1.0
QED 0.00
0.02 0.04 0.06 0.08 0.10 0.12 0.14
Density
b
Training ref.Reinvent SGPT-RL
0.0 0.2 0.4 0.6 0.8 1.0
SAscore 0.00
0.05 0.10 0.15 0.20
Density
c
Training ref.Reinvent SGPT-RL
2 0 2 4 6
LogP 0.00
0.01 0.02 0.03 0.04
Density
d
Training ref.Reinvent SGPT-RL
10 20 30 40 50 60 70
Length 0.00
0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16
Density
e
Training ref.Reinvent SGPT-RL
150 200 250 300 350 400 450 Weight
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07
Density
f
Training ref.Reinvent SGPT-RL
Supplementary Figure 8: Property distributions of molecules generated by the agent models on the ACE2 task. 10,000 molecules from the training dataset were evaluated as a reference. SGPT-RL shifted the distribution towards better docking scores.
8
Supplementary Figure 9 shows the six top scoring molecules generated by the agents in the last step of the ACE2 task. SGPT-RL generated molecules are more similar to each other comparing to Reinvent generated ones.
SGPT-RL
Reinvent
DS: -12.8 DS: -12.4 DS: -12.4
DS: -12.2 DS: -12.1 DS: -12.1
DS: -11.9 DS: -11.2 DS: -10.9
DS: -10.7 DS: -10.6 DS: -10.5
NH O N HN
O
O N H2N
HN
NN O N N
O
H2N
O N
N
N N HN
O O
N O N
O O HN O
O
O
N N O N N O
NH
O O NH2
N N O
O N N N
O NH S NH O NH N N N
Cl NN O NH O
N N
F O
NH O
N
F H O
N O N O
F F
O
NH N H O
NH F
N NH O NH HN
O
O
Supplementary Figure 9: Top scoring molecules generated in the last step of the ACE2 task. SGPT- RL generated molecules are more similar to each other comparing to Reinvent generated ones.