Optimization of binding affinities in chemical space with generative pretrained transformer and deep reinforcement learning

(1)

Optimization of binding affinities in chemical space with generative pretrained transformer and deep reinforcement learning

Item Type Preprint

Authors Xu, Xiaopeng;Zhou, Juexiao;Zhu, Chen;Zhan, Qing;Li, Zhongxiao;Zhang, Ruochi;Wang, Yu;Liao, Xingyu;Gao, Xin Citation Xu, X., Zhou, J., Zhu, C., Zhan, Q., Li, Z., Zhang, R., Wang, Y.,

Liao, X., & Gao, X. (2023). Optimization of binding affinities in chemical space with generative pretrained transformer and deep reinforcement learning. https://doi.org/10.26434/

chemrxiv-2023-7v4sw Eprint version Pre-print

DOI 10.26434/chemrxiv-2023-7v4sw

Publisher American Chemical Society (ACS)

Rights This is a preprint version of a paper and has not been peer reviewed. Archived with thanks to American Chemical Society (ACS) under a Creative Commons license, details at: https://

creativecommons.org/licenses/by-nc/4.0/

Download date 2023-11-02 02:13:31

Item License https://creativecommons.org/licenses/by-nc/4.0/

Link to Item http://hdl.handle.net/10754/690881

(2)

Optimization of binding affinities in chemical space with transformer and deep reinforcement learning

Xiaopeng Xu

^1,2

, Juexiao Zhou

^1,2

, Chen Zhu

³

, Qing Zhan

^1,2

, Zhongxiao Li

^1,2

, Ruochi Zhang

⁴

, Yu Wang

⁴

, Xingyu Liao

^1,2

, and Xin Gao

^∗,1,2

1Computer Science Program, Computer, Electrical and Mathematical Science and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia

2Computational Bioscience Research Center (CBRC), KAUST, Thuwal, 23955-6900, Saudi Arabia

3KAUST Catalysis Center (KCC), KAUST, Thuwal, 23955-6900, Saudi Arabia

4Syneron Technology, Guangzhou, China

∗Corresponding author: [email protected].

1 Evaluation of SGPT-RL on distribution learning

SGPT-RL and Reinvent prior models were evaluated on the Moses Benchmark. The results of the Moses metrics are shown in Supplementary Table 1. We can see that the SGPT-RL prior model learned a good validity and novelty on Moses benchmark.

Supplementary Table 1: Comparison of models on the Moses benchmark

Model Validity Uniqueness SNN IntDiv Novelty

Baseline 1.000 1.000 0.642 0.856 0.000

CharRNN 0.975 0.999 0.602 0.856 0.842

VAE 0.977 0.998 0.626 0.856 0.695

AAE 0.937 0.997 0.608 0.856 0.793

JT-VAE 1.000 1.000 0.548 0.855 0.914

LatentGAN 0.897 0.997 0.537 0.857 0.950

MCMG 0.886 - 0.427 0.835 0.983

MolGPT 1.000 1.000 0.529 0.871 0.931

Reinvent prior 0.986 1.000 0.619 0.856 0.783

SGPT-RL prior 0.936 0.997 0.563 0.856 0.946

(3)

The property distributions of molecules generated by the prior models and training references were plotted as shown in Supplementary Figure 1. The SGPT-RL and the Reinvent prior models follow the same distributions as the training reference.

0.0 0.2 0.4 0.6 0.8 1.0

DRD2 activity 0.0

0.2 0.4 0.6 0.8

Density

a

Training ref.

Reinvent prior SGPT-RL prior

12 10 8 6 4 2

ACE2 docking score 0.00

0.02 0.04 0.06 0.08 0.10

Density

b

Training ref.

0.0 0.2 0.4 0.6 0.8 1.0

QED 0.00

0.02 0.04 0.06 0.08 0.10

Density

c

Training ref.

0.0 0.2 0.4 0.6 0.8 1.0

SAscore 0.000

0.025 0.050 0.075 0.100 0.125 0.150 0.175

Density

d

Training ref.

10 20 30 40 50 60 70 80

Length 0.00

0.02 0.04 0.06 0.08 0.10 0.12

Density

e

Training ref.

200 250 300 350 400

Weight 0.000

0.005 0.010 0.015 0.020 0.025

Density

f

Training ref.

Supplementary Figure 1: Property distribution of molecules generated by prior models. 10,000 molecules sampled from training dataset were used as a reference. Curves of SGPT-RL and Reinvent prior samples are shown with shading. Curves of training references are shown without shading. We can see that molecules from the three sources follow similar property distributions.

The distribution of ring counts were analyzed as shown in Supplementary Figure 2. The prior models follow the same distribution as the training reference.

0 1 2 3 4 5 6 7

Ring counts 0.0

0.1 0.2 0.3 0.4 0.5 0.6 0.7

Density

a Training ref.

SGPT-RL prior

0 1 2 3 4 5 6 7

Ring counts 0.0

0.1 0.2 0.3 0.4 0.5 0.6 0.7

Density

b Training ref.

Reinvent prior

Supplementary Figure 2: Ring count distribution of molecules generated by the SGPT-RL and Reinvent prior models. 10,000 molecules sampled from the training data were used as a reference. The molecules generated by SGPT-RL and the Reinvent prior model follow the same distributions as the training reference.

2

(4)

1.1 Scaffold clustering threshold selection

Analysis of different distance cutoffs for scaffold clustering is shown in Supplementary Figure 3. A cutoff of 0.2 can be a good threshold as the curve in Supplementary Subfigure 3a is becoming flat before 0.2 and most of the clusters contain only a single scaffold using a cutoff of 0.2.

0.0 0.2 0.4 0.6 0.8

Cutoff 0

2500 5000 7500 10000 12500 15000 17500

Number of clusters

a

Number of clusters for different distance cutoffs

0 2500 5000 7500 10000 12500 15000 17500

Cluster index 1

2 3 4 5 6

Number of scaffolds

b

Threshold: 0.2

Supplementary Figure 3: Cutoffs for scaffold clustering. a) The number of clusters given a cutoff range from 0 to 0.9. A cutoff of 0.2 can be a good threshold as the curve is becoming flat before 0.2. b) The number of scaffolds for each cluster when using a cutoff of 0.2. Most of the clusters contain only a single scaffold using a cutoff of 0.2.

(5)

2 Hyper-parameter tuning of SGPT-RL

The hyper-parameter of SGPT-RL was fine-tuned to find the suitable value for goal-directed generation tasks. The main hyper-parameter, sigma, was evaluated as described below.

The step curves of different sigma values are shown in Supplementary Figure 4. From Supplementary Figure 4a, we can see that with larger sigma, the agents achieved a better validity after 100 steps. Besides, from Supplementary Figure 4b, we can see that larger sigma value also enabled generating more active molecules in the initial steps.

0 20 40 60 80 100

Step 0.75

0.80 0.85 0.90 0.95 1.00

Validity

a

Sigma = 60

Sigma = 600 Sigma = 6000

0 20 40 60 80 100

Step 0.0

0.2 0.4 0.6 0.8 1.0

DRD2 activity

b

Sigma = 60

Sigma = 600 Sigma = 6000

Supplementary Figure 4: Learning curves of different sigma values in SGPT-RL on the DRD2 task However, when comparing the agents trained after 3000 steps on Moses metrics, we found larger sigma values lead to poorer uniqueness and internal diversity of generated molecules, as shown in Supplementary Table 2. This was probably caused by mode collapse during training the agent. As good diversity was preferred in our study, a sigma value of 60 was chosen for the SGPT-RL agent.

Supplementary Table 2: Moses metrics of different sigma values in SGPT-RL on the DRD2 task

Sigma Validity Uniqueness SNN IntDiv Novelty

60 0.998 0.933 0.515 0.683 0.995

120 0.998 0.797 0.486 0.631 0.999

600 0.999 0.334 0.340 0.481 1.000

1920 1.000 0.257 0.392 0.474 1.000

4

(6)

3 Comparison of SGPT-RL and Reinvent on the DRD2 task

The property distributions of agents trained after the final step are shown in Supplementary Figure 5. Both SGPT-RL and Reinvent agents were able to generate molecules with high DRD2 activities.

0.0 0.2 0.4 0.6 0.8 1.0

DRD2 activity 0.0

0.2 0.4 0.6 0.8

Density

a

Training ref.

Reinvent SGPT-RL

0.0 0.2 0.4 0.6 0.8 1.0

QED 0.00

0.02 0.04 0.06 0.08 0.10 0.12

Density

b

Training ref.

Reinvent SGPT-RL

0.0 0.2 0.4 0.6 0.8 1.0

SAscore 0.000

0.025 0.050 0.075 0.100 0.125 0.150 0.175

Density

c

Training ref.

Reinvent SGPT-RL

6 4 2 0 2 4 6

LogP 0.00

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08

Density

d

Training ref.

Reinvent SGPT-RL

10 20 30 40 50 60

Length 0.000

0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200

Density

e

Training ref.

Reinvent SGPT-RL

100 200 300 400

Weight 0.00

0.02 0.04 0.06 0.08

Density

f

Training ref.

Reinvent SGPT-RL

Supplementary Figure 5: Property distributions of molecules generated by agent models on the DRD2 task. 10,000 molecules are sampled from training dataset to be used as the reference. Both SGPT-RL and Reinvent agents were able to generate molecules with high DRD2 activities.

(7)

Supplementary Figure 6 shows the six top scoring molecules generated by the agents in the 1,000 th step.

SGPT-RL generated molecules are more similar to each other comparing to Reinvent generated molecules.

SGPT-RL

Reinvent

Act.: 1.0 Act.: 1.0 Act.: 1.0

F HN

HO N

O HO NH

Cl

NH₂

F NH HO

HO F

F NH

HN

HN NH HO

OH OH

Cl NH

Act.: 1.0 Act.: 1.0 Act.: 1.0

HO Cl

NH HN HO

NH₂ HO

HO N

NH₂ N N

HO

F N

N

OH F

NH HO

OH

NH F

Act.: 1.0 Act.: 1.0 Act.: 1.0

Supplementary Figure 6: Top scoring molecules generated by the agents in the 1,000th step of the DRD2 task. SGPT-RL generated molecules are more similar to each other comparing to Reinvent generated molecules.

6

(8)

4 Structure-based molecular generation with ACE2 as the target

Supplementary Figure 7 shows the molecules generated by the Reinvent agent in the initial steps on the ACE2 task. This agent was randomly exploring the sequences, with no clear patterns observed.

STEP 1

STEP 11

STEP 21

STEP 31

STEP 41

DS: -10.1 DS: -7.6 DS: -8.1 DS: -9.7

Reinvent generated molecules

O HN

O NH N

S

N NN S O

S O N

N O Cl N Cl

F NN N

NH2 N N

S HN

O H

N O

N N O

O NH O

N N N

H O O

O Cl

O O

N O HN NN

NH2

Cl

F FF

O

N NH2

O

NH2 S

OH H N

O O

O HN

O

N N O

ON N N O

HN

N O O HN

N HN O O FF F

N O

NH O O

N N O

HN O

F F N HN

NH O N N N

N

O N Cl

S O O

O

N

N N N O O

NH O HN N

DS: -8.4 DS: -6.7 DS: -7.9 DS: -8.8

DS: -9.0 DS: -9.3 DS: -10.0 DS: -9.1

DS: -8.7 DS: -9.1 DS: -8.4 DS: -9.7

DS: -9.4 DS: -9.3 DS: -9.8 DS: -8.5

Supplementary Figure 7: Reinvent generated molecules in the initial steps of the ACE2 task. This model was randomly exploring the sequences, with no clear patterns observed.

(9)

The property distributions of molecules generated by the agents and the training reference on the ACE2 task are shown in Supplementary Figure 8. SGPT-RL shifted the distribution towards better docking scores.

14 12 10 8 6 4 2

ACE2 docking score 0.00

0.02 0.04 0.06 0.08 0.10 0.12

Density

a

Training ref.

Reinvent SGPT-RL

0.0 0.2 0.4 0.6 0.8 1.0

QED 0.00

0.02 0.04 0.06 0.08 0.10 0.12 0.14

Density

b

Training ref.

Reinvent SGPT-RL

0.0 0.2 0.4 0.6 0.8 1.0

SAscore 0.00

0.05 0.10 0.15 0.20

Density

c

Training ref.

Reinvent SGPT-RL

2 0 2 4 6

LogP 0.00

0.01 0.02 0.03 0.04

Density

d

Training ref.

Reinvent SGPT-RL

10 20 30 40 50 60 70

Length 0.00

0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16

Density

e

Training ref.

Reinvent SGPT-RL

150 200 250 300 350 400 450 Weight

0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07

Density

f

Training ref.

Reinvent SGPT-RL

Supplementary Figure 8: Property distributions of molecules generated by the agent models on the ACE2 task. 10,000 molecules from the training dataset were evaluated as a reference. SGPT-RL shifted the distribution towards better docking scores.

8

(10)

Supplementary Figure 9 shows the six top scoring molecules generated by the agents in the last step of the ACE2 task. SGPT-RL generated molecules are more similar to each other comparing to Reinvent generated ones.

SGPT-RL

Reinvent

DS: -12.8 DS: -12.4 DS: -12.4

DS: -12.2 DS: -12.1 DS: -12.1

DS: -11.9 DS: -11.2 DS: -10.9

DS: -10.7 DS: -10.6 DS: -10.5

NH O N HN

O

O N H₂N

HN

NN O N N

O

H₂N

O N

N

N N HN

O O

N O N

O O HN O

O

N N O N N O

NH

O O NH₂

N N O

O N N N

O NH S NH O NH N N N

Cl NN O NH O

N N

F O

NH O

N

F H O

N O N O

F F

O

NH N H O

NH F

N NH O NH HN

O

Supplementary Figure 9: Top scoring molecules generated in the last step of the ACE2 task. SGPT- RL generated molecules are more similar to each other comparing to Reinvent generated ones.