• Tidak ada hasil yang ditemukan

Chapter IV: Multiscale Equivariant Score-based Generative Modeling for

4.2 Method

We assume the model inputs are a receptor protein backbone template containing the amino acid sequence s and (N, C𝛼, C) atomic coordinates ˜x ∈ R𝑛resΓ—3Γ—3, and a set of ligand molecular graphs {Gπ‘˜}𝐾

π‘˜=1 containing atom/bond types and stereochemistry labels (e.g., tetrahedral or E/Z isomerism [208]). We aim to sample (x,y) ∼ π‘žπœ™(Β·|s,x˜,{G})from a generative model π‘žπœ™with predicted 3D heavy-atom coordinates of the protein x ∈ R𝑛×3 and that of the ligands y ∈ Rπ‘šΓ—3. It can be understood as a conditional generative modeling problem for partially-observed systems.

NeuralPLexer adopts a two-stage architecture for protein-ligand structure prediction (Figure4.1a). The input protein backbone template and molecule graphs are first encoded and passed into acontact predictorthat iteratively samples binding interface spatial proximity distributions for each ligand in {G}; the output contact map parameterizes thegeometry prior, a finite-time marginal of a designed SDE that progressively injects structured noise into the data distribution. An equivariant structure diffusion module(ESDM) then jointly generates 3D protein and ligand structures by denoising the atomic coordinates sampled from the geometry prior through a learned reverse-time SDE (Figure4.1b).

Protein-ligand structure generation with biophysics-informed diffusion processes Diffusion models [201] introduce a forward SDE that diffuses data into a noised distribution and a neural-network-parameterized reverse-time SDE that generate data by reverting the noising process. To motivate the design principles for our

Figure4.1:NeuralPLexerenablesprotein-ligandcomplexstructurepredictionwithfullreceptorflexibility.(a)Methodoverview.(b) SamplingfromNeuralPLexer.Theprotein(coloredasred-bluefromN-toC-terminus)andligand(coloredasgrey)3Dstructuresare jointlygeneratedfromalearnedSDE,withapartially-diffusedinitialstateπ‘žπ‘‡βˆ—approximatedbytheproteinbackbonetemplateand predictedinterfacecontactmaps.(c-e)KeyelementsoftheNeuralPLexertechnicaldesign.(c)Ligandmoleculesandmonomericentities areencodedasthecollectionofatoms,localcoordinateframes(depictedassemi-transparenttriangles),andstereospecificpairwise embeddings(depictedasdashedlines)representingtheirinteractions.(d)Theforward-timeSDEintroducesrelativedrifttermsamong proteinC𝛼atoms,non-C𝛼atomsandligandatoms,suchthattheSDEeraseslocal-scaledetailsat𝑑=π‘‡βˆ— toenableresamplingfrom anoisedistribution.(e)Informationflowintheequivariantstructurediffusionmodule(ESDM).ESDMoperatesonaheterogeneous graphformedbyproteinatoms(P),ligandatoms(L),proteinbackboneframes(B)andligandlocalframes(F)topredictcleanatomic coordinatesΛ†x0,Λ†y0usingthecoordinatesatafinitediffusiontime𝑑>0.

biomolecular structure generator, we first consider a general class of linear SDEs known as the multivariate Ornstein–Uhlenbeck (OU) process [209] for point cloud Z∈R𝑁×3:

𝑑Z𝑑 =βˆ’Ξ˜Z𝑑𝑑 𝑑+𝜎 𝑑W𝑑 (4.1)

where Θ ∈ R𝑁×𝑁 is an invertible matrix of affine drift coefficients and W𝑑 is a standard 3𝑁-dimensional Wiener process. The forward noising SDEs used in standard diffusion models [210, 211] can be recovered by settingΘ =πœƒI, converging to an isotropic Gaussian prior distribution at the𝑑 β†’ ∞(often expressed as𝑑 β†’1 with reparameterized𝑑[212]) limit. In contrast, we design a multivariate SDE with data-dependent drift matrixΘ(Z0)and truncate the SDE at𝑑 =π‘‡βˆ— < ∞such that the final state of forward noising process is a partially-diffused, structured distribution π‘žπ‘‡βˆ— that can be well approximated by a coarse-scale model. We propose a set of SDEs depicted by Figure4.1d and detailed in Table 4.1, with separated lengthscale parameters𝜎1, 𝜎2such that the forward diffusion process erases residue-scale local details but retains global information about protein domain packing and ligand binding interfaces, yielding the following time-dependent transition kernels:

π‘žπ‘‘ xC𝛼(𝑑) |x(0),y(0)

=N xC𝛼(0);𝜎2

1𝜏˜I

(4.2) π‘žπ‘‘ xnonC𝛼(𝑑) βˆ’xC𝛼(𝑑) |x(0),y(0)

=N π‘’βˆ’πœΛœ xnonC𝛼(0) βˆ’xC𝛼(0)

; 2𝜎2

1(1βˆ’π‘’βˆ’2 ˜𝜏)I (4.3) π‘žπ‘‘ y(𝑑) βˆ’cTxC𝛼(𝑑) |x(0),y(0)

=N π‘’βˆ’πœΛœ y(0) βˆ’cTxC𝛼(0)

;𝜎2

1(1βˆ’π‘’βˆ’2 ˜𝜏) (I+cTc) (4.4) where we use an exponential schedule ˜𝜏 = (𝜎2

min/𝜎2

1)𝑒𝑑 with truncation π‘‡βˆ— = 2 log(𝜎2/𝜎min). c is a softmax-transformed contact map as detailed in Sec. 4.2, which attracts the diffused ligand coordinates y(𝑑) towards binding interface C𝛼 atoms while preserving SE(3)-equivariance. We choose𝜎1 =2.0 Γ… to match the average radius of standard amino acids with task-specific𝜎2> 𝜎1such that at𝑑 =π‘‡βˆ—: (a) the terms involvingxnonC𝛼(0) andy(0) approximately vanishes thus are set to zeros to initialize the reverse-time SDE, and (b) the C𝛼-atom coordinate marginal π‘žπ‘‡βˆ— xC𝛼(𝑑) |x(0)

is sufficiently close to which approximated by the backbone template π‘žπ‘‡βˆ— xC𝛼(𝑑) |x˜

, guided by the theoretical result proposed in [213]. Proofs regarding SE(3)-equivariance are stated in the Appendix 4.5.

Contact map prediction and sampling from the truncated reverse-time SDE Given protein-ligand coordinates(x,y), we define the contact mapL∈R𝑛resΓ—π‘š with matrix elements 𝐿𝐴𝑖 = log(

Í

π‘—βˆˆ {𝐴}π‘’βˆ’2𝛼βˆ₯xπ‘—βˆ’y𝑖βˆ₯

2

Í

π‘—βˆˆ {𝐴}π‘’βˆ’

𝛼βˆ₯xπ‘—βˆ’y𝑖βˆ₯2 ) where 𝑗 runs over all protein atoms in amino acid residue 𝐴and 𝛼 = 0.2 Γ…βˆ’1. The term cin (4.4) is then defined as 𝑐𝐴𝑖(L) = Íexp(𝐿𝐴𝑖)

𝐴exp(𝐿𝐴𝑖). To sample from the reverse-time SDE, we use the contact predictor to generate inferred contact maps Λ†Land parameterize the geometry prior π‘žπ‘‡βˆ—(Β·|x˜,LΛ†) β€” the initial condition of reverse-time SDE β€” by replacing x(0) in π‘žπ‘‡βˆ— with the backbone template ˜x and the ligand-C𝛼 relative drift coefficient c with the predictedc(L). Note that in the general multivariate OU formulation, thisΛ† corresponds to replacing the clean-data-dependent drift coefficients Θ(Z0) by a model estimation Λ†Ξ˜. To account for the multimodal nature of protein-ligand contact distributions, the contact predictor modelsLas the logits of a categorical posterior distribution over a sequence of one-hot observations{l}πΎπ‘˜=1sampled for individual molecules in{G}. The forward pass of contact predictorπœ“ takes an iterative form:

LΛ†π‘˜ =πœ“(

π‘˜

βˆ‘οΈ

π‘Ÿ=1

lπ‘Ÿ;s,x˜,{G}); lπ‘˜ =OneHot(π΄π‘˜, π‘–π‘˜); (π΄π‘˜, π‘–π‘˜) ∼Categorical𝑛resΓ—π‘š(LΛ†π‘˜βˆ’1), π‘–π‘˜ ∈ Gπ‘˜

(4.5) where π‘˜ ∈ {1,Β· Β· Β· , 𝐾} and we set Λ†L :=Lˆ𝐾. All results reported in this study are obtained with𝐾 =1 due to the curation scheme of standard annotated protein-ligand datasets, but we note that the model can be readily trained on more diverse structural databases with multi-ligand samples.

Architecture overview

Here we outline the key neural network design ideas and defer the featurization, architecture, and training details to the Appendix. To enable stereospecific molecular geometry generation and explicit reasoning about long-range geometrical correla- tions, NeuralPLexer hybridizes two types of elementary molecular representations (Figure4.1c): (a) atomic nodes and (b) rigid-body nodes representing coordinate frames formed by two adjacent chemical bonds. For small-molecule ligand encoding, we introduce a graph transformer with learnable chirality-aware pairwise embeddings that are constructed through graph-diffusion-kernel-like transformations [214]; such pairwise embeddings are pretrained to align with the intra-molecular 3D coordinate distributions from experimental and computed molecular conformers. The protein backbone template encoding module and the contact predictor are built upon a sparsified version of invariant point attention (IPA) adapted from AlphaFold2 [5]

and are combined with standard graph attention layers [187, 215] and edge update blocks.

The architecture of ESDM (Figure4.1e) is inspired by prior works on 3D graph and attentional neural networks for point clouds [216, 217], rigid-body simulations [218]

and biopolymer representation learning [5, 219–221]. In ESDM, each node is associated with a stack of standard scalar featuresfs ∈R𝑐and cartesian vector features fv ∈R3×𝑐representing the displacements of a virtual point set relative to the node’s Euclidean coordinatet∈R3. A rotation matrixR∈SO(3) is additionally attached to each rigid-body node. Geometry-aware messages are synchronously propagated among all nodes by encoding the pairwise distances among virtual point sets into graph transformer blocks. Explicit non-linear transformation on vector features fvis solely performed on rigid-body nodes through a coordinate-frame-inversion mechanism, such that the node update blocks are sufficiently expressive without sacrificing equivariance or computational efficiency. On the contrary, 3D coordinates are solely updated for atomic nodes while the rigid-body frames(t,R) are passively reconstructed according to the updated atomic coordinates, circumventing numerical issues regarding fitting quaterion or axis-angle variables when manipulating rigid- body objects. The nontrivial actions of a parity inversion operation on rigid-body nodes ensure that ESDM can capture the correct chiral-symmetry-breaking behavior that adheres to the molecular stereochemistry constraints.