Method - Multiscale Equivariant Score-based Generative Modeling for

Chapter IV: Multiscale Equivariant Score-based Generative Modeling for

4.2 Method

We assume the model inputs are a receptor protein backbone template containing the amino acid sequence s and (N, C𝛼, C) atomic coordinates ˜x ∈ R^𝑛^res^×3×3, and a set of ligand molecular graphs {G𝑘}^𝐾

𝑘=1 containing atom/bond types and stereochemistry labels (e.g., tetrahedral or E/Z isomerism [208]). We aim to sample (x,y) ∼ 𝑞_𝜙(·|s,x˜,{G})from a generative model 𝑞_𝜙with predicted 3D heavy-atom coordinates of the protein x ∈ R^𝑛×³ and that of the ligands y ∈ R^𝑚×³. It can be understood as a conditional generative modeling problem for partially-observed systems.

NeuralPLexer adopts a two-stage architecture for protein-ligand structure prediction (Figure4.1a). The input protein backbone template and molecule graphs are first encoded and passed into acontact predictorthat iteratively samples binding interface spatial proximity distributions for each ligand in {G}; the output contact map parameterizes thegeometry prior, a finite-time marginal of a designed SDE that progressively injects structured noise into the data distribution. An equivariant structure diffusion module(ESDM) then jointly generates 3D protein and ligand structures by denoising the atomic coordinates sampled from the geometry prior through a learned reverse-time SDE (Figure4.1b).

Protein-ligand structure generation with biophysics-informed diffusion processes Diffusion models [201] introduce a forward SDE that diffuses data into a noised distribution and a neural-network-parameterized reverse-time SDE that generate data by reverting the noising process. To motivate the design principles for our

Figure4.1:NeuralPLexerenablesprotein-ligandcomplexstructurepredictionwithfullreceptorflexibility.(a)Methodoverview.(b) SamplingfromNeuralPLexer.Theprotein(coloredasred-bluefromN-toC-terminus)andligand(coloredasgrey)3Dstructuresare jointlygeneratedfromalearnedSDE,withapartially-diffusedinitialstate𝑞𝑇∗approximatedbytheproteinbackbonetemplateand predictedinterfacecontactmaps.(c-e)KeyelementsoftheNeuralPLexertechnicaldesign.(c)Ligandmoleculesandmonomericentities areencodedasthecollectionofatoms,localcoordinateframes(depictedassemi-transparenttriangles),andstereospecificpairwise embeddings(depictedasdashedlines)representingtheirinteractions.(d)Theforward-timeSDEintroducesrelativedrifttermsamong proteinC𝛼atoms,non-C𝛼atomsandligandatoms,suchthattheSDEeraseslocal-scaledetailsat𝑡=𝑇∗ toenableresamplingfrom anoisedistribution.(e)Informationflowintheequivariantstructurediffusionmodule(ESDM).ESDMoperatesonaheterogeneous graphformedbyproteinatoms(P),ligandatoms(L),proteinbackboneframes(B)andligandlocalframes(F)topredictcleanatomic coordinatesˆx0,ˆy0usingthecoordinatesatafinitediffusiontime𝑡>0.

biomolecular structure generator, we first consider a general class of linear SDEs known as the multivariate Ornstein–Uhlenbeck (OU) process [209] for point cloud Z∈R^𝑁^×3:

𝑑Z𝑡 =−ΘZ𝑡𝑑 𝑡+𝜎 𝑑W𝑡 (4.1)

where Θ ∈ R^𝑁×𝑁 is an invertible matrix of affine drift coefficients and W𝑡 is a standard 3𝑁-dimensional Wiener process. The forward noising SDEs used in standard diffusion models [210, 211] can be recovered by settingΘ =𝜃I, converging to an isotropic Gaussian prior distribution at the𝑡 → ∞(often expressed as𝑡 →1 with reparameterized𝑡[212]) limit. In contrast, we design a multivariate SDE with data-dependent drift matrixΘ(Z₀)and truncate the SDE at𝑡 =𝑇^∗ < ∞such that the final state of forward noising process is a partially-diffused, structured distribution 𝑞_𝑇_∗ that can be well approximated by a coarse-scale model. We propose a set of SDEs depicted by Figure4.1d and detailed in Table 4.1, with separated lengthscale parameters𝜎₁, 𝜎₂such that the forward diffusion process erases residue-scale local details but retains global information about protein domain packing and ligand binding interfaces, yielding the following time-dependent transition kernels:

𝑞_𝑡 x_C𝛼(𝑡) |x(0),y(0)

=N x_C𝛼(0);𝜎²

1𝜏˜I

(4.2) 𝑞_𝑡 x_nonC𝛼(𝑡) −x_C𝛼(𝑡) |x(0),y(0)

=N 𝑒⁻^𝜏^˜ x_nonC𝛼(0) −x_C𝛼(0)

; 2𝜎²

1(1−𝑒^{−2 ˜}^𝜏)I (4.3) 𝑞_𝑡 y(𝑡) −c^Tx_C𝛼(𝑡) |x(0),y(0)

=N 𝑒⁻^𝜏^˜ y(0) −c^Tx_C𝛼(0)

;𝜎²

1(1−𝑒⁻^{2 ˜}^𝜏) (I+c^Tc) (4.4) where we use an exponential schedule ˜𝜏 = (𝜎²

min/𝜎²

1)𝑒^𝑡 with truncation 𝑇^∗ = 2 log(𝜎₂/𝜎_min). c is a softmax-transformed contact map as detailed in Sec. 4.2, which attracts the diffused ligand coordinates y(𝑡) towards binding interface C𝛼 atoms while preserving SE(3)-equivariance. We choose𝜎₁ =2.0 Å to match the average radius of standard amino acids with task-specific𝜎₂> 𝜎₁such that at𝑡 =𝑇^∗: (a) the terms involvingx_nonC𝛼(0) andy(0) approximately vanishes thus are set to zeros to initialize the reverse-time SDE, and (b) the C𝛼-atom coordinate marginal 𝑞_𝑇∗ x_C𝛼(𝑡) |x(0)

is sufficiently close to which approximated by the backbone template 𝑞_𝑇∗ x_C𝛼(𝑡) |x˜

, guided by the theoretical result proposed in [213]. Proofs regarding SE(3)-equivariance are stated in the Appendix 4.5.

Contact map prediction and sampling from the truncated reverse-time SDE Given protein-ligand coordinates(x,y), we define the contact mapL∈R^𝑛^res^×^𝑚 with matrix elements 𝐿_𝐴𝑖 = log(

𝑗∈ {𝐴}𝑒⁻²^𝛼∥x𝑗^−y𝑖^∥

𝑗∈ {𝐴}𝑒⁻

𝛼∥x𝑗−y𝑖∥2 ) where 𝑗 runs over all protein atoms in amino acid residue 𝐴and 𝛼 = 0.2 Å⁻¹. The term cin (4.4) is then defined as 𝑐_𝐴𝑖(L) = Í^exp(^𝐿^𝐴𝑖⁾

𝐴exp(𝐿𝐴𝑖). To sample from the reverse-time SDE, we use the contact predictor to generate inferred contact maps ˆLand parameterize the geometry prior 𝑞_𝑇_∗(·|x˜,Lˆ) — the initial condition of reverse-time SDE — by replacing x(0) in 𝑞_𝑇∗ with the backbone template ˜x and the ligand-C𝛼 relative drift coefficient c with the predictedc(L). Note that in the general multivariate OU formulation, thisˆ corresponds to replacing the clean-data-dependent drift coefficients Θ(Z₀) by a model estimation ˆΘ. To account for the multimodal nature of protein-ligand contact distributions, the contact predictor modelsLas the logits of a categorical posterior distribution over a sequence of one-hot observations{l}^𝐾_𝑘₌₁sampled for individual molecules in{G}. The forward pass of contact predictor𝜓 takes an iterative form:

Lˆ𝑘 =𝜓(

𝑘

∑︁

𝑟=1

l𝑟;s,x˜,{G}); l𝑘 =OneHot(𝐴_𝑘, 𝑖_𝑘); (𝐴_𝑘, 𝑖_𝑘) ∼Categorical𝑛_res×𝑚(Lˆ𝑘−1), 𝑖_𝑘 ∈ G𝑘

(4.5) where 𝑘 ∈ {1,· · · , 𝐾} and we set ˆL :=Lˆ𝐾. All results reported in this study are obtained with𝐾 =1 due to the curation scheme of standard annotated protein-ligand datasets, but we note that the model can be readily trained on more diverse structural databases with multi-ligand samples.

Architecture overview

Here we outline the key neural network design ideas and defer the featurization, architecture, and training details to the Appendix. To enable stereospecific molecular geometry generation and explicit reasoning about long-range geometrical correla- tions, NeuralPLexer hybridizes two types of elementary molecular representations (Figure4.1c): (a) atomic nodes and (b) rigid-body nodes representing coordinate frames formed by two adjacent chemical bonds. For small-molecule ligand encoding, we introduce a graph transformer with learnable chirality-aware pairwise embeddings that are constructed through graph-diffusion-kernel-like transformations [214]; such pairwise embeddings are pretrained to align with the intra-molecular 3D coordinate distributions from experimental and computed molecular conformers. The protein backbone template encoding module and the contact predictor are built upon a sparsified version of invariant point attention (IPA) adapted from AlphaFold2 [5]

and are combined with standard graph attention layers [187, 215] and edge update blocks.

The architecture of ESDM (Figure4.1e) is inspired by prior works on 3D graph and attentional neural networks for point clouds [216, 217], rigid-body simulations [218]

and biopolymer representation learning [5, 219–221]. In ESDM, each node is associated with a stack of standard scalar featuresf_s ∈R^𝑐and cartesian vector features f_v ∈R³^×𝑐representing the displacements of a virtual point set relative to the node’s Euclidean coordinatet∈R³. A rotation matrixR∈SO(3) is additionally attached to each rigid-body node. Geometry-aware messages are synchronously propagated among all nodes by encoding the pairwise distances among virtual point sets into graph transformer blocks. Explicit non-linear transformation on vector features f_vis solely performed on rigid-body nodes through a coordinate-frame-inversion mechanism, such that the node update blocks are sufficiently expressive without sacrificing equivariance or computational efficiency. On the contrary, 3D coordinates are solely updated for atomic nodes while the rigid-body frames(t,R) are passively reconstructed according to the updated atomic coordinates, circumventing numerical issues regarding fitting quaterion or axis-angle variables when manipulating rigid- body objects. The nontrivial actions of a parity inversion operation on rigid-body nodes ensure that ESDM can capture the correct chiral-symmetry-breaking behavior that adheres to the molecular stereochemistry constraints.

Dalam dokumen Physics-Informed Neural Approaches for Multiscale Molecular Modeling and Design (Halaman 116-120)