Computational methods to predict protein interactions

Literature review

1.14 Computational methods to predict protein interactions

1.14.1 Homology modelling

While the collection of protein structures is slowly gaining momentum, the structure of numerous proteins remains unsolved. Though it is impossible to solve all protein structures in existence, innovation into the computational prediction of protein tertiary structures is slowly filling this

gap. One such method includes homology modelling. Homology or comparative modelling is a method used to predict the tertiary structure of proteins from its AA sequence. The notion is that the structures exhibits greater stability and evolves more slowly over time. As such, homology modelling assumes that while similar sequences will adopt an almost identical structure, distant sequences will also retain a high level of similarity. Consequently, the method relies on a template of a known structure to compare against and predict the unknown structure (Muhammed and Aki- Yalcin, 2018). Several steps are involved in homology modelling, namely, template selection, alignment of the template and target sequences, model building, model optimization and validation. Currently, several homology modelling software are available, however, the two most commonly used software include MODELLER and Swiss-Model. Unfortunately, a drawback of homology modelling is that these theoretical tertiary structures cannot be predicted if the corresponding template’s structure of a protein in the same family has not been solved (e.g. NMR or X-ray crystallography) (Chenug and Yu, 2018). Despite this, homology modelling is considered highly accurate, fast and low cost with clearly defined steps (Muhammed and Aki- Yalcin, 2018).

1.14.2 Molecular docking

Proteins do not act in an independent manner in complex biological processes. In fact, proteins often interact with small molecules or ligands to complete important functional tasks (Salmaso, 2018). This phenomenon has since attracted a great many scientists in the drug industry. The ability to accurately predict binding modes and interpret these data can provide invaluable information for the development of novel drug binding ligands (Salmaso and Moro, 2018). These methods have since been referred to as molecular docking. Since molecular docking estimates the best binding pose of a ligand and its associated protein, it requires an experimental or theoretical predicted tertiary structure (Salmaso, 2018).

Molecular docking occurs in two stages, i.e. the search for conformation or binding poses and the scoring function that correlates a score to each binding pose generated (Kitchen et al., 2004;

Huang and Zou, 2010). During conformational searches, the algorithm should accurately evaluate the specified conformation space outlined by the free energy landscape whereas the binding scores should be associated with the global minimum energy of the hypersurface¹¹ (Salmaso and Moro, 2018). In general, two approaches can be considered during docking, i.e. rigid and flexible docking. In the first approach, the independent protein and ligand are bound based on shape and

11 A physical space that considers the N^th dimension instead of only 2D and 3D models.

volume. The second approach assumes flexible docking in which binding is accomplished based on reciprocal effects of both the ligand and protein (Prieto-Martínez et al., 2018). Based on these approaches, there are currently several molecular docking software currently available for commercial and research purposes (Ciemny et al., 2018). Importantly, while molecular docking is fervently employed in the drug industries, this approach has been used to evaluate the interactions of existing protein-ligand systems, such as in HIV (Tong et al., 2017; Tarasova et al., 2018; Vora et al., 2019).

1.14.3 Molecular dynamics

Since protein-ligand interactions play vital roles in key biological processes, studying these interactions is an important approach in understanding the basis of protein functionality (Fu et al., 2018). A popular approach to evaluate these interactions is molecular dynamics (MD). Simply put, a MD simulation is a method used to compute the behavior of atoms and molecules over time (Lipkowitz, 1990). Briefly, the atomic particles are modelled with a specified charge and mass.

The electrostatic force field, calculated from the atomic charge is used to estimate the force on each atom in the system. The force is then used to update the positions of the atoms and evolve the system over time. Consequently, MD simulations provide useful information on protein conformation and dynamics (Frenkel and Smit, 2002). This method has gained enormous popularity over the years due to software upgrades that has vastly improved the speed of calculations (Perricone et al., 2018).

MD software include Chemistry of HARvard Macromolecular Mechanics (CHARMM), GROningen Machine for Chemical Simulations (GROMACS) and Assisted Model Building with Energy Refinement (AMBER). Apart from classical MD simulations, these packages also provide useful tools for protein assessment, including predicting the binding-free energy scores of protein- ligand systems as discussed below.

1.14.4 Predicting protein-ligand binding-free energies

As previously discussed, the recognition of ligands by proteins drive many biological processes.

The strength of this recognition is characterized by binding affinities (Jiao et al., 2008). The two most popular methods used to determine binding-free energies are the Molecular Mechanics- Poisson Boltzmann and Generalized Born Surface Area calculations (MM-PBSA and MM- GBSA). The MM-P(G)BSA approach to estimate binding-free energies are based on thermodynamics as shown in equation (1) (Case, 2014). The equation shows that the binding-free energy is the difference between the solvated complex, ligand and protein.

∆𝐺

_{𝑏𝑖𝑛𝑑,𝑠𝑜𝑙𝑣}

= ∆𝐺

_{𝑐𝑜𝑚,𝑠𝑜𝑙𝑣}

− (∆𝐺

_{𝑟𝑒𝑐,𝑠𝑜𝑙𝑣}

+ ∆𝐺

_{𝑙𝑖𝑔,𝑠𝑜𝑙𝑣}

)

(1)

Each of these components are calculated based on the trajectory of the complex as well as the atomic interactions. Thus, equation (1) can also be written as equation (2):

𝐺 = 𝐸

_𝑏𝑛𝑑

+ 𝐸

_𝑒𝑙

+ 𝐸

_𝑣𝑑𝑊

+ 𝐺

_𝑝𝑜𝑙

+ 𝐺

_𝑛𝑝

− 𝑇𝑆

(2)

In equation (2) the first three terms (Ebnd, Eel, EvdW) are standard MM terms corresponding to bond, electrostatic and van der Waals (vdW) interactions. The polar and non-polar contributions are represented by Gpol and Gnp. In the MM-P(G)BSA calculations, the polar contributions are estimated either using the GB model or PB equation. Contrastingly, the non-polar contribution is estimated by the solvent accessible surface area (SASA). The TS term refers to the absolute temperature (T) multiplied by the entropy (S). Entropy calculations are estimated using the normal mode analysis (Genheden and Ryde, 2015). Normal mode analyses have a large margin of error therefore introducing significant uncertainty. Additionally, these calculations are computationally expensive (Graham et al., 2013). As a result, entropy calculations can be omitted if comparisons are being made between ligands binding to the same protein.

Dalam dokumen The HIV-1 gag and protease: exploring the coevolving nature and structural implications of complex drug resistance mutational patterns in subtype C. (Halaman 44-47)