Available theoretical prediction models for predicting stabilizing mutations To overcome the demerits of directed evolution approaches numerous in silico algorithms

Introduction and literature review

1.7 Available theoretical prediction models for predicting stabilizing mutations To overcome the demerits of directed evolution approaches numerous in silico algorithms

have been proposed which can predict whether conceptualized mutations will be extreme-stabilizing. These models have been developed by investigating protein features by comparing extremostable proteins with non-extremostable proteins at different hierarchies of protein organization: from the nucleotide codons in their genes, their amino acid preferences in their protein sequence to their tertiary structures. The algorithms available to date with the capability of distinguishing thermostabilizing mutants are mostly knowledge based¹⁸⁴. Few are support vector machine (SVM) based¹⁸⁵ and others are based on molecular dynamics¹⁸⁶. For example, ROSETTA is a powerful modeling package with modules for protein structure prediction, modeling, docking and design. At present, methods are available to predict stability changes only when the atomic structure of the protein is available¹⁸⁵. Thus, the methods to predict extreme-stability have focused on the three dimensional structures of proteins. Table 1.5 presents the software and tools required to predict and design mutations that can stabilize proteins.

Table 1.5: Popular software that predict stability changes after mutations.

Tools Salient Features References

ROSETTA A dynamic and evolving macromolecular modeling suite addressing biomolecular structure prediction and design

184

I-Mutant Support Vector machine based, both sequence and structure can be used, single mutation

185

Cupsat Sequence as input, single amino acid mutations ⁷³

MUPRO Support Vector machine based, sequence as input, Single mutation ¹⁸⁷

ERIS Structure as input, multiple mutations ¹⁸⁸

iPTREE-STAB Machine learning based, single mutation ¹⁸⁹

PoPMuSiC Single mutation ¹⁹⁰

WET-STAB Machine learning based, multiple mutation ¹⁹¹

MUSTAB Support Vector machine based, sequence as input, multiple mutations ¹⁹² AUTO–MUTE Machine learning based, structure as input, single mutation ¹⁹³

SDM Sequence/structure as input, single mutation ¹⁹⁴

iSTABLE Support vector machine based, structure/sequence as input, single mutation

195

NeEMO Machine learning based, structure as input, ¹⁹⁶

ENCoM Neural Network based, single mutation ¹⁹⁷

iRDP Ensemble of servers ¹¹⁶

Although a lot of work has been done for identifying stabilizing mutations, protein engineering methods utilized to achieve them are still random and success rate is probabilistic. It can be said here that the accurate prediction of the thermodynamic consequences caused by mutations through in silico algorithms remains challenging¹⁹⁸. Khan and Vihinen recently evaluated and compared 11 online stability predictors and found that the predictions were only moderately accurate¹⁹⁹. Limitations are that majority of them require complex computational power and proficiencies. Another drawback is that they are based on calculations of features from protein sequences and can consider only single point mutations at a time and also requires several empirical parameters or heuristics such as patterning of residues for their calculations. Moreover statistical analysis based on Tm values (the midpoint of the thermal transition), suffers the fact that it is available only for a few proteins in a high resolution protein structural dataset. This limits the ability to examine correlations in a significant way²⁶. Molecular dynamic simulations of mutation are several orders of magnitude complicated than that with a knowledge-based scoring function¹⁹⁸. The other concern is that, only few algorithms can predict the effect of multiple mutations. Multi-site mutations are expected to have more complex effect on protein thermostability than from single point mutations²⁰⁰. For example, a predictive model WET was developed. It is a weighted decision table method for predicting protein thermostability change upon double mutation from amino acid sequences (Huang and Gromiha 2009). However, the accuracy drops to 0.57 when it is tested on the hypothetical reverse mutations²⁰¹. The other model PROTS-RF is based on Random Forest algorithm and achieves an accuracy of 78.7% for multiple mutations²⁰¹. The accuracy achieved until date creates limitation when greater than two mutations are to be performed. Molecular dynamics simulation (MD simulation) has been a recent and

popular methodology to understand the rationale behind protein thermostability. To design thermostable proteins by approaches of MD simulation one can look at the root mean square fluctuation, root mean square deviation, Radius of gyration graphs of the wild type protein at high temperatures. Such graphs represent the residue flexibility, backbone rigidity and compactness. MD simulation can be carried out using packages like GROMACS, NAMD and DESMOND. Mutating highly flexible residues with ones that are more rigid can enhance their thermal stability. Manjunath and Sekar carried out 50 ns MD simulation of SAICAR synthetase from mesophilic and hyperthermophilic sources²⁰². They concluded that the thermostable proteins were more rigid. Long distance interactions are lost in mesostable proteins in contrast to that observed in the thermostable counterparts. Paul et al. performed 10 ns simulations each at 300 and 350 K, and 20 ns each at 400 and 450 K for chemotaxis protein from Thermotoga maritima and its mesophilic counterpart Salmonella enteric. They observed the mesostable protein to have greater flexibility at higher temperatures⁴⁹. In another work, 16 thermostable mutants of Bacillus subtilis lipase A were studied and a direct correlation was derived for structural rigidity and thermostability¹⁴⁹. Irrespective of the success of the method, the only drawback is that it is case sensitive and specific. It can guide in mutating a particular protein to enhance its temperature stability but do not lay foundations for a universal approach.

Additionally the cumulative effect of all the mutations on the physicochemical features or structural changes associated with the mutations cannot be as such predicted using the aforementioned algorithms. Also another lacuna is that all these methods give multiple choices of possible stabilizing mutations and do not conclude whether they will actually lead to thermostability. Moreover in doing so, they also fail to select as to which point mutation (single, multiple) or which combination of mutations will actually lead to thermostability of proteins. In short they are unable to rank or prioritize the plausible mutations based on their effect on stability on proteins. Therefore a new method is needed that can prioritize features according to their importance in rendering proteins thermostable at a desired temperature. This will give rise to a guided approach to extreme stabilize proteins.

1.8 Salient features deduced from engineered extreme-stable mutants to better

Dalam dokumen Exploring molecular adaptations of extremophilic proteins: a platform for protein engineering. (Halaman 54-57)