• Tidak ada hasil yang ditemukan

Theoretical prediction models of thermostability

Prologue

1.7. Theoretical prediction models of thermostability

To overcome the demerits of directed evolution approaches numerous in silico algorithms have been proposed which can predict whether conceptualized mutations will be thermostabilizing. These models have been developed by investigating protein features by comparing thermostable proteins with mesostable proteins at different hierarchies of protein organization: from the nucleotide codons in their genes, their amino acid preferences in their protein sequence to their tertiary structures. The algorithms available to date with the capability of distinguishing thermostabilizing mutants are mostly knowledge based (Rohl et al. 2004). Few are support vector machine (SVM) based (Capriotti et al. 2005) and further lesser are based on molecular dynamics (Benedix et al. 2009). Table presents the existing methods that have been used to predict protein thermostability.

TH-1690_10610619

Attaining Protein Thermostability – A Rationalised Approach 2016

Chapter I 22 Table 1.5. Existing popular softwares that predict stability of mutations

Tools Salient Features References

I-Mutant Support Vector machine based, both sequence and structure can be used, single mutation

Capriotti, 2005

Cupsat Sequence as input, single amino acid

mutations Parthiban et al. 2006

MUPRO Support Vector machine based, sequence

as input, Single mutation Cheng et al. 2006 ERIS Structure as input, multiple mutations Yin et al. 2007 iPRESTAB Machine learning based, single mutation Huang et al. 2007

PoPMuSiC Single mutation Dehouck et al. 2009

WET-STAB Machine learning based, multiple

mutation Huang et al. 2009

MUSTAB Support Vector machine based, sequence

as input, multiple mutations Teng et al. 2010 AUTO–MUTE Machine learning based, structure as

input, single mutation Masso et al. 2011 SDM Sequence/structure as input, single

mutation Worth et al. 2011

iSTABLE Support vector machine based, structure/sequence as input, single mutation

Chen et al. 2013

NeEMO Machine learning based, structure as

input, Giollo et al. 2014

ENCoM Neural Network based, single mutation Frappier et al. 2014

iRDP Ensemble of servers Panigrahi et al. 2015

All the methods used for stability prediction presented in Table 1.5 employ machine learning methods on protein datasets to correctly classify thermostable proteins and discriminate between stabilizing and destabilizing mutations. They perform with higher accuracies than most of the statistical and molecular dynamics simulation methods. The latter also have the disadvantage of requiring high

TH-1690_10610619

Attaining Protein Thermostability – A Rationalised Approach 2016

Chapter I 23 computational power and proficiencies. There are various examples where machine learning approaches have been utilized. Such methods were based on support vector machines, neural networks and decision trees which can predict the effects of mutations on thermostability (Bava et al. 2004; Capriotti et al. 2005; Kumar et al.

2000). Large datasets of known primary, secondary, and tertiary structures of proteins were used to train the machine learning algorithms. Gromiha et al. analyzed the amino acid compositions of 3075 mesophilic and 1609 thermophilic proteins by logistic functions, neural networks, support vector machines, decision trees and found that charged residues as well as the hydrophobic residues have higher occurrence in thermophiles (Gromiha et al. 2008). In 2010, Prethermut software was developed, based on machine learning methods, to predict the effect of single- or multi-site mutations on protein thermostability (Tian et al. 2010). Ebrahimi et al. employed various supervised and unsupervised machine learning algorithms to find amino acid composition features that contribute to enzyme thermostability (Ebrahimi et al. 2011).

They reported Gln content and frequency of hydrophilic residues as the most important protein features for thermostability. They also reported that the amino acid sequence is the main indicator of protein function but direct prediction of protein characteristics such as thermostability is not possible from the primary amino acid sequence (Ebrahimi et al. 2011). Consequently, methods to predict thermostability have focused on the three dimensional structures of proteins. From the aforementioned examples it is clear that bulk of the work done on prediction of protein thermostability is on the primary sequence and tertiary structures of proteins.

Moreover though it has been reported that thermophiles can be distinguished by their pattern of synonymous codon usage for several amino acids (Lynn et al. 2002; Lobry et al. 2003), very less work related to model generation at the nucleotide and codon usage levels of thermophiles has been performed. It was also conclusively reported that at elevated temperature selective constraints at all three molecular levels:

nucleotide content, codon usage and amino acid composition are important to stabilize thermophilic proteins (Lynn et al. 2002). Only recently Lu et al. developed a

TH-1690_10610619

Attaining Protein Thermostability – A Rationalised Approach 2016

Chapter I 24 hybrid fractal algorithm to predict thermophilic nucleotide sequences with an average accuracy of 0.945 (Lu et al. 2012).

Although a lot of work has been done for identifying stabilizing mutations, protein engineering methods utilized to achieve them are still random and success rate is probabilistic. It can be said here that the accurate prediction of the thermodynamic consequences caused by mutations through in silico algorithms remains challenging (Seeliger et al. 2010). Khan and Vihinen recently evaluated and compared 11 online stability predictors and found that the predictions were only moderately accurate (Khan et al. 2010). Limitations are that majority of them require complex computational power and proficiencies. Another drawback is that they are based on calculations of features from protein sequences and can consider only single point mutations at a time and also require several empirical parameters or heuristics such as patterning of residues for their calculations. Moreover statistical analysis based on Tm values (the midpoint of the thermal transition), suffers the fact that it is available only for a few proteins in a high resolution protein structural dataset. This limits the ability to examine correlations in a significant way (Kumar et al. 2000). Molecular dynamic simulations of mutation are several orders of magnitude complicated than that with a knowledge-based scoring function (Sleegier et al. 2010). The other concern is that, only few algorithms can predict the effect of multiple mutations.

Multi-site mutations are expected to have more complex effect on protein thermostability than from single point mutations (Tian et al. 2010). For example, a predictive model weighted decision table method-WET-STAB was developed. It is a weighted decision table method for predicting protein thermostability change upon double mutation from amino acid sequences (Huang et al. 2009). However the accuracy drops to 0.57 when it is tested on the hypothetical reverse mutations (Li et al. 2012). The other model Protein Thermostability Random Forest model (PROTS- RF) is based on Random Forest algorithm and achieves an accuracy of 78.7% for multiple mutations (Li et al. 2012). The accuracy achieved until date creates limitation when greater than two mutations are to be performed. Additionally the cumulative effect of all the mutations on the physicochemical features or structural

TH-1690_10610619

Attaining Protein Thermostability – A Rationalised Approach 2016

Chapter I 25 changes associated with the same cannot be as such predicted using the aforementioned algorithms. Also another lacuna is that all these methods give multiple choices of possible stabilizing mutations and do not conclude whether they will actually lead to thermostability. Moreover, in doing so they also fail to select as to which point mutation (single, multiple) or which combination of mutations will actually lead to thermostability of proteins. In short they are unable to rank or prioritize the plausible mutations based on their effect on stability on proteins.

Therefore, a new method is needed that can prioritize features according to their importance in rendering proteins thermostable at a desired temperature. This will give rise to a guided approach to thermostabilize proteins.