It is declared that the work described in this thesis entitled “Attaining Protein Thermostability – A Rationalized Approach” carried out by Ms. Debamitra Chakravorty (roll number for the award of the degree of Doctor of Philosophy is an authentic report of the results obtained from the research work carried out under my supervision at the Department of Biosciences and Bioengineering, Indian Institute of Technology Guwahati, India, and this work has not been submitted elsewhere for a degree. It is essential to know whether such enzymes can be produced recombinantly by through protein engineering approaches.
Protein attributes responsible for thermostability
Genome
Thermophiles and hyperthermophiles have been reported to have high GC content (Bao et al. 2002; Saunders et al. 2003). Recently, work done by Zeldovich et al. 2007) conclude that an increase in purine (A+G) of thermophilic bacterial genomes due to the preference for isoleucine, valine, tyrosine, tryptophan, arginine, glutamine and leucine, which have purine-rich codon patterns, is responsible for the possible primary adaptation mechanism for thermophilicity.
Proteome
A study of 16 protein families showed that thermostable proteins show a constant increase in hydrogen bonds (Vogt et al. 1997). Salt bridges reduce ∆Cp thus leading to the shift of the thermostability curve (Chan et al. 2011).
Thermodynamic stability
Role of water
The point to consider is that as water is released at higher temperatures, the local protein structure around water binding sites such as Ser or Thr can be changed so that it is unstable enough to induce protein instability (Denisov, 1999 and Nagendra et al. 1998). Thus, studies have shown that the thermophilic proteins have a very low frequency of Ser compared to mesophilic proteins (Chakravarty and Varadarajan, 2000 and Kumar et al. 2000).
Approaches to develop thermostable proteins by protein engineering
The gap is that this can lead to neutral and deleterious mutations (Lehmann et al. 2001). Moreover, the predictive power of these concepts is quite limited; the targeted mutations are random and must be tested individually by site-directed mutagenesis (Spector et al. 2000).
Available Thermostable Protein Databases
Theoretical prediction models of thermostability
In 2010, Prethermut software was developed, based on machine learning methods, to predict the effect of single or multi-site mutations on protein thermostability (Tian et al. 2010). Multi-site mutations are expected to have more complex effect on protein thermostability than single point mutations (Tian et al. 2010).
An insight from molecular dynamics simulation
Another gap is also that all these methods provide multiple choices for possible stabilizing mutations and do not conclude whether they will actually lead to thermostability. Moreover, they fail to select which point mutation (single, multiple) or which combination of mutations will actually lead to thermostability of proteins.
Origin of the work
Regardless of the success of the method, the only drawback is that it is sensitive and specific.
Hypothesis
The relevance and expected outcome of the proposed study
The work uses all possible knowledge available about protein stability to develop a ranking model. The model will be able to predict whether mutations in the protein will positively lead to thermostability.
Conclusions
This method is a low-cost and time-saving technology compared to current computational and experimental approaches. Fourth, work that prioritized factors according to their role in contributing to thermostability ceases to exist.
Creation of Thermostable Protein Database
Prologue
Introduction
It is clear that the database dedicated exclusively to thermostable proteins ceases to exist to this day. This chapter outlined the development and integration of a curated database for thermostable proteins.
Methodology
- Data collection, database architecture and integration
- Data analysis
Classification
Amino acid composition analysis of all thermostable proteins
Structural analysis of all thermostable proteins and feature generation
Refinement of collected data and generated features
The mesostable homologous counterparts were obtained via a BLAST search using all other structures in PDB and optimal parameters. From each BLAST search, the protein with the highest ranking (by E-value) was chosen and only thermophilic and mesophilic proteins of the wild type were retained.
Amino acid composition analysis of the refined dataset
Motif Discovery in refined data of thermostable and mesostable pairs
From the total proteins in our database, protein structures with resolution greater than 2.5 Å were removed.
Refinement of features and intra-protein interaction analysis of the refined dataset
Results and Discussion
- Data collection, database architecture and integration
Data collection
Database architecture
Data analysis
In addition, 30% of thermostable proteins have crystal structures belonging to hydrolases, and most proteins have a temperature stability range of 70-80 °C. Proteins represent 132 thermostable organisms in the database. The analysis of the amino acid composition of 378 thermostable proteins led to the conclusion that the percentage of charged and non-polar amino acids is higher than
Refinement of collected data and generated features
Amino acid composition analysis of the refined data
- Conclusions
- Introduction
- Methodology
- Sequence collection and characterization
- Multiple Sequence Alignment (MSA) of thermostable and mesostable lipases
- Study of percentage amino acid composition of thermostable and mesostable lipases
- Structural characterization by tree based annotation of thermostable and mesostable lipases
- Structural analysis
- Study of structurally important residues of thermostable- mesostable lipases
- Results and Discussion
- Sequence characterization
Although Ala has been reported to have a high helix propensity (Panja et al. 2015), its percentage was observed to be lower in thermostable proteins. Moreover, PDB structures gave large RMSD deviations as assigned by CE Calculate (Shindyalov et al. 1998).
Active site residues
Oxyanion hole
The lid of lipases
This result clearly supports this point, as we noted the presence of poly Ala residues in the lid helix of Baciullus thermoalkalophilic lipases, which may largely lead to their thermostability. Furthermore, the stability of the lid helix at elevated temperature may be critical for the thermal activity of lipases.
Ion binding
Moreover, a P-loop-like motif with Arg to Lys substitution is observed in protein tyrosine phosphatases (Zang et al. 1998). Since it has been reported that Zn2+ binding induces thermostability of lipases (Fujii et al. 1996), this P-loop-like motif can therefore be considered by our data analysis as a conserved pattern in thermoalkalophilic Bacillus lipases.
The AXXXA and GXXXG motifs
- Comparison of amino acid composition of thermostable and mesostable lipases
- Structural analysis of lipases by subfamily tree annotation
- HotSpot Wizard and CUPSAT analysis of structurally important residues
- Conclusions
It was hypothesized that increasing the strand length to seven residues near β-hairpins would increase the conformational stability of the protein (Stanger et al. 2001). This is entropically favorable at certain turning positions and leads to an increase in protein stiffness (Trevino et al. 2007).
Rationalizing Protein
Thermostability by Multiple Feature Ranking for Model
Introduction
Moreover, favored mutations are related to the global stability of a protein (Wijma et al. 2013). Second, to achieve the same, a deeper understanding of the mechanisms underlying protein thermostability is still a prerequisite (Eijsink et al., 2004).
Materials and Methods
- Datasets for feature generation
- Classification of thermostable proteins through machine learning algorithms
The two sets in each case were assigned the name TP for thermostable proteins and MP for mesostable proteins. Furthermore, the next goal was to create a model that has the ability to distinguish between thermostable and mesostable proteins.
Application of attribute weighting to enumerate important thermostabilizing features
The final dataset with their features was imported into Rapid Miner (RapidMiner 5.3.000, Rapid-I GmbH, Stochumer Str Dortmund, Germany) and the thermostable and mesostable proteins (categorized as T and M) were set as the label attribute. For the same reason, the dataset was further subjected to unsupervised lazy modeling and supervised algorithms.
Application of unsupervised clustering for model generation for protein thermostability
Feature weight presented with important features and tractable properties, but alone was insufficient to generate models for protein thermostability. Since biological datasets may have missing values, Expectation Maximization algorithm estimates likelihood parameter in models with incomplete data (Do et al. 2008).
Application of supervised clustering for model generation for protein thermostability
Application of Multicriteria Decision Making algorithm
The next objective was to rank or prioritize protein structural features according to their contribution to thermostability. Machine learning methods can classify thermostable proteins but cannot prioritize thermostability factors by ranking them according to their importance in making proteins thermostable.
Hierarchical clustering
Deriving at weights of features and the pairwise comparison matrix
Development of RankProt
The principle for deriving ranks was by matrix multiplication of features in the test set by the priorities/eigenvectors of the features. Therefore, if the rank of the mutated structure is higher than the reference structure, such mutations would qualify as thermostabilizing.
Performance and validation
Thus, if the wild-type and mutated constructs are available, one can predict whether the mutation will lead to thermal stability from the ranks given by RankProt for the wild-type and mutated constructs. If, and only if, such stabilizing mutations increase the number of higher priority features, they will lead to protein thermostabilization.
Ranking proteins and mutations
Accuracy
Results and Discussion
- Classification of thermostable proteins through machine learning algorithms
Datasets for feature generation
From Table 4.1 it can be observed that Main chain-main chain hydrogen bonds, polar accessible surface area, charged accessible surface area and ionic interactions were the properties given weights by 10 of the weighting algorithms. An increase in main chain–main chain hydrogen bonds and ionic interactions has been reported to increase the stability of proteins (Sadeghi et al. 2006).
Unsupervised clustering to generate model for protein thermostability
Supervised clustering to generate model for protein thermostability
Lazy modeling to generate model for protein thermostability
Decision Trees to generate model for protein thermostability
Multicriteria decision making to rank thermostabilizing features
The application of AHP for ranking thermostabilizing features
Generation of feature weights for thermostability factors
Using this method, all properties were prioritized according to their importance in contributing to protein thermostability. Thus, according to Saaty, 2008, the above-mentioned judgment for the derivation of preference vectors can be accepted as consistent, since the value of the consistency ratio (CR) is less than 0.10 (Saaty et al. 2008).
Ranking obtained for features contributing to thermostability
In such hydrogen bonds, the donor (NH) and acceptor (CO) atoms come from the backbone. This indicates that all types of hydrogen bonds do not contribute equally to thermostabilizing proteins.
RankProt: Validation and Accuracy
Conclusions
Moreover, the edge of this method is that multiple combinations of mutations can be prioritized at once with higher rank assigned to the more stabilizing ones. The software package can be downloaded on demand and the download link is available on the web interface of Thermostable Protein Structural Database.
Attaining Plausible Mutations to Enhance Protein
Thermostability
Introduction
To further validate RankProt, the wild-type lipase A protein of Bacillus subtilis, stable at 35oC, was chosen as a model enzyme for carrying out mutations in this chapter (Acharya et al. 2004). Therefore, for further validation of RankProt in this chapter, the wild-type lipase A protein of Bacillus subtilis, stable at 35 °C, was chosen as a model enzyme for carrying out mutations (Acharya et al. 2004).
Methodology
- Selecting model enzyme for experimentation
- In silico mutagenesis
- Contact map analysis
- Molecular dynamics simulation
The contact type and distance cutoff are provided by the tool (Vehlow et al. 2011). Therefore, the HB-plot tool (Bikadi et al. 2007) was used to analyze the network of hydrogen bonds in wild-type and mutant structures.
Analysis of MD Simulation Trajectories
Results and Discussion
- Selecting model enzyme for experimentation
Parrinello−Rahman barostat (Parrinello et al. 1981) with a temperature and pressure coupling time constant of 1.0 ps. Interestingly, apart from the mentioned features leading to thermostabilization, γ-turns were observed to increase in the mutated Bacillus subtilis lipases.
Homology modeling and docking studies
Ranking via RankProt
Contact map analysis to enumerate the importance of predicted stabilizing mutations
It can be clearly observed that the number of unique contacts in mutants is much higher than the wild-type structure. Comparative bar graphs of unique contacts in thermostable mutants of Bacillus subtilis lipase and wild type (1i6w).
Hydrogen bond analysis of mutated and wild type structures
Molecular dynamics simulation analysis of the predicted thermostabilizing mutations of Bacillus subtilis
MD simulation at higher temperatures for Wild type (1i6w: WT), mut 1, and mut 2 was performed because protein denaturation has been reported to occur on the microsecond time scale (Duan et al. 1998). The mutation at position Q121 was the same for both mut 1 and mut 2 where it was replaced by Asn.
Root Mean Square Deviation (RMSD)
Rg: radius of gyration; RMSD: root mean square deviation; RMSF: root mean square fluctuation; WT: wild type. The root mean square deviation (RMSD) from the initial structure as a function of time for WT and mutants during the 30 ns simulation time course is shown.
Root Mean Square Fluctuation (RMSF)
Both mutants show lower flexibility than the wild type at these regions at all three temperatures. This observation shows that the mutations led to decrease in flexibility of the mutants w.r.t.
Radius of Gyration (Rg)
Hydrogen Bonds
Interestingly, the average number of main chain hydrogen bonds is also much higher for mut 2, followed by WT, and the lowest is observed for mut 1. As the temperature increases, unfolding occurs and hydrogen bonds are formed between the side chains of the amino acid residues and the solvent , i.e. lowering hydrogen bonds within the protein side chain.
Secondary structure analysis
Conclusions
Hydrogen bonds <3Å were much larger for the mutants compared to the wild-type structures. The results revealed many interesting factors that supported these mutants being more stable than the wild type.
Conclusion and Future Perspective
- Conclusions
- Commercial Viability
- Research Output
- Future perspective
Molecular dynamics simulation of wild type and mutants were performed at 320K, 330K and 350K for 30 ns each. The Tm of the mutants was calculated to be 63 °C and 66 °C relative to the wild type, which has a Tm of 59 °C.
BIBLIOGRAPHY
A coarse-grained elastic network atom contact model and its use in the simulation of protein dynamics and the prediction of the effect of mutations. Thermostability of proteins: role of metal binding and pH on the stability of the dinuclear CuA site of Thermus thermophilus.
PUBLICATIONS
Thesis Publications
Journal Papers
Book Chapters
Award
Conference/Workshop presentation
Journal (Other than Thesis work)
Conference (Other than Thesis work)