• Tidak ada hasil yang ditemukan

Attaining Protein Thermostability – A Rationalised Approach

N/A
N/A
Protected

Academic year: 2023

Membagikan "Attaining Protein Thermostability – A Rationalised Approach"

Copied!
224
0
0

Teks penuh

It is declared that the work described in this thesis entitled “Attaining Protein Thermostability – A Rationalized Approach” carried out by Ms. Debamitra Chakravorty (roll number for the award of the degree of Doctor of Philosophy is an authentic report of the results obtained from the research work carried out under my supervision at the Department of Biosciences and Bioengineering, Indian Institute of Technology Guwahati, India, and this work has not been submitted elsewhere for a degree. It is essential to know whether such enzymes can be produced recombinantly by through protein engineering approaches.

Fig. 1.1. Sources of some chosen thermostable enzymes. This figure has been adapted from  Stetter KO
Fig. 1.1. Sources of some chosen thermostable enzymes. This figure has been adapted from Stetter KO

Protein attributes responsible for thermostability

Genome

Thermophiles and hyperthermophiles have been reported to have high GC content (Bao et al. 2002; Saunders et al. 2003). Recently, work done by Zeldovich et al. 2007) conclude that an increase in purine (A+G) of thermophilic bacterial genomes due to the preference for isoleucine, valine, tyrosine, tryptophan, arginine, glutamine and leucine, which have purine-rich codon patterns, is responsible for the possible primary adaptation mechanism for thermophilicity.

Proteome

A study of 16 protein families showed that thermostable proteins show a constant increase in hydrogen bonds (Vogt et al. 1997). Salt bridges reduce ∆Cp thus leading to the shift of the thermostability curve (Chan et al. 2011).

Table 1.2. Contribution of amino acid residues towards protein thermostability  Amino
Table 1.2. Contribution of amino acid residues towards protein thermostability Amino

Thermodynamic stability

Role of water

The point to consider is that as water is released at higher temperatures, the local protein structure around water binding sites such as Ser or Thr can be changed so that it is unstable enough to induce protein instability (Denisov, 1999 and Nagendra et al. 1998). Thus, studies have shown that the thermophilic proteins have a very low frequency of Ser compared to mesophilic proteins (Chakravarty and Varadarajan, 2000 and Kumar et al. 2000).

Approaches to develop thermostable proteins by protein engineering

The gap is that this can lead to neutral and deleterious mutations (Lehmann et al. 2001). Moreover, the predictive power of these concepts is quite limited; the targeted mutations are random and must be tested individually by site-directed mutagenesis (Spector et al. 2000).

Table 1.4. Present approaches for thermostabilizing proteins  Proposed features   Contributing factors
Table 1.4. Present approaches for thermostabilizing proteins Proposed features Contributing factors

Available Thermostable Protein Databases

Theoretical prediction models of thermostability

In 2010, Prethermut software was developed, based on machine learning methods, to predict the effect of single or multi-site mutations on protein thermostability (Tian et al. 2010). Multi-site mutations are expected to have more complex effect on protein thermostability than single point mutations (Tian et al. 2010).

Table 1.5. Existing popular softwares that predict stability of mutations
Table 1.5. Existing popular softwares that predict stability of mutations

An insight from molecular dynamics simulation

Another gap is also that all these methods provide multiple choices for possible stabilizing mutations and do not conclude whether they will actually lead to thermostability. Moreover, they fail to select which point mutation (single, multiple) or which combination of mutations will actually lead to thermostability of proteins.

Origin of the work

Regardless of the success of the method, the only drawback is that it is sensitive and specific.

Hypothesis

The relevance and expected outcome of the proposed study

The work uses all possible knowledge available about protein stability to develop a ranking model. The model will be able to predict whether mutations in the protein will positively lead to thermostability.

Conclusions

This method is a low-cost and time-saving technology compared to current computational and experimental approaches. Fourth, work that prioritized factors according to their role in contributing to thermostability ceases to exist.

Creation of Thermostable Protein Database

Prologue

Introduction

It is clear that the database dedicated exclusively to thermostable proteins ceases to exist to this day. This chapter outlined the development and integration of a curated database for thermostable proteins.

Fig. 2.1. The schema for thermostable protein database development.
Fig. 2.1. The schema for thermostable protein database development.

Methodology

  • Data collection, database architecture and integration
  • Data analysis

Classification

Amino acid composition analysis of all thermostable proteins

Structural analysis of all thermostable proteins and feature generation

Refinement of collected data and generated features

The mesostable homologous counterparts were obtained via a BLAST search using all other structures in PDB and optimal parameters. From each BLAST search, the protein with the highest ranking (by E-value) was chosen and only thermophilic and mesophilic proteins of the wild type were retained.

Amino acid composition analysis of the refined dataset

Motif Discovery in refined data of thermostable and mesostable pairs

From the total proteins in our database, protein structures with resolution greater than 2.5 Å were removed.

Refinement of features and intra-protein interaction analysis of the refined dataset

Results and Discussion

  • Data collection, database architecture and integration

Data collection

Database architecture

Data analysis

In addition, 30% of thermostable proteins have crystal structures belonging to hydrolases, and most proteins have a temperature stability range of 70-80 °C. Proteins represent 132 thermostable organisms in the database. The analysis of the amino acid composition of 378 thermostable proteins led to the conclusion that the percentage of charged and non-polar amino acids is higher than

Fig.  2.4.  A)  Pi-charts  illustrating  classification  of  thermostable  proteins  in  the  kingdoms  of  classification
Fig. 2.4. A) Pi-charts illustrating classification of thermostable proteins in the kingdoms of classification

Refinement of collected data and generated features

Amino acid composition analysis of the refined data

  • Conclusions
  • Introduction
  • Methodology
    • Sequence collection and characterization
    • Multiple Sequence Alignment (MSA) of thermostable and mesostable lipases
    • Study of percentage amino acid composition of thermostable and mesostable lipases
    • Structural characterization by tree based annotation of thermostable and mesostable lipases
    • Structural analysis
    • Study of structurally important residues of thermostable- mesostable lipases
  • Results and Discussion
    • Sequence characterization

Although Ala has been reported to have a high helix propensity (Panja et al. 2015), its percentage was observed to be lower in thermostable proteins. Moreover, PDB structures gave large RMSD deviations as assigned by CE Calculate (Shindyalov et al. 1998).

Table 2.3. The 17 statistically significant features responsible for protein thermostability  Sl
Table 2.3. The 17 statistically significant features responsible for protein thermostability Sl

Active site residues

Oxyanion hole

The lid of lipases

This result clearly supports this point, as we noted the presence of poly Ala residues in the lid helix of Baciullus thermoalkalophilic lipases, which may largely lead to their thermostability. Furthermore, the stability of the lid helix at elevated temperature may be critical for the thermal activity of lipases.

Ion binding

Moreover, a P-loop-like motif with Arg to Lys substitution is observed in protein tyrosine phosphatases (Zang et al. 1998). Since it has been reported that Zn2+ binding induces thermostability of lipases (Fujii et al. 1996), this P-loop-like motif can therefore be considered by our data analysis as a conserved pattern in thermoalkalophilic Bacillus lipases.

The AXXXA and GXXXG motifs

  • Comparison of amino acid composition of thermostable and mesostable lipases
  • Structural analysis of lipases by subfamily tree annotation
  • HotSpot Wizard and CUPSAT analysis of structurally important residues
  • Conclusions

It was hypothesized that increasing the strand length to seven residues near β-hairpins would increase the conformational stability of the protein (Stanger et al. 2001). This is entropically favorable at certain turning positions and leads to an increase in protein stiffness (Trevino et al. 2007).

Fig  3.2. Average  %  amino  acid  composition  of  bacterial,  fungal  thermostable  and  mesostable  lipases
Fig 3.2. Average % amino acid composition of bacterial, fungal thermostable and mesostable lipases

Rationalizing Protein

Thermostability by Multiple Feature Ranking for Model

Introduction

Moreover, favored mutations are related to the global stability of a protein (Wijma et al. 2013). Second, to achieve the same, a deeper understanding of the mechanisms underlying protein thermostability is still a prerequisite (Eijsink et al., 2004).

Fig. 4.1. Schema to identify and generate thermostable mutants.
Fig. 4.1. Schema to identify and generate thermostable mutants.

Materials and Methods

  • Datasets for feature generation
  • Classification of thermostable proteins through machine learning algorithms

The two sets in each case were assigned the name TP for thermostable proteins and MP for mesostable proteins. Furthermore, the next goal was to create a model that has the ability to distinguish between thermostable and mesostable proteins.

Application of attribute weighting to enumerate important thermostabilizing features

The final dataset with their features was imported into Rapid Miner (RapidMiner 5.3.000, Rapid-I GmbH, Stochumer Str Dortmund, Germany) and the thermostable and mesostable proteins (categorized as T and M) were set as the label attribute. For the same reason, the dataset was further subjected to unsupervised lazy modeling and supervised algorithms.

Application of unsupervised clustering for model generation for protein thermostability

Feature weight presented with important features and tractable properties, but alone was insufficient to generate models for protein thermostability. Since biological datasets may have missing values, Expectation Maximization algorithm estimates likelihood parameter in models with incomplete data (Do et al. 2008).

Application of supervised clustering for model generation for protein thermostability

Application of Multicriteria Decision Making algorithm

The next objective was to rank or prioritize protein structural features according to their contribution to thermostability. Machine learning methods can classify thermostable proteins but cannot prioritize thermostability factors by ranking them according to their importance in making proteins thermostable.

Hierarchical clustering

Deriving at weights of features and the pairwise comparison matrix

Development of RankProt

The principle for deriving ranks was by matrix multiplication of features in the test set by the priorities/eigenvectors of the features. Therefore, if the rank of the mutated structure is higher than the reference structure, such mutations would qualify as thermostabilizing.

Performance and validation

Thus, if the wild-type and mutated constructs are available, one can predict whether the mutation will lead to thermal stability from the ranks given by RankProt for the wild-type and mutated constructs. If, and only if, such stabilizing mutations increase the number of higher priority features, they will lead to protein thermostabilization.

Ranking proteins and mutations

Accuracy

Results and Discussion

  • Classification of thermostable proteins through machine learning algorithms

Datasets for feature generation

From Table 4.1 it can be observed that Main chain-main chain hydrogen bonds, polar accessible surface area, charged accessible surface area and ionic interactions were the properties given weights by 10 of the weighting algorithms. An increase in main chain–main chain hydrogen bonds and ionic interactions has been reported to increase the stability of proteins (Sadeghi et al. 2006).

Table 4.1. Results of attribute weighting
Table 4.1. Results of attribute weighting

Unsupervised clustering to generate model for protein thermostability

Supervised clustering to generate model for protein thermostability

Lazy modeling to generate model for protein thermostability

Decision Trees to generate model for protein thermostability

Multicriteria decision making to rank thermostabilizing features

The application of AHP for ranking thermostabilizing features

Generation of feature weights for thermostability factors

Using this method, all properties were prioritized according to their importance in contributing to protein thermostability. Thus, according to Saaty, 2008, the above-mentioned judgment for the derivation of preference vectors can be accepted as consistent, since the value of the consistency ratio (CR) is less than 0.10 (Saaty et al. 2008).

Ranking obtained for features contributing to thermostability

In such hydrogen bonds, the donor (NH) and acceptor (CO) atoms come from the backbone. This indicates that all types of hydrogen bonds do not contribute equally to thermostabilizing proteins.

RankProt: Validation and Accuracy

Conclusions

Moreover, the edge of this method is that multiple combinations of mutations can be prioritized at once with higher rank assigned to the more stabilizing ones. The software package can be downloaded on demand and the download link is available on the web interface of Thermostable Protein Structural Database.

Attaining Plausible Mutations to Enhance Protein

Thermostability

Introduction

To further validate RankProt, the wild-type lipase A protein of Bacillus subtilis, stable at 35oC, was chosen as a model enzyme for carrying out mutations in this chapter (Acharya et al. 2004). Therefore, for further validation of RankProt in this chapter, the wild-type lipase A protein of Bacillus subtilis, stable at 35 °C, was chosen as a model enzyme for carrying out mutations (Acharya et al. 2004).

Methodology

  • Selecting model enzyme for experimentation
  • In silico mutagenesis
  • Contact map analysis
  • Molecular dynamics simulation

The contact type and distance cutoff are provided by the tool (Vehlow et al. 2011). Therefore, the HB-plot tool (Bikadi et al. 2007) was used to analyze the network of hydrogen bonds in wild-type and mutant structures.

Analysis of MD Simulation Trajectories

Results and Discussion

  • Selecting model enzyme for experimentation

Parrinello−Rahman barostat (Parrinello et al. 1981) with a temperature and pressure coupling time constant of 1.0 ps. Interestingly, apart from the mentioned features leading to thermostabilization, γ-turns were observed to increase in the mutated Bacillus subtilis lipases.

Table 5.1. Comparative analysis of reported and predicted features in thermostability  of  Bacillus subtilis  lipases
Table 5.1. Comparative analysis of reported and predicted features in thermostability of Bacillus subtilis lipases

Homology modeling and docking studies

Ranking via RankProt

Contact map analysis to enumerate the importance of predicted stabilizing mutations

It can be clearly observed that the number of unique contacts in mutants is much higher than the wild-type structure. Comparative bar graphs of unique contacts in thermostable mutants of Bacillus subtilis lipase and wild type (1i6w).

Fig  5.3.  Comparative  bar  graphs  of  unique  contacts  in  thermostable  mutants  of  Bacillus  subtilis   lipase  and  wild  type  (1i6w)
Fig 5.3. Comparative bar graphs of unique contacts in thermostable mutants of Bacillus subtilis lipase and wild type (1i6w)

Hydrogen bond analysis of mutated and wild type structures

Molecular dynamics simulation analysis of the predicted thermostabilizing mutations of Bacillus subtilis

MD simulation at higher temperatures for Wild type (1i6w: WT), mut 1, and mut 2 was performed because protein denaturation has been reported to occur on the microsecond time scale (Duan et al. 1998). The mutation at position Q121 was the same for both mut 1 and mut 2 where it was replaced by Asn.

Root Mean Square Deviation (RMSD)

Rg: radius of gyration; RMSD: root mean square deviation; RMSF: root mean square fluctuation; WT: wild type. The root mean square deviation (RMSD) from the initial structure as a function of time for WT and mutants during the 30 ns simulation time course is shown.

Root Mean Square Fluctuation (RMSF)

Both mutants show lower flexibility than the wild type at these regions at all three temperatures. This observation shows that the mutations led to decrease in flexibility of the mutants w.r.t.

Fig.  5.10. Graph showing difference in RMSF between mut 1, mut 2 and wild type  lipaseat A) 320K B) 330K C) 350K
Fig. 5.10. Graph showing difference in RMSF between mut 1, mut 2 and wild type lipaseat A) 320K B) 330K C) 350K

Radius of Gyration (Rg)

Hydrogen Bonds

Interestingly, the average number of main chain hydrogen bonds is also much higher for mut 2, followed by WT, and the lowest is observed for mut 1. As the temperature increases, unfolding occurs and hydrogen bonds are formed between the side chains of the amino acid residues and the solvent , i.e. lowering hydrogen bonds within the protein side chain.

Fig.  5.12.  The  average  number  of  hydrogen  bonds  per  frame  of  the  30  ns  MD  simulations for A) All hydrogen bonds B) Main chain C) Side chain hydrogen bonds  at 320, 330 and 350K
Fig. 5.12. The average number of hydrogen bonds per frame of the 30 ns MD simulations for A) All hydrogen bonds B) Main chain C) Side chain hydrogen bonds at 320, 330 and 350K

Secondary structure analysis

Conclusions

Hydrogen bonds <3Å were much larger for the mutants compared to the wild-type structures. The results revealed many interesting factors that supported these mutants being more stable than the wild type.

Conclusion and Future Perspective

  • Conclusions
  • Commercial Viability
  • Research Output
  • Future perspective

Molecular dynamics simulation of wild type and mutants were performed at 320K, 330K and 350K for 30 ns each. The Tm of the mutants was calculated to be 63 °C and 66 °C relative to the wild type, which has a Tm of 59 °C.

BIBLIOGRAPHY

A coarse-grained elastic network atom contact model and its use in the simulation of protein dynamics and the prediction of the effect of mutations. Thermostability of proteins: role of metal binding and pH on the stability of the dinuclear CuA site of Thermus thermophilus.

PUBLICATIONS

Thesis Publications

Journal Papers

Book Chapters

Award

Conference/Workshop presentation

Journal (Other than Thesis work)

Conference (Other than Thesis work)

Gambar

Table 1.1. Sources of thermostable enzymes
Table 1.2. Contribution of amino acid residues towards protein thermostability  Amino
Table 1.3. Existing popular tools to predict intra-protein interactions
Table 1.5. Existing popular softwares that predict stability of mutations
+7

Referensi

Dokumen terkait

https://doi.org/ 10.1017/jie.2019.13 Received: 17 September 2018 Revised: 17 October 2018 Accepted: 23 April 2019 First published online: 2 September 2019 Key words: Aboriginal