Analysis of attributes contributing to extreme-stability of proteins
4.3 Results and discussion
4.3.2 Relative abundance of protein feature in extremophiles
making approach (Analytic hierarchy process, AHP) for ranking model prediction to finally be able to devise ranking models (for each extremophilic/extremophilic proteins dataset) for categorizing future proteins and prioritize the attributes.
Table 4.3: Enumerating statistically significant protein features by KS test.
Dataset Total
features Statistically significant codon features
T-M 29 19 AA dataset: Ala, Cys, Asp, Glu, His, Ile, Lys, Asn, Gln, Arg, Ser, Thr, Val, Tiny, Sml, Pol, Chrg, Bsc, Acd
P-M 49 15 AA + ST dataset: Ala, His, Met, Asn, Ser, Thr, Trp, Tiny, Ali, Aro, Bsc, Acd, NPASA, CASA, GT
T-P
29 21 AA dataset: Ala, Cys, Asp, Glu, His, Lys, Met, Asn, Gln, Arg, Ser, Thr, Val, Trp, Tiny, Sml, Aro, Pol, Chrg, Bsc, Acd
20 11 ST dataset: GT, HI, MMH, MSH, II, CPI, SB, NPASA, PASA, CASA, PV
A-B
29 27
AA dataset: Ala, Phe, Asp, Glu, His, Ile, Lys, Leu, Met, Asn, Pro, Gln, Arg, Ser, Thr, Val, Trp, Tyr, Tiny, Sml, NPol, Pol, Ali, Aro, Chrg, Bsc, Acd
20 16 ST dataset: II, MMH, MSH, SSH, HB, SB, PASA, CASA, PV, AAI, ASI, CPI, HI, DS, AH, BSH
H-Nh 49 17 AA+ST dataset: Asp, Ile, Asn, Arg, Thr, Tiny, Sml, NPol, Pol, Chrg, Bsc, Acd, HI, CPI, NPASA, SB
B-Nb 49 13 AA+ST dataset: Gln, Arg, Ser, Tiny, Chrg, Bsc, Acd, HI, II, CPI, SB, BSH, BST
Figure 4.2: Enumerating relative abundance of statistically significant protein features for extremophilicity (A) T-M, (B) P-M, (C) T-P, (D) A-B, (E) H-Nh and (F) B-Nb. Green color bars represent positive contributors of main datasets and negative contributors of counter dataset whereas, dark blue color bars represent positive contributors counter datasets and negative contributors of main dataset.
The statistical analysis of relative abundance revealed: in T-M dataset – Chrg, Glu, Acd, Bsc, Arg, Lys, etc. are abundant in thermophilic proteins whereas Sml, Tiny, Gln, Ser, etc. are abundant in mesophilic proteins; in P-M dataset – Tiny, Ala, Ali, Met, Ser, etc.
are abundant in psychrophilic proteins whereas, Bsc, GT, Acd, CASA, etc. are abundant in mesophilic proteins; in T-P dataset – GI, HI, HB, SB, etc. are abundant in thermophilic proteins whereas, MSH, MMH, PV, PASA, etc. are abundant in psychrophilic proteins;
in A-B dataset – HB, AH, MMH, CASA, etc. are abundant in alkaliphilic proteins whereas, BS, MSH, SSH, HI, AAI, etc. are abundant in acidophilic proteins; in H-Nh dataset, Sml, Acd, Tiny, Asp, Thr, etc. are abundant in halophilic proteins whereas, Bsc, Arg, Ile, Npol, Ile, Ali, etc. are abundant in non-halophilic proteins; in B-Nb dataset, Chrg, Bsc, Arg, Acd, BST, II, HI, SB, etc. are abundant in barophilic proteins whereas, Tiny, Sml, Gln, etc. are abundant in non-barophilic proteins. Previous reports also
corroborate the present result showed that charged amino acid such as Glu, Arg, Lys, etc.
showed higher frequency in thermophilic, alkaliphilic and barophilic proteins7,10,29, whereas, small amino, polar and acidic acids such as Ser, Asp, Gln, Thr and Ala showed higher frequency in psychrophilic and halophilic proteins7,20 at sequence level. The psychrophilic proteins also comprise a significantly higher proportion of amino acids that contribute to higher protein flexibility in the coil regions of proteins, such as those with tiny/small or neutral side chains7. On contrary, another example of Pyrococcus abyssi, a hyperthermophilic piezophile having an increase in small amino acids (Gly, Ala, Ser, Thr, Pro, Asp and Asn) across its proteome when compared to that of the related thermophilic archaeon but non-piezophile, Pyrococcus furiosus have been found13. Adaptation of proteins at low pH seems to be attributed to the prevalence of acidic (negatively charged at a neutral pH) amino acids (Asp and Glu) whereas adaptation of proteins at basic pH seems to be attributed to the prevalence of basic amino acids (Lys, Arg and His) on the surface of these enzymes and proteins30. Correspondingly, Reed et al., was also reported the basic amino acids such as arginine is beneficial for stabilization of proteins under extreme temperature as well as extreme pressure condition10. In contrary to barophilic proteins, Metapally and Reddy said that in psychrophilic proteins, small amino acids such as Ser, Asp, Gln, Thr and Ala are overrepresented in the coil regions of secondary structures, whilst charged and basic side chains such as Arg and Glu are underrepresented in the helical regions7.
At structural level, hydrogen bonds, electrostatic interactions, hydrophobic interaction, charge accessible surface area, gamma turns, etc. are the most prominent factors related to the enhanced thermal stability of proteins and make the protein structure robust that can sustain extreme-conditions2,16,31. Increment in main chain-main chain hydrogen bonds and ionic interactions has been reported to increase themostability of proteins32. The electrostatic interactions, salt bridges are among the most prominent factors related to the enhanced thermal stability of proteins from extremophiles. It was also reported that barophilic P. abyssi tends to substitute arginine (Arg) for all other amino acids in sequences homologous to non-barophilic P. furiosus13. At sequence level, this corroborates with the previous results that arginine is frequently occurred amino acid in
most of the barophilic proteins at primary levels13. One of the major finding of this work is that gamma turns increases in thermostable lipases as compared to their mesostable counterparts1.