• Tidak ada hasil yang ditemukan

A thorough literature review highlights that a comprehensive and curated database is required for thermostable proteins. Thus, database creation has been one of the main focuses of this dissertation. A user friendly and web compatible relational database-Thermostable Protein Structural database was created. The same can be accessed through www.extreme-stabledb.in. The architecture of database was in Apache, PHP and MySql platform. The salient features include information about all the thermostable proteins available in Protein Data Bank. The database contains information about 378 thermostable proteins from 132 thermophilic and thermophilic organisms and 261 mutants. Details about their source organisms, phylogeny and enzyme classification were also collected. The database also has information about the engineered mutants which have played a pivotal role in understanding protein thermostability. Thermostability has been attributed to be the result of cumulative effect of multitudes of factors. Thus, data regarding protein features like their physicochemical properties and amino acid composition can also be searched for in the database. The database also provides information about the plethora of literature available about thermostable proteins and data which are dedicated to understand the rationale behind protein thermostability. The database has also been utilized to generate meaningful data by data refinement and collection of homologous mesostable protein structures and enumeration of their complementary features to that of the thermostable proteins. This data has been employed for an in-depth study about protein thermostability.

Outputs:

1. Thermostable protein structural database and Intra-Protein Interaction Enumerator.

2. Debamitra Chakravorty, Mohd. Faheem Khan, Sanjukta Patra. The creation of Extremostable Protein Database (Manuscript under preparation).

2.1. Introduction

Biological databases are computerized repository of organized and refined biological data with chief objective of easy retrieval of information. There are innumerable biological databases. High emphasis is on sequence and structural data of nucleotide and protein related databases. It is interesting to conclude here, that bulk of biological databases relates to enzymes which are proteins that support all biochemical reactions of life. The reason is well known as the development of biological web databases will provide very useful information and insights for biological systems (Zou et al. 2015). Global databases that have been developed are National Centre for Biotechnology Information (NCBI), Protein Data Bank (PDB) and UniProt. Huge amount of data are still being generated and deposited in such global databases. This necessitates data curation and channelizing it in the form of comprehensive repositories, so that it will be easily accessible and utilized for further knowledge development.

On similar lines large data have been generated for thermostable proteins over the past decades. This is evident from the fact that text query of “thermophile”,

“hyperthermophile” and “thermostable” results in 44 genomes in NCBI and 1280 protein structures in Protein Data Bank. In the direction of data curation related to protein thermodynamics, Gromiha et al. (1999) created the Protherm database for proteins and mutants. The database hosts only thermodynamic data of wild type and mutant proteins that are not exclusively thermostable. It is clear that a database dedicated solely to thermostable proteins ceases to exist until date. Further, in recent years theoretical predictions of protein stability and have led to the accumulation of thermostable mutants. They throw light on the mechanism for thermostability.

However, such knowledge gets masked when they are deposited in global databases like PDB. Therefore, there is a need to curate such data so that a universal protocol for temperature stability can be obtained. This will aid in designing mutants from mesophilic proteins which will enhance protein stability. Such mutants are industrially important as thermostable proteins find use in paper, dairy, detergent and many other industries (Kumar et al. 2000). Additionally databases dedicated to

proteins lack information about their physicochemical properties like hydrogen bonds, salt bridges and ionic interactions. Thus, it is important to host a database which will have all such informations.

In this present chapter the development and integration of a curated database for thermostable proteins has been outlined. The database is a collection of data relevant to thermostable proteins. The database was built on an “entity relationship model” wherein data is arranged in tabulated form and each table is related to one another by primary and foreign keys. This is followed by building a user accessible web platform in HTML and CSS and a user friendly search engine which can be used to browse data in a meaningful format from the database. The immediate application is that these data can be further processed for acquiring knowledge about thermostable proteins. The overall schema has been illustrated in Fig. 2.1.

Fig. 2.1. The schema for thermostable protein database development.

2.2. Methodology

2.2.1. Data collection, database architecture and integration

Sequence and structures of all the available thermostable proteins from UniProt KB and the Protein Data Bank (PDB) with the key words search: “thermostable”, “thermophilic”

and “hyperthermophilic” were collected. As the main motive of this research was to correlate sequence and structural features to protein thermostability, sequences that do not have their crystallized structures in PDB were excluded from this study. Mutant structures that have been engineered with an increase or decrease in thermostability were also collected from the PDB and the Protherm database. Redundant information was discarded. The sequences were collected in FASTA format and the structures in .pdb format. Information regarding their temperature stability, source organism, the optimal growth temperature of the source organisms, phylogenetic classification, enzyme classification number (E.C No.), literature publications and patents were also collected from literature. Finally with all these information in hand, a thermostable protein structural database was created using MySQL, APACHE and PHP platforms. A web interface was created using Dreamweaver.

Servers and softwares used

Dreamweaver and XAMPP, PROTPARAM, COPID, MEME suite, VADAR, Intra Protein calculator (IPI) program (python) and PROMOTIF.

2.2.2. Data analysis