Due to lack of the exact knowledge about the sequence features between coding and non-coding regions, the identification of protein coding regions in DNA sequence has been a challenging issue in bioinformatics. In this chapter, an efficient time-frequency filtering approach is proposed for the identification of coding regions in the DNA sequence. The proposed method employs a multi resolution approach to analyze both the small and large coding regions and it does not depend on a prior window length as in case of Fourier methods. The performance of the proposed method is compared with the existing methods and the results show its superiority in identification of the exon regions.
using a Novel S-transform
based Filtering Approach
Localization of Hot Spots in
Proteins using a Novel S-transform based Filtering Approach
4.1 Introduction
Biological mechanisms of living organisms like metabolism, gene regulatory and interaction networks have put numerous challenges to modern biomolecular research.
In particular, identification and characterization of protein-protein interactions is a burning issue in protein science. Proteins are the basic building blocks of all living organisms and protein-protein interactions are the basis of all biological processes, both inside and outside the cell [53] [54]. The protein is made up of amino acids.
There are twenty amino acids and are represented in a protein sequence as a string of alphabetical symbols with typical lengths ranging from 100 to 10000 [2]. The protein molecules fold beautifully to form a highly specific 3-dimensional shape, which defines their particular biological activities. The 3-D structure of a protein is important because the structure is linked with the biological function. This 3-D shape allows the protein to interact with other molecules known as targets at specific sites which are referred to as active sites of protein [55] [56]. In active sites of protein, there are certain residues (amino acids) that operate as an interface in the binding and recognition between interacting molecules [58] and are termed as hot spots. Basically the target molecules are proteins, DNA stretches or some
Figure 4.1: A schematic view of the hot spots in the complex of human growth hormone and its receptor.The human growth hormone (yellow) bound to the extracellular portion of its homodimeric receptors (grey). Available online at:
doi:10.1371/journal.pcbi.0030119.g001
other small molecules. The search for protein functions provides the identification and characterization of each protein as well as in-depth knowledge regarding their interaction with other proteins and DNA molecules. The protein-protein interaction or hot spot identification provides a base to identify and analyze the drugs, molecular medicines, etc. The identification of protein hot spots and solving the protein structure-function problem [57] is a challenging task for researchers in biology, engineering and computer science. Many protein-interaction networks have been modeled to discover the mechanism of protein complexes, but a deep understanding of this requires the knowledge of interface amino acids that takes action in protein-protein interactions [59]- [61]. A biological experimental technique known as Alanine Scanning Mutagenesis (ASM) has been used to identify the hot
spots [62]- [64]. It uses the measure of the energy contribution of interface amino acids by mutating each amino acid to alanine. There is a little bit ambiguity because a single mutation cannot infer the effort in interaction as the protein structure and its interactions are highly complex to be summed as the features of individual residues.
However, the alanine scanning is considered as a good method of identification of hot spots and also it is widely accepted by many researchers. The alanine is chosen because it eliminates its side chain easily without altering the main chain conformation as the side chain does not directly involve in protein function. It also does not put any extreme electrostatic or steric effects on the main chain conformation.
The protein-target interaction is very specific in nature. The protein binds to the target in an analogous manner as a key fits to the corresponding lock. A schematic view of the interaction of a protein with target through the hot spots is shown in Fig.4.1. As the interaction involves binding of the protein to the target it releases some energy in that process known as binding free energy (∆G). When the interface amino acid is mutated to the alanine, the binding free energy of the mutated protein-target complex is measured. Then the change in the binding energy (∆∆G) before and after the mutation is evaluated. It has been demonstrated that if ∆∆G is more than a threshold (2.0 kcal/mol) by the mutation of an amino acid, then it is considered as a hot spot [62] [65]. This concept has also been accepted by the biologist and used by the researchers. The ASM procedure is very expensive as it involves wet lab experiment which needs variety of chemicals, instruments etc.
It is also time consuming and requires a lot of effort. Hence there is a need of advanced computational techniques to make this task easier in identifying the hot spot locations [58] [77]. The outcome of the computational techniques provides a step to combat the localization problem and avoids the unnecessary mutations in wet lab experiments.