Biological functions of proteins are primarily determined by a model of protein-target interactions known as resonant recognition model (RRM). The RRM is a physico-mathemati-cal approach that interprets protein sequence information using signal analysis methods. According to this model, there is a significant correlation between the spectra of the numerical presentation of amino acid sequences and their
biological activities [78] [79]. To apply suitable signal processing methods for the analysis, the character string of protein need to be converted to a suitable numerical sequence. This is achieved by assigning a numeral to each amino acid that forms the protein.
The assignment of numerical value to each amino acid is based on some physical properties that are relevant to the protein’s biological functioning. A variety of amino acid indices have been reported in literature. Cosic et al. [80] have demonstrated that the best correlation can be obtained with the parameters, that are related to the energy of delocalized electrons of each amino acid which have strongest impact on the electronic distribution of the whole protein. An effective way of assigning the numerical value is the electron-ion-interaction potential (EIIP). The EIIP is defined as the average energy of delocalized electrons of the amino acid which can be evaluated by the pseudo potential model reported in [80]. The EIIP values for the 20 amino acids are listed in Table 4.1. Hence the primary sequence of protein can be converted to the numerical sequence by replacing each amino acid by the corresponding EIIP values. Veljcovic et al. [81] have reported that the
Table 4.1: EIIP values of the 20 amino acids
Amino acid EIIP Amino acid EIIP
Leucine (Leu) 0.0000 Trypsin(Try) 0.0516 Isoleucine(Ile) 0.0000 Tryptophan(Trp) 0.0548 Asparagine(Asn) 0.0036 Glutamine(Gln) 0.0761 Glycine(Gly) 0.0050 Methionine(Met) 0.0823 Valine(Val) 0.0057 Serine(Ser) 0.0829 Glutamic acid(Glu) 0.0058 Cystrine(Cys) 0.0829 Proline(Pro) 0.0198 Threonine(Thr) 0.0941 Histidine(His) 0.0242 Phenylalanine(Phe) 0.0946 Lysine(Lys) 0.0371 Arginine(Arg) 0.0959 Alanine(Ala) 0.0373 Asparatic acid(Asp) 0.1263
Fourier spectral analysis of the EIIP sequence of the protein has strong relevance to its functional activity. All the proteins belonging to a functional family share
a common spectral component, which characterizes a particular function of the group. This component is defined as the characteristic frequency of the functional group. In protein-target interaction both the protein and target share the same characteristic frequency, but are opposite in phase. This is believed to provide a resonant recognition in the binding process and hence the mechanism termed as RRM. The analysis of the function of protein using RRM is generally performed in two stages. First the symbolic sequence of the protein is converted into the numerical sequence using the EIIP values. Then the discrete Fourier transform of the proteins is computed to evaluate the consensus spectrum. It has been observed that the spectra of a family of protein sequences sharing a common frequency show a peak in the cross-spectrum function [78] [80]. The common characteristic frequency of a functional group of K proteins can be computed by the cross-spectral function defined in (4.1).
S(w) =|X1(w)| |X2(w)| |X3(w)| · · · |XK(w)| (4.1) where X1(w)X2(w)· · ·XK(w) are the DFTs corresponding to the K proteins. The product of the amplitude spectra of the protein sequences as in Eq. (4.1) of a functional group is referred to as the consensus spectrum. Peak frequencies in the consensus spectrum denote the characteristic frequencies for all the proteins analyzed. It has been demonstrated that if a group of proteins has only one common function, then the consensus spectrum has one significant peak. If a protein performs more than one function, then each function corresponds to a unique characteristic frequency in the cross spectra. The numerical sequence (basic bovine and acidic bovine) and the consensus spectrum of the group of fibroblast growth factors (FGF) are shown in Figs. 4.2 and 4.3, respectively. This constitutes a family of proteins that affect the growth, differentiation and survival of certain cell. This particular function of the family is clearly shown as a peak at the normalized frequency of 0.90 in the consensus spectrum.
The RRM characteristic frequency in the consensus spectrum corresponds to a particular biological function of the family of proteins. Therefore, determination
0 50 100 150 0
0.05 0.1 0.15 0.2
Base location
Amplitude
0 50 100 150
0 0.05 0.1 0.15 0.2
Base location
Amplitude
0 0.5 1
0 0.2 0.4 0.6 0.8 1
Frequency
Amplitude
0 0.5 1
0 0.5 1 1.5
Frequency
Amplitude
Figure 4.2: The numerical sequence and corresponding DFTs of the basic bovine (left) and acidic bovine (right)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Frequency
Amplitude
Figure 4.3: The consensus spectrum of the FGF family. The peak corresponds to the characteristic frequency relevant to a certain biological function
of the characteristic frequency enables identification of the individual amino acids i.e. the hot spots that contribute to it. A simple procedure has been adopted to identify the hot spots by altering the amplitude of the Fourier coefficients corresponding to the characteristic frequencies. It determines those amino acids which are most affected by the changes in the amplitude that belong to the characteristic frequency. The difficulty in this method is that a change in one Fourier coefficient affects all the samples in the numerical sequence of the protein, thereby provides an unreliable result. As the spectrum of the protein contains more frequency components along with the characteristic frequency, it confirms that the characteristics of the signal changes throughout the samples i.e. non-stationary in nature. A joint time-frequency analysis is needed for analyzing the change of the characteristic frequency in this case. This issue is resolved in this chapter by using the S-transform which is a better candidate for time-frequency analysis. Therefore a new method of time-frequency filtering using the S-transform has been proposed as a promising method to identify the amino acids (hot spots) corresponding to the characteristic frequencies.