Evolution of the genetic code: the nonsense, antisense, and
antinonsense codes make no sense
G. Houen *
Department of Protein Chemistry,Statens Serum Institut,Artilleri6ej5,DK-2300Copenhagen S,Denmark
Received 16 April 1999; received in revised form 26 May 1999; accepted 6 July 1999
Abstract
According to the molecular recognition theory, the complementarity of the sense and nonsense DNA strands is reflected in a complementarity of polypeptides and the corresponding nonsense polypeptides. A comparison of the sense and nonsense code matrices, and of the antisense and antinonsense code matrices, either by visual inspection or by comparing the corresponding hydrophobicity matrices (e.g. by simply adding them together), revealed no complementarity of these pairs of matrices in terms of possible attractive physical forces. Instead, it was evident that the codes divide the amino acids into two major groups: hydrophilic and hydrophobic, a division which is directly correlated with the folding property of proteins. A simple primordial genetic code distinguishing between these two types of amino acids would have been capable of generating three-dimensionally folded peptides, which could stabilize coding RNAs by forming ribonucleoprotein complexes. This evolutionary scheme is reflected in the present organisation of information processing and storage in essentially all organisms. RNAs are processed and translated into proteins by ribonucleoproteins, while other steps in information retrieval and processing, such as DNA replication, transcription, protein folding and posttranslational processing, are catalyzed by proteins. This shows that the evolution of DNA as an information storage medium was a secondary event, unrelated to the evolution of the genetic code. From the primordial hydrophilic/hydrophobic (f.ex. Leu/Arg) code, evolution proceeded by introduc-tion of a catalytic amino acid (Ser). The further evoluintroduc-tion of the code has mainly served to increase the number of functional hydrophilic amino acids, since there has not been a great advantage in increasing the number of structural, hydrophobic amino acids. At some stage during the evolution of the genetic code, double-stranded DNA was introduced as a maximally safe genetic copy of RNA. This required the action of highly specific enzymes, and was therefore preceded by the refinement of the genetic code. As a conclusion of this evolutionary scheme, it can be inferred that, in general only the sense strand encodes proteins. © 1999 Elsevier Science Ireland Ltd. All rights reserved.
Keywords:Genetic code; Nonsense; Antisense
www.elsevier.com/locate/biosystems
1. Introduction
The diversity of living organisms reflects the complexity of the earth as an open nonequi-* Tel.: +45-3268-3276; fax:+45-3268-3149.
E-mail address:[email protected] (G. Houen)
librium thermodynamic system (Nicolis and Pri-gogine, 1977), in which a complex phylogenetic evolution has taken place. Despite this complexity all living organisms use essentially the same ge-netic code for storing and retrieving information, and therefore this code must have evolved at a relatively early stage of evolution and then re-mained unchanged after its ‘perfection’ (Crick et al., 1961; Crick, 1968; Orgel, 1968; Eigen et al., 1989; Osawa et al., 1992).
Crick (1968) has discussed two fundamentally different theories of genetic code evolution: 1, the ‘frozen accident theory’, postulating that amino acids became linked to codons purely by chance; and 2, the ‘stereochemical principle’, which as-sumes that stereochemical constraints guided the association of amino acids with codons. Wong (1975) has proposed a co-evolution theory of the genetic code which postulates that ‘the codon system is primarily an imprint of the prebiotic pathways of amino acid formation’.
In addition to these theories, a theory has been put forward about a possible complementarity between the putative protein products of the two complementary strands of DNA: the molecular recognition theory (Blalock and Smith, 1984; Blalock and Bost, 1986; Zull and Smith, 1990).
According to this theory, the complementarity between the sense and nonsense strands of DNA is revealed in a complementarity between for ex-ample receptors and peptide hormones (Bost et al., 1985a,b; Blalock and Bost, 1986; Carr et al., 1986; Weigent et al., 1986; Brentani, 1988; Brentani et al., 1988; Baranyi et al., 1995) and a high affinity of peptides for their antisense pep-tides (Shai et al., 1987; Fassina et al., 1989; Pasqualini et al., 1989; Shai et al., 1989). While this theory may hold for some peptides and anti-sense peptides, it can be shown to have no struc-tural basis in the genetic code. Instead, the genetic code evolved as a maximally effective information transfer system, based on RNA, amino acids and DNA.
2. The sense, antisense, nonsense and antinonsense genetic codes
From the DNA double helix three different transcription reading frames can be defined in addition to the genetic code. These four codes are logical transformation of each other as shown below.
A: The genetic code (the sense code)
trans-lates the sense strand into protein by
read-ing the codons in 5%– 3%direction and using
the genetic code matrix (Table 1).
B: The antisense code is obtained by reading
codons in the opposite (3%– 5%) direction
(Table 2).
C: The nonsense code is obtained by reading
complementary codons in the 5%– 3%
direc-tion (Table 3).
D: The antinonsense code is obtained by
reading complementary codons in the 3%–
5% direction (Table 4).
The genetic code (Table 1) groups the amino acids into two major groups: hydrophobic (first column) and hydrophilic (columns 2 – 4). This grouping can be realized either by visual inspec-tion of the amino acids or by constructing a hydrophobicity matrix from the genetic code
ma-trix using any set of hydrophobicity/
hydrophilic-ity values (Table 5). The first column only contains amino acids with very hydrophobic side Table 1
The genetic code matrix
A
Ile Thr Lys Arg
Table 2
The antisense genetic code matrix
C A G
chains (Cys and Tyr) or a high dipole moment (Trp).
Thus with only minor exceptions the genetic code divides the amino acids into two major groups (hydrophobic and hydrophilic), a basic pattern which has a direct correlation with the folding pattern of proteins: a hydrophobic core
Table 4
The antinonsense genetic code matrix
U C A G
The nonsense genetic code matrix
U C A G
His Arg Leu Pro
Table 5
The genetic code hydrophobicityamatrix
−9.2 6.5 −1.9 1.4
aHPLC derived hydrophobicity values (Parker et al., 1986).
Table 6
The nonsense genetic code hydrophobicity matrix.
5.7 4.2 8.0 5.2
corresponding amino acid side chains. On the contrary, the different codes pair very different side chains with each other, e.g. in first column rows 5 – 8:
sense nonsense antisense Antinonsense
Phe
Some ‘complementarity’ is observed when com-paring the code with the antisense code, and when comparing the nonsense code with the antinon-sense code due to the fact that residues only change positions within columns. This is, how-ever, only a mathematical property of the genetic code and reflects its general property of a maxi-mally safe and effective information transfer system.
Another way of analyzing possible relations between the sense and nonsense codes is by com-paring the sense genetic code hydrophobicity ma-trix (Table 5) with the nonsense genetic code hydrophobicity matrix (Table 6). When these ma-trices are added, a matrix is obtained which de-scribes possible physical interactions between amino acid residues and the corresponding non-sense residues (Table 7). Table 7, however, shows no signs of complementarity between sense and nonsense residues. Hydrophobic residues would be expected to interact primarily with hydropho-bic nonsense residues, and this should give a more negative sum. Hydrophilic residues should
inter-act preferentially with hydrophilic nonsense
residues by hydrogen bonds and ionic interac-tions, thus giving rise to a more positive sum. None of this is observed, but a more or less random appearing sum matrix is obtained.
3. The primordial genetic code
Several reviews summarizing current knowledge of codon specificities have been published, and many authors have integrated this knowledge with different theories of genetic code evolution (Crick, 1968; Jukes, 1973, 1978; Wong, 1975, Table 7
Sum of genetic code hydrophobicity matrix and nonsense code hydrophobicity matrix
surrounded by a hydrophilic surface (Bajaj and Blundell, 1984).
1988; Kocherlakota and Acland, 1982; Macchiato and Tramontano, 1982; Soto and Toha, 1985; Cedergren et al., 1986; Figureau, 1987, 1989; Os-awa and Jukes, 1988, 1989; Lehmann and Jukes, 1988; Di Guilio, 1989a,b; Osawa et al., 1992; Baumann and Oro, 1993).
Presumably, the code evolved from a primitive form, but no matter how the code reached its present form the grouping of the amino acids into two major groups cannot be accidental, but rather reflects an important property of the genetic code: U as second base determines that a codon will code for a very hydrophobic amino acid. With this notion a simple genetic alphabet can be constructed:
U C/A/G
Hydrophilic N Hydrophobic
N (U/C/A/G)
This simple code contains the central property of a protein folding code: the ability to discrimi-nate between a structural, hydrophobic amino acid which tends to be in the interior of a protein, and a functional, hydrophilic amino acid, which tends to be at the surface of a protein. This
self-folding property of proteins, sometimes
named ‘the second alphabet’ of the genetic code (Jaenicke, 1987; Levitt, 1991), is strongly con-served in the genetic code. Mutations at position 3 result in an identical or very similar amino acid while mutations at position 1 result in a similar amino acid.
In principle, the simple code described above could have generated primitive folded proteins (Brack and Orgel, 1975), which catalyzed evolu-tion of the code by stabilizing some RNAs rela-tive to others.
The instability of RNA, which is today a major problem in studying many processes in living cells, favoured evolution of the protective action of proteins, and the limited catalytic capabilities of RNA favoured the evolution of protein en-zymes. This process required the establishment of
a genetic code for reading the encoded
information.
At this stage it is difficult to envisage stero-chemical constrains on the association of RNAs with amino acids and it seems more likely that a
few amino acids were selected by availability and by their ability to catalyze chemical evolution. From this point, evolution proceeded by increas-ing the number of amino acids and the complexity of the protein synthetic machinery.
4. Refining the genetic code
In the present genetic code only three amino acids have six codons: Leu, Ser and Arg. These three amino acids makes a set containing all properties for protein hydrophobic core structure (Leu), interaction with nucleic acids (Arg) and catalysis of chemical reactions (Ser). Ser and Arg are further related by their coexistence in the fourth column rows 5 – 8. The primitive code could therefore have been:
C/A/G
U
N L R/S N
Arg was possibly recruited earlier than Ser, due to its ability to interact with and stabilize nucleic acids by ionic forces.
Since the third position of the codons is highly redundant, it is likely that the next step was the ability of the first base to discriminate between Ser and Arg. The ability of U or C as first base to discriminate between Ser and Arg indicates that C was the second primordial base, and a possible step in the evolution of the code could have been:
C
This code has the ability to distinguish between a structural hydrophilic amino acid and a cata-lytic hydrophilic amino acid.
The further evolution of the code must have been an intimate interplay between proteins and RNA exploring all possibilities and resulting in the present amino acids, and the start and stop codons.
Codons in first row column 2 – 4 code for hy-drophilic amino acids with catalytic properties, whereas the same columns in row two code for structural amino acids. It can also be seen that the introduction of A and G added the possibility of discriminating between uncharged (column 2) and
charged (columns 3 and four) as well as start/
stop, Asp/Glu, etc. This evolutionary scheme is
reflected in the present code since at places where the third position is important U and C code for the same amino acid and A and G code for the same amino acid, eg. Tyr-stop, Cys-stop, His-Gln, Asn-Lys, Asp-Glu, Ser-Arg.
A possible stage in evolution could thus have been:
This relatively simple code contains all informa-tion for start, stop, structure and funcinforma-tion.
Later in evolution Cys and Trp were introduced at the expense of the number of stop codons.
Since the hydrophobic amino acids have struc-tural functions, there has not been a great advan-tage in enlargement of the number of different hydrophobic amino acids, a notion supported by the relatively low number of codons for them. The increase in the number of bases from 2 to 4 has mainly served to increase the number of func-tional amino acids.
5. Discussion
Inspection of the genetic code and its derived hydrophobicity matrix reveals that the code di-vides the amino acids into two major groups with hydrophobic and hydrophilic residues. This divi-sion is directly related to the folding property of proteins, which have a hydrophobic core and a hydrophilic surface.
On the basis of this fundamental division, it seems likely that the genetic code evolved from a primitive code, which only discriminated between
a hydrophobic and a hydrophilic residue, most likely Leu and Arg. From this primordial code, evolution proceeded by introduction of a func-tional catalytic residue (Ser), and by a further
limited increase in structural hydrophobic
residues, and a larger increase in functional hy-drophilic residues.
Comparison of the present genetic code and the nonsense code reveals no complementarity in terms of possible attractive physical forces, and it can be concluded, that in general only the sense strand encodes functional proteins.
Some nonsense messages do code for proteins, but this is only rarely found and may reflect the early existence of double-stranded RNA. For ex-ample, the nonsense strand of the erbA locus has been found to encode an erbA homologue with
altered T3 binding capacity (Miyajima et al.,
1989).
The conclusion reached above is consistent with the notion, that RNA was the primordial au-toreplicative genetic material.
Transcription of DNA and the folding and modification of proteins are catalyzed by en-zymes, while the translation and maturation of RNA are catalyzed by ribonucleoproteins. This organization reflects the evolution of the genetic code, also implying that ribonucleoproteins cata-lyzed early evolutionary steps.
The evolution of DNA as a more stable copy of the information contained in RNA clearly had a major evolutionary advantage, as all cells use DNA in this way.
If it is assumed that DNA evolved as a copy of RNA by the loss of a hydroxyl group, it also follows that the genetic code was established solely by RNA – protein interactions.
References
Bajaj, M., Blundell, T., 1984. Evolution and the tertiary structure of proteins. Ann. Rev. Biophys. Bioeng. 13, 453 – 492.
Baranyi, L., Campbell, W., Ohshima, K., Fujimoto, S., Boros, M., Okada, H., 1995. The antisense homology box: a new motif within proteins that encode biologically active pep-tides. Nat. Med. 1, 894 – 901.
Baumann, U., Oro, J., 1993. Three stages in the evolution of the genetic code. Biosystems 29, 133 – 141.
Blalock, J.E., Bost, K.L., 1986. Binding of peptides that are specified by complementary RNAs. Biochem. J. 234, 679 – 683.
Blalock, J.E., Smith, E.M., 1984. Hydropathic anti-comple-mentarity of amino acids based on the genetic code. Biochem. Biophys. Res. Commun. 121, 203 – 207. Bost, K.L., Smith, E.M., Blalock, J.E., 1985a. Similarity
be-tween the corticotropin (ACTH) receptor and a peptide encoded by an RNA that is complementary to ACTH mRNA. Proc. Natl. Acad. Sci. USA 82, 1372 – 1375. Bost, K.L., Smith, E.M., Blalock, J.E., 1985b. Regions of
complemetarity between the messenger RNAs for epider-mal growth factor, transferrin, interleukin-2 and their re-spective receptors. Biochem. Biophys. Res. Commun. 128, 1373 – 1380.
Brack, A., Orgel, L.E., 1975. b-Structures of alternating polypeptides and their possible prebiotic significance. Na-ture 256, 383 – 387.
Brentani, R.R., 1988. Biological implications of complemen-tary hydropathy of amino acids. J. Theor. Biol. 135, 495 – 499.
Brentani, R.R., Ribeiro, S.F., Potocnjak, P., Pasqualina, R., Lopes, J.D., Nakaie, C.R., 1988. Characterization of the cellular receptor for fibronectin through a hydropathic complementarity approach. Proc. Natl. Acad. Sci. USA 85, 364 – 367.
Carr, D.J.J., Bost, K.L., Blalock, J.E., 1986. An antibody to a peptide specified by an RNA that is complementary to endorphin mRNA recognizes an opiate receptor. J. Neu-roimmunol. 12, 329 – 337.
Cedergren, R., Grosjeau, H., Larue, B., 1986. Primordial reading of genetic information. Biosystems 19, 259 – 266. Crick, F.H.C., 1968. The origin of the genetic code. J. Mol.
Biol. 38, 367 – 379.
Crick, F.H.C., Barnett, L., Brenner, S., Watts-Tobin, R.J., 1961. General nature of the genetic code for proteins. Nature 192, 1227 – 1232.
Darnell, J.E., Doolittle, W.F., 1986. Speculations on the early course of evolution. Proc. Natl. Acad. Sci. USA 83, 1271 – 1275.
Di Guilio, M., 1989a. Some aspects of the organization and evolution of the genetic code. J. Mol. Evol. 29, 191 – 201. Di Guilio, M., 1989b. The extension reached by the minimiza-tion of the polarity distances during the evoluminimiza-tion of the genetic code. J. Mol. Evol. 29, 288 – 293.
Doudna, J.A., Szostak, J.W., 1989. RNA-catalyzed synthesis of complementary-strand RNA. Nature 339, 519 – 522. Eigen, M., Lindemann, B.F., Tietze, M., Winkler-Oswatitsch,
R., Dress, A., von Haesler, A., 1989. How old is the genetic code? Statistical geometry of tRNA provides an answer. Science 244, 673 – 679.
Ekland, E.H., Bartel, D.P., 1996. RNA-catalyzed RNA poly-merization using nucleoside triphosphates. Nature 382, 373 – 376.
Ekland, E.C., Szostak, J.W., Bartel, D.P., 1995. Structurally complex and highly active RNA ligases derived from ran-dom RNA sequences. Science 269, 364 – 370.
Fassina, G., Roller, P.P., Olson, A.D., Thorgeirsson, S.S., Omichinski, J.G., 1989. Recognition properties of peptides hydropathically complementary to residues 356 – 375 of the c-rafprotein. J. Biol. Chem. 264, 11252 – 11257.
Figureau, A., 1987. Information theory and the genetic code. Orig. Life 17, 439 – 449.
Figureau, A., 1989. Optimization and the genetic code. Orig. Life Evol. Biosph. 19, 57 – 67.
Jaenicke, R., 1987. Folding and association of proteins. Prog. Biophys. Mol. Biol. 49, 117 – 237.
Jukes, T.H., 1973. Possibilities for the evolution of the genetic code from a preceding form. Nature 246, 22 – 26. Jukes, T.H., 1978. The amino acid code. Adv. Enzymol. 47,
375 – 432.
Kocherlakota, R.R., Acland, N.D., 1982. Ambiguity and the evolution of the genetic code. Orig. Life 12, 71 – 80. Kuhn, H., 1972. Angew Chem. Int. Ed. 11, 798 – 820. Lehmann, N., Jukes, T.H., 1988. Genetic code development by
stop codon takeover. J. Theor. Biol. 135, 203 – 214. Levitt, M., 1991. Protein folding. Curr. Opin. Struct. Biol. 1,
224 – 229.
Macchiato, M.F., Tramontano, A., 1982. Thermodynamic ap-proach to a possible theory of the evolution of a genetic code. Zeitschr. Naturforsch. 37c, 1031 – 1037.
erbA homologs encoding proteins with different T3 bind-ing capacities are transcribed from opposite DNA strands of the same locus. Cell 57, 31 – 39.
Nicolis, G., Prigogine, I., 1977. Self-Organization in Nonequi-librium Systems. Wiley, New York.
Orgel, L.E., 1968. Evolution of the genetic apparatus. J. Mol. Biol. 38, 381 – 393.
Orgel, L.E., 1972. A possible step in the origin of the genetic code. Isr. J. Chem. 10, 287 – 292.
Osawa, S., Jukes, T.H., 1988. Evolution of the genetic code as affected by anticodon content. Trends Genet. 4, 191 – 198. Osawa, S., Jukes, T.H., 1989. Codon reassignment (codon
capture) in evolution. J. Mol. Evol. 28, 271 – 278. Osawa, S., Jukes, T.H., Watanabe, K., Muto, A., 1992. Recent
evidence for evolution of the genetic code. Microbiol. Rev. 56, 229 – 264.
Parker, J.M.R., Guo, D., Hodges, R.S., 1986. New hy-drophilicity scale derived from high performance liquid chromatography peptide retention data: correlation of pre-dicted surface residues with antigenicity and X-ray-derived accessible sites. Biochemistry 25, 5425 – 5432.
Pasqualini, R., Chamone, D.F., Brentani, R.R., 1989. Deter-mination of the putative binding site for fibronectin on platelet glycoprotein IIb – IIIa complex through a hy-dropathic complementarity approach. J. Biol. Chem. 264, 14566 – 14570.
Shai, Y., Flashner, M., Chaiken, I.M., 1987. Anti-sense pep-tide recognition of sense peppep-tides: direct quantitative char-acterization with the ribonuclease S-peptide system using analytical high-performance affinity chromatography. Bio-chemistry 26, 669 – 675.
Shai, Y., Brunck, T.K., Chaiken, I.M., 1989. Antisense pep-tide recognition of sense peppep-tides: sequence simplification and evaluation of forces underlying the interaction. Bio-chemistry 28, 8804 – 8811.
Soto, M.A., Toha, C.J., 1985. A hardware interpretation of the evolution of the genetic code. Biosystems 18, 209 – 215.
Weigent, D.A., Hoeprich, P.D., Bost, K.L., Brunck, T.K., Reiher, W.E., Blalock, J.E., 1986. The HTLV-III en-velope protein contains a hexapeptide homologous to a region of interleukin-2 that binds to the interleukin-2 receptor. Biochem. Biophys. Res. Commun. 139, 367 – 374.
Wong, J.T.-F., 1975. A co-evolution theory of the genetic code. Proc. Natl. Acad. Sci. USA 72, 1909 – 1912. Wong, J.T.-F., 1988. Evolution of the genetic code. Microbiol.
Sci. 5, 174 – 181.
Zull, J.E., Smith, S.K., 1990. Is genetic code redundancy related to retention of structural information in both DNA strands. Trends Biochem. Sci. 15, 257 – 261.
.