Protein synthesis - Review of some fundamentals in genomics

1.4 Review of some fundamentals in genomics

1.4.2 Protein synthesis

A protein is a complex biomolecule that consists of a long chain of amino acids. The amino acids are linked to each other by strong covalent bonding called peptide bonds, and the amino acid chain is also known as a polypeptide. There are 20 different kinds of amino acids in proteins, where each amino acid has a different side-chain. Therefore, a protein can be conveniently represented as a sequence of amino acids, where each of the 20 distinct amino acids is denoted by a 3-letter code or an 1-letter code. For example, the amino acidalanineis denoted by ‘Ala’ or ‘A,’ andcysteineis denoted by ‘Cys’ or ‘C.’

Proteins are involved in every single biological process in all cells, hence playing a crucial role in all living organisms. The information that is needed for encoding proteins is stored in the DNA.

Portions in the DNA that contain the information for producing proteins are calledprotein-coding genes, or often simplygenes.² Each gene in the DNA is first copied into an RNA molecule (transcription), which is then used to produce proteins (translation). Therefore, it can be said that the genetic information flows from DNA to RNA to protein. This basic principle is typically called thecentral dogmaof molecular biology [1], and it explains how the genetic instructions contained in the DNA are used to synthesize RNAs and proteins. Figure 1.9 illustrates this principle in a simple diagram.

The main steps in a typical protein synthesis process are shown in Figure 1.10. Each step in the process is discussed in the following subsections.

2Note that there exist alsoncRNA (noncoding RNA) genes, which are portions of DNA that give rise to functional RNAs that are not translated into proteins.

DNA

RNA

Protein

RNA synthesis (transcription)

Protein synthesis (translation)

Figure 1.9: The central dogma of molecular biology states that the genetic information flows from DNA to RNA to protein.

1.4.2.1 Transcription

The process of copying the content of a gene into an RNA is calledtranscription. The transcription process is carried out by an enzyme calledRNA polymerase, where anenzymeis a protein that cat- alyzes a specific chemical reaction. Initially, the RNA polymerase binds to a special region in the DNA called thepromoter, which is located upstream of a gene and is used to designate the starting point of the transcription process. During transcription, the RNA polymerase uses one strand of the DNA (called thetemplate strand) to copy the content into an RNA molecule. While copying the content from DNA to RNA, a thymine (T) in the original DNA sequence is replaced by a uracil (U) in the RNA that is being synthesized. The resulting transcript of a protein-coding gene is called a pre-mRNA(pre-messenger RNA).

Living organisms can be categorized into two types, namely,prokaryotesandeukaryotes. Prokary- otes are simple organisms (mostly unicellular) that do not have a cell nucleus. Bacteria are com- mon examples of prokaryotes. On the other hand, eukaryotes are organisms that have complex cells with membrane-bound nuclei. Most of them are multicellular, and higher organisms such as worms, plants, insects and mammals belong to eukaryotes. Most protein-coding genes in eukaryotes consist of two types of regions called exons and introns (see Figure 1.10).³ The introns are removed from the pre-mRNA and the remaining exons are concatenated to form amRNA(messenger RNA). This process is calledsplicing. Sometimes, one pre-mRNA gives rise to multiple mRNAs

3The protein-coding genes of prokaryotes do not have introns.

Exon 1 Intron Exon 2 Intron Exon 3

5’ UTR 3’ UTR

Gene A Gene B Gene C

DNA

Pre-mRNA

mRNA

Exon 1 Exon 2 Exon 1 Exon 3

Protein

Protein 1 Protein 2

(a)

(b)

(c)

(d)

mRNA 1 mRNA 2

Transcription

Splicing

Translation

Figure 1.10: Illustration of a typical protein synthesis process.

UUU : Phenylalanine UUC : Phenylalanine UUA : Leucine UUG : Leucine

CUU : Leucine CUC : Leucine CUA : Leucine CUG : Leucine

AUU : Isoleucine AUC : Isoleucine AUA : Isoleucine AUG : Methionine, Start

GUU : Valine GUC : Valine GUA : Valine GUG : Valine

UCU : Serine UCC : Serine UCA : Serine UCG : Serine

CCU : Proline CCC : Proline CCA : Proline CCG : Proline

ACU : Threonine ACC : Threonine ACA : Threonine ACG : Threonine

GCU : Alanine GCC : Alanine GCA : Alanine GCG : Alanine

UAU : Tyrosine UAC : Tyrosine UAA : Stop UAG : Stop

CAU : Histidine CAC : Histidine CAA : Glutamine CAG : Glutamine

AAU : Asparagine AAC : Asparagine AAA : Lysine AAG : Lysine

GAU : Aspartic acid GAC : Aspartic acid GAA : Glutamic acid GAG : Glutamic acid

UGU : Cysteine UGC : Cysteine UGA : Stop UGG : Tryptophan

CGU : Arginine CGC : Arginine CGA : Arginine CGG : Arginine

AGU : Serine AGC : Serine AGA : Arginine AGG : Arginine

GGU : Glycine GGC : Glycine GGA : Glycine GGG : Glycine

Figure 1.11: The genetic code.

by combining different exons. This phenomenon is calledalternative splicing, and it is widely observed in eukaryotes.

1.4.2.2 Translation

During thetranslationprocess, the mRNA that was transcribed from DNA is decoded by the ribo- some andtRNAs(transfer RNA) to generate a polypeptide (or a protein). A polypeptide is a long sequence of amino acids that are interconnected via peptide bonds. The translation of mRNAs into proteins is governed by thegenetic codethat maps each of the 64codons(triplets of nucleotides) into one of the 20 different amino acids. Figure 1.11 shows the genetic code that holds true for most genes in the vast majority of organisms. However, deviations from the standard code shown in Figure 1.11 are also widespread. For example, in several human mitochondrial mRNAs, the triplet

‘UGA’ was observed to code a tryptophan instead of serving as a stop codon [11].

For a comprehensive introduction to genomics and cell biology, see [1, 11].

(a) ^5’ Â ^C ^G Â Â Â ^C ^G Û ^C ^C Â Â Â ^G ^C Û Û ^G ^3’

5’

A A

C G

G C

U A

3’

A C

U A

G C C

(b)

A A

A 5’ U U C G

A G C U C G 3’

G C U A

5’ U U C G A A A G C U C G A A A A G G C U 3’

stem loop

stem-loops

pseudoknot

Figure 1.12: Two examples of RNAs with secondary structures. The primary sequence of each RNA is shown along with its structure after folding. The dashed lines indicate interactions between bases. (a) RNA with two stem-loops. (b) RNA with a pseudoknot.

Dalam dokumen Signal Processing Methods for Genomic Sequence Analysis (Halaman 32-36)