130
CHAPTER 4 NUCLEIC ACIDSCohen and Boyer also devised a technique for high-frequency transformation, so that recombinant DNA molecules could be introduced into living bacterial cells. This originally involved treat-ing bacteria with calcium chloride followed by heat shock, but this has been supplanted by more efficient methods, notably
electropo-ration (application of an electric charge). The vector is a DNA molecule into which the gene to be cloned is inserted. As schema-tized in Figure 4B.2, the vector is a plasmid (small circular DNA) containing a gene that specifies resistance to a particular antibiotic, such as ampicillin. The plasmid contains just one site for cleavage by the restriction enzyme to be used. Both the plasmid and a DNA molecule or chromosome containing the gene of interest are cleaved with the same restriction enzyme, such as EcoRI.
Annealing, or renaturation, followed by enzymatic ligation, yields recombinant DNA molecules in which the gene of interest has been spliced into the vector. Note that the vector need not be a plasmid. Any DNA capable of independent replication within a cell, such as a virus genome, can serve as a vector.
After annealing and ligation, the DNA, containing a mixture of recombinant and nonrecombinant DNAs, is introduced into recipient bacteria by transformation. Any transformed bacteria can be identified by plating in the presence of ampicillin. Only those bacteria that have been transformed will grow because they now contain the antibiotic resistance gene. The investigator must then carry out additional experiments to establish that a particu-lar transformed cell contains the gene being cloned. A variety of methods are available, including additional antibiotic-resistance screens, DNA-DNA renaturation reactions, or analysis of expres-sion of the cloned gene with an antibody or activity assay.
Figure 4B.3 shows the structure of pBR322, a plasmid engi-neered for gene cloning in 1977 and still in occasional use 1. Foreign DNA
to be inserted Gene of interest
Ligation
Introduction into host cell Host cell
DNA
Selection for cells containing recombinant DNA molecules by growth in the presence of antibiotic 2. Vector:
plasmid pBR322
3. Marker:
Antibiotic resistance gene
Recombinant DNA molecule
+
FIGURE 4B.2
Cloning a fragment of DNA into a plasmid vector and introducing the recombinant molecule into bacteria.
Direction of transcription
pBR322 (4362 bp)
Replication origin
DNA fragment to be cloned
ampR gene remains intact
tetR gene is split by the insertion of DNA fragment
Aval Sa /l Sa /l BamHI EcoRV
PvuII EcoRI Pvu l
Pst l
ampR tetR
HindIII
FIGURE 4B.3
pBR322, one of the earliest cloning vectors. Some of the restriction sites are shown, as well as the direction of transcription of the ampicillin and tetracycline resistance genes. The bottom diagram shows the effect of cloning a novel sequence into the HindIII site.
MANIPULATING DNA
131
today. This small recombinant DNA molecule, 4632 base pairs in length, contains two antibiotic resistance genes, one for ampicillin resistance and one for tetracycline resistance Note that each of these genes contains restriction cleav-age sites within its sequence. A site for the restriction enzyme HindIII lies within the sequence. Hence, if this site is used for cloning, the gene is split and, therefore, inactive.
Because the gene is intact, all bacteria that have acquired a plasmid, whether recombinant or not, can be selected on the basis of their resistance to ampicillin. But now, recombinant plasmids can be identified because these bacteria are sensitive to tetracycline as well as ampicillin-resistant, whereas bacteria that acquired the original plasmid, without an insert, are resist-ant to both drugs.
Many variations have been devised on this original approach.
Blunt-ended DNA molecules, containing no sticky ends, can now be cloned. Many cloning vectors have been devised, usually as recombinant DNA molecules themselves. One such vector, bacte-riophage M13, is particularly useful because the recombinant DNA can be isolated as a single-stranded molecule, readily amenable to DNA sequence analysis (page 1039). Particularly widely used are expression vectors, in which signals to regulate and activate high-level expression of the cloned gene are built into the vector. Other modifications include sequences that aid in purification of the recombinant protein (see Figure 5A.1). In the biotechnology industry, techniques such as these led to the intro-duction as early as 1982 of human insulin as a recombinant gene product, used for diabetes treatment. Other recombinant proteins later approved for clinical use include blood-clotting factors, clot-dissolving enzymes used to treat heart attack victims, pituitary growth hormone, and interferons.
Automated Oligonucleotide Synthesis
In 1976 the British biochemist Fred Sanger introduced dideoxy DNA sequencing, the rapid DNA sequence analysis that led to completion of the 3 billion-base-pair human genome sequence in 2001. In 1978 the Canadian biochemist Michael Smith devised a method for site-directed mutagenesis, which allows an investigator to introduce any desired mutation into a cloned gene. Both researchers received the Nobel Prize for their contributions.
Both dideoxy DNA sequencing and site-directed mutagenesis depend upon the availability of oligodeoxyribonucleotide mole-cules of precisely known sequence to serve as primers in DNA polymerase-catalyzed reactions. The oligonucleotide synthesis process most widely used at present is the phosphoramidite method. This method is popular because it can be carried out in automated fashion, with the oligonucleotide linked to a solid support as nucleotides are added one at a time.
The method is illustrated in Figure 4B.4. Intermediates in this chemical synthesis are nucleoside phosphoramidites, with the phosphorus in the reactive trivalent form. The process begins with attachment of the hydroxyl of the first nucleotide to a sil-ica matrix. The -hydroxyl group is protected with a dimethyl-5¿ 3¿
3¿
ampR tetR
tetR (tetR) .
(ampR)
trityl (DMTr) blocking group. Amino groups on the purine or pyrimidine base are also protected. The next nucleotide is intro-duced as the blocked derivative, with a phosphoramidite group on position . Tetrazole is present, leading to protonation of the diisopropylamine moiety on the incoming nucleotide and facili-tating its loss upon reaction with the unprotected -hydroxyl.
Oxidation with iodine converts the trivalent phosphorus to a phosphotriester. This process is repeated in stepwise fashion until up to 150 nucleotides have been added in a precisely determined sequence. Finally, the blocking groups are removed, and the finished chain is removed from the solid support, followed by chromatographic purification if necessary. Each step proceeds with about 98% efficiency, meaning that a 20-mer, containing 20 nucleotide residues, can be produced at about 80% final yield.
Note that the phosphoramidite synthetic method proceeds from the end of the polymer toward the end, whereas the enzy-matic synthesis of DNA by DNA polymerase proceeds in the opposite direction.
Because of the ease with which oligonucleotides of defined sequence can be synthesized, and because of the regularity of DNA secondary structure, scientists have devised numerous methods for creating defined DNA-based nanostructures. Figure 4B.5 illustrates one example. In this example synthetic oligonucleotides were designed to fold into the form of a perfect tetrahedron. By means such as these, synthetic DNAs have been used to form structures with possible applications in nanotechnology, including gears, tubes, and even mechanical devices.
Dideoxynucleotide Sequence Analysis
On page 110 we mentioned the Maxam-Gilbert method for DNA-sequence analysis, which involves treating DNA with reagents that cleave at specific nucleotides, yielding a population of molecules that can be resolved on the basis of molecular weight by gel electrophoresis. The enzymatic method introduced by Sanger similarly involves electrophoretic analysis of DNA fragments terminated at specific nucleotides, but Sanger’s method allows analysis of longer stretches of DNA, and it lends itself more readily to automation. The method, as originally carried out, uses bacteriophage M13 as a cloning vector for the DNA sequence being analyzed. M13 DNA, as isolated from virus particles, is circular and single-stranded. However, the molecule replicates through a two-stranded intermediate form, called the replicative form, or RF. With introduction of useful restriction cleavage sites, M13 RF becomes a cloning vector comparable to pBR322. As shown in Figure 4B.6, the DNA fragment to be ana-lyzed is cloned into M13 RF. After ligation the recombinant DNA is introduced into E. coli by transformation, and the infected cells are incubated and allowed to produce phage particles. The phage particles are purified, and DNA is isolated from each, as single-stranded DNA containing the sequence for analysis.
Isolated single-stranded circular DNA from phage genomes carrrying the desired insert becomes the template for four DNA polymerase-catalyzed reactions. The primer is an oligonucleotide that is complementary to an M13 sequence lying just to the
5¿
3¿
5¿
3¿
132
CHAPTER 4 NUCLEIC ACIDSside of the insert. Extension of this primer by DNA polymerase copies the insert. The polymerase reactions are run in the pres-ence of deoxyribonucleoside triphosphate analogs, the -dideoxyribonucleoside triphosphates (ddNTPs), which serve as inhibitors of chain extension because they lack hydroxyl ter-mini. The dideoxy analog (ddATP) of deoxyadenosine triphos-phate is shown here.
2',3'-Dideoxyadenosine triphosphate
5'
O
H H
3' 2' 4' 1'
O− O− O−
O O O
Adenine
P O P O P O CH2
HO
3¿
2¿,3¿
3¿ To generate a series of A-terminated fragments, the DNA
poly-merase reactions are run in the presence of equal concentrations of dATP, dCTP, dGTP, and dTTP, plus 1/10th that concentration of ddATP. When T is in the template strand, DNA polymerase occa-sionally inserts ddAMP instead of dAMP. When that happens, DNA replication stops and the fragment is released from the enzyme. Thus, a series of fragments of varying lengths accumu-lates, with a common end (the primer) and variable ends, and with each end identifying a T residue in the insert sequence that is being analyzed. Similarly, sites terminated by C, G, and T are identified simply by running polymerase reactions with the other three dideoxy analogs, one at a time. Inclusion of a radioactive nucleotide in the polymerization mixture and gel electrophoresis followed by radioautography yields four “sequencing ladders,” as shown in Figure 4B.6. Each band in the radioautographic image of the electrophoretic gel identifies one of the four bases at that site.
3¿
3¿ 5¿
O
O O
O O
O
Step 2 Add next residue
Steps 4, 5, and 6:
Remove all blocking groups on bases Remove —CH3from phosphates Cleave finished chain from silica support
Oligonucleotide chain 3⬘ 5⬘
5⬘
Step 1 Remove 3⬘ DMTr
R1
Si
3⬘ residue attached to a silica matrix
*Reactive groups on all bases are blocked by chemical reagents
O O
C (CH2)2 C NH (CH2)3 O Si R1 =
DMTr
Base*1
O P CH3O
N⬘ H
(CH3)2CH CH(CH3)2 Base*2
O
Base*2
CH2 HO
CH2 DMTr
O R1
Si
Repeat steps 1, 2, and 3 until all residues are added Base*1
O Base*1
Step 3 Oxidize
O Base*1
CH2 O
O R1
Si CH3O
O CH2
O DMTr
O Base*2
CH2 O DMTr CH2
O DMTr
CH2 O
O R1
Si CH3O
P P O
FIGURE 4B.4
Solid-phase synthesis of oligonucleotides by the phosphoramidite method.
MANIPULATING DNA
133
As noted above, M13 vectors were originally used because they per-mitted DNA synthesis on single-stranded templates. Modifications of polymerase chain reaction (Tools of Biochemistry 24A) now permit the preparation of single-stranded DNA molecules for sequencing without the need for M13-based vectors.
Sanger sequencing is now done automatically. Each ddNTP is derivatized with a fluorescent dye, each of a different color.
Thus, each fragment has a distinct color based upon the identity of the ddNTP that terminated the sequencing reaction. This allows all four reaction mixtures to be resolved in one lane of a sequencing gel, permitting analysis of far more DNA in one sequencing operation. The gel is scanned fluorometrically, and a computer reads the DNA sequence directly from the resulting pattern of differently colored peaks (Figure 4B.7).
Further refinements of this method have greatly increased its
“throughput,” or the amount of sequencing information derived from one operation. These methods yielded complete genomic sequences for several bacteria in the mid-1990s, and the near-completion of the human genome sequence in 2001. Since then, further modifications have yielded several approaches that greatly expand the speed and accuracy of DNA sequencing operations.
Several of the most prominent of these “second-generation”
sequencing technologies are described in articles cited at the end of this section. As detailed later in this book, enormous amounts of information about health and disease, biological individuality, and evolutionary relationships have come from these developments.
Site-directed Mutagenesis
Analysis of the function of a protein involves altering the struc-ture of the protein and then determining whether and how the biological function of the protein has been altered. Two methods have been used classically. One is to alter certain residues chemi-cally by treatment with protein-modifying reagents. The approach lacks specificity because all residues of a given amino acid may be altered, not just the one or two of special interest.
Another approach is to mutagenize an organism with ultraviolet light, ionizing radiation, or a chemical mutagen and then select for surviving organisms containing the mutation of interest, fol-lowed by isolation of the mutant protein. The problem with this approach is the difficulty in targeting the action of the mutagen to a specific region of the gene of interest, typically the catalytic site of an enzyme or a region involved in regulatory interactions with DNA or with other proteins.
Once it became possible to clone the gene encoding a protein of interest, it became possible to systematically alter the gene at specific sites to generate virtually any desired mutation, a tech-nique known as site-directed mutagenesis. Introduction of the cloned mutant gene into a host cell, followed by its expression, could then yield the mutant protein for study of its altered function.
The most powerful and widely used method for site-directed mutagenesis, conceived by Michael Smith, allows the introduc-tion of practically any mutaintroduc-tion at any site, including single-base substitutions, short deletions, or insertions. The approach, illustrated in Figure 4B.8, requires that the gene first be cloned into a single-stranded vector, such as phage M13, as discussed for DNA sequence analysis. Next an oligodeoxynucleotide, about 20 nucleotides long, is synthesized that is complementary in sequence to the cloned gene at the site of the desired mutation, except in the center of the sequence.
Here the oligonucleotide sequence contains one or two delib-erate mistakes—either single nucleotides that do not pair with the template or insertions or gaps of a few nucleotides. Upon annealing to the cloned gene, these alterations create either non–Watson–Crick base pairs or bases that have no partners and therefore form a “looping out.” The correctly matched bases on both sides of the mismatch cause it to remain annealed, despite the mismatch. DNA polymerases are then used to synthesize around the circular vector from this primer, followed by enzy-matic ligation to create a closed circular duplex. After introduc-tion of this circular molecule into bacteria by transformaintroduc-tion, both strands replicate and yield phage. In principle, 50% of the phage should contain the desired mutation within the inserted sequence. In practice, that percentage is considerably less, but it can be increased by various techniques (see reference by Kunkel et al.). In any event, the mutant gene is then cleaved out of the modified phage genome by restriction nuclease treatment, and it can be recloned into an expression vector for large-scale prepara-tion and subsequent isolaprepara-tion of the mutant protein.
Although single-stranded DNA phages, such as M13, were originally used to prepare templates for site-directed mutage-nesis, such molecules are now readily prepared by polymerase Strands
1 2 3 4
+ + +
1 nt Hinges Anneal Ligate
20 bp Edges
A A
F
D
B E
C
F
D
B E
C
(a) (b)
FIGURE 4B.5
Design and synthesis of a three-dimensional DNA nanostructure, in this case a DNA tetrahedron. (a) Design of four synthetic oligonucleotides with com-plementary base sequences indicated by matching colors. The four oligonu-cleotides were annealed, by heating and slow-cooling, followed by ligation—treat-ment with an enzyme (Chapter 25) that creates covalent bonds between the DNA ends. (b) Two images of a space-filling representation of a tetrahedron with three 30-nucleotide sides (A, B, and C) and three 20-nucleotide sides (D, E, and F).
(a) From Science 310:1661–1665, R. P. Goodman, I. A. T. Schaap, C. F. Tardin, C. M. Erben, R. M. Berry, C. F. Schmidt, and A. J. Turberfield, Rapid chiral assembly of rigid DNA building blocks for molecular nanofabrication. © 2005. Reprinted with permission from AAAS.
134
CHAPTER 4 NUCLEIC ACIDSFIGURE 4B.6
Cloning into M13 and sequencing by the Sanger method.
Double-strand replicative form of M13
Restriction Ligation and transformation
into bacteria
Sequence to be analyzed
DNA polymerase
CACCddT CACCTGAAddT CACCTGAATddT CACCTGAAT TACGddT
Sequence of bases on the complementary
strand
Sequence of bases on the analyzed
strand dATP
dT TP dCTP dGTP
ddATP ddCTP ddGTP ddT TP
Isolation of single-strand DNA from phage particles Oligonucleotide
primer
Primer
OH
5′
DNA to be sequenced
Bacterial cell
GTGG ACTTAA TG
CA
α-(32P)
CACCTddG
CACCTGAAT TACddG ddC
CAddC CACddC
CACCTGAAT TAddC CddA
CACCTGddA CACCTGAddA CACCTGAAT TddA
14 13 12 11 10 9 8 7 6 5 4 3 2 1
T G C A T T A A G T C C A C Bases
Agarose electrophoresis gels
A C G T A A T T C A G G T G
MANIPULATING DNA
135
chain reaction (PCR; Tools of Biochemistry 24A), and that has made an already relatively straightforward technique even simpler.
References
Ding, B., and N. C. Seaman (2006) Operation of a DNA robot arm inserted into a 2D DNA crystalline substrate. Science 314:1583–1585. An early application in DNA nanotechnology.
Douglas, S. M., H. Dietz, T. Liedl, B. Högberg, F. Graf, and W. M. Shih (2009) Self-assembly of DNA into nanoscale three-dimensional shapes. Nature 459:414–418. Both two- and three-dimensional shapes can be designed from DNA.
Drmanac, R., and 66 coauthors (2010) Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327:78–81. Using second- and third-generation technology, these work-ers sequenced three human genomes at a cost of $4400 per genome and accuracy of one error per 100 kb.
Endo, M., and H. Sugiyama (2009) Chemical approaches to DNA nanotech-nology. ChemBioChem 10:2420–2443. A detailed and informative review.
Kunkel, T. A., J. D. Roberts, and R. A. Zakour (1989) Rapid and efficient site-specific mutagenesis without phenotypic selection. In: Recombinant DNA Methodology, edited by R. Wu, L. Grossman, and K. Moldave, pp. 587–601.
Academic Press, San Diego, CA. Laboratory instructions for the most widely used method of site-directed mutagenesis.
Mardis, E. R. (2011) A decade’s perspective on DNA sequencing technology.
Nature 470:198–203. A recent review of the sequencing methods arising since the Sanger technique.
Mattencci, M. D., and M. H. Caruthers (1981) Synthesis of deoxyoligonu-cleotides on a polymer support. J. Am. Chem. Soc. 103:3185–3191. More detailed description of the chemistry involved.
Metzker, M. L. (2010) Sequencing technologies—the next generation. Nature Reviews Genetics 11:31–46. This review article discusses the principles and applications of six “second-generation” high-throughput DNA sequencing technologies.
Sambrook, P. J., and D. W. Russell (2001) Molecular Cloning, A Laboratory Manual, Volumes 1–3, 3rd ed. Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. The definitive laboratory handbook of molecular biological methods.
Zheng, J., and eight coauthors (2009) From molecular to macroscopic via the rational design of a self-asssembled 3D DNA crystal. Nature 461:74–77.
A triangular DNA structure formed from synthetic oligodeoxyribonu-cleotides forms large crystals, well beyond nanoscale.
T T T T T T T T T TT T T T
130 T
T T T 110 T
G G GG G
100 90
G G G G
120 G
C
1040 1120 1200 1280 1360 1440 1520 1600
C C C CC C C
A A A A A AA A A
G
FIGURE 4B.7
Data from a DNA sequencing gel.
Courtesy of Dr. Robert H. Lyons, The University of Michigan’s DNA Sequencing Core.
GCC CAG
GCC CAG
GCC CGG
Oligonucleotide primer with one mismatch annealed to circular DNA
Cloned gene
Closed circular duplex M13 vector
DNA polymerase, DNA ligase dNTPs
Replication in E. coli
Original gene Mutant
gene GTC
CAG
FIGURE 4B.8
Use of a mismatched synthetic oligonucleotide primer to introduce muta-tions in a gene cloned into a single-stranded vector.
We have seen that one class of biopolymers,
the nucleic acids, stores and transmits the genetic information of the cell. Much of that information is expressed in another class of biopolymers, the proteins. Proteins play an enor-mous variety of roles: Some carry out the transport and storage of small mole-cules; others make up a large part of the structural framework of cells and tissues.Muscle contraction, the immune response, and blood clotting are all mediated by proteins. An important class of proteins is the enzymes—the catalysts that pro-mote the tremendous variety of reactions that are required to support the living state. Each type of cell in every organism has several thousand kinds of proteins to serve these many functions.
In keeping with the multiplicity of their functions, proteins are extremely com-plex molecules. This comcom-plexity is illustrated in Figure 5.1, which depicts the molec-ular structure of myoglobin, a relatively small protein that functions primarily in oxygen binding and storage in animal tissues. In this and the following three chapters, we analyze in detail the structures and functions of a handful of proteins, including myoglobin. We will see that although there are general features of protein structure shared by most proteins, each protein has a distinct structure that is optimally suited to its function. Protein structures may appear at first glance to be hopelessly complex;
however, there is an elegant and readily comprehensible logic to protein structure, which we will describe here and in Chapter 6. We begin with a description of the simple “building blocks” that are found in all proteins: the amino acids.
Amino Acids
Structure of the -Amino Acids
All proteins are polymers, and the monomers that combine to make them are -amino acids. A general representation of an -amino acid is shown in Figure 5.2a.
The amino group is attached to the -carbon, the carbon next to the carboxylic acid group; hence the name -amino acid. To the -carbon of every amino acid are alsoa a aa A