Bacteriophage Nucleosome Lac Repressor
»50 nm »10 nm
»10 nm
Figure 5.1: The bending of DNA is ubiquitous in the cellular processes associated with genome management. In the figure above, we depict cartoons of the crystal structures of three biological processes that bend DNA. The bacteriophageφX174, a virus that infects bacteria, is a DNA bending expert. The protein capsid shell, shown in the figure, is densely packed with the virus genome which assumes a spool conformation. Our own genome is tightly wrapped around many protein spools, called histone core particles (shown in the second panel). The complex, consisting of the protein and the DNA, is called a nucleosome. DNA bending is also intimately involved in gene regulation.
The lac repressor is a simple example of a very common motif in which gene regulatory proteins loop DNA. (These images are taken from David Goodsell, the Protein Data Bank.)
Widom) to chain statistics. In Sect.5.4, I present the experimental results of Cloutier and Widom [2] and discuss their implication for high-curvature DNA bending. Finally, in Sect.5.5, I present a summary of the following chapter on DNA mechanics in which I describe an exact statistical me- chanics theory of kinkable semi-flexible polymers and discuss the application of this theory to DNA mechanics.
Beads-on-a-string Chromatin Fiber
Figure 5.2: DNA packaging in eukaryotic cells. The figure above is a schematic illustration of the many levels of structural organization that exist in a mitotic chromosome. The fundamental packaging unit of chromatin is the nucleosome complex: a histone core (yellow) tightly wrapped one-and-three-quarter times by double-stranded DNA (red). This figure is from Ref. [5].
Nor can it use the original copy. It might damage this copy or another process might need use the same gene. In fact, at any one time in the cell, there are typically many RNA transcripts of a few, essential genes. The RNA transcripts are then translatedinto many copies of the gene product, the protein.
5.1.1 DNA packaging
Of course things are not quite that simple. A eucaryotic cell’s library is called the nucleus. The nucleus is the organelle in which the genome is stored. Much like the complicated system in our own Caltech library, the cell also has an organized system for storing its genome. Consider the physical length of our genome: there are roughly three billion base pairs, each a third of a nanometer in length. If the genome were stretched out, it would be roughly a meter in length! The cell nucleus has just one millionth this diameter (∼1 µm)!
Different organisms have learned to cope with this “DNA packaging problem” in different ways.
eukaryotic cells—like those that we are composed of—have several scales of DNA packaging. On the shortest length scale, DNA is tightly wrapped around protein “spools” calledhistone core particles.
These protein spools are roughly 11 nm in diameter. One complex, ∼ 200 bp of DNA wrapped around the histone core, is called the nucleosome (See Fig. 5.1.1). Nucleosomes collectively form fibers called chromatin. Due to the large number of nucleosomes required to package our genome, histones are by far the most common DNA binding protein in eukaryotic cells [4]. The typical state
i
z ds
Bacteriophage ÁX174 Bacteriophage T7
»50 nm
25 nm
A B C
Figure 5.3: DNA bacteriophages are protein capsids tightly packed with DNA. Panel A shows a cartoon of the crystal structure of the capsid of bacteriophage φX174. The genome is believed to be packaged in an inverse-spool configuration, starting from large radius coils and working inwards as depicted schematically in panel B. Panel C shows experimental evidence for the spool packaging hypothesis. This image is from a three dimensional reconstruction of a T7 phage, generated by cryo- electron tomography [6]. The dark rings in the image correspond to the ordered DNA spool. The inner radius of the DNA spool is believed to be a few nm [6,7]. (Panel A is from David Goodsell, at the Protein Data Bank. Panel C is from Ref. [6].)
of our DNA, due to the high-curvature induced by histones, is very much more condensed than free DNA in solution.
To see what a huge affect histones have on DNA condensation, it is useful exercise to estimate how much space the DNA would take up in absence of any confinement. As we shall discuss later, thermal fluctuations bend DNA spontaneously. The mean squared end-to-end distance of a long polymer in solution is [8]
D∆X~2E
=bL, (5.1)
whereLis the polymer contour length andbis the Kuhn length which is proportional to the stiffness of the polymer. (We shall define it more precisely later in the chapter.) For DNA, the Kuhn length is 100 nm. The mean squared end-to-end distance is approximately the square of physical length of the polymer in solution. To estimate the volume that DNA occupies in solution, we shall cube this approximate length
Vfree∼(bL)3/2∼10−10 m3= 0.1 µL. (5.2) A tenth of a micro-liter may seem pretty small, but in lab we routinely pipette volumes of just 2 µL! By contrast the volume of the nucleus is
Vnucleus∼(1 µm)3= 10−9 µL, (5.3)
one one-hundred-millionth of the free DNA volume! This incredible condensation is predominantly the work of the histone.
The properties of nucleosomes are of great biological interest because chromatin forms the sub- strate for all eukaryotic-cellular processes associated with DNA: from transcription to replication, recombination, DNA repair, and cell division [9]. The chemical and physical properties of chromatin are intimately related to the mechanics of tightly bent DNA. I shall give one explicit example of this interplay in Sect.5.4.
Perhaps viruses face a more pronounced “DNA packaging problem” than even eukaryotic cells.
Although biologists argue about whether viruses should be counted as organisms, they replicate their genetic information with amazing efficiency. Viruses are parasites that cannot self replicate, but they are one of the smallest replicating genetic units [4]. Viruses are weight conscious. For example, nearly all of the genome sequence of a virus is coding sequence; whereas a very large fraction of our own genome (90%) does not appear to code for proteins at all! In fact, there are many known examples of regions of virus genomes that code for multiple proteins at once.
The virus is a parsimonious traveler to meet the challenges of efficient transfer from host to host, while protecting its genome. Like eukaryotic cells, viruses must pack their long genomes by tightly bending their DNA. Many bacteriophage, viruses which infect bacteria like that pictured in Fig. 5.1, achieve this tight packaging by compressing their DNA into a very small protein capsid shell, typically less than 50 nm in diameter. The DNA is packed into the capsid in an inverse spool configuration by a DNA packaging motor. As the capsid fills from its periphery, the motor must insert DNA coils of increasingly high curvature (See Fig.5.1.1). The end of the genome is probably packed with a radius of curvature of just a few nanometers! The energy expended in this packaging processes to bend the DNA is not wasted, but stored, like a compressed spring, to be used to infect another host. This amazing story is told in many other places [7], but by almost all accounts the mechanics of tightly-bent DNA plays a starring role!
5.1.2 Transcriptional regulation
In the previous section, I alluded to the fact that only ten percent of our genome codes for proteins.
Even in the coding regions, not all proteins are expressed (synthesized) at once. In fact, only very few “housekeeping genes” are generically expressed in all of our cells all of the time. The rest are
“turned on” only when they are needed. How does the cell control its genetic information and ensure that the correct genes are expressed?
Many gene regulatory mechanisms have been discovered over the past fifty years but one of the most important and universal mechanisms controls the transcriptional process itself. If no protein is needed, no RNA transcript is produced. This regulatory mechanism is called transcriptional regulation. On the microscopic scale, gene regulatory proteins that bind particular sequences of DNA, called operators, interact with the cellular machinery responsible for transcription. Gene regulatory proteins exist which exert both positive (activation) and negative (repression) control
Figure 5.4: DNA looping is a common functional motif in eukaryotic gene regulation. The EM mi- crograph shows two DNA loops formed in theCyIIIacis-regulatory region of the sea urchin genome.
These DNA loops are the result of regulatory proteins trapping rare looped DNA conformations.
(This figure is taken from Ref. [10].)
om os Repressor
Promoterom os
A B C
Figure 5.5: Transcriptional regulation and chain statistics. DNA looping is a common motif in transcriptional regulation, the mechanism by which the cell regulates transcription. Thelac operon is an example of regulatory looping in a procaryotic cell. The gene regulatory proteinlac repressor can bind to two operators and induce a loop, increasing its affinity for the DNA. Panel A shows the crystal structure of the repressor bound to DNA. (This panel is taken from David Goodsell, the Protein Data Bank.) Panel B shows a schematic drawing of the looping mechanism. The lac repressor can capture rare thermal fluctuation that bring the auxiliary operator into proximity with the DNA binding domain of the protein. (This panel is taken from Ref. [11].) Panel C shows a schematic depiction of the effective concentration of the auxiliary operator (blue dot) once the primary operator is bound to the repressor (the pin). The intermediate DNA (red line) acts as a tether which increases the local, effective concentration of the auxiliary operator in the vicinity of DNA binding domain. (This panel is adapted from Ref. [4].)
over transcription.
One extremely important and common motif in transcriptional regulation is DNA looping. DNA looping is induced by gene regulatory complexes which bind the DNA at more than one operator.
(See Fig. 5.1.2) DNA looping implies that the regulatory control region can be thousands of base pairs in length and can include many different operators and therefore it can be sensitive to many different stimuli. Such complicated regulatory machinery is generic in eukaryotic cells and especially in multicellular organisms. A typical example is the 2500 base pairEndo16cis-regulatory region in the Sea Urchin. In this regulatory circuit, more than 40 different looping configurations are possible [12,10]! (See Fig.5.1.2.)
From a biological perspective, we would like to understand and predict the levels of gene tran- scription in these systems. The formation of DNA loops implies that DNA chain statistics plays an integral role in determining the function of these regulatory circuits. Once the gene regulatory complex has bound one operator, the DNA, between this operator and those adjacent to it, acts as a tether, increasing the effective concentration of the adjacent operators at the regulatory complex.
This mechanism is illustrated in Fig. 5.1.2. If the operators are too closely spaced, the inherent stiffness of the DNA can hold the two operators apart, preventing a gene regulatory complex from binding both operators. As a result, the behavior of gene regulatory circuits depends sensitively on the base pair spacing of the operators. In Sect. 5.3, we shall return to the idea of effective concentration and develop it more rigorously.
Fortunately, procaryotic cells exhibit regulatory circuits which do not loop in forty different configurations! These systems provide anin vivoproving ground for understanding gene regulatory looping. For example, in E. coli, the lac operon has been extensively studied. The binding of the lacrepressor induces a DNA loop and represses the transcription of the gene lacZ. (See Fig.5.1.2.) The stability of the induced DNA loop is measured indirectly by measuring protein expression as a function of inter-operator spacing (loop length) [13,14,15,16]. (See Fig.5.1.2.)