In vitro, these enzymes are one of the essential tools of the modern molecular biologist. The molecular biologist uses the same effects of the same molecules (the dideoxynucleotides and their analogs) to determine the sequence of DNA molecules. The final part of this thesis is devoted to a statistical analysis of the data and investigating the role of sequence context in determining the incorporation of dideoxynucleotides.
Sanders presents statistics from a total of 903 bases sequenced by each of the 3 polymerases, but does not relate this to sequence context, except for s.
An Aside
This raises the level of characterization of this defect from merely anecdotal and qualitative to a statistically justifiable quantitative measurement.
Chapter 2
DNA, Polymerases and Dideoxy Sequencing
Chemistry of DNA and its Polymerases
In 195:3, Watson and Crick showed that DNA molecules naturally exist as double helices, consisting of two complementary ON A molecules held together by hydrogen bonds between the bases. The two DNA strands in a double helix are oriented opposite to each other, with the 5' end of one strand being adjacent to the 3' end of the other. Although the entire process of DNA replication is rather complicated, the main process involves a class of enzymes known as DNA polymerases.
A typical polymerase requires three main things: a piece of template DNA, a short piece of DNA (or RNA) complementary to the 3' end of the template (the "primer"), and a supply of the 4 deoxynucleotide triphosphates.
Sanger Dideoxy Sequencing Reactions
If a reaction mixture consisting of a template-primer complex is prepared, sufficient amounts of the 4 deoxynucleotide triphosphates and a percentage of e.g. dideoxythymidine triphosphate (ddT), assuming that the polymerase does not distinguish between dT and ddT (often an invalid assumption, it turns out), each A (A is complementary to T) in the template will have a one percent chance of completing the elongation process. The products of this reaction (the “T” reaction) will be dominated by DNA segments ending in ddT and complementary to the initial sequence of the template (Figure 2.3). If the polymerase molecule falls off in the middle of extending the primer, and neither it nor another polymerase molecule picks up where it left off, then the fragment may end up at an inappropriate position.
This loop inhibits the action of the polymerase, making it more likely to fall off in the wrong place and mistermine the primer.
F' . ddC
- G e l Electrophor esi s
- Detection
- Chapter 3 Technologies
- Sequenas e
- The ABI 373 S e quence r
- Running A Gel
- Chapter 4
It then takes 0.50 seconds to turn around before returning to the other edge of the gel. There is some play in the horizontal position of the comb, so the strips are never in the same position on the gel from ride to ride. The surface of the gel often has a bump in the center of the dimple due to the teeth of the comb pushing the edges.
This appears in the image as an added glow at the edges of the lanes.
Generating the Data; A Sequencing Project
Cos mid 2-4 7
All data used in this thesis came from a cosmid that was sequenced as part of a project to sequence part of the mouse T-cell receptor locus; part of the immune system. Primarily sequenced between January and May 1992 by Don Seto, ct al., using four ABI 373 sequences at Caltech. The four colors in the primer are known as FAM, JOE, TAM RA and ROX, they are used.
Chapter 5
- Common-Time Resampling
- Gel Straightening
Tt will be important later, when the four-color data for each pixel is transformed into four-color data, so that the different colors are temporally matched. For each pixel, in each color, the 2 values before the reference time plus the value after the reference time are fit with a quadratic function. The value of this square at the reference time is used as the value of that pixel in that color for this four-color scan (see Figure 5.1).
This effectively resamples all data in a four-color scan at the common reference time.
Path of scanning optics in space-time
Filter color in front +-of photomultiplier tube
- Lane Finding
- Horizontal Averaging
- Comparison with ABI Software
- Lane 1 2 3 4 5 6 7 8
From the left edge, the correlations between adjacent segments are calculated for a series of displacements of the right-hand segment. The algorithm then shifts over one segment to the right and repeats the process using the newly shifted segment as the left-hand segment of the pair. The purpose of the track finding procedure is to find the left and right boundaries of the bands.
To extend the Lbe edges along the gel, certain adjustments are necessary due to the tendency of the webs to bend more.
Extracting Incorporation Ratios
- Dye-Space Transformation
- Baseline Subtraction
- Mobility -Shift Corrections
- Base Calling (first two passes)
- Filtering
- Finding the Primer Peak
- Finding Peaks
- First-Pass B ase Calling
- Graph Normalization
- Second-Pass Base Calling
- Base Calling (third pass )
This transformation varies slightly between machines due to variations in the exact transmission spectra of the filters. The presence of the dye moiety on the 5' end of the DA molecules also changes the mobility of the molecule. This is the second most important reason for calling bases incorrectly, but correcting for it would require a priori knowledge of the sequence.
That sequence is determined in the order of the different color peaks in the data. Specifically, Lite sC'qttence is that of the primer strand reading in the 5'-+3' direction with increasing scan number. The parameters of the two Gaussians vary as a function of the scan number to compensate slightly for the increasing peak width during the run.
This filtering scheme tends to sharpen peaks, but it is not intended to be interpreted as a "corrected" representation of the data. Since it represents the beginning of the real data for a lane, it is an important milestone. Later, after the second pass, I'll locate the small place of the fragment as a more accurate landmark.
When this ratio is the lowest, the primer peak is in the middle of the 100 scan window. I* This routine looks at a portion of the data to try relative strengths of the different color signals.
6 .5.1 The Smai S ite
The primer position is too coarse for the final results; the main use I make of the primer position is to find the minor site.
- Reference Spacing
- Third-Pass Base Calling
- Conse nsus Alignment
- Computing The Spacing Graph
- Quantitating Peaks
- Chapter 7 The Data
- lntro
- Statistical Analysis
The primer position is too crude for the final pa~s, in fact the biggest use I make of the primer position is to find the Smal spot. was incorporated into his idea of the proper base spacing, causing more errors to be rnadC' later. To a first approximation, t lw V<'location with which a molecule' moves through Lite gel is proportional to Lh<' <'l<'ctric ficld and inversely proportional to the viscosity of the buffC'r solution in the gel. I make the assumptions that the electric field of the tape is proportional to the recorded voltage and that the viscosity of the buffer solution is proportional to the viscosity of water at the recorded temperature.
NOTE THAT FBASE IS RELATIVE TO FRAGD->INSERT_BASE THIS MAKES FBASE COMPATIBLE WITH COMP_NOM_SPACING. It is necessary to compare the fragment to both the given consensus and its contents, as the fragment could come from both strands of the cosmid. You can then use that raw positioning information to align the entire fragment to the correct strand of the consensus.
One of the criteria used in the base call is peak h<•ight; peaks that are too low ar<' rejctcd everything within 20 bases of them.
Figure 7.2 is a graph of the mean intensity of the data versus the position of this choking fragment. The goal of flrsl is to get an idea of how noisy the <:> data is.
IIIII I I
7.2 .2 Which B ase Pos itions Are Important?
Effect of Particular Bases
Prom this point, I consider the data to avc-raglllg the intensities of the No fragment bases at <'ach consensus position. I sorted all the consensus positions by average intensity and divided the r<'s resulting list into bins, each having at least ;30 points and spanning at least 0.005 intensity units. Sequence Context Intensity Sequence Context Intensity CTTCCCATAC T CTGAGTGCTA 0.079 CCCCCAGCAG G TGCATCTTTT 2.016 AGCCAACTAT T CTTTAATTAT 0.146 AGTCGCTGAG G CTGCACTACC 1.856 TTTCCAGTGACTCA GGT0TCTGGACAT2. . 1.656 AACAGTCCTT T CTTTTTCCTA 0.281 TCTTTTGTGG G TGATTCAGTT 1.616 GTGTGCA GCATGC0. ACCTCTTC 1.595 TCTTTTATAT T CTGTTAGTGA 0.289 AGCAATTAGC G GGGGTTTTTG 1.590 TATCATATAC T CAAAATGCTT 0.295 TCCTCCAGTG G AGGCCCTGTC 1.581 TTTAATGTAC G ATTTTGATT1GT TTGGTTGGTG 1.528 GTTTGAATAT G CTTGGCCCAG 0.309 TTCCTAG C CCCCACACAA 1.479 CTCC ATTACATAC A CTAGCAAGAT 0.327 AAATGTTATC C CCTTTCCTGG 1.4 79 GCTCCTACACCAT CCTT3GTCTT0GT3GTTAG. 476 TATGCCTTAC T CTGGTATAGG 0.341 ATTATTTACA G AATCTCAATA 1.447 AGTAATGTAT G CAGCTTGAAT 0.343 CCCTTCCAAAG G ACAGCCATGC 1.423 TCCAGAATAC G TGACTCACGG 0.3TCCT4CT GGGCT 1.423 TTTCCTTTAT T CTCATTACAC 0.348 GCATCTGCCG C CACCACTTCT 1.421 AGCC AACTAA G CCCTCCTGGT 0.352 AATGGCTGAG C ATGGACCATG L.406 TTTATGACTT T CTTTGTCCTG CTCCTATACCCTAA 9. TGCTGCACTT 0.362 GTGACTCACG G TCTACAACAA 1.391 TTTCCTATAC A ATGTATTCAT 0.369 TTTCTGTAATG T CACCAAGGAG 1.388 TTTAGACTAT G TTCACTGTGA 0.375 CATTACAACT T TCAGGATTGT 1.385 GGTACCGACA G GTTCCTCTTC 0.376 TCCCCATGTT T TTGAGTAATA ] .382 CCATCCATAC T CATGTACCAA 0.377 TATGT ATGTG T TGAATCACTA 1.366 CCAAGACAAT G CTGAAAAGGA CTGAAAAGGA CCTTATG 0.377 TATGTATGTG T. C T CGCCCATCCA 0.386 TTTTTAAAAA A AGAAAGAAGA 1.359 TTTTAAATAT T CACAGCTAAG 0.387 CTTCCCATTG T TGAACATTTC 1.358 CAAATATTAT T CACTTTCCAG 0.391 TGGTTGTATT T TGTTGTAGAC 1.:358 Table 7.
Chapter 8 Conclusions
Oxyg<>n atom at th<> 2' position. Is the same mechanism responsible for the distinction between dATP and ddATP, two molecules that differ by one oxygen atom at the :3' position?. Kristensen [1\risten:;en d al J 988) suggests that knowing<> the effect of the sequence on top h('ight) could be useful in calling bases.
Appendix A
Escherichia Coli thioredoxin confers processivity on the DNA polymerase activity of the gene 5 protein of bacteriophage T7. Effect of manganese ions on the incorporation of dideoxynucleotides by bacteriophage T7 DNA polymerase and Escherichia Coli DNA polymerase 1. DNA sequence analysis with a modified bacteriophage T7 DNA polymerase (effect of pyrophospholysis and metal ions).