University of Cape Town
Plasmodium falciparum general transcription factors TFIIB and TLP
By
Steven Bing
Dissertation presented for the degree of Master of Science
In the Department of Molecular and Cell Biology University of Cape Town
February 2015
Supervisor
Dr Thomas Oelgeschläger
The financial assistance of the National Research Foundation (DAAD-NRF) towards this research is hereby acknowledged. Opinions expressed and conclusions arrived at, are those of the author and are not necessarily to be attributed to the DAAD-NRF.
quotation from it or information derived from it is to be published without full acknowledgement of the source.
The thesis is to be used for private study or non- commercial research purposes only.
Published by the University of Cape Town (UCT) in terms of the non-exclusive license granted to UCT by the author.
University of Cape Town
I know the meaning of Plagiarism and declare that all of the work in the document, save for that which is properly acknowledged, is my own.
Signed: Steven Bing Signature:
Signature removed
I am deeply grateful to my family for the tireless love and support they have shown me, not only during my academic career, but in all things. Mom, Dad, and Graeme, thank you.
Thank you to my supervisor, Dr Thomas Oelgeschläger, for the guidance and advice, as well as the time and energy you have spent on my education.
To the members of my lab, Tom, Alma, Gertrud, and of course, Rob, I am so glad to have had the opportunity to work with you. You have all been such great lab mates and I appreciate the friendships we share. Thank you for the help, support, and buffers.
I have had the supreme pleasure of spending the last few years in the excellent company of the postgraduate and faculty members of the Department of Molecular and Cell Biology. I will continue our tradition of tea at 10:30, and think of you often.
A special thank you must of course be extended to the support staff of the Department. The research undertaken in the Department is truly a team effort, and I am grateful for the role you play in making sure research runs smoothly.
I would like to extend a general sense of gratitude to one and all who directly or indirectly, have lent their helping hand in this research project.
Finally, thank you Tyronne McCrindle. I truly do not know how I could have done this without you.
The following reagents were obtained through the MR4 as part of the BEI Resources Repository, NIAID, NIH: Plasmodium falciparum P. falciparum 3D7 library, MRA-296- 299, deposited by D. Chakrabarti, and for that, we extend our gratitude.
Malaria is a leading cause of morbidity and mortality worldwide, and results in approximately 600,000 deaths annually. The life cycle of the parasite is complex, and has several distinct stages of development. The transitions between these stages are brought about through tightly controlled and highly synchronized changes in gene expression.
Plasmodium falciparum causes the most lethal form of malaria in humans. The parasite is particularly virulent as it is able to evade immune detection by the infected host. This virulence is directly related to the expression of variable antigens on the surface of infected red blood cells. The control of gene expression is known to be largely regulated via RNA Polymerase II (RNAPII) transcription initiation, but in P. falciparum the underlying mechanisms have not been determined. This primarily because very little is known about both the key protein factors and DNA elements which guide the assembly of RNAPII components into the transcription initiation complex. Bioinformatics studies have shown that there is very little amino acid sequence conservation between human and Plasmodium RNAPII transcription initiation components. Together with the observation that the Plasmodium genome has an extremely high A+T content, this suggests that Plasmodium may have specific mechanisms to initiate transcription, which could be targeted by novel anti-malarials. The general transcription factor TFIIB and the TBP-like protein (TLP) are key proteins involved in the recognition of the core promoter, and the initiation of RNAPII transcription initiation complex assembly. TFIIB stabilises DNA binding of the primary promoter recognition factor, TATA-box binding protein (TBP), and is involved in promoter recognition through interactions with specific DNA sequences up- and downstream of the TBP DNA binding site. TBP-like protein is a member of the TBP protein family that has been implicated in life cycle stage specific gene transcription initiation in various eukaryotic model organisms. This research study reports the first successful expression and purification of recombinant epitope-tagged Plasmodium falciparum TFIIB and TLP proteins.
Preliminary assays demonstrate DNA-binding activity for the recombinant Plasmodium TBP-like protein, and suggest DNA-binding activity in Plasmodium TFIIB protein, which has not been demonstrated before in eukaryotic TFIIB.
A,T,G,C - adenine, thymine, guanine, cytosine
Ad2ML - adenovirus 2 major late ApiAP2 - Apicomplexan AP2
BLAST - Basic Local Alignment Search Tool
BLASTp - protein BLAST bp – base pairs
BREu/d - upstream/downstream B- recognition element
BSA - bovine serum albumin
C-;N- terminus – carboxyl-; amino- terminus CAP - catabolite activator protein
cDNA - DNA copy synthesized from mRNA COBALT - Constraint-based Multiple Protein Alignment Tool
DNA - deoxyribonucleic acid
dNTP - deoxyribose nucleoside triphosphates DPE - downstream promoter element
DTT - Dithiothreitol E. coli - Escherichia coli
EDTA - Ethylenediaminetetraacetic acid EMSA - electrophoretic mobility shift assay EtBR - ethidium bromide
FastAP - FastAP Thermosensitive Alkaline Phosphatase
GBP - glycophorin-binding protein
GBP-130 - glycophorin binding protein 130 GSH - glutathione
GST - glutathione-S-transferase GTF - general transcription factor H. sapiens - Homo sapiens HEPES - 4-(2-hydroxyethyl)-1- piperazineethanesulfonic acid HRP - horseradish peroxidase HTH- helix-turn-helix
Inr – initiator
IPTG - Isopropyl β-D-1- thiogalactopyranoside
iRBCs - infected red blood cells
KAHRP - knob-associated histidine rich protein
LA - lysogenic agar LB - lysogenic broth
LCR - low-complexity region mRNA - messenger RNA
MSA - Multiple sequence alignments MTE - motif ten element
Mw - molecular weight
Ni-beads - PureProteome™ Nickel Magnetic Beads (Merck Millipore)
NP 40 - Nonidet™ P 40 Substitute (Sigma- Aldrich®)
OD - optical density ORF - open reading frame
P. falciparum - Plasmodium falciparum PCF - pooled column fraction
PCR - polymerase chain reaction Pf - Plasmodium falciparum PfEMP - P. falciparum erythrocyte membrane protein
PfTBP - Plasmodium falciparum TATA-box binding protein
PfTFIIA - Plasmodium falciparum transcription factor IIA
PfTFIIB - Plasmodium falciparum transcription factor IIB
PfTLP - Plasmodium falciparum TBP-like protein
PfTLPco - codon optimised PfTLP PIC - pre-initiation complex PVDF - polyvinylidene difluoride Q-, DEAE-, CM-, SP-Sepharose® -
quaternary ammonium, diethylaminoethanol, carboxymethyl, sulphopropyl
RBC - red blood cells
RCF - relative centrifugal force RNA - ribonucleic acid
RNAPII - RNA polymerase II rRNA - ribosomal RNA
S. cerevisiae - Saccharomyces cerevisiae SDS - sodium dodecyl sulphate
SDS-PAGE - SDS polyacrylamide gel electrophoresis
SELEX - Systematic evolution of ligands by exponential enrichment
SOC - Super Optimal broth with Catabolite repression
TAE - tris base, acetic acid and EDTA TBE - tris base, boric acid and EDTA TBP - TATA-box binding protein TE - Tris-EDTA
TEV - Tobacco Etch Virus TF - transcription factor
TFII- A,B,D - transcription factor II – A,B,D, TAF - TBP-associated factor
TLP - TBP-like protein TRF - TBP-related factor tRNA - transfer RNA TSS - transcription start site UTR - untranslated region
Contents
Chapter 1 Introduction ... 1
Plasmodium falciparum ... 1
1.1.1 The impact of Plasmodium infection ... 1
1.1.2 Life cycle of Plasmodium falciparum... 1
1.1.3 Pathology of Plasmodium falciparum ... 4
1.1.4 The Plasmodium genome ... 6
1.1.5 Gene expression in Plasmodium falciparum ... 7
1.1.6 The RNA polymerase II pre-initiation complex ... 11
1.1.1 The general transcription factors ... 13
The aims of this study ... 19
Chapter 2 Materials and Methods ... 21
Bioinformatics analysis of protein structure and function ... 21
2.1.1 Multiple sequence alignments... 21
2.1.2 Domain identification in protein sequences ... 21
2.1.3 Secondary and tertiary structural prediction of proteins ... 22
Vectors and primers used for gene cloning and sequencing ... 22
2.2.1 Vectors used for gene cloning... 22
2.2.2 Primers used for gene cloning and sequencing ... 23
Cloning and sequencing of Plasmodium TLP ... 24
2.3.1 Agarose gel electrophoresis ... 24
2.3.2 Polymerase chain reactions (PCRs) ... 24
2.3.3 Restriction enzyme digestion and ligation of PCR amplified inserts ... 25
2.3.4 Preparation of pET11d-vectors for cloning ... 26
2.3.5 Isolation of PfTLP expressing pET11d-vectors ... 27
2.3.6 Transformation of protein expression vectors into protein expressing cells 28 Expression of recombinant Plasmodium proteins in E. coli ... 29
2.4.1 Expression of recombinant PfTFIIB protein in E. coli BL21-CodonPlus® (DE3)-RIL cells ... 29
2.4.1 Expression of recombinant PfTLP protein in E. coli BL21-CodonPlus® (DE3)-RIL cells ... 30
2.4.2 Preparation of E. coli glycerol stocks ... 30
2.4.3 Determination of recombinant protein expression by pilot-scale affinity purification ... 31
2.4.4 Identification of 6His-PfTLP and 6His-PfTFIIB proteins by mass spectrometry based analysis ... 34
Accumulation and purification of recombinant Plasmodium TFIIB protein ... 35
2.5.1 Accumulation of recombinant 6His-PfTFIIB expressed protein E. coli cell mass... 35
2.5.2 Bulk purification of 6His-PfTFIIB protein by Nickel-affinity purification . 35 2.5.3 Further purification of 6His-PfTFIIB by Sepharose® resins ... 37
Accumulation and purification of recombinant Plasmodium TLP ... 38
2.6.1 Accumulation of recombinant GST-6His-PfTLP expressed protein E. coli
cell mass ... 38
2.6.2 Preparation of soluble protein ... 38
2.6.3 Binding to Glutathione-Agarose (Sigma-Aldrich®): ... 39
2.6.4 Thrombin cleavage of GST-6His-PfTLP ... 39
2.6.5 Nickel affinity purification of cleaved 6His-PfTLP protein ... 40
2.6.6 Investigation of SP-Sepharose® purification of 6His-PfTLP ... 40
2.6.7 Depletion of GST from PfTLP protein preparation ... 40
Protocol for immunoblot analysis ... 41
Characterisation of PfTLP antibody ... 42
DNA binding assays. ... 43
2.9.1 Preparation of DNA probes ... 43
2.9.2 Polyacrylamide gel electrophoresis mobility shift assay ... 47
2.9.3 Agarose gel electrophoresis mobility shift assay ... 48
2.9.1 Immobilised template assay ... 49
Chapter 3 Results and Discussion ... 51
Bioinformatics analysis of putative PfTFIIB – structure and function ... 51
3.1.1 Amino acid sequence alignment analysis ... 51
3.1.2 Prediction of PfTFIIB secondary and tertiary structures ... 58
3.1.3 Summary of in silico analysis of PfTFIIB ... 63
Bioinformatics analysis of putative PfTLP – structure and function ... 66
3.2.1 Amino acid sequence alignment analysis ... 66
3.2.2 Prediction of PfTLP secondary and tertiary structures ... 70
3.2.3 Summary of in silico analysis of PfTLP ... 77
Expression and Purification of Plasmodium TFIIB ... 78
3.3.1 IPTG-induced overexpression of recombinant Plasmodium TFIIB leads to a reduction in cell growth in E. coli. ... 78
3.3.2 Expression of recombinant Plasmodium TFIIB can be achieved from non- induced cell cultures. ... 79
3.3.3 Expression of recombinant Plasmodium TFIIB can be achieved from the release of catabolite repression in non-induced cell cultures. ... 81
3.3.4 Bulk purification of recombinant 6His-PfTFIIB protein ... 84
3.3.5 Further purification of 6His-PfTFIIB ... 88
3.3.1 Summary of PfTFIIB expression and purification ... 92
Expression of recombinant PfTLP protein ... 93
3.4.1 Cloning of PfTLP ... 93
3.4.2 IPTG-induced expression of recombinant Plasmodium TLP leads to a reduction in cell growth in E. coli. ... 94
3.4.3 Expression of GST-6His-PfTLP ... 98
3.4.4 Expression of codon optimised GST-6His-PfTLP ... 104
3.4.5 Purification of PfTLP protein ... 108
3.4.6 Summary of PfTLP expression and purification ... 115
Examination of the DNA-binding potential of PfTLP and PfTFIIB... 116
3.5.1 Electrophoretic mobility shift assays (EMSAs) ... 117
3.5.2 Immobilised template assays ... 120
3.5.3 Agarose EMSAs... 124
3.5.4 Summary of the DNA binding potential of PfTLP and PfTFIIB... 128
Chapter 4 Conclusions ... 129
In silico analysis of PfTFIIB and PfTLP... 129
Expression and purification of recombinant PfTFIIB ... 130
Expression and purification of recombinant PfTLP... 130
DNA-binding potential of PfTLP and PfTFIIB ... 131
Overall conclusions and future work ... 131
Appendix ... 133
Accession numbers of proteins used in bioinformatics analysis ... 133
Plasmodium sequences used for alignment purposes ... 134
Predicted secondary structures for PfTFIIB and PfTLP ... 135
Vector information for this study ... 137
Nickel affinity purification of soluble proteins expressed in E.coli BL21-codonPlus® (DE3)-RIL E. coli cells not carrying an expression vector ... 140
SDS-PAGE showing the contaminants present in 6His-PfTFIIB protein preparation 141 Rare codon analysis of PfTLP ORF ... 142
DNA alignment of the optimized regions (red) of the PfTLP gene ... 142
Analysis of additional codons in pET11d-GST-6His-PfTLPco(aa) ... 144
References... 145
1
Chapter 1 Introduction
Plasmodium falciparum
1.1.1 The impact of Plasmodium infection
Malaria is a leading cause of morbidity and mortality across the globe (World Health Organisation 2014). Approximately 40% of the world population live in malaria endemic areas. In 2013 malaria resulted in about 200 million infections, and approximately six hundred thousand deaths. The vast majority of infections (~80%) occur in Africa, with ~90%
of the deaths occurring on this continent, the majority of which are children under the age of five years. Malaria in humans is caused by four species of Plasmodium. These include P.
malariae, P. vivax, P. ovale, and as the cause of cases with the highest rates of complications and death, Plasmodium falciparum.
1.1.2 Life cycle of Plasmodium falciparum
Plasmodium falciparum is an alveolate protist, of the group apicomplexa. Apicomplexans consist of single-celled eukaryotic organisms, many of which are parasitic to both invertebrates and vertebrates, leading to a host of diseases. As a group, the apicomplexans typically have multi-stage lifecycles, and with variable morphologies as they inhabit their multiple species-specific hosts (Neva & Brown 1994).
Plasmodium falciparum infects the female Anopheles gambiae mosquito and the human host. In taking up a blood meal from an infected human host, the mosquito takes up male and female gametocytes. Within the mosquito midgut, the male-derived micro- and female gametocyte-derived macrogametes of the parasite fuse to form a zygote (entering into the
2 sporogenic cycle), which proceeds to form the ookinete. The ookinete passes through the gut wall, and encysts on the exterior wall of the gut to form the oocyst. The oocyst eventually ruptures, and releases sporozoites into the mosquito body cavity. The sporozoites then travel to the mosquito salivary glands.
The sporozoites are released into bloodstream of the human host when the mosquito next feeds (the exo-erythrocytic cycle). The sporozoites then rapidly invade the hepatocytes of the liver. For P. falciparum, in the next 6-9 days the sporozoites develop into trophozoites, and then undergo multiple rounds of nuclear division forming a schizont, a single cell containing multiple nuclei without cell segmentation. After several rounds of nuclear division, the cell segments to form thousands of merozoites. Note that in P. vivax, the sporozoites may instead differentiate into hypnozoites, which may lie dormant in the liver, and are the cause of recurrent bouts of malaria (Karunaweera et al. 1992). Several tens of thousands of merozoites may be released from a bursting hepatocyte, and then individually invade red blood cells (erythrocytes; RBCs) entering the erythrocytic cycle (Cowman &
Crabb 2006). Symptoms typically appear 7 – 14 days post sporozoite inoculation. In order to infect the erythrocytes, P. falciparum parasites recognise and bind to the erythrocyte surface receptors, for example, the sialoglycoproteins, glycophorins A and B. The parasites bind to glycophorin surface receptors with glycophorin-binding proteins (GBPs) via tandem binding repeats present on the GBP surface (Perkins 1984; Cowman & Crabb 2006). Having bound to the RBC, the parasite invades the target cell and resides within the parasitophorous vacuole.
3 Figure 1: Overview of the erythrocytic-stage life cycle of Plasmodium falciparum
A.Life cycle of P. falciparum. Main stages of development are indicated.
B.Stages of the erythrocytic cycle of P. falciparum. Approximate time post-inoculation for the stages of development are indicated.
Illustrations are referenced from microscopy images of infected erythrocytes (White 2008; Alano 2007).
4 In the erythrocytes, the merozoites differentiate into the (immature) trophozoites. This is often called the ‘ring’ stage due to the morphology seen on a blood smear (Neva & Brown 1994). This is a highly metabolic stage, where the trophozoite enlarges for the next 12 hours, ingesting the erythrocyte cytoplasm and proteolysing the host haemoglobin (Francis et al.
1997). The mature trophozoites then differentiate further into a schizont. The schizonts produce a large number of merozoites (32 nuclei per schizont). The released merozoites then invade additional erythrocytes (Neva & Brown 1994).
Note that during the early erythrocytic stage (approximately 6 hours post-erythrocyte infection) some merozoites may differentiate into male and female gametocytes. These gametocytes reside within the erythrocytes until taken up again by the mosquito vector, and so complete the life cycle (Alano 2007). The mechanisms behind the commitment to sexual reproduction are unknown (Winzeler 2009).
The erythrocytic stage of the Plasmodium falciparum occurs over approximately 48 hours, from invasion to the bursting of the infected erythrocyte. The development into the ring stage occurs approximately 6 hours post-erythrocyte infection, the trophozoites mature at approximately 24 hours, the schizont forms at approximately 40 hours (Neva & Brown 1994).
1.1.3 Pathology of Plasmodium falciparum
The symptoms of malaria caused by P. falciparum differ from malaria caused by other Plasmodium species. As with all malaria infections, classic symptoms include paroxysm, a cyclical onset of sudden coldness followed by fever and sweating (Greenberg & Lobel 1990). Typically, paroxysms occur synchronously with the merozoite release in the intraerythrocytic stages. In P. vivax/ovale infections, this occurs every two days, and every
5 three days for P. malariae. P. falciparum infection may not have the classic paroxysms, and instead exhibit a continuous fever (Neva & Brown 1994).
In the determination of the severity of falciparum malarial infection, a primary determinant is the expression of proteins which are exported to the surface of infected erythrocytes, (Miller et al. 2013). The proteins expressed include the knob-associated histidine rich protein (KAHRP) and P. falciparum erythrocyte membrane protein 2 and 3 (PfEMP2, PfEMP3), which form a layer directly beneath the cell membrane, inducing a protruding ‘knob’ in the area by interacting with the host cellular structures and proteins (Lanzer et al. 1993). KAHRP proteins act as a platform for the presentation of the P. falciparum erythrocyte membrane protein 1 (PfEMP1). This presentation of knobs and associated structures are a primary determinant of cytoadherence and severe malaria (Lanzer et al. 1993). The knobs cause infected red blood cells (iRBCs) to cytoadhere both to one another as well as other cells.
Cytoadherence of iRBCs to non-infected RBCs as well as endothelial and other intravascular cells of the body prevents the clearance of the iRBCs by the spleen, and facilitates the positioning of infected cells to regions of optimum parasite growth (Buffet et al. 2010).
These infected cells become sequestered in the micro-vascular structures of organs such as the heart, lungs and brain, and placenta, separate from peripheral blood circulation (Khoury et al. 2014). The consequence of this accumulation and sequestration of cells in organs is the onset of severe disease state of the human host, such as cerebral malaria, the outcome of which may be fatal comas and seizures (Buffet et al. 2010).
PfEMP1 is encoded by the highly variable (var) genes. P. falciparum possess approximately 60 var genes, although only one is expressed at any one time (Guizetti & Scherf 2013). P.
falciparum infections are detected by the host immune system when the parasites are in free peripheral circulation, and through the detection of iRBCs by the antigenic PfEMP1 variants.
The mechanism of switching the var genes allow the parasite to evade immune attack by the
6 host, preventing long-term immunity and so persist as a chronic infection (Bachmann et al.
2012).
The control of gene expression is not only required for the parasite to alter the erythrocyte displayed antigens rapidly, rather the complex life cycle of P. falciparum as a whole requires strictly controlled and co-ordinated changes in gene expression. Clearly, understanding of the molecular mechanisms underlying the control of gene expression will be a prerequisite to fully understand Plasmodium biology and the molecular basis of Plasmodium pathogenesis.
1.1.4 The Plasmodium genome
The genome of P. falciparum was published in 2002 (Gardner et al. 2002). This has allowed remarkable insight into the parasite, and it is hoped that insights from this resource will drive the knowledge of this parasite forward, not only in facilitating investigations into potential treatments and vaccines, but also into understanding the physiology and pathology of the organism. The P. falciparum genome contains up to 27 million bases, is separated into 14 chromosomes and expresses approximately 5500 genes (Florens et al. 2002).
An interesting feature of the P. falciparum genome is its adenine and thymine nucleotide (A + T) richness, with A + T contributing to an average of 80 -90% of the genome (Gardner et al. 2002). Other Plasmodium species do not have this degree of A+T richness, it being the highest A+T content of any genome sequenced. Several theories have been posited to explain the A+T content, including that the high levels of A+T richness may allow for rapid evolution and recombination, and so allow for increased immune system evasion. Due to the A+T richness it was found that protein sequences have diverged so extremely that they complicate the association of protein identity to hypothetical proteins. Of the ~5300 proteins predicted to be encoded by the genome, 60% do not have sufficient similarity to proteins in
7 other organisms to imply function/identity. This is likely due to the A+T-richness of the genome, and consequentially, the suite of codons available for use.
Proteins in the P. falciparum genome are heavily populated with low complexity regions (Zilversmit, Volkman, DePristo, et al. 2010). It has been found that the P. falciparum proteome is richer in low complexity regions (LCRs) than other organisms studied, with at least one LCR present in 87% of genes. Other organisms average between 65-70% of genes having LCRs. These LCRs also fall into different classes, which may have effects on their function and rate of evolution. There are multiple theories about the function of LCRs. LCRs may be an adaptive mechanism to adjust translation rates (Frugier et al. 2010), may be a method of regulating messenger ribonucleic acid (mRNA) stability, may be involved in generating antigenic diversity through recombination, or may simply be the result of increased recombination rates due to the A+T rich genome, discussed by Zilversmit et al.
(2010).
1.1.5 Gene expression in Plasmodium falciparum
There is evidence to support regulation of gene expression at various levels (reviewed in, Horrocks et al. 2009) including gene activation, post-transcriptional regulation, as well as epigenetic regulation.
Evidence for gene expression regulation arising from control of gene transcription has been shown in whole-genome/proteome studies (Florens et al. 2002; Bozdech, Llinás, et al. 2003).
These studies making use of both microarray and proteomic data, and support a model of expression of genes in a life cycle stage-dependent manner, in which transcription of genes is tuned temporally for the physiological processes that are likely to occur at that stage of development. There is some evidence that this ‘just in time model’ needs revision (Horrocks et al. 2009). For example, it has been found that changes in the environment, such as the
8 stresses endured by temperature changes or glucose starvation, result in global expression changes. This suggests that the parasite is not locked into a specific pattern of gene expression (Horrocks et al. 2009). Further studies have also shown a lag between changes in mRNA transcript and protein levels, as well as a life cycle stage-specific rate of controlled mRNA decay, with increasing mRNA stability across the 48 hour erythrocytic cycle (Shock et al. 2007). Similarly, it has been found that the parasite also produces different sets of ribosomal RNAs (rRNAs) depending on life cycle stage, which is likely to be an additional level of translational control (Gardner et al. 2002). Altogether there is evidence now that there is a deeper level of transcriptional control than the ‘just in time’ model would suggest, and will remain the focus of future research.
The presence of LCRs maybe an additional mechanism used by the organism to regulate translation rate (Frugier et al. 2010). Other organisms may adjust translation rates by altering the codons in a gene to make use of more or less abundant transfer RNAs (tRNAs), and so either speed up or slow translation, but P. falciparum only has one copy of each tRNA- coding gene, suggesting equal numbers of the tRNAs, and so the loss of this method of translational regulation. Instead, by over-utilising a set of tRNAs in an LCR, the translational rate of the region is likely to reduce, and so allow more time for folding of complex secondary structures, such as the alignment of β-strands to form β-sheets (Frugier et al.
2010).
There has been a focus on regulation of transcription through epigenetics, due to the identification of multiple chromatin remodelling proteins (Coulson et al. 2004). Chromatin structure affects gene expression through either allowing the activation of genes through increasing the accessibility to them, or silencing gene expression through silencing of the region. A partial explanation for the stage-specific gene expression seen previously (Florens et al. 2002), is that genes of a shared process may be closely located on the chromosomes,
9 and so share activation by chromatin remodelling. Multiple studies investigating histone modifications have been performed, reviewed in (Hoeijmakers et al. 2012; Guizetti & Scherf 2013; Cui & Miao 2010).
Control of gene expression in P. falciparum at the transcriptional level is supported by the presence of features common to other eukaryotes, as found in early studies (Lanzer et al.
1992a; Lanzer et al. 1992b; Horrocks et al. 1998; Lanzer et al. 1993). Similar to model eukaryotes, transcription is regulated through a bipartite promoter system, consisting of (i) a core promoter region at which RNA polymerase II (RNAPII) forms the pre-initiation complex (PIC) with the general transcription factors (GTFs) and drives transcription from the transcription start site (TSS). Additional regulatory regions control the activity of the core promoter region, including (ii) cis-acting regulatory elements, to which additional specific transcription factors (TFs) may bind. Analogous to other model organisms, the binding of these cis-acting regulatory elements by positive or negative regulators may either support or inhibit the formation of the PIC, and so are correlated with the activity of the promoter and the accumulation of mRNA (Horrocks et al. 2009). Transcription of P.
falciparum genes typically results in monocistronic transcripts which contain intronic and exonic regions, conserved splicing sites (and machinery) as well as 5’ and 3’ untranslated regions (UTRs), which are themselves modified by capping and polyadenylation, respectively (Horrocks et al. 2009).
The formation of the PIC as the key step in transcriptional gene control made it a target for finding the Plasmodium protein orthologues involved. The key components of the PIC machinery were identified by hidden Markov model profile searching in the P. falciparum genome (Bischoff & Vaquero 2010). This search uncovered orthologues of RNAPII and most, but not all, GTFs. These include the TATA binding protein (TBP) and some of the other components of the transcription factor IID (TFIID) system, the TBP-associated factors
10 (TAFs). So far, only TAFs 1, 2, and 7 could be identified in P. falciparum. By contrast metazoans possess at least 13 TAFs. The search also uncovered orthologues for the transcription factors (TFII-) A, B, E, F, and H, and the TBP homologue TBP-like protein (Bischoff & Vaquero 2010). Interestingly, the TAFs that have not been found (excepting TAF5) all have histone-fold domains. Similarly, the histone H1, which also contains this motif, is absent in P. falciparum. Identification of other transcription-associated proteins resulted in only approximately 1.3% of the proteins uncovered, about one third of the number found in other free-living eukaryotes (Coulson et al. 2004). This has led to the suggestion that control of gene expression may arise predominantly from post-transcriptional and epigenetic regulatory mechanisms.
The A+T richness of the Plasmodium genome may be partly responsible for the paucity of transcription factors found by bioinformatics approaches, due to low sequence homology of Plasmodium genes to other protists. Of the proteins which have been putatively identified as being involved in transcriptional regulation, very few have been fully characterised, either in vitro or in vivo. The Plasmodium Apicomplexan AP2 (ApiAP2) factors are a group of transcription factors common to the apicomplexan group, which have been studied in vitro through EMSA and protein-binding microarray assays. Five of the ApiAP2 factors have also been characterised in vivo. These include factors involved in the function of sporozoites (Yuda et al. 2010), the expression of invasion related genes, var gene silencing, and two factors important for gametocytogenesis (De Silva et al. 2008; Painter et al. 2011; Sinha et al. 2014). Two other transcription factors have been partly characterised, the PfMyb1 protein which binds to several promoters and controls the expression of the associated genes (Gissot et al. 2005), and the PREB protein, which has been shown to bind and activate gene expression at the Pf1-cys-prx cis-element (Komaki-Yasuda et al. 2013).
11 Locating cis-acting recognition elements, as well as the sequences for the core promoter has been particularly challenging, in part due to the A+T content of the genome (~90%
intergenic), but also due to technical challenges in the mapping of TSSs (Wakaguri et al.
2009; Horrocks et al. 2009; Brick et al. 2008). Several studies have attempted to identify these regions through bioinformatics (Young et al. 2008; Jurgelenaite et al. 2009; Brick et al. 2008), however, the sequences obtained sometimes conflict, and without biochemical investigation cannot be confirmed.
Although several of the general transcription factors and RNAPII have been identified in silico, there is still very little information available regarding the function of this crucial transcription apparatus. Without this information the understanding of transcriptional regulation in Plasmodium, as well as understanding of the characteristics of the unique genome cannot be uncovered.
1.1.6 The RNA polymerase II pre-initiation complex
The following description of the formation of the PIC and the components thereof are thoroughly reviewed in Thomas & Chiang, (2006). The formation of the PIC occurs at the core promoter region surrounding the TSS (designated as +1 on the sense strand) Figure 2B.
These core promoter elements have been identified in other eukaryotes, such as humans and yeast (Decker & Hinton 2013). They include the TATA box, a conserved region of sequence TATAWAAR (in metazoans), which is occasionally present in the core promoter, 25-30 nucleotides upstream of the TSS, and to which TBP binds. In metazoans and yeast, the PIC is nucleated by the binding of TBP, as part of the TFIID complex, to the core promoter region. The binding of the TFIID protein-complex is an induced fit mechanism, where the binding of DNA by TBP and insertion of the conserved phenylalanine residues leads to bending of the DNA, causing torsional stress on the DNA as well as a conformational shift in the TFIID complex itself (Cianfrocco & Nogales 2013). The PIC then forms in a step-
12 wise fashion. The binding of TFIIA stabilises TBP-DNA binding, and competes with proteins which inhibit TBP DNA-binding and/or function (Kokubo et al. 1998; Grünberg &
Hahn 2013). TFIIB binds next, and enhances TBP binding to the TATA box (Zhao & Herr 2002; Imbalzano et al. 1994). TFIIB then assists in the recruitment of TFIIF and RNAPII (Chen & Hampsey 2004; Kostrewa et al. 2009; Bushnell et al. 2004; Tubon et al. 2004), and
Figure 2: Simplified model of the core promoter and RNA polymerase II pre-initiation complex (PIC) assembly.
A. The PIC is initiated by (i) the binding of TBP as part of the TFIID complex (not shown) to the TATA-box (pink region) which causes a bend in the DNA. (ii) TFIIA and TFIIB bind to TBP and stabilise the association to DNA, and to recruit (iii) RNAPII and the rest of the GTFs. The DNA around the transcription start site is melted and transcription is initiated.
B. A simplified illustration of gene regulatory regions present in eukaryotes. The core promoter contains the transcription start site (TSS) and core promoter elements, such as the TATA box (recognised by TBP) and B-recognition elements up- (u) and downstream (d) of TATA, recognised by TFIIB. PIC assembly at the core promoter is regulated by proximal and distal (enhancer) promoter regions that contain binding sites for sequence-specific DNA-binding transcription activator or repressor proteins.
13 finally TFIIE and TFIIH. The binding of additional GTFs causes the DNA to melt, promoter opening, and transcription to occur (Figure 2A). The TATA box is not ubiquitous in the core promoter, and other core promoter elements may be present, both in tandem with TATA, or without (Thomas & Chiang 2006). The combination of core promoter elements differ between genes, and none have been found to be indispensable, although some combinations do occur more frequently than others.
The core promoter may contain an initiator (Inr) sequence, which surrounds the TSS, and is bound by TAFs 1/2, a downstream promoter element (DPE) 28-34 nucleotides downstream of the TSS (in Drosophila) bound by TAFs 6/9 (Thomas & Chiang 2006). There may also be a motif ten element (MTE, at +18 to +29) which functions in tandem with the Inr region, and may substitute for a missing TATA box and/or DPE (in conjunction with the Inr region).
The downstream core element (DCE) is present (as three sub-elements at +6 to +11, +16 to +21, and +30 to +34) but mutually exclusive with the DPE (located at +23 to +34) interact with TAFs 1 and 6 respectively (Thomas & Chiang 2006). TFIIB stabilises the TBP-DNA interaction through the recognition of elements in the core promoter, the B-recognition elements up and downstream of the TATA box (BREu and BREd, -38 to -32 and -23 to -17 respectively, in yeast) (Lagrange et al. 1998; Deng & Roberts 2005). The presence or absence of these BREs also selects for the mutually exclusive recruitment of either TFIIA or NC2 (respectively) (Deng et al. 2009).
1.1.1 The general transcription factors TFIID
TFIID is a multi-protein complex, composed of TBP and the TAFs. TBP is the predominant DNA-binding component of TFIID and recognises the TATA-box (in core promoters with
14 this element) while the TAFs recognise other core promoter elements, (Thomas & Chiang 2006; Juven-Gershon et al. 2008).
TBP (and its homologues/orthologues) consist of a very highly-conserved carboxyl- (C-) terminal domain, with a large and variable amino- (N-) terminal domain, to which no structure has been prescribed (Akhtar & Veenstra 2011). The C-terminal domain, or core domain, is responsible for DNA binding. The core is saddle-shaped, with a concave and positively charged surface which makes contact with the DNA, while the convex surface makes contact with other proteins/factors (Figure 3). The functional role of the variable N- terminal is not well characterised. Early in vitro studies suggested that the domain is required for TATA-dependent transcription, (Lescure et al. 1994), although others have shown no loss of cell-growth or transcription in mice with homozygous deletions of the N-terminus (Schmidt et al. 2003). In mice, the loss of the N-terminus has only been shown to lead to rejection of implanted embryos in females, while males were seemingly unaffected (Hobbs
Figure 3: Tertiary structure of human TATA-box binding protein (core region).
Concave region formed by β-sheets (orange) which interact with DNA. Phenylalanine residues highlighted in red. Convex region formed by α-helices in purple. Structure model shown was prepared using PyMol based on the PDB entry 1CDW (Nikolov et al. 1996).
15 et al. 2002). The expansion of the glutamine repeats in the region leads to spino-cerebellar ataxia in humans (van Roon-Mom et al. 2005), although the mechanism is not understood.
P. falciparum TBP (PfTBP) was cloned before the bioinformatics identification of the general transcription factors (McAndrew et al. 1993), but has only been partially characterised thus far in a single study (Ruvalcaba-Salazar et al. 2005). In this study, the core region of PfTBP was recombinantly expressed and N-terminally tagged with glutathione-S-transferase (GST). The DNA-binding sites of the protein was examined by electrophoretic mobility shift assay (EMSA) and DNase I foot printing assay to the kahrp and gbp-130 genes. This study was successful in mapping the PfTBP binding to TATA-like sequences upstream of the TSS, at -81 and -186 nucleotides upstream from the TSS of the kharp and gbp-130 genes respectively. This conservation of the TATA-locality of TBP binding is remarkable, given the A+T rich genome, and the prevalence of cryptic TATA- like elements. Interestingly, Brick et al. (2008) suggest the presence of physiochemical signals in the region of the TSS, identified computationally, may be used to identify the core- promoter elements.
TBP-related factors.
Although TBP is the most studied of the proteins in the family, there exist several TBP- related factors (TRFs), reviewed extensively by Akhtar & Veenstra (2011).
The discovery of TRFs at different times and in different organisms, with different numbers of TRFs, and combined with difficulty in establishing homology of these TRFs, has led a fair amount of confusion in the naming of the TRFs. The first TRF was discovered in Drosophila, and was named TRF1. This TRF1 is not found in vertebrates. Metazoans may have another TRF, in Drosophila named TRF2, as it is in humans, but may alternately be named the TBP-related protein (TRP) or TBP-like factor (TLF) or the TBP-like protein
16 (TLP) or TBP-like protein 1 (TBPL1). TLP is the name of the orthologue that will be referred to in this thesis. Another TRF is present in vertebrates, named alternately TBP2 or TRF3, or TBPL2 (Zhang et al. 2001).
The functional roles of the TRFs are less well characterised than TBP. It has been suggested that the role of the TRFs is to provide an additional method of gene transcriptional regulation.
In this model TBP-like protein of TFIID is switched for TBP and allows expression from tissue specific promoters (Hochheimer & Tjian 2003).
TBP2 is the most similar of the TRFs, with ~90% similarity between the human paralogs over the core domain. TBP2 is known to be able to associate to the TATA box, and may replace TBP at certain genes (Bartfai et al. 2004; Jallow et al. 2004). For example, TBP2 of Xenopus laevis has been shown to be expressed predominantly in the oocytes, as an alternative to TBP (Akhtar & Veenstra 2009). TBP2 in mice gametogenesis appears to complement TBP, seemingly activating alternative genes to TBP (Gazdag et al. 2007). TRF1 from Drosophila appears to function predominantly with RNA polymerase III transcription (Akhtar & Veenstra 2011).
TLP is a more distant paralogue to TBP and TBP2 (Akhtar & Veenstra 2011), with lower genetic similarity than the relationship between TBP2 and TBP (over the core domain). It has not been demonstrated to have affinity for the TATA box (Moore et al. 1999), however TLP does appear to be able to bind both TFIIA and TFIIB, as well as the TFIIA-like factor (ALF). TLP has been shown to act as both a transcriptional inhibitor as well as an activator, (Moore et al. 1999). Interestingly, in a study of the TATA-less promoter of deoxynucleotidyl transferase, an insertion of a TATA-element removed the ability of TLP to activate transcription, (T. Ohbayashi et al. 2003). A dependence on TLP by some embryogenesis genes, complementary to TBP, has been seen in TLP knockouts of X. laevis, (Veenstra et al.
17 2000). Similar results have been seen for embryogenesis in C. elegans, (Dantonel et al.
2000). Knockdown studies in mice have shown that TLP does not appear to have a role in embryogenesis, but is important for spermatogenesis (Martianov et al. 2001). In these knockout mice, females had no phenotype and were fertile, while males were unable to complete sperm production, due to inhibition of the elongation of the spermatids.
In the identification of GTFs in P. falciparum, a candidate gene named TFIID-like was found, (Bischoff & Vaquero 2010) (accession number PF3D7_1428800.2 on PlasmoDB, www.plasmodb.org). In Plasmodium there appears to be only a single TBP-like protein (hereafter referred to as Plasmodium falciparum TLP or PfTLP). This protein has not been characterised thus far, but may hold an intriguing avenue of exploration for the characterisation of transcription initiation. Notably, and similarly to other TLPs described thus far, proteomic and transcriptomic data suggests upregulation of the PfTLP gene/protein during the sexual development stages, such as gametocytogenesis (Aurrecoechea et al.
2009). This is in contrast to PfTBP expression which appears to be either downregulated or expressed at lower levels than PfTLP during these stages. Should PfTLP be involved in the transcription of specific genes, this may provide interesting and possibly useful insights into transcription initiation, and potentially future drug targets.
TFIIB
TFIIB is has been implicated in many aspects of transcription, both in the initiation as well as in the elongation of the RNA strand (Kostrewa et al. 2009). As TBP induces a dramatic bend in the DNA, TFIIB binds to TBP and strengthens the association of the complex to DNA through its interactions with the BREu and BREd regions up- and downstream of the TATA box. TFIIB also recruits RNAPII, and assists in the identification of the transcription start site (Imbalzano et al. 1994; Zhao & Herr 2002; Nikolov et al. 1995; Lagrange et al.
1998; Chen & Hahn 2004; Chen & Hahn 2003; Kostrewa et al. 2009).
18 Structurally, TFIIB is composed of an N-terminal region which binds and directs RNAPII to the TSS (Figure 4; Tubon et al. 2004; Kostrewa et al. 2009). The region in the N-terminus, which binds to RNAPII has been termed the B-ribbon, and consists of a zinc-finger domain which is conserved in eukaryotic and archaeal bacterial TFIIB proteins (Qureshi & Jackson 1998; Gietl et al. 2014). Adjacent to the B-ribbon is the B-reader which contains a helix- loop structure, thought to assist in the localisation of RNAPII to the TSS, through the identification of Inr sequence motifs.
The N-terminal region also regulates transcription through a highly conserved region consisting of multiple charged residues, called the charged cluster domain. TFIIB may exist in either a closed state, where intramolecular interactions between the N and C terminals appear to prevent transcription initiation (but not the formation of the PIC), or the open state where initiation is possible (Elsby & Roberts 2004; Glossop et al. 2004; Zhang et al. 2000).
The C-terminal or core domain of the TFIIB molecule is composed of two cyclin-like repeats, made up of 5 α-helices, and connected by a linker region which contains an
Figure 4: Tertiary structure of TFIIB.
The structure of the N-terminus of ScTFIIB (left, (Kostrewa et al. 2009)) is composed of a zinc-finger domain, a B-ribbon made up of anti-parallel β-sheets (orange), the B-reader composed of an α-helix (purple) and a loop region, and the B-linker composed of a β-strand and α-helix. The C-terminal cyclin- like repeats of human TFIIB (right, (Nikolov et al. 1995)) are each composed of four amphipathic α- helices surrounding a core α-helix, separated by a linker region.
Structure models shown were prepared using PyMol based on PDB entries 3K1F.M (Kostreva et al.
2009) and 1VOL (Nicolov et al. 1995).
19 additional α-helix (Figure 4). This linker region has been shown to form a cleft, into which TBP binds to TFIIB, while the α-helix region interacts with and stimulates the activity of RNAPII (Xin Liu, David A. Bushnell, Dong Wang, Guillermo Calero 2010; Tsai & Sigler 2000; Nikolov et al. 1995).
TFIIB is theorised to be involved in the transition between initiation and elongation of transcription (Kostrewa et al. 2009). TFIIB assists TBP in the melting of the DNA, opening a bubble in the DNA and exposing this DNA to RNAPII. Whilst bound to RNAPII after the formation of the DNA bubble, the B-linker is then located in a position to stabilise the transcription bubble, while the B-reader is able to locate the Inr, and localise RNAPII to the TSS. The RNA-strand grows as a DNA-RNA hybrid, and after growth beyond seven nucleotides, TFIIB is released and the elongation complex is formed (promoter escape).
The identification of GTFs in P. falciparum led to identification of a TFIIB orthologue (Bischoff & Vaquero 2010). Given the importance of TFIIB in the formation of the PIC, through interactions with various GTFs and RNAPII, the characterisation of the molecule is predicted to be fundamentally important for the characterisation of transcriptional initiation in the parasite.
The aims of this study
Currently, a major focus of the work in the research group is to identify the DNA sequences which direct the formation of the pre-initiation complex, and so direct transcription in Plasmodium falciparum. Studies which have utilised bioinformatics approaches to identify TSSs to map and predict promoter regions and regulatory sequences have yielded largely inconclusive results due to the A+T-richness of the P. falciparum genome.
In the formation of the PIC, the P. falciparum GTFs must localise to the gene promoter regions. It may, therefore, be assumed that by characterising the binding specificity of GTFs
20 specifically that of PfTBP/PfTLP in conjunction with PfTFIIB and PfTFIIA, will provide an important starting point to identifying P. falciparum promoter regions, and subsequently the mechanisms behind P. falciparum transcriptional regulation.
Currently, the expression and characterisation of PfTBP and PfTFIIA are being investigated by another student in the research group (Robert Milton). TFIIB is known to be important for promoter recognition, discussed above, through the identification of the BREs. The possibility that PfTFIIB performs a similar role in P. falciparum promoters must be investigated. Additionally, the potential of PfTLP to regulate gene expression as an alternative to PfTBP at certain genes, and hence, recognise alternative promoter sequences to PfTBP is a possibility that must be explored. Canonically, TBP is known to bind to the TATA-box with high affinity. It is the focus of this research group to answer the question of how the P. falciparum TBP orthologue is able to recognise a TATA-box, in a veritable sea of A+T.
The specific aims of this research project were to investigate whether epitope-tagged PfTFIIB and PfTLP can be expressed in a soluble form in E. coli BL21-CodonPlus RIL cells and, if so, optimise expression conditions for maximal yield. Establish the affinity- purification of PfTFIIB and PfTLP. Finally to establish an investigation into the potential DNA-binding properties of PfTFIIB and PfTLP.
The aims of this research project form part of current research activities in the lab, that aim to make use of recombinant P. falciparum GTFs to determine the sites of PIC assembly, both in previously characterised putative P. falciparum promoters, and through a systematic evolution of ligands by exponential enrichment (SELEX) strategy (Ogawa & Biggin 2012).
21
Chapter 2
Materials and Methods
Bioinformatics analysis of protein structure and function 2.1.1 Multiple sequence alignments
Gene sequences used for multiple sequence alignments:
The TBP/TRF and TFIIB protein sequences used in the multiple sequence alignments are listed in Table 6 in the Appendix 5.1, and were retrieved from the National Centre for Biotechnology Information (NCBI) online database (www.ncbi.nlm.nih.gov/). Plasmodium protein sequences were retrieved from the PlasmoDB database (Aurrecoechea et al. 2009) with the exception of P. reichenowi, retrieved from NCBI.
Multiple sequence alignments (MSA) were performed using Clustal Omega, hosted on the Analysis Tool Web Services from the EMBL-EBI (Larkin et al. 2007; McWilliam et al.
2013; Sievers et al. 2011), on the default settings. As well as the Constraint-based Multiple Protein Alignment Tool (COBALT, www.ncbi.nlm.nih.gov/blast; Papadopoulos & Agarwala 2007), on the default settings.
Phylogenetic analysis of proteins performed in MEGA6 program (Tamura et al. 2013), using the Minimum Evolution method on the default settings: Interior-branch test (bootstrap) with 500 iterations; Substitution model: Poisson model; Uniform rate among sites. Tree Inference Options: Close-Neighbour-interchange; initial tree by Neighbour-joining.
2.1.2 Domain identification in protein sequences
BLASTp as well as Domain Enhanced Lookup Time Accelerated (DELTA)-BLASTp (Altschul et al. 1997; States & Gish 1994) were performed on the full-length PfTLP and PfTFIIB translated protein sequences (Appendix 5.1) with the default parameters. Conserved
22 domain information was determined from the results of the Conserved Domain Database search (Marchler-Bauer et al. 2009; Marchler-Bauer & Bryant 2004; Marchler-Bauer et al.
2011).
2.1.3 Secondary and tertiary structural prediction of proteins
Phyre (version 2.0; Phyre2) was used to predict the secondary and tertiary structures in the PfTLP and PfTFIIB, using the Intensive modelling mode (Kelley & Sternberg 2009).
Visualisation of tertiary protein structures
Visualisation and analysis of protein structures were performed in PyMol v1.6.00 (Schrödinger, LLC 2010) and Chimera v1.8.1 (Pettersen et al. 2004) programs. Solved protein structures used for comparison purposes include: HsTFIIB C-terminus: PDB entry 1VOL (Nikolov et al. 1995); ScTFIIB N-terminus: PDB entry 3K1F.M, (Kostrewa et al.
2009); HsTBP core: PDB entry 1CDW (Nikolov et al. 1996).
Vectors and primers used for gene cloning and sequencing 2.2.1 Vectors used for gene cloning
The plasmid vectors for the cloning and expression of the Plasmodium proteins PfTLP and PfTFIIB were derived from the commercially available pET11d vectors (Novagen). The pET11d expression vectors contain the β-lactamase gene which confers ampicillin resistance. The vectors used in this study are summarised in Table 9 in the Appendix 5.4, as are the vector diagrams. The NdeI and BamHI restriction enzyme sites of the multiple cloning sites in the vectors were used for cloning in this study. The pET11d-6His-HsTBP vector was obtained from Dr T. Oelgeschläger, and this served as the vector frame for the cloning of pET11d-6His-PfTLP. The pET11d-GST-6His vector was constructed by Chenjerai Muchapirei, and served for the cloning of PfTLPco.
23 2.2.2 Primers used for gene cloning and sequencing
The primers used for cloning and sequencing of the PfTLP open reading frame (ORF) are listed in Table 1. Primers for cloning were designed in order to overlap the N-terminus and C-terminus of the target gene. Restriction enzyme (endonuclease) sites were incorporated in the primers for insertion of the PCR product into digested vector frame. Primers for cloning were designed by hand in SerialCloner (Version 2.6.1, SerialBasics), melting temperatures and self-complementarity of the primers was checked using OligoCalc (www.basic.northwestern.edu/biotools/oligocalc.html).
Table 1: List of primers for cloning and sequencing
Endonuclease restriction sites are shown underlined, start codons are bold, and the stop codons are in italics.
Number Primer Name 5' to 3' sequence Restriction
Site Gene
1 PfTLP-FWD GGCAGCCATATGTATCCCCCTT
GTAAAAAGAAAAAAC NdeI PfTLP
2 PfTLP-REV GCAGCCAGATCTCTATTAATGTT
GCGATTTACTTTTAATTAAATAT GG
BglII PfTLP
3 PfTLP-IntFWD GTCCCCGTCACTTTAAGTAC N/A PfTLP
4 PfTLP-IntREV ATGTCCACTTTGCTTTTATCCT N/A PfTLP
5 PfTLPco-FWD TTATAAGGCATATGTATCCGCC
GTGTAAA NdeI PfTLPco
6 PfTLPco-REV TATGGATCCTTATCAGTGTTGGC
TTTTAC BamHI PfTLPco
7 PfTLPco-IntFWD GTGCCGGTTACCCTGAGTACG N/A PfTLPco
8 PfTLPco-IntREV CTTTGTTGTCGTTAGATTTGTTTT
CATCGTTC PfTLPco
9 T7-FWD TAATACGACTCACTATAGG N/A
10 pET11d-REV GTCAGGCACCGTGTATGAAA N/A
11 GST-FWD ACAAATTGATAAGTACTTGAAA
TCCA
24 Cloning and sequencing of Plasmodium TLP
2.3.1 Agarose gel electrophoresis
Electrophoresis of DNA was performed using 0.8-2% (w/v) agarose containing 0.02µg/ml ethidium bromide (EtBR). Gels were prepared using either 0.5×TBE (40mM Tris-Cl pH 8.3, 45mM boric acid, 1mM Ethylenediaminetetraacetic acid [EDTA]) or 1×TAE (40mM Tris-Cl pH 8.3, 0.11% (v/v) glacial acetic acid, 1mM EDTA) buffer. Electrophoresis was conducted at 80-100volts until the DNA was sufficiently separated. DNA bands were visualised by employing a long wavelength (365 nm) UV trans-illuminator.
2.3.2 Polymerase chain reactions (PCRs)
PCR reactions (Sambrook et al. 1989) were performed using KAPA Taq PCR Kits making use of Buffer A (Kapa Biosystems), following the manufacturer’s instructions, in 50µl reaction volumes. dNTPs for the reactions were supplied by Thermo Scientific.
Amplification of the PfTLP and PfTLPco genes, with their respective primers, made use of the same reaction cycling conditions (Table 9). The PCR reaction profile was adjusted to include a long and lower elongation temperature due to the A+T richness of the gene sequences (Su et al. 1996).
PCR amplification of the open reading frame for Plasmodium TLP was performed using 2µl of each of the cDNA libraries MRA-296 and 297 (MR4, ATCC® Manassas Virginia;
contributed by D. Chakrabarti).
25 A synthesised PfTLP gene, optimised for expression in E. coli was ordered from GenScript, (www.genscript.com/) termed ‘PfTLP codon optimised’ or PfTLPco gene, and was supplied in a pUC57 vector as a lyophilised powder. The vector was reconstituted in reverse osmosis purified water (RoH2O; Milli-Q purified, Millipore) to a final concentration of 200ng/µl.
5.5fmol of the vector was used per PCR reaction.
All PCR products were analysed by agarose gel electrophoresis, and then purified using either the MinElute™ Reaction Cleanup Kit (Qiagen) or QIAquick PCR Purification Kit (Qiagen), following the manufacturer’s instructions. Purified DNA concentrations were determined with a NanoDrop 2000 spectrophotometer (‘Nanodrop’, Thermo Scientific).
2.3.3 Restriction enzyme digestion and ligation of PCR amplified inserts The original PfTLP ORF purified PCR product was simultaneously digested with NdeI and BglII restriction enzymes in 2×tango buffer (Thermo Scientific) to cleave the N and C- termini respectively, leaving ‘sticky’ ends compatible with the expression vector. The digested PCR product was then gel isolated after agarose gel electrophoresis, and purified with the QIAEX II Gel Extraction Kit (Qiagen) following the manufacturer’s protocol, and the concentration determined by Nanodrop and confirmed by agarose gel electrophoresis with a calibrated molecular weight marker.
Table 2: PCR cycling conditions for PfTLP gene amplification
Duration Temperature (°C) Number of Cycles
10 min 95 1
30 sec 90 25
30 sec 58
90 sec 60
10 min 60 1
10 min 72 1
26 The PfTLPco purified PCR product was digested with NdeI restriction enzyme in NdeI buffer (New England Biolabs) and purified with the MinElute Reaction Cleanup Kit (Qiagen). The purified DNA was then digested with BamHI restriction enzyme in BamHI buffer (Thermo Scientific), and purified with the MinElute Reaction Cleanup Kit. All procedures following the manufacturer’s instructions. The concentration of the purified insert DNA was determined by Nanodrop and confirmed by agarose gel electrophoresis with a calibrated molecular weight marker.
2.3.4 Preparation of pET11d-vectors for cloning Plasmid isolation
In order to generate sufficient vector frame for the cloning of the PfTLP and PfTLPco genes, either One Shot® Stbl3™ or TOP10 Chemically Competent E. coli cells (Stbl3, TOP10, Invitrogen) were transformed with the appropriate vector (pET11d-6His-HsTBP or pET11d- GST-6His), according to the manufacturer’s instructions. Transformants were spread-plated on lysogenic agar plates (LA; 1% w/v tryptone powder, 0.5% w/v yeast extract, 0.5% w/v NaCl, 1.5% w/v agar, pH adjusted to 7.5) containing 100µg/ml ampicillin, and grown overnight (8-12 hours) at 37°C. The transformed cells were then used to inoculate 10ml lysogenic broth (LB; 1% w/v tryptone powder, 0.5% w/v yeast extract, 0.5% w/v NaCl, pH adjusted to 7.5) cultures containing 100µg/ml ampicillin, and grown overnight at 37°C with shaking. Plasmid DNA was isolated from the cultures making use of the GeneJET Plasmid Miniprep Kit (Thermo Scientific), following the manufacturer’s instructions. The concentration of purified plasmid DNA (pDNA) was determined by Nanodrop, and confirmed by linearizing the DNA by restriction enzyme digestion, and agarose gel electrophoresis with a calibrated molecular weight marker.
27 Vector restriction enzyme digestion
Purified plasmid DNA was double digested with BamHI and NdeI in 2×Tango buffer (Thermo Scientific) and analysed by agarose gel electrophoresis. The cleaved vector frame was dephosphorylated with FastAP Thermosensitive Alkaline Phosphatase (FastAP, Thermo Scientific) and the sample heat inactivated at 75°C for 10 minutes. The cleaved vector frame was then isolated after agarose gel electrophoresis, and purified with the QIAEX II Gel Extraction Kit. The concentration of purified plasmid vector frame was determined by Nanodrop, and confirmed by agarose gel electrophoresis with a calibrated molecular weight marker. All procedures were performed following the manufacturer’s instructions.
2.3.5 Isolation of PfTLP expressing pET11d-vectors Ligation, plasmid isolation identification of succefuly cloned vectors
Digested insert and vector frame were ligated by incubation with T4 DNA Ligase (Thermo Scientific), insert:vector ratios of 1:1 and 3:1 respectively, in 20µl volumes. The ligation mixes were incubated for 2 hours at room temperature. Of the ligation mixes, 1/10 volume of each ligation mix was then transformed into 25µl-100µl of either Stbl3 or TOP10 Chemically Competent E. coli cells, following the manufacturer’s instructions. The transformed cells were then spread-plated on ampicillin containing lysogenic agar (LA) plates, and incubated overnight at 37°C.
Colonies successfully transformed with insert-containing vectors were identified by colony PCR. In this assay, a colony is lightly touched with a sterile pipette tip. The tip is dipped into 2µl of RoH2O, and heated to 90°C for 5 minutes. This is then used in the PCR reaction (described above) with the appropriate forward and reverse primers, and the cycling conditions in Table 2. PfTLP-FWD and PfTLP-REV primers were used to identify transformants containing pET11d-6His-PfTLP and pET-11d-GST-6His-PfTLP vector, and PfTLPco-FWD and PfTLPco-REV primers for transformants containing the pET-11d-GST-
28 6His-PfTLPco vector (Table 1). 1/5 volume of the PCR reactions were analysed by agarose gel electrophoresis, and vector DNA isolated from positively identified colonies, as described above. The sequences of the vectors were then confirmed by DNA sequence analysis (Stellenbosch Central Analytical Facility, Stellenbosch, South Africa) on an ABI3730xl DNA analyser (Applied Biosystems, Foster City, USA). The pET11d-6His- PfTLP and pET11d-GST-6His-PfTLP vectors were sequenced with the primers 3, 4, 9, and 10; pET11d-GST-6His-PfTLPco vector was sequenced with the primers 7, 8, 10 and 11 (Table 1) to ensure full sequence coverage. Sequence data was analysed using Chromas software (Version 2.01, Technelysium Pty Ltd, Queensland, Australia), and SerialCloner.
2.3.6 Transformation of protein expression vectors into protein expressing cells
Note that the cloning of the pET11d-GST-6His-PfTLPco was initially unsuccessful, and a vector was isolated with an additional two codons in between the 6His-tag and PfTLPco ORF. This vector, termed pET11d-GST-6His-PfTLPco(aa), was used in expression trials until the correct pET11d-GST-6His-PfTLPco vector could be cloned.
The pET11d-6His-PfTLP, pET11d-6His-PfTFIIB (cloned in the lab, and provided by Dr Thomas Oelgeschläger), pET11d-GST-6His-PfTLP, pET11d-GST-6His-PfTLPco and pET11d-GST-6His-PfTLPco(aa) vectors were transformed into BL21-CodonPlus® (DE3)- RIL competent cells (Agilent Technologies, Stratagene 2005) for expression, following the manufacturer’s instructions. Transformants were spread plated onto either LA plates or Super Optimal Broth with Catabolite Repression (SOC; 2% w/v tryptone powder, 0.5% w/v yeast extract, 10mM NaCl, 2.5mM KCl, 10mM MgCl2, 20mM glucose, 1.5% w/v agar) plates, containing 50µg/ml chloramphenicol and 100µg/ml ampicillin, and incubated overnight at 37°C.