EXTENDED EXPERIMENTAL DETAILS - Regulation in Escherichia coli

Extended details of experimental design Choosing target genes for Reg-Seq

Genes in this study were chosen to cover several different categories. 29 genes had some information on their regulation already known to validate our method under a number of conditions. 37 were chosen because the work of Schmidt, Kochanowski, Vedelaar, Ahrné, et al., 2016 demonstrated that gene expression changed significantly under different growth conditions. A handful of genes such as minC, maoP, orfdhE were chosen because we found either their physiological significance interesting, as in the case of the cell division gene minC or that we found the gene regulatory question interesting, such for the intra-operon regulation demonstrated by fdhE. The remainder of the genes were chosen because they had no regulatory information, often had minimal information about the function of the gene, and had an annotated transcription start site (TSS) in RegulonDB.

Choosing transcription start sites for Reg-Seq

A known limitation of the experiment is that the mutational window is limited to 160 bp. As such, it is important to correctly target the mutation window to the location around the most active TSS. To do this we first prioritized those TSS which have been extensively experimentally validated and catalogued in RegulonDB. Secondly we selected those sites which had evidence of active transcription from RACE experiments (Mendoza-Vargas et al., 2009) and were listed in RegulonDB. If the intergenic region was small enough, we covered the entire region with our mutation window. If none of these options were available, we used computationally predicted start sites.

Reg-Seq Sequencing

The total library was first sequenced by PCR amplifying the region containing the variant promoters as well as the corresponding barcodes. This allowed us to uniquely associate each random 20 bp barcode with a promoter variant. Any barcode which was associated with a promoter variant with insertions or deletions was removed from further analysis. Similarly, any barcode that was associated with multiple pro-

moter variants was also removed from the analysis. The paired end reads from this sequencing step were then assembled using the FLASH tool (Magoc and Salzberg, 2011). Any sequence with PHRED score less than 20 was removed using the FastX toolkit. Additionally, when sequencing the initial library, sequences which only appear in the dataset once were not included in further analysis in order to remove possible sequencing errors.

For all the MPRA experiments, only the region containing the random 20 bp barcode was sequenced, since the barcode can be matched to a specific promoter variant using the initial library sequencing run described above. For a given growth con- dition, each promoter yielded 20,000 to 500,000 usable sequencing reads. Under some growth conditions, genes were not analyzed further if they did not have at least 20,000 reads.

To determine which base pair regions were statistically significant a 99% confidence interval was constructed using the MCMC inference to determine the uncertainty.

Reg-Seq Growth conditions

The growth conditions studied in this study were inspired by (Schmidt, Kochanowski, Vedelaar, Ahrne, et al., 2015) and include differing carbon sources such as growth in M9 with 0.5% Glucose, M9 with acetate (0.5%), M9 with arabinose (0.5%), M9 with Xylose (0.5%) and arabinose (0.5%), M9 with succinate (0.5%), M9 with fumarate (0.5%), M9 with Trehalose (0.5%), and LB. In each case cell harvesting was done at an OD of 0.3. These growth conditions were chosen so as to span a wide range of growth rates, as well as to illuminate any carbon source specific regulators.

We also used several stress conditions such as heat shock, where cells were grown in M9 and were subjected to a heat shock of 42 degrees for 5 minutes before harvesting RNA. We grew in low oxygen conditions. Cells were grown in LB in a container with minimal oxygen, although some will be present as no anaerobic chamber was used. This level of oxygen stress was still sufficient to activate FNR binding, and so activated the anaerobic metabolism. We also grew cells in M9 with Glucose and 5mM sodium salycilate.

Growth with zinc was preformed at a concentration of 5mM ZnCl₂and growth with

iron was preformed by first growing cells to an OD of 0.3 and then adding FeCL₂ to a concentration of 5mM and harvesting RNA after 10 minutes. Growth without cAMP was accomplished by the use of the JK10 strain which does not maintain its cAMP levels.

All knockout experiment were preformed in M9 with Glucose except for the knock- outs forarcA,hdfR, andphoPwhich were grown in LB.

Reg-Seq Constructs

The following mutated constructs were integrated into the plasmid.

fdoH,sdaB,thiM,yedJ,ykgE,sdiA yqhC,yicI,ybjT,mtgA,aphA,bdcR yncD,rumB,yagH,eco,y�G,htrB iap,ygjP,yedK,holC,aegA,rapA dusC,fdhE,dnaE,ycgB,yehS,yeiQ ydhO,hslU,ymgG,rlmA,modE,ycbZ yajL,yecE,ybiP,ybjL,ygdH,pcm rcsF,sbcB,ygeR,mscK,mscS,ynaI ybdG,hicB,arcB,minC,ybeZ,ydjA yggW,acuI,yehU,yehT,ybiO,mscL zapB,waaA-coaD,coaA,yjjJ,groSL,pyrLBI

�g,�f,maoP,poxB,rspA,mscM arcA,tar,dpiBA,araAB,araC,xylF xylA,dicA,dicC,dicB,ompR,xapAB ilvC,asnA,idnK,dinJ,yjiY,motAB

�sK,cra,uvrD,adiY,znuCB,znuA zupT,pitA,ecnB,leuABCD

RplKAJL-rpoBC,yodB,atpI,msyB,ndk,thrLABC

Genes Forward Integration Site Mutated Wild 5’ RNA Sequence

Type Sequence Barcode

TTCGTCTTCACCTCGAGCACGCTTATTCGTGCCGTGTTAT TTCGTCTTCACCTCGAGCACTTTGCTTCAGTCAGATTCGC TTCGTCTTCACCTCGAGCACGTCGAGTCCTATGTAACCGT TTCGTCTTCACCTCGAGCACGTAAGATGGAAGCCGGGATA TTCGTCTTCACCTCGAGCACGGTGTCGCAACATGATCTAC TTCGTCTTCACCTCGAGCACGTGCTAAGTCACACTGTTGG TTCGTCTTCACCTCGAGCACTCTAAACAGTTAGGCCCAGG TTCGTCTTCACCTCGAGCACGTCTTTATACTTGCCTGCCG TTCGTCTTCACCTCGAGCACCACCGCGATCAATACAACTT TTCGTCTTCACCTCGAGCACTTCGGATAGACTCAGGAAGC TTCGTCTTCACCTCGAGCACCCATTGATAGATTCGCTCGC TTCGTCTTCACCTCGAGCACTTTTCTACTTTCCGGCTTGC TTCGTCTTCACCTCGAGCACATGACTATTGGGGTCGTACC TTCGTCTTCACCTCGAGCACTCGACAATAGTTGAGCCCTT TTCGTCTTCACCTCGAGCACGAGCCATGTGAAATGTGTGT TTCGTCTTCACCTCGAGCACCGTATACGTAAGGGTTCCGA TTCGTCTTCACCTCGAGCACTTATGATGTCCGGATACCCG TTCGTCTTCACCTCGAGCACTCTTAGAAATCCACGGGTCC

TACTTTTGATTGCTGTGCCCTATTAGGCTTCTCCTCAGCGCTAGTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN GTTCAATCACTGAATCCCGGTATTAGGCTTCTCCTCAGCGCTCCTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN CAGGGGTCGTCATATCTTCATATTAGGCTTCTCCTCAGCGGGACTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN CACCTCATAGAGCTGTGGAATATTAGGCTTCTCCTCAGCGTCGGTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN CGGTTCCTAGTCATGTTTGCTATTAGGCTTCTCCTCAGCGCCAATCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN TTGTACTAATCTCGTCCCGGTATTAGGCTTCTCCTCAGCGTATCTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN TTATGTTCACAACTGGCGTGTATTAGGCTTCTCCTCAGCGGAGTTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN TGGAACTGATTTGGCCTTTGTATTAGGCTTCTCCTCAGCGAGTATCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN TATAGTTCCTCCCATGCACCTATTAGGCTTCTCCTCAGCGTTGGTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN ACAATAGACAGACCCATGCATATTAGGCTTCTCCTCAGCGGCCTTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN GAGTCGAGCTAGCATAGGAGTATTAGGCTTCTCCTCAGCGAATTTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN TTGTGGGAGCTTCTTACCATTATTAGGCTTCTCCTCAGCGACAATCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN TCGTACGGGAATGACCATAGTATTAGGCTTCTCCTCAGCGTAAATCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN AGACACAACGTAGCCGATTATATTAGGCTTCTCCTCAGCGCGGTTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN CGGACTAAAGGATCGAGTCATATTAGGCTTCTCCTCAGCGGCCATCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN CATCGGATAACACAAAGCGTTATTAGGCTTCTCCTCAGCGGGCCTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN GATGTATACTCCACCGTGGTTATTAGGCTTCTCCTCAGCGCACCTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN TGAGATATGTACCTGGTGCCTATTAGGCTTCTCCTCAGCGATTGTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN

Figure A.1: Promoter constructs for Reg-Seq. We show examples of the constructs generated for the Reg-Seq project. Here we display the different 5’ mRNA

sequences that follow the mutated region for each gene. The mutated region will be 160 base pairs, with 115 base pairs upstream and 45 base pairs downstream of the TSS for that gene listed in Table A.1.

Gene Start Site Transcription Direction fdoH 4085867 rev

sdaB 2928035 fwd thiM 2185451 rev yedJ 2033449 rev ykgE 321511 fwd sdiA 1996867 rev yqhC 3155262 rev yicI 3836664 rev ybjT 909320 rev mtgA 3350504 rev aphA 4269355 fwd bdcR 4474096 fwd yncD 1523276 rev rumB 897947 fwd yagH 285350 fwd eco 2303851 fwd yfhG 2690181 rev htrB 1116709 rev iap 2876547 fwd ygjP 3235915 fwd yedK 2009866 fwd holC 4484273 rev aegA 2585570 rev

rapA 63358 rev

dusC 2230395 rev fdhE 4081359 rev dnaE 197026 fwd ycgB 1237285 rev yehS 2212241 rev yeiQ 2266214 rev ydhO 1734357 fwd hslU 4122354 rev ymgG 1223097 rev rlmA 1907086 rev modE 794644 rev ycbZ 1018330 rev yajL 443748 rev yecE 1950778 fwd ybiP 904523 rev ybjL 889945 rev ygdH 2926272 fwd pcm 2870686 rev rcsF 220022 rev sbcB 2082728 fwd ygeR 2999918 rev mscK 486492 fwd

Gene Start Site Transcription Direction

mscS 3069871 rev

ynaI 1395973 rev

ybdG 604684 rev

hicB 1509221 fwd

arcB 3353049 rev

minC 1226139 rev

ybeZ 693469 rev

ydjA 1848700 rev

yggW 3096620 fwd

acuI 3403446 fwd

yehU 2214673 rev

yehT 2212969 rev

ybiO 845736 rev

mscL 3438001 fwd

zapB 4118427 fwd

WaaA-coaD 3808516 fwd

coaA 4175107 rev

yjjJ 4621716 fwd

groSL 4370616 fwd

pyrLBI 4472553 rev

rplKAJL-rpoBC 4178354 fwd

yodB 2042294 fwd

atpIBEFHAGDC 3922525 rev

msyB 1114213 rev

ndk 2644913 rev

thrLABC 148 fwd

tig 455077 fwd

tff-rpsB-tsf 189712 fwd

maoP 3948058 fwd

poxB 911076 rev

rspA 1655186 rev

mscM 4390638 rev

arcA 4640508 rev

tar 1972716 rev

dpiBA 652172 fwd

araAB 70075 rev

araC 70241 fwd

xylF 3731069 fwd

xylA 3730807 rev

dicA 1647979 fwd

dicC 1647876 rev

dicB 1649597 fwd

ompR 3536707 rev

xapAB 2524910 rev

ilvC 3957912 fwd

asnA 3927129 fwd

Gene Start Site Transcription Direction

idnK 4494597 fwd

dinJ 246533 rev

yjiY 4591397 rev

motAB-cheAW 1977302 rev

ftsK 933138 fwd

cra 87969 fwd

uvrD 3997907 fwd

adiY 4338042 rev

znuCB 1942634 fwd

znuA 1942661 rev

zupT 3182433 fwd

pitA 3637612 fwd

ecnB 4376509 fwd

leuABCD 83735 rev

Table A.1: All TSS for all genes investigated in Reg-Seq.

There are 160 base pairs of mutated sequence for each regulatory region, 115 base pairs upstream and 45 base pairs down stream of the transcription start site. The

TSS location is shown, as well as whether the gene is transcribed in a 5’ to 3’

direction or a 3’ to 5’ direction.

A p p e n d i x B

Dalam dokumen Regulation in Escherichia coli (Halaman 151-157)