Extended details of experimental design Choosing target genes for Reg-Seq
Genes in this study were chosen to cover several different categories. 29 genes had some information on their regulation already known to validate our method under a number of conditions. 37 were chosen because the work of Schmidt, Kochanowski, Vedelaar, Ahrné, et al., 2016 demonstrated that gene expression changed significantly under different growth conditions. A handful of genes such as minC, maoP, orfdhE were chosen because we found either their physiological significance interesting, as in the case of the cell division gene minC or that we found the gene regulatory question interesting, such for the intra-operon regulation demonstrated by fdhE. The remainder of the genes were chosen because they had no regulatory information, often had minimal information about the function of the gene, and had an annotated transcription start site (TSS) in RegulonDB.
Choosing transcription start sites for Reg-Seq
A known limitation of the experiment is that the mutational window is limited to 160 bp. As such, it is important to correctly target the mutation window to the location around the most active TSS. To do this we first prioritized those TSS which have been extensively experimentally validated and catalogued in RegulonDB. Secondly we selected those sites which had evidence of active transcription from RACE experiments (Mendoza-Vargas et al., 2009) and were listed in RegulonDB. If the intergenic region was small enough, we covered the entire region with our mutation window. If none of these options were available, we used computationally predicted start sites.
Reg-Seq Sequencing
The total library was first sequenced by PCR amplifying the region containing the variant promoters as well as the corresponding barcodes. This allowed us to uniquely associate each random 20 bp barcode with a promoter variant. Any barcode which was associated with a promoter variant with insertions or deletions was removed from further analysis. Similarly, any barcode that was associated with multiple pro-
moter variants was also removed from the analysis. The paired end reads from this sequencing step were then assembled using the FLASH tool (Magoc and Salzberg, 2011). Any sequence with PHRED score less than 20 was removed using the FastX toolkit. Additionally, when sequencing the initial library, sequences which only appear in the dataset once were not included in further analysis in order to remove possible sequencing errors.
For all the MPRA experiments, only the region containing the random 20 bp bar- code was sequenced, since the barcode can be matched to a specific promoter variant using the initial library sequencing run described above. For a given growth con- dition, each promoter yielded 20,000 to 500,000 usable sequencing reads. Under some growth conditions, genes were not analyzed further if they did not have at least 20,000 reads.
To determine which base pair regions were statistically significant a 99% confidence interval was constructed using the MCMC inference to determine the uncertainty.
Reg-Seq Growth conditions
The growth conditions studied in this study were inspired by (Schmidt, Kochanowski, Vedelaar, Ahrne, et al., 2015) and include differing carbon sources such as growth in M9 with 0.5% Glucose, M9 with acetate (0.5%), M9 with arabinose (0.5%), M9 with Xylose (0.5%) and arabinose (0.5%), M9 with succinate (0.5%), M9 with fumarate (0.5%), M9 with Trehalose (0.5%), and LB. In each case cell harvesting was done at an OD of 0.3. These growth conditions were chosen so as to span a wide range of growth rates, as well as to illuminate any carbon source specific regulators.
We also used several stress conditions such as heat shock, where cells were grown in M9 and were subjected to a heat shock of 42 degrees for 5 minutes before harvesting RNA. We grew in low oxygen conditions. Cells were grown in LB in a container with minimal oxygen, although some will be present as no anaerobic chamber was used. This level of oxygen stress was still sufficient to activate FNR binding, and so activated the anaerobic metabolism. We also grew cells in M9 with Glucose and 5mM sodium salycilate.
Growth with zinc was preformed at a concentration of 5mM ZnCl2and growth with
iron was preformed by first growing cells to an OD of 0.3 and then adding FeCL2 to a concentration of 5mM and harvesting RNA after 10 minutes. Growth without cAMP was accomplished by the use of the JK10 strain which does not maintain its cAMP levels.
All knockout experiment were preformed in M9 with Glucose except for the knock- outs forarcA,hdfR, andphoPwhich were grown in LB.
Reg-Seq Constructs
The following mutated constructs were integrated into the plasmid.
fdoH,sdaB,thiM,yedJ,ykgE,sdiA yqhC,yicI,ybjT,mtgA,aphA,bdcR yncD,rumB,yagH,eco,y�G,htrB iap,ygjP,yedK,holC,aegA,rapA dusC,fdhE,dnaE,ycgB,yehS,yeiQ ydhO,hslU,ymgG,rlmA,modE,ycbZ yajL,yecE,ybiP,ybjL,ygdH,pcm rcsF,sbcB,ygeR,mscK,mscS,ynaI ybdG,hicB,arcB,minC,ybeZ,ydjA yggW,acuI,yehU,yehT,ybiO,mscL zapB,waaA-coaD,coaA,yjjJ,groSL,pyrLBI
�g,�f,maoP,poxB,rspA,mscM arcA,tar,dpiBA,araAB,araC,xylF xylA,dicA,dicC,dicB,ompR,xapAB ilvC,asnA,idnK,dinJ,yjiY,motAB
�sK,cra,uvrD,adiY,znuCB,znuA zupT,pitA,ecnB,leuABCD
RplKAJL-rpoBC,yodB,atpI,msyB,ndk,thrLABC
Genes Forward Integration Site Mutated Wild 5’ RNA Sequence
Type Sequence Barcode
TTCGTCTTCACCTCGAGCACGCTTATTCGTGCCGTGTTAT TTCGTCTTCACCTCGAGCACTTTGCTTCAGTCAGATTCGC TTCGTCTTCACCTCGAGCACGTCGAGTCCTATGTAACCGT TTCGTCTTCACCTCGAGCACGTAAGATGGAAGCCGGGATA TTCGTCTTCACCTCGAGCACGGTGTCGCAACATGATCTAC TTCGTCTTCACCTCGAGCACGTGCTAAGTCACACTGTTGG TTCGTCTTCACCTCGAGCACTCTAAACAGTTAGGCCCAGG TTCGTCTTCACCTCGAGCACGTCTTTATACTTGCCTGCCG TTCGTCTTCACCTCGAGCACCACCGCGATCAATACAACTT TTCGTCTTCACCTCGAGCACTTCGGATAGACTCAGGAAGC TTCGTCTTCACCTCGAGCACCCATTGATAGATTCGCTCGC TTCGTCTTCACCTCGAGCACTTTTCTACTTTCCGGCTTGC TTCGTCTTCACCTCGAGCACATGACTATTGGGGTCGTACC TTCGTCTTCACCTCGAGCACTCGACAATAGTTGAGCCCTT TTCGTCTTCACCTCGAGCACGAGCCATGTGAAATGTGTGT TTCGTCTTCACCTCGAGCACCGTATACGTAAGGGTTCCGA TTCGTCTTCACCTCGAGCACTTATGATGTCCGGATACCCG TTCGTCTTCACCTCGAGCACTCTTAGAAATCCACGGGTCC
TACTTTTGATTGCTGTGCCCTATTAGGCTTCTCCTCAGCGCTAGTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN GTTCAATCACTGAATCCCGGTATTAGGCTTCTCCTCAGCGCTCCTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN CAGGGGTCGTCATATCTTCATATTAGGCTTCTCCTCAGCGGGACTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN CACCTCATAGAGCTGTGGAATATTAGGCTTCTCCTCAGCGTCGGTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN CGGTTCCTAGTCATGTTTGCTATTAGGCTTCTCCTCAGCGCCAATCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN TTGTACTAATCTCGTCCCGGTATTAGGCTTCTCCTCAGCGTATCTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN TTATGTTCACAACTGGCGTGTATTAGGCTTCTCCTCAGCGGAGTTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN TGGAACTGATTTGGCCTTTGTATTAGGCTTCTCCTCAGCGAGTATCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN TATAGTTCCTCCCATGCACCTATTAGGCTTCTCCTCAGCGTTGGTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN ACAATAGACAGACCCATGCATATTAGGCTTCTCCTCAGCGGCCTTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN GAGTCGAGCTAGCATAGGAGTATTAGGCTTCTCCTCAGCGAATTTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN TTGTGGGAGCTTCTTACCATTATTAGGCTTCTCCTCAGCGACAATCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN TCGTACGGGAATGACCATAGTATTAGGCTTCTCCTCAGCGTAAATCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN AGACACAACGTAGCCGATTATATTAGGCTTCTCCTCAGCGCGGTTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN CGGACTAAAGGATCGAGTCATATTAGGCTTCTCCTCAGCGGCCATCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN CATCGGATAACACAAAGCGTTATTAGGCTTCTCCTCAGCGGGCCTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN GATGTATACTCCACCGTGGTTATTAGGCTTCTCCTCAGCGCACCTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN TGAGATATGTACCTGGTGCCTATTAGGCTTCTCCTCAGCGATTGTCACTGGCCGTCGTTTTACATGACTGACTGANNNNNNNNNNNNNNNNNNNN
Figure A.1: Promoter constructs for Reg-Seq. We show examples of the constructs generated for the Reg-Seq project. Here we display the different 5’ mRNA
sequences that follow the mutated region for each gene. The mutated region will be 160 base pairs, with 115 base pairs upstream and 45 base pairs downstream of the TSS for that gene listed in Table A.1.
Gene Start Site Transcription Direction fdoH 4085867 rev
sdaB 2928035 fwd thiM 2185451 rev yedJ 2033449 rev ykgE 321511 fwd sdiA 1996867 rev yqhC 3155262 rev yicI 3836664 rev ybjT 909320 rev mtgA 3350504 rev aphA 4269355 fwd bdcR 4474096 fwd yncD 1523276 rev rumB 897947 fwd yagH 285350 fwd eco 2303851 fwd yfhG 2690181 rev htrB 1116709 rev iap 2876547 fwd ygjP 3235915 fwd yedK 2009866 fwd holC 4484273 rev aegA 2585570 rev
rapA 63358 rev
dusC 2230395 rev fdhE 4081359 rev dnaE 197026 fwd ycgB 1237285 rev yehS 2212241 rev yeiQ 2266214 rev ydhO 1734357 fwd hslU 4122354 rev ymgG 1223097 rev rlmA 1907086 rev modE 794644 rev ycbZ 1018330 rev yajL 443748 rev yecE 1950778 fwd ybiP 904523 rev ybjL 889945 rev ygdH 2926272 fwd pcm 2870686 rev rcsF 220022 rev sbcB 2082728 fwd ygeR 2999918 rev mscK 486492 fwd
Gene Start Site Transcription Direction
mscS 3069871 rev
ynaI 1395973 rev
ybdG 604684 rev
hicB 1509221 fwd
arcB 3353049 rev
minC 1226139 rev
ybeZ 693469 rev
ydjA 1848700 rev
yggW 3096620 fwd
acuI 3403446 fwd
yehU 2214673 rev
yehT 2212969 rev
ybiO 845736 rev
mscL 3438001 fwd
zapB 4118427 fwd
WaaA-coaD 3808516 fwd
coaA 4175107 rev
yjjJ 4621716 fwd
groSL 4370616 fwd
pyrLBI 4472553 rev
rplKAJL-rpoBC 4178354 fwd
yodB 2042294 fwd
atpIBEFHAGDC 3922525 rev
msyB 1114213 rev
ndk 2644913 rev
thrLABC 148 fwd
tig 455077 fwd
tff-rpsB-tsf 189712 fwd
maoP 3948058 fwd
poxB 911076 rev
rspA 1655186 rev
mscM 4390638 rev
arcA 4640508 rev
tar 1972716 rev
dpiBA 652172 fwd
araAB 70075 rev
araC 70241 fwd
xylF 3731069 fwd
xylA 3730807 rev
dicA 1647979 fwd
dicC 1647876 rev
dicB 1649597 fwd
ompR 3536707 rev
xapAB 2524910 rev
ilvC 3957912 fwd
asnA 3927129 fwd
Gene Start Site Transcription Direction
idnK 4494597 fwd
dinJ 246533 rev
yjiY 4591397 rev
motAB-cheAW 1977302 rev
ftsK 933138 fwd
cra 87969 fwd
uvrD 3997907 fwd
adiY 4338042 rev
znuCB 1942634 fwd
znuA 1942661 rev
zupT 3182433 fwd
pitA 3637612 fwd
ecnB 4376509 fwd
leuABCD 83735 rev
Table A.1: All TSS for all genes investigated in Reg-Seq.
There are 160 base pairs of mutated sequence for each regulatory region, 115 base pairs upstream and 45 base pairs down stream of the transcription start site. The
TSS location is shown, as well as whether the gene is transcribed in a 5’ to 3’
direction or a 3’ to 5’ direction.
A p p e n d i x B