Regulation in Escherichia coli

First, I would like to thank my advisor Rob Phillips for his always wonderful scientific insights and attitude. Next, we note that much of the quantitative insight available on transcriptional regulation relies on work on only a few model regulatory systems such as LacI as considered above.

LIST OF TABLES

INTRODUCTION

Chapter Summaries

In Chapter 2, we discuss the modeling of DNA sequence-specific transcription factor binding energy in vivo. We used these energy matrices to predict the binding energies of Lac O1 binding site mutants and confirmed our predictions with experimental measurements (right). B) In Chapter 3, we identified regulatory architectures for unannotated promoters.

Figure 1.1: Applying quantitative models of gene regulation across the genome.

The central dogma of molecular biology

As a result, we have a deep understanding of protein-coding regions of the genetic code. The functions of the regulatory regions are just as much of a mystery as they were decades ago.

Thermodynamic models and gene regulation

Simple repression occurs when a single transcription factor binds near the RNAP binding site and prevents RNAP binding. The value of 𝑝𝑏 𝑜𝑢𝑛𝑑 is given by the sum of the Boltzmann weights for all microstates where an RNAP is bound to the specific site, which gives us.

Figure 1.3: Input-output function for simple repression. For a simple repression architecture, with one RNAP site, and one repressor binding site, Garcia and Phillips, 2011 measured the output of lacZ for transcription factor binding sites of different DNA

Quantitative modeling of gene regulation

One can write quantitative models for any combination of interacting transcription factor binding sites, and such models have been written for each of the transcription factor architectures found in this work (Bintu et al., 2005). Boedicker et al., 2013; Scott et al., 2010) apply the states and weights approach to the case of DNA looping in thelacoperon. In Phillips et al., 2012 it was found that the occupancy hypothesis could not adequately describe the mechanism of repression in a single repression architecture where the repressor bound upstream of the RNAP and could simultaneously bind to the RNAP.

Lack of current regulatory knowledge

In this case, gene regulation occurred independently of the occupancy of RNAP. promoter architecture number of activators. Strikingly, over half of the operons lack any listed transcription factor binding sites.

Figure 1.8: Distribution of regulatory architectures in E. coli. The percentage prevalence of each type of regulatory architecture in RegulonDB before the work in this dissertation

MAPPING DNA SEQUENCE TO TRANSCRIPTION FACTOR BINDING ENERGY IN VIVO

Introduction

Massively parallel reporter assays (MPRAs) have made it possible to read out transcription factor binding position and occupancy in vivo with base pair resolution and provide a means to analyze non-binding features such as "insulator" sequences (Levo, Avnit-Sagi, et al. al., 2017; Melnikov et al., 2012; Levy et al., 2017). Current in vivo methods for determining transcription factor binding affinities, such as bacterial one-hybrid (Christensen et al., 2011; Xu and Noyes, 2015), require restructuring the promoter so that it no longer resembles its genomic counterpart.

Results

We use MPAthic software to ascertain the parameters of energy matrices and thermodynamic parameters of 𝑝𝑏 𝑜𝑢𝑛𝑑 (Kinney and Callan, 2010; Ireland and Kin-ney, 2016). Although not completely unexpected, we find that the quality of the matrix predictions decreased as we predicted the energy of sequences further away from the wild-type sequence of the binding site used to generate the energy matrix.

Figure 2.1: Process flow for using Sort-Seq to obtain energy matrices. (A) A simple repression motif was designed in which a LacI repressor binding site is placed immediately downstream of the RNAP site

Discussion

We frame this work in the context of a simple repression architecture, which represents a widespread bacterial regulatory architecture (Rydenfelt et al., 2014) commonly used in synthetic biology (Brophy and Christopher A. Voigt, 2014; Khalil and Collins , 2010; Purnick and Weiss, 2009). Deep learning algorithms can provide an alternative approach to model DNA-protein interactions (Sun et al., 2017).

Methods Sort-Seq librariesSort-Seq libraries

Generation of final strains containing a simple repression motif and a specific LacI copy number was achieved by P1 transduction. Studies on the distribution of Escherichia coli cAMP-receptor protein and RNA polymerase along E.

A SYSTEMATIC APPROACH FOR DISSECTING THE MOLECULAR MECHANISMS OF TRANSCRIPTIONAL

REGULATION IN BACTERIA

Introduction

CRISPR assays have also shown promise for identifying longer-range enhancer-promoter interactions in mammalian cells (Fulco et al., 2016). DNA affinity chromatography and mass spectrometry (Mittler, Butter, and M. Mann, 2008; Mirzaei et al., 2013) are then used to identify the regulatory proteins that recognize these sites.

Figure 3.1: Summary of transcriptional regulatory knowledge in E. coli. left panel:

Results

Two binding sites for XylR were identified (see also Figure 3.4) along with a CRP binding site. Several RNAP binding sites were identified using Sort-Seq data performed in the Δ dgoR strain grown in M9 minimal medium with 0.5% glucose.

Figure 3.2: Overview of approach to characterize transcriptional regulatory DNA, using Sort-Seq and mass spectrometry

Discussion

In particular, we were able to identify multiple RNAP binding sites along the length of the promoter. Landing platform technologies for chromosomal integration (Kuhlman and E.C. Cox, 2010; H. Zhang et al., 2016) should allow massively parallel reporter assays to be performed in chromosomes rather than on plasmids.

Methods

We begin with a description of how we construct the mutated promoters used in the Sort-Seq experiment itself. In the case of the lacZ promoter, the associated Sort-Seq data were also used in the analysis in (Razo-Mejia et al., 2014).

Bacterial strains

To calculate the overall protein ratio, the unnormalized protein replicate ratios were log-transformed and then shifted so that the median protein log ratio within each replicate was zero (i.e., the median protein ratio was 1:1). The overall experimental log ratio was then calculated from the mean of the replicate ratios.

Figure 3.12: Related to Fig. 3.2 and Fig. 3.3. Analysis of the library mutation spectrum and effect of Sort-Seq sorting conditions

Supplemental Information: Characterization of library diversity and sorting sensitivity.sorting sensitivity

Analysis of library diversity using data from the mar promoter

Although we intended to have a similar mutation rate, we found the mutation rate to be closer to three percent per nucleotide position. To better understand how the mutation rate varies between libraries, we plot a histogram of the number of mutations per base pair for the entire set of sequences found in themarRpromoter library (Fig.

Supplemental Information: Generation of sequence logos

While we obtained an average mutation rate of 10.4% in this library, close to our target rate of 9%, there is some variation in this mutation rate as might be expected, since the incorporation of mutations into the DNA synthesis procedure is a random process is . Due to the large number of possible triple point mutants in a 60 bp region, only a small subset of the possible sequences will be found in the library.

Generating position weight matrices from known genomic binding sites

Finally, the values of the position weight matrix are found by computing the log probabilities with respect to a background model (Stormo, 2000). This reflects the fact that base frequencies matching the background pattern tell us nothing about transcription factor binding preferences, while deviation from this background frequency indicates sequence specificity.

Generating position weight matrices from Sort-Seq data

I: Proceedings of the National Academy of Sciences of the United States of America 107.20, s. Karakterisering af MarR, repressoren af den multiple antibiotikaresistens (mar) operon i Escherichia coli." da.

Figure 3.13: Extended analysis of the dgoR promoter. (A) Sequence logos were generated for the most upstream 60 bp region containing the putative RNAP and CRP binding sites

DECIPHERING THE REGULATORY GENOME OF ESCHERICHIA COLI, ONE HUNDRED PROMOTERS AT A

TIME

Introduction

One of the most exciting MPRA-based X-Seq tools with broad biophysical reach is the Sort-Seq approach developed by Kinney, Murugan et al., 2010, with proof-of-concept work on virgin genes discussed in the previous chapter. In the results section, we provide a global view of the discoveries we made while surveying more than 100 promoters in E.

Results

0, 1) architectures where the locations of the putative binding sites are highlighted in red and the identity of the bound transcription factors is revealed in the mass spectrometry data. 36% of all multigene operons have at least one TSS that transcribes only a subset of the genes in the operon (Conway et al., 2014).

Figure 4.1: The E. coli regulatory genome and the genes studied with Reg-Seq.

Discussion

The case studies described in the main text demonstrate the method's ability to deliver on the promise of systematically uncovering the regulatory genome. The current incarnation of the methods described here has focused on adjacent regions near the transcription start site.

Methods

On combinatorial transcription logic schemes. In: Proceedings of the National Academy of Sciences 100.9, pp. In: Proceedings of the National Academy of Sciences of the United States of America116.37, pp.

Figure 4.6: Reg-Seq analysis of broadly-acting transcription factors. (A) GlpR as a widely-acting regulator

EXTENDED EXPERIMENTAL DETAILS

The mutated region will be 160 base pairs, with 115 base pairs upstream and 45 base pairs downstream of the TSS for that gene listed in Table A.1. There are 160 base pairs of mutated sequence for each regulatory region, 115 base pairs upstream and 45 base pairs downstream of the transcription start site.

Figure A.1: Promoter constructs for Reg-Seq. We show examples of the constructs generated for the Reg-Seq project

EXTENDED ANALYSIS DETAILS

Validating Reg-Seq against previous methods and results
Information footprints
Estimating mutual information from observed data and model predic- tionstions
Markov Chain Monte Carlo fitting procedure
TOMTOM motif comparison
BioInformatic methods - TOMTOM motif comparison

Looking at the marR promoter in Figure B.2(C), we can only see the signature of two MarR sites, although CpxR, Fis and CRP are known to bind to the promoter. We will define 𝑧𝑞 as the order in the 𝑞th order binding energy predictions.

Figure B.1: A summary of four direct comparisons of measurements using fluorescence and sorting and using RNA-Seq

Sigma Factors

Even for a smaller 𝜎 factor such as 𝜎32, there are about 100 known RNAP binding sites (Keseler et al., 2013). This is a similar scale to the number of known CRP or FNR binding sites (Keseler et al., 2013).

Binding sites regulating divergent operons

We compare the consensus sequences for each 𝜎 factor with the binding energy matrices for each of the discovered regulatory elements. A comparison of the types of architectures found in RegulonDB (Santos-Zavaleta et al., 2019) with the architectures with newly discovered binding sites found in the Reg-Seq study.

Table B.2: Identification of the 𝜎 factors used for each RNAP binding site.

Diffeomorphic modes

One of the first things we notice about this expression is that changing the value of 𝛼 will not affect the rank of any prediction (unless it is zero). If the differential change in R is a function of only the value of R, then there is no way that the ranking of the cells can change, and thus the transformation.

Figure B.11: The microstates of a one transcription factor promoter. We assume that any state with RNAP bound will transcribe at the rate given by "activity"

Diffeomorphic Mode Calculations

This equation cannot be expressed only as a function of R, therefore an additive shift to the repressor site is not a diffeomorphic state. This equation cannot be expressed only as a function of R, therefore a multiplicative shift to the repressor site is not a diffeomorphic state.

Genes

Tig is one of the chaperones that cooperates in the folding of newly synthesized cytosolic and secretory proteins (Keseler et al., 2013). Lipid A, is the hydrophobic part of lipopolysaccharide (LPS), a glucosamine-containing saccharolipid that forms the outer layer of the outer membrane of most gram-negative bacteria (Raetz et al., 2007).

Construction of sequence logos

The relative height of bases 𝐴, 𝐶, 𝐺 and 𝑇 are displayed according to their relative probabilities scaled by information content (Schneider et al., 1986). We construct sequence logos using WebLogo ( Crooks et al., 2004 ) and custom code from Justin Kinney available on the GitHub repo for the Sort-Seq project ( https://github.com/RPGroup-PBoC/sortseq_belliveau ).