Decoding the Regulatory Genome

Of course, a lab is nothing without collaborators, and I landed with some of the best. By fitting our model to experimental data, we can determine the values of the unknown parameter values in our model.

INTRODUCTION

The central dogma of molecular biology

In a process known as transcription, an RNA polymerase (RNAP) recognizes and binds to a region upstream of the gene known as a promoter and then copies the gene into a single-stranded RNA message known as mRNA. The ribosome facilitates the matching of the mRNA message to transfer RNAs (tRNAs), which are structures made of RNA that include a codon recognition sequence at one end and carry the corresponding amino acid as cargo at the other end.

Regulation of protein abundance and activity

Another common mechanism for post-transcriptional regulation in prokaryotes is the use of small RNA regulators (sRNAs). While mechanisms of regulation at all stages of the central dogma certainly inform each other, the focus of this work is the quantitative analysis of transcriptional regulation in prokaryotes.

Quantitative models of transcriptional regulation

The partition function can be thought of as the sum of the statistical mechanical weights of all microstates in the system, and is given by The value of pbound is given by the sum of the Boltzmann weights for all microstates in which an RNAP is bound to the specific site, which we give.

Figure 1.4: Modeling transcription using statistical mechanics. To model gene expression, we make the assumption that gene expression is proportional to the probability that RNAP is bound to the promoter, p bound [24]

The diversity of transcriptional regulatory mechanisms

IPTG is used because, unlike allolactose, IPTG is not broken down by β-galactosidase, meaning that the concentration remains constant for the duration of the experiment. Furthermore, although the mechanisms of loop formation are known to depend on the DNA sequence of the loop region, the bending due to HU is not affected by the relative “stiffness” of the DNA [8].

Figure 1.6: Distribution of regulatory architectures in E. coli. (A) We classify regulatory architectures according to the number of activator sites A and repressor sites R in a promoter region, using the notation ( A, R)

The state of knowledge of transcriptional regulation

Experimental methods for discovering and analyzing regulatory motifs A central goal of the present work is to dissect regulatory regions “from startA central goal of the present work is to dissect regulatory regions “from start

By measuring the fluorescent intensity of each spot, one can determine the affinity of the transcription factor for the DNA sequence within the spot [55]. The fluorescent intensity of each DNA spot on the microarray then serves as a readout of the transcription factor's affinity for the associated DNA sequence.

Figure 1.9: Diverse methods assay multiple aspects of transcription factor bind- bind-ing

Computational methods for analyzing data

Now that we have expressions for the prior P(∆εt f) and the probability P(D|∆εt f), we can write an expression forP(∆εt f|D),. As mentioned earlier, the posterior distribution of the binding energy of the transcription factor can be represented by the student's t distribution given in equation 1.30.

Figure 1.10: Using Bayesian inference to determine the value of a parameter.

BIBLIOGRAPHY

Modulation of DNA loop lifetime by free energy of loop formation. Proceedings of the National Academy of Sciences. Composability of regulatory sequences controlling transcription and translation in Escherichia coli. Proceedings of the National Academy of Sciences.

TUNING TRANSCRIPTIONAL REGULATION THROUGH SIGNALING: A PREDICTIVE THEORY OF ALLOSTERIC

INDUCTION

Introduction

For example, when we have sufficient confidence in the model, a single set of data can be used to accurately extrapolate the behavior of a system to other conditions. Induction is characterized by the addition of an effector which binds to the repressor and stabilizes the inactive state (defined as the state that has a low affinity for DNA), thereby increasing gene expression.

Figure 2.1: Transcription regulation architectures involving an allosteric re- re-pressor

Results

Instead, we measure the fold change in gene expression due to the presence of a repressor. Specifically, we show predictions for (F) leakage, (G) saturation, (H) dynamic range, (I) [EC50], and (J) effective Hill coefficient induction profiles.

Figure 2.2: States and weights for the simple repression motif. (A) RNAP (light blue) and a repressor compete for binding to a promoter of interest

Discussion

We are consistently able to accurately predict the leakage, saturation and dynamic range for each of the strains. In conclusion, our application of the MWC model provides an accurate, predictive framework for understanding simple repression by allosteric transcription factors.

Methods

If we assume that these errors(i) are normally distributed with mean zero and standard deviation σ, the likelihood of the data given the parameters. All data used in this work as well as all relevant code can be found on the dedicated website http://rpgroup-pboc.github.io/mwc_induction.

Supplemental Information: Inferring Allosteric Parameters from Previ- ous Data

10], who both measured fold-change with the same simple repression system in the absence of inducer (c = 0), but at various repressor copy numbers R. Note that this functional form does not exactly match our fold-change Equation 2.5 in the limietc =0,.

Figure 2.8: Multiple sets of parameters yield identical fold-change responses.

Supplemental Information: Induction of Simple Repression with Multiple Promoters or Competitor Sites

However, if the number of competitor sites and their average binding energy are known, they can be accounted for in the model. This mimics the common scenario in which a transcription factor has multiple binding sites in the genome.

Figure 2.10: Induction with variable R and multiple specific binding sites.

Supplemental Information: Flow Cytometry

The consistency of these three readouts validates the quantitative use of flow cytometry and unsupervised access to determine the fold change in gene expression. The fold change in gene expression for equivalent simple repression constructs was determined using three independent methods: flow cytometry (this work), colorimetric Miller assays [ 9 ] and video microscopy [ 10 ].

Figure 2.15: Plate arrangements for flow cytometry. (A) Samples were measured primarily in the forward arrangement with a subset of samples measured in reverse.

Supplemental Information: Single-Cell Microscopy

For comparison with the flow cytometry results, cells were cultured in the same manner as described in the main text. We note that the credible regions from the microscopy data shown in Figure 2.21B are much broader than those from flow cytometry due to the smaller number of replicates performed.

Figure 2.18: Experimental workflow for single-cell microscopy. For comparison with the flow cytometry results, the cells were grown in an identical manner to those described in the main text

Supplemental Information: Fold-Change Sensitivity Analysis

As in panel A, but showing how folding changes sensitivity for different repressor copy numbers. As in panel C, the sensitivity of fold change with respect to KI is again least (by order of magnitude) for the low repressor copy number strains.

Figure 2.22: Determining how sensitive the fold-change values are to the fit values of the dissociation constants.(A) The difference ∆fold-change K A in fold change when the dissociation constant K A is slightly offset from its best-fit value K A = 139 +29

Supplemental Information: Alternate Characterizations of Induction In this section we discuss a different way to describe the induction data, namely,

Although the curves in Figure 2.25 are almost identical to those in Figure 2.4 (made using the MWC model Equation 2.5), we emphasize that the Hill function approximation is more complex than the MWC model (which uses four parameters instead of of three contains) and it hides the relationships with the physical parameters of the system. For our purposes, the Hill function Equation 2.46 does not capture the connection to the physics of the system and provides no intuition about how transcription depends on such mutations.

Figure 2.23: Hill function and MWC analysis of each induction profile. Data for each individual strain was fit to the general Hill function in Equation 2.44

Supplemental Information: Global Fit of All Parameters

The fit values for the repressor copy numbers were all within one standard deviation of the previously reported values given in Ref. Note that there is overlap between all the repressor copy numbers and that the net difference in the repressor-DNA binding energies is less than 1 kBT.

Table 2.2: Key model parameters for induction of an allosteric repressor.

Supplemental Information: Applicability of Theory to the Oid Operator Sequence

The same experimental data are plotted against the best-fitting parameters using the full O1, O2, O3 and Oid data sets to derive KA, KI, repressor copy numbers and the binding energies of all operators (see Supplementary Section 2.11). Fold-change curves for the different repressor DNA binding energies ∆εR A are plotted as a function of repressor copy number when IPTG concentration c = 0.

Figure 2.28: Predictions of fold-change for strains with an Oid binding sequence versus experimental measurements with different repressor copy numbers

Supplemental Information: Comparison of Parameter Estimation and Fold-Change Predictions across Strains

Fold change in expression is plotted as a function of IPTG concentration for all strains containing an O1 operator. Fold change in expression is plotted as a function of IPTG concentration for all strains containing an O2 operator.

Figure 2.30: O1 strain fold-change predictions based on strain-specific param- param-eter estimation of K A and K I

Supplemental Information: Properties of Induction Titration Curves In this section, we expand on the phenotypic properties of the induction response

Both [EC50] and h vary significantly with repressor copy number for sufficiently strong operator binding energies. Interestingly, for weak operator binding energies of the order of the O3 operator, it is predicted that the effective Hill coefficient should not change with repressor copy number.

Figure 2.33: Dependence of leakiness, saturation, and dynamic range on the operator binding energy and repressor copy number

Supplemental Information: Applications to Other Regulatory Architec- tures

In the case of inducible activation, binding of an effector molecule to an activator transcription factor increases the fold change in gene expression. The right panel shows how varying the polymerase–activator interaction energy εAP alters the fold change.

Figure 2.35: Representative fold-change predictions for allosteric corepression and activation

Supplemental Information: E. coli Primer and Strain List

Asymmetric configurations in a redesigned homodimer reveal multiple subunit communication pathways in protein allostery.The Journal of Biological Chemistry. Chemosensing in Escherichia coli: Two regimes of two-state receptors. Proceedings of the National Academy of Sciences.

Table 2.4: Promoter sequences and primers used in this work. The listed promoter sequences were randomly mutated to produce libraries for use in Sort-Seq experiments

A SYSTEMATIC APPROACH FOR DISSECTING THE MOLECULAR MECHANISMS OF TRANSCRIPTIONAL

REGULATION IN BACTERIA

Introduction

In recent years, a number of massively parallel reporter assays have been developed for dissecting the functional architecture of transcriptional regulatory sequences in bacteria, yeast and metazoans. However, no approach has yet been established to use massively parallel reporter technologies to decipher the functional mechanisms of previously uncharacterized regulatory sequences.

Results

The switching graphs of the coupled expressions are shown in Figure 3.5A. While we performed Sort-Seq on a larger. Multiple RNAP binding sites were identified using Sort-Seq data performed on a∆dgoRstrain grown in M9 minimal media with 0.5% glucose. detailed further in Supplementary Figure 3.16).

Figure 3.1: Overview of approach to characterize transcriptional regulatory DNA, using Sort-Seq and mass spectrometry

Discussion

Microarray-synthesized promoter libraries and measurement of expression from barcoded transcripts using RNA-seq, instead of flow cytometry, can be used to allow multiple loci to be studied simultaneously [ 14 , 18 ]. Landing pad technologies for chromosomal integration [58–60] should enable massively parallel reporter assays to be performed in chromosomes rather than on plasmids.

Methods Bacterial strainsBacterial strains

In Figure 3.9C, the coefficient of variation is plotted for each of the proteins measured in this study, separated by whether their promoter contains any known transcription factor binding sites (identified from RegulonDB [1]). The number of copies was measured to be 675 copies per cell when cells were grown in galactose, and 15 copies per cell or less under all other conditions considered.

Figure 3.8: Summary of transcriptional regulatory knowledge in E. coli. left panel: Well-characterized promoters considered in this work

Supplemental Information: Characterization of library diversity and sorting sensitivity

Due to the large number of possible triple point mutants in a 60 bp region, only a small subset of the possible sequences will be found in the library. In (B), cells were sorted using the method in the main text, where cells were sorted into 4 bins, each containing 15% of the population.

Figure 3.10: Analysis of the library mutation spectrum and effect of Sort-Seq sorting conditions.

Supplemental Information: Generation of sequence logos

Finally, the values of the position weights matrix are found by calculating the log-likelihood given the background model [ 86 ]. The total information content contained in the position weight matrix is then the sum of the information content along the length of the binding site.

Figure 3.11: Comparison between Sort-Seq and genomic-based sequence logos.

Supplemental Information: Statistical mechanical model of the DNA affinity chromatography approach

With respect to our two purifications shown in Figure 3.12D, ∆εT F,s refers to the binding energy of the transcription factor to its target binding site, while ∆εT F,ns refers to the non-specific binding energy to the non-target reference DNA. Equation 3.8 can be used to calculate the expected ratio of transcription factor bound to target DNA to reference DNA given by . 3.10) Again, the index ∆εT F,ns refers to the binding energy of the transcription factor to the non-target (i.e. non-specific) reference DNA.

Supplemental Information: DNA affinity chromatography and mass spec- trometry experimentation and analysis

In the left panel of Figure 3.12B we show the mean enrichment values that were measured for each of the detected proteins. Here we have taken the logarithm of the enrichment ratios so that the bins are equal.

Figure 3.12: Identification of transcription factors using DNA-affinity chromatogra- chromatogra-phy and mass spectrometry

Supplemental Information: Selection of the mutagenesis window for promoter dissection by Sort-Seq