INTRODUCTION
1.6 Experimental methods for discovering and analyzing regulatory motifs A central goal of the present work is to dissect regulatory regions “from startA central goal of the present work is to dissect regulatory regions “from start
to finish,” which includes discovering and identifying transcripiton factor binding sites for unannotated promoters, determining the identities and regulatory roles of these transcription factors, and measuring transcription factor binding energies.
Ultimately it is desirable to perform such an analysis in a high-throughput manner for unannotated promoters genome-wide, but this will require the development of new methodology. However, a number of methods currently exist that can be leveraged to dissect regulatory architectures on a low- to mid-throughput scale.
Here we discuss three classes of methods that have been used previously to analyze regulatory architectures: occupancy assays (Figure 1.9A), in vitro affinity assays (Figure 1.9B), and massively parallel reporter assays (MPRAs) (Figure 1.9C). We give special attention to MPRAs as they are used extensively in this work.
Occupancy assays create maps of the locations of nucleotide-binding elements throughout the genome. For example, Chromatin ImmunoPrecipitation (ChIP)-seq is a common technique for determining the locations of transcription factors and hi- stones [47, 48]. In a ChIP-seq experiment, the genome is fragmented and antibodies are introduced which target a transcription factor or histone of interest. These anti- bodies are immunoprecipitated to produce a sample containing the targeted protein along with any bound DNA fragments. These DNA fragments are sequenced and aligned to the genome to create a map of the binding sites for the targeted protein.
Similar occupancy-based techniques can be used for applications such as determin- ing the distribution of ribosomes [49] and identifying nucleosome binding regions, regions of open chromatin, and other regulatory elements in eukaryotes [50–54].
Occupancy assays can be used to determine the rough sequence specificity of a specific transcription factor, as they provide multiple examples of sequences that bind to the transcription factor of interest. However, these assays provide no infor- mation regarding transcription factor affinity–that is, the binding energy between the transcription factor and a given sequence. A number ofin vitromethods have been devised to sensitively determine transcription factor sequence specificity and bind- ing affinity [55–58]. In vitromethods allow one to assay the interactions between purified transcription factors and thousands of sequence variants. For example, protein-binding microarrays (PBMs) are a common, straightforward assay for as- sessing sequence specificity. In this assay, a microarray spotted with thousands of DNA sequence variants is incubated with fluorescently labeled transcription factor.
The transcription factor binds with some probability to each DNA spot, depend- ing on the affinity of the transcription factor to the DNA sequence. Measuring the fluorescent intensity of each spot allows one to determine the affinity of the transcription factor for the DNA sequence within the spot [55]. Other in vitro methods such as MITOMI [56], HT-SELEX [57], and Spec-seq [58] similarly al- low for high-sensitivity measurements of transcription factor binding affinities and sequence specificities. A distinct advantage of in vitrotechinques is that they can be used to analyze low-affinity binding events [56, 59, 60]. However, a major draw- back ofin vitrotechniques is that they cannot fully capture the subtleties ofin vivo protein binding, which includes competition from other proteins, the influence of small molecules, and DNA shape effects. Additionally, bothin vitromethods and occupancy methods focus on specific proteins that must be purified or immunopre- cipitated, and thus are not especially useful for analyzing the regulatory architectures of specific promoters which may lack full regulatory annotation.
Massively parallel reporter assays (MPRAs, reviewed in Refs. [61, 62] and schema- tized in Figure 1.9C) are a diverse, versatile class of assays that can be used to analyze multiple aspects of transcriptional regulation either locally or genome-wide. In gen- eral, MPRAs are performed by positioning a library of promoter variants upstream of a reporter gene. Variations to the promoter can include single-nucleotide muta- tions to the promoter region [13, 63–65], transcription factor arrangement [66–68], spacing between binding sites [66, 68], or any other modification that can be made at the nucleotide level. These variations alter the promoter’s regulatory properties, resulting in a change in the reporter gene’s expression level. If the reporter gene is fluorescent, the cells may then be sorted into bins according to their fluorescence using fluorescence-activated cell sorting (FACS) [13, 66]. The contents of each bin are sequenced, and the sequence of each promoter variant is thereby associated with the reporter gene’s expression level. Another common strategy is to associate the promoter variant with a barcode that is transcribed along with the reporter gene [63–
65, 67, 68]. One can then sequence the reporter gene’s mRNA transcripts and count the number of times that each barcode appears, thus associating gene expression with promoter variant in a fine-grained manner.
A number of studies use MPRAs to assay systematic perturbations to individual promoters [13, 63–68]. In addition, many variations on MPRAs have been developed to assay numerous other aspects of transcriptional regulation, such as analyzing regulatory “parts” for synthetic biology applications [69] or testing models for
predicting regulatory motifs in human cells [70, 71]. With minor modifications, the technique is also well-suited to genome-wide analysis and discovery of transcription factor binding sites [72–77]. Additionally, MPRAs can be combined with other techniques to obtain a more detailed understanding of regulatory systems. For example, MPRAs have been combined with occupancy assays to identify candidate enhancers and correlate transcription factor occupancy with regulation [78–80].
In this work we make use of the MPRA Sort-Seq [13] to discover transcription factor binding sites, infer the sites’ regulatory roles and interactions, and create predictive models of transcription factor-DNA binding. We innovate by showing that we can thoroughly dissect promoters with diverse regulatory mechanisms given little to no initial information regarding a promoter’s regulation. Additionally, we show that when combined with the proper analysis techniques, Sort-Seq can be used to determine transcription factor binding affinity with accuracy comparable toin vitro assays.
OCCUPANCY ASSAYS
(EXAMPLE: CHIP-SEQ) MASSIVELY PARALLEL REPORTER ASSAYS
(A) (C)
(B)
1. Transcription factors bind to DNA 1. Create promoter mutant library
2. Sort by fluorescence or purify RNA
3. Sequence promoters and match sequences to expression level 2. Fragment DNA and immunoprecipitate
target protein
3. Sequence remaining DNA and map out binding sites
sequence position
counts
IN VITRO BINDING ASSAYS (EXAMPLE: PBM) 1. Create microarray with thousands of DNA sequence variants
2. Add fluorescently-labeled TF and measure intensity of each DNA spot
reporter gene
mutant library
cells expressing reporter gene mutations
bin sequence
4 AATTGTGAGCGG...
1 2 3 AATCGTGAGCGG...
AATTGTGGGCGG...
AAATGTGAGCGG...
AATTGTGTGCGG...
AATTATGAGCGG...
AATTATGAGAGG...
AATTGTGTGCGG...
AATAGTGAGCGC...
AACTGTGAGCGG...
AATTGTAAGCGG...
ATTTGTGAGCGG...
counts sequence AATTGTGAGCGG...
AATCGTGAGCGG...
AATTGTGGGCGG...
AAATGTGAGCGG...
AATTGTGTGCGG...
AATTATGAGCGG...
AATTATGAGAGG...
AATTGTGTGCGG...
AATAGTGAGCGC...
AACTGTGAGCGG...
AATTGTAAGCGG...
ATTTGTGAGCGG...
counts
fluorescence
Figure 1.9: Diverse methods assay multiple aspects of transcription factor bind- ing. (A) Occupancy-based methods can determine the locations of regulatory ele- ments throughout the genome. For example, ChIP-seq works by digesting DNA and then immunoprecipitating a transcription factor of interest. Sequencing the DNA fragments attached to the transcription factor and aligning these fragments to the genome provides a map of transcription factor binding sites. (B) In vitro assays provide highly accurate readings of transcription factor sequence specificity and affinity. For example, protein binding microarrays (PBMs) contain thousands of DNA sequences to which fluorescently-labeled transcription factors can bind. The fluorescent intensity of each DNA spot on the microarray then serves as a readout of the transcription factor’s affinity for the associated DNA sequence. (C) MPRAs are performed by positioning a library of promoter variants upstream of a reporter gene. Measuring the expression of the reporter gene and correlating expression with promoter sequence makes it possible to ascertain the roles of promoter elements.