Chapter III: Transcriptional topology
III.2: Results
requirements surely contain a large number of “housekeeping” genes with varied levels of expression relative to the other tissues in the body. Furthermore, even genes that are known to be regulated by enhancers can appear to drive native expression patterns with their proximal promoters alone, as was once the case with myogenin (Yee and Rigby 1993), so the lack of apparent necessity, in assay, for CRMs does not disprove their existence.
whether they connect with only one gene or multiple genes, all connect to the nearest annotated active gene, even the 25% of elements that connect to multiple genes (Fig. III-2, right). The majority of genes without a ChIA-PET connection are not detectably expressed (Fig. III-1, compare Figures I-3 to III-4). I used available C2C12 active and repressive chromatin mark data, and skipped over genes that lack any of the active or repressive chromatin marks assayed, opposed to connected genes, which have active marks as expected (data not shown). However, due to the relatively little data on repressive marks in our laboratory collection or the literature, I cannot comment on the biochemistry
underlying gene-skipping, other than to say that it is consistent with a previous microscope experiment’s claim that inactive genes “loop out” of transcription factories (Mitchell and Fraser 2008).
III.2.3: Connectivity and changes in gene expression
I next wanted to know how ChIA-PET connections changed over time. It has been reported that E:P interactions do change with transcription, but also that many are
constant across tissue types (Simonis, Klous et al. 2006). In order to determine how change in gene expression relates to ChIA-PET, I chose to create four well-defined trajectories of gene expression – up, flat, down, and off – to analyze with respect to each other, while leaving behind genes that are ambiguously expressed or that have an ambiguous trajectory (Fig. III-4). The change in ChIA-PET connectivity from myoblast to myocyte correlates highly with the change in gene expression (Fig. III-5). This could mean that flat genes are unlikely to change their architecture, while developmentally regulated genes might. Alternatively, heeding the cautions from Fig. II-5, the architecture could remain constant while increased input from CRMs could cause the increase in gene expression. Another, less likely, possibility is that high expression at one promoter
somehow bleeds over into surrounding genomic area, increasing the number of ChIA-PET connections returned.
III.2.4: Distal degree shows a preference for gene type
To further quantify ChIA-PET connectivity within the up, flat, down, and off groups, I measured the degree. The degree of a gene is not as closely correlated with its
expression (data not shown) as edge weight is (Fig. III-5). When measuring specifically the distal degree (distal connection number per gene) in linear regression versus gene expression, it became clear that while distal degree is weakly correlated with expression level (data not shown), it is strongly correlated with gene trajectory, specifically, the
upregulated genes (Fig. III-7, top). Flat genes have no such correlation (data not shown).
I have shown that the amount of ChIA-PET connectivity is partially related to the quantity of a gene’s expression. However, I wanted to determine if the striking results in the upregulated set of genes were due to expression change alone, or if they showed evidence of being connected in a qualitatively different way from other genes. To further explore the distinction between upregulated and flat genes, I asked how for each trajectory group distal connectivity is distributed with respect to RNA amount. For each RNA
abundance class, I quantified the global myocyte distal degree. There is a strong
distinction (P<10-5) for medium- and high-abundance upregulated genes to have a higher distal degree than flat genes of the same abundance (Fig. III-7, bottom).
III.2.5: Promoter-promoter connections
Almost a third of pol2 ChIA-PET interactions are gene-gene interactions, which in our data define gene vertices centered at transcription start sites (see Materials and Methods). In two contemporary studies, the authors suggested that such ChIA-PET connections reflect, and may even cause, co-regulation (Chepelev, Wei et al. 2012; Li, Ruan et al. 2012; Kieffer-Kwon, Tang et al. 2013). My differentiation system is well-suited
to test this with XX G-G connections overall, YY containing at least one significantly up- or down-regulated gene, and ZZ containing genes of substantial trajectory >50FPKM. I interrogated my data in several different ways to ask if, globally, G-G connectivity predicts co-regulation of the paired genes. I found no statistically significant correlation overall between pairs of genes with respect to their expression levels (data not shown) when I confined the analysis to active genes, since unexpressed genes aren’t expected to connect (and therefore will give a false positive significant result when included in the null hypothesis). Similarly, there was no significant correlation between expression trajectory of pairs of connected gene vertices (data not shown). However, the G-G landscape isn’t completely random. Flat genes, the most expressed gene type, are connected equally with each other, and with upregulated or downregulated, while the two differential classes are almost never connected (Fig. III-6, left versus center bars).
Although there was no global evidence of co-regulation associated globally with gene-gene pol2 connectivity, inspection of several loci of known biological interest led me to ask whether a more narrowly defined set of development genes, isolating the most extreme expression differences, are overrepresented in certain expression patterns. I compared the activity of genes which directly or indirectly connect to super-differential
“seed” genes to all genes within the 2Mb window available to the ChIA-PET connections of the “seed” genes (Fig. III-6, top). I found that members of the group of 252 extremely upregulated muscle genes are more likely to be close to other myogenic genes than other groups of genes, but even taking this into consideration, they were also more likely to connect to other upregulated genes (Fig. III-6, bottom). A similar analysis with
downregulated genes just barely failed statistical significance, perhaps because of low n, and there was no such result with flat genes (data not shown). This suggests that only the
small subsets of developmentally regulated genes are candidates for co-regulation, while the majority of G-G interactions reflect co-expression.
For large graph analysis, one interpretation is that the additive effect of enhancers causes high expression. Another interpretation is that expression at some modestly and steadily expressed genes may be a byproduct of being physically connected to an important gene.