• Tidak ada hasil yang ditemukan

Theory and models of enhancer sequence evolution

1.5 Evolution of enhancers drives species divergence

1.5.6 Theory and models of enhancer sequence evolution

If enhancer activity is species-specific, yet the sequences underlying enhancer elements are ancient and present across species genomes, then what determines the rapid turnover of gene regulatory sequences?

Below, I will discuss what is known about the evolution of enhancer sequences and its relationship with species divergent activity.

1.5.6.1 Nucleation model of enhancer sequences with multiple ages

Most enhancer activity extinguishes over long evolutionary periods due to turnover. So then, for regions with species-specific enhancer activity, what sequence features in ancient enhancers promote species divergence? Species-specific enhancer sequences identified from comparing human and mouse H2K27ac developmental neocortex regions first illustrated that enhancer sequences were composites of older core sequences and younger derived sequences (Emera et al. (2016)). The significance of this finding suggested that one underappreciated mode of enhancer sequence evolution was sequences produced from genomic rearrangements that accumulate during species divergence.

The authors proposed thenucleation modelof enhancer sequence evolution—that the sequences of active enhancer that are not extinguished have evolved by adding on new, younger pieces of DNA (Figure 1.6). In this model, ade novosequence would emerge in the ancestral genome, possibly through repeat element transposition, and nucleate TFBSs to create a “proto-enhancer”, or minimally active regulatory sequence. The gain of TFBS motifs would likely place that proto-enhancer sequence under some level of evolutionary constraint. The authors suggest that a specie’s genome may have many proto-enhancers, which would allow for mature, species-specific gene regulatory sequences to form at any time. However, it is unclear whether proto-enhancer sequences perform gene regulatory functions at all. Overtime, many proto-enhancer sequences may turn over, yet it is unclear if turnover perturbs gene regulation activity. A few of the sequences that resist turnover presumably go on to gain younger sequences with new TFBS that could either produce or reinforce existing gene regulatory activity. This model would explain the author’s

observation of divergent developmental neocortical enhancer sequences and their multiple sequence ages,

but questions remain around the function and relevance proto-enhancers and composite enhancers to gene regulatory function genome-wide.

Figure 1.6: Enhancer sequence nucleation model

A model of enhancer evolution in the neocortex. Based on characterization of the most recently evolved enhancers in our dataset, enhancers in the neocortex likely emerge as proto-enhancers, short sequences with low regulatory information content. These may emerge in situ from unconstrained sequences or arise from transposable element repeats. Many of these elements are likely lost over time, but some serve as nucleation points for complex enhancer cores to evolve. Based on characterization of the an- cient enhancers in our dataset, proto-enhancers that survive undergo substantial modification, becoming composites of ancient and derived functional segments. From Emera et al. (2016)

1.5.6.2 Transposable element integration may produce gene regulatory elements

Transposable elements (TEs) are repetitive sequences that replicate and insert copies of their genetic sequence throughout eukaryotic genomes (Chuong et al. (2016)). Retrotransposition of TEs is considered a primary source of genome expansion, and recently, 53% of the human genome is estimated to have TE derived-sequence (TEDS) origins (Nurk et al. (2022)). Autonomous classes of TEs, such as long interspersed nuclear elements (LINEs), contain sequences that encode open reading frames for the replication machinery that copies L1 repeats, while other non-autonomous classes, such as short interspersed nuclear elements (SINEs) elements, rely on LINE replication machinery to spread genetic copies. TEs are typically species-specific, and co-evolve as their host genomes diverge. Three classes of TEDS—L1, SINE/Alu, and SVA families— have evidence of active retrotransposition in the human genome, and their random insertions has been previously associated with germline diseases and cancer (Belancio et al. (2009); Burns (2017); Chen et al. (2005)).

Evidence suggests that TEDS have gene regulatory activity and that random insertions throughout the genome can gain gene regulatory activity if host factors, such as zinc fingers, do not actively silenced TEDS insertions (Elbarbary et al. (2016); Chuong et al. (2013, 2016, 2017)). The range of evidence for this phenomenon varies from specific examples of the necessity and sufficiency of a TED sequence for gene regulatory activity, such as the role of an L2 LINE element in a stickleback GDF6 enhancer (Indjeian et al.

28

(2016)), to broad descriptions about the prevalence of TEDS in putative enhancer and promoter annotated regions (Chuong et al. (2013); Sundaram and Wysocka (2020)). Enrichment of TEDS in TFBS ChIP-seq data further reinforces that TEDS sequences have regulatory potential (Marnetto et al. (2018); Schmidt et al.

(2012); Fueyo et al. (2022)). Evaluating the evolutionary history of SINE/Alu subfamilies suggests that SINE/Alu TEDS may acquire H3K4me1 gene regulatory annotations over time (Su et al. (2014)). However, cis-regulatory elements are depleted of TEDS compared with the genomic background, suggesting gene regulatory activity does not favor TEDS insertions. Some have proposed that TEDS are “domesticated” or

“co-opted” to become active gene regulatory regions (Figure 1.7). While this concept provides a facile interpretation on the observed links between gene regulation and TE origins, it clashes with the widespread genomic depletion of TEDS-based enhancers. The prevalence of TEDS-associated regulation may be predicted by whether the gene it regulates has a duplicate or not (Correa et al. (2021)). Together, at specific loci with specific genomic features, TEDS may provide the raw genetic material for species-specific enhancer elements, but more work is needed on the genomic contexts that tolerate this type of regulatory evolution

Figure 1.7: Proposed model of how transposable element derived sequences may form into species- specific gene regulatory elements

Hierarchy of evidence to consider when determining whether a TE has been co-opted for host functions.

Many TEs have biochemical hallmarks of regulatory activity on the basis of genome-wide assays. How- ever, additional evidence is required to determine which of these TEs alter the regulation of host genes and affect organismal phenotypes and fitness. From Chuong et al. (2016)

1.5.6.3 Mechanisms of gene regulatory evolution incisandtrans

A major gap in our understanding of gene regulatory evolution is how often species’ differences in gene regulation are produced fromcis-regulatory mutations that affect local gene regulatory function and activity at target genes ortrans-regulatory changes in the cellular environment (for example TF protein abundance)

that drives widespread differences in gene regulatory activity between species. If the environment completely determined gene regulatory differences between species, we would expect that controlling for environmental factors would reveal no quantifiable difference in gene regulation between species. However, testing gene regulatory activity across homologous species’ sequences in a single cellular environment, such as across the genomes ofDrosophila specieswith STARR-seq (Arnold et al. (2014)), comparative MPRAs in humans and mouse embryonic stem cells (Mattioli et al. (2020)), and in allele-specific gene expression in chimeric human and chimp tetraploid neural progenitor stem cells and differentiated cranial neural crest cells (Agoglia et al. (2021); Gokhman et al. (2021)), support that 30-40% of divergent gene regulation can be attributed to changes incis-regulatory DNA.

Some speculate thatcis- andtrans- regulatory variation have different contributions to gene regulatory divergence. In theory,trans-regulatory variation may to contribute more to gene regulatory variation within populations, whilecis-regulatory variation might fix heritable gene expression patterns into the genome (Hill et al. (2021)). Under the omnigenic model,trans-variation is estimated to explain 70% of trait heritability (Liu et al. (2019)). Indeed,trans-acting variation explains a proportion of variation in some eQTL studies (Hill et al. (2021); Rotival et al. (2011)) and can affect the expression of many downstream gene targets. Most recently in large eQTL studies on human population variation in blood gene expression, trans-eQTL affect gene regulatory variation through transcription factors (V˜osa et al. (2021)). Between yeast species,cis-regulatory variation is thought to contribute more to divergence (Metzger et al. (2017);

Coolon et al. (2014)) via phenotypic variation that becomes fixed in species genomes. Beyond this, the evolutionary dynamics that fix phenotypic variation into the regulatory genome are not well understood.