• Tidak ada hasil yang ditemukan

Dissecting the evolution of human enhancer sequences

N/A
N/A
Protected

Academic year: 2023

Membagikan "Dissecting the evolution of human enhancer sequences"

Copied!
220
0
0

Teks penuh

Genomes, their organization, and their differences between species

The genome encodes cellular life

Genome sequences are organized into units

A major driver of species divergence is changes to gene regulation

Enhancers are DNA sequences that regulate gene expression

  • Enhancers are genomic elements that regulate transcription
    • A note on the word “enhancer”
  • Transcription factors bind gene regulatory sequences to regulate transcription
  • Enhancer gene regulation is cell-type- and context-specific
  • Enhancers are enriched for human genetic variation, disease-associated variation

TFs, co-activators and gene regulatory sequences cooperate to control gene transcription (Shlyueva et al. Reference maps of gene regulatory elements generated by large consortia such as ROADMAP, FANTOM5 and ENCODE (Roadmap Epigenomics Consortium et al.).

Figure 1.1: Tissue-specific enhancers bind transcription factors and interact with transcription start sites Shlyueva et al
Figure 1.1: Tissue-specific enhancers bind transcription factors and interact with transcription start sites Shlyueva et al

Annotations and methods for enhancer characterization

  • Enhancer activity requires open chromatin
  • Histone markers
  • Transcription factor binding
  • Transcribed enhancer RNAs
  • Gene regulatory reporter assays
    • in vivo reporter assays
    • Massively parallel reporter assays (MPRAs)
    • STARR-seq reporter assays
    • Evaluating effects of human genetic variation on gene regulation using
    • SHARPR-MPRA
    • ATAC-STARR-seq reporter assays

One of the most powerful approaches for evaluating developmental gene regulatory activity uses transgenic in vivo reporter assays. Variations on the MPRA have been used to measure different features of gene regulatory activity.

Figure 1.2: Approaches for identifying and testing the activity of candidate enhancer sequences (a) Biochemical annotations of candidate enhancers: schematic depiction of an enhancer and a target gene, marked with the biochemical annotations used to nomina
Figure 1.2: Approaches for identifying and testing the activity of candidate enhancer sequences (a) Biochemical annotations of candidate enhancers: schematic depiction of an enhancer and a target gene, marked with the biochemical annotations used to nomina

Methods for estimating enhancer evolution using comparative genomics

  • Sequence homology, synteny, and multiple sequence alignments
  • Sequence conservation and measuring substitution rates
  • Human acceleration and positive selection in enhancers
  • Sequence ages
  • Comparative histone modification and chromatin accessibility reveal alignable se-
  • Comparative reporter assays reveal differences in gene regulatory activity between
    • Comparing regulatory activity of evolutionary divergent sequences
  • Chimeric cellular models
  • Lymphoblastoid cellular models for comparing within and between species gene

Considering this evidence, it is unclear what specific contributions HAR substitutions make to human-specific gene regulatory activity. Other factors may drive divergent gene regulatory activity between species, such as nuclear TF abundance or epigenetic changes in chromatin accessibility (Gershman et al. 2022)), that do not alter the underlying genetic sequence.

Figure 1.4: ATAC-STARR-seq workflow
Figure 1.4: ATAC-STARR-seq workflow

Evolution of enhancers drives species divergence

  • Gene expression patterns are largely conserved, despite functional gene regulatory
    • Populations do not tolerate variation with large effects on phenotype; vari-
    • Turnover and rearrangement of gene regulatory sequences can affect tran-
  • How useful is measuring sequence conservation for determining gene regulatory
  • Conserved enhancer sequences
    • Ultra-conserved and conserved regulatory sequences
    • Conservation of TFBS, neutrality of spacing in-between
  • Divergent enhancer activity—rapid turnover between species
  • Mechanisms of functional gene regulatory evolution in humans
  • Theory and models of enhancer sequence evolution
    • Nucleation model of enhancer sequences with multiple ages
    • Transposable element integration may produce gene regulatory elements 28

Changes in gene expression levels are not the only result of variation in the gene regulatory sequence. Effective modeling of gene regulatory sequence evolution must consider both attributes when assessing effects on gene regulatory activity.

Figure 1.6: Enhancer sequence nucleation model
Figure 1.6: Enhancer sequence nucleation model

Chapters Outline

Chapter 1—Models of human enhancer sequence evolution

Using sequence features, functional data, human genetic variation, and transposable element information, I address the nucleation model of enhancer evolution and expand on the various ways in which enhancer sequences can evolve over time. We first aged human transcribed enhancer sequences from the FANTOM5 tissue datasets (Andersson et al. 2014)) by estimating the most recent common ancestor of this sequence with multiple sequence alignments.

Chapter 2—Enhancers with multiple sequence origins are functional, under evolu-

Chapter 3—Genome-wide dissection of the mechanisms of gene regulatory diver-

Finally, we exploited PheWAS data from the UK Biobank to link genetic variation to electronic health record traits and determine the biological function of divergent regulatory regions of genes under positive selection. Modeling the evolutionary architecture of transcribed human enhancer sequences reveals diverse origins, functions, and links to human trait variation.

ABSTRACT

INTRODUCTION

Nevertheless, most of the gains in enhancer activity do not occur under strong positive selection (Pollard et al. Although important, TE-derived sequences (TEDS) are depleted in sequences with enhancer activity compared to the rest of the genome (Emera et al.

RESULTS

  • Estimating enhancer ages using vertebrate multiple species alignments
  • Enhancers are older, longer, and more conserved than the genomic background
  • Enhancers are enriched for simple evolutionary sequence architectures
  • The oldest sequences occur in the middle of complex enhancers
  • Complex enhancers are longer and older than simple enhancers
  • Complex enhancers are more pleiotropic and more conserved in activity across
  • Simple and complex enhancers are under similar levels of purifying selection
  • Genetic variants in simple enhancers are more likely to be associated with human
  • Genetic variants in simple enhancers are enriched for changes in biochemical regu-
  • Transposable element sequences can both nucleate and remodel enhancers
  • Different TE families are enriched in simple and complex enhancers
  • Age architectures of enhancers identified by histone modifications show similar trends 48

Enhancer pleiotropy generally increases with age, and complex enhancers are consistently more pleiotropic than age-matched simple enhancers (Figure 2.3A). Simple enhancers are more enriched for GWAS variants than complex enhancers (p = 0.01, two-tailed permutation test).

Figure 2.1: Illustration of the method for mapping enhancer sequence age architecture.
Figure 2.1: Illustration of the method for mapping enhancer sequence age architecture.

METHODS

  • Syntenic block aging strategy
  • eRNA enhancer identification, aging, and architecture assignment
  • ChIP-peak enhancer identification, aging, and architecture assignment
  • Trimming and expansion of ChIP-peak enhancer lengths
  • Human syntenic block PhastCons conservation
  • Background random genome regions and architectures
  • Enhancer pleiotropy
  • Cross-species enhancer activity
  • Enhancer sequence constraint
  • GWAS catalog enrichment
  • ClinVar variant enrichment
  • eQTL enrichment
  • Massively parallel reporter assay data
  • Transposable element derived sequence enrichment

We explored enhancer architectures identified by the Roadmap Epigenomics Consortium et al. Human liver enhancers from a cross-species analysis of vertebrate livers (Villar et al. 2015)), assigned ages and architectures. The Emera et al dataset is derived from the Reilly et al dataset and filtered based on human-mouse active enhancer overlap and alignment.

Enrichment for GTEx v6 eQTL from 46 tissues (last downloaded 23 July 2019) (GTEx Consortium et al. 2017a)) in enhancers with simple and complex architectures was tested against a null distribution determined by shuffling observed enhancers using the same strategy as described for GWAS variant enrichment.

DATA AVAILABILITY

The following datasets were derived from sources in the public domain

ACKNOWLEDGEMENTS

Brein_Cingulate_Gyrus Brein_Mid_Frontale_Lobe CD14_Primêre_Selle CD3_Primêre_Selle_Perifere_UW Fetale_Darm_Klein Fetale_Spier_Been Fetale_spier_stam Fetale_Plasenta Fetale_Maag Fetaal 0.0000. GWAS Katalogus-variant Vou verryking CD4+_CD25-_IL17+_PMA-Ionomcyin_stimulated_Th17_Primary_CellsPenis_Foreskin_Melanocyte_Primary_Cells_skin03Penis_Foreskin_Fibroblast_Primary_025_Cells_Primary_Cells-Cells-Cells-Cells_454+ CellsCD4+_CD25+_CD127-_Treg_Primary_CellsMobilized_CD34_Primary_Cells_FemaleCD19_Primary_Cells_Peripheral_UWHepG2_Hepatocellular_CarcinomaCD4_Memory_Primary_Cells_MoothStom_Primary_Sells_Stomive_Primary_Cells_Moothstoms stantia_NigraDnd41_TCell_Leukemie Rektale_Smooth_MuscleFetal_Adrenal_GlandCD56_Primary_CellsPancreatic_IsletsiPS-18_Cell_LineK562_LeukemiePsoas_MuscleLeft_VentricleAult_Liver. Chondrocytes_from_Bone_Marrow_Derived_Mesenchymal_Stem_Cell_Cultured_CellsCD4+_CD25-_IL17-_PMA-Ionomycin_stimulated_MACS_purified_Th_Primary_CellsBeen_Marrow_Derived_Mesenchymell_Cultured_Mesenchymell_Cultured_Mesenchymell ocyte_Primary_Cells_skin03hESC_Derived_CD184+_Endoderm_Cultured_CellsPenis_Foreskin_Melanocyte_Primary_Cells_skin01H1_BMP4_Derived_Mesendoderm_Cultured_Cellsh_ESCoderm. 25-_CD45RO+_Memory_Primary_CellsPenis_Foreskin_Fibroblast_Primary_Cells_skin01H1_BMP4_Derived_Trophoblast_Cultured_CellsHUVEC_Umbilical_Vein_Endothelial_CellsA5_Adf_CellsA5_Adf_01_0200_0100_0101_0100_01000 ult_Dermale_FibroblasteNHEK-Epidermale_KeratinosieteBrain_Inferior_Temporal_LobeMonocytes-CD14+_RO01746HeLa-S3_Servikale_KarsinoomCD8_Geheue_Primêre_Selle_Brain_Hippocampus_Brain_Brain_Hippocampus_Skelale_Middle_Brain oth_MuscleFetal_Intestine_LargeBrain_Angular_GyrusCD14_Primary_CellsFetal_Muscle_TrunkHUES48_Cell_LineFetal_Muscle_LegiPS-20b_Cell_LinePlacenta_AmnionColonic_Slim_Small_Ventriuctom_Colonic_Small_VentriuctomSdi achFetal_PlacentaFetal_ThymusH1_Cell_LineH9_Cell_LineOsteoblasts Esophagus Pankreas ThymusOvaryLung.

GWAS Katalogus-variant Vou verryking CD4+_CD25-_IL17+_PMA-Ionomcyin_stimulated_Th17_Primary_CellsPenis_Foreskin_Melanocyte_Primary_Cells_skin03Penis_Foreskin_Fibroblast_Primary_Cells_Tm254_Cdm_254_Cdm_Cdm_254+ _SelleCD4+_CD25-_CD45RA+_Naïef_Primêre_SellePerifere_Bloed_Mononukleêre_Primêre_SelleCD4+_CD25+_CD127-_Treg_Primêre_SelleMobilized_CD34_Primêre_Selle_Prmêre_Selle_Vroulike_PUWVroulike_PUWVroulike_Selle_PUWVroulike ular_CarcinomaCD4+_CD25-_Th_Primary_CellsCD4_Memory_Primary_CellsDuodenum_Smooth_MuscleCD8_Naive_Primary_CellsMaag_Smooth_MuscleCD4_Naive_Primary_CellsBrain_Anterior_Caustanti_Brain_Semia_NTCUstantiBrain_Duk oth_MuscleFetal_Adrenal_GlandiPS_DF_6.9_Cell_LineCD56_Primary_CellsPancreatic_IsletsiPS-18_Cell_LineK562_LeukemiaPsoas_MuscleLeft_Ventricle_Ventricle_LiverAorta.

ABSTRACT

Introduction

However, models that synthesize the evolutionary origins of enhancer sequences with an understanding of functional modules are needed. Thus, deeper understanding of enhancer sequence evolution will contribute valuable context for resolving gene regulatory functions of candidate disease. Indeed, the relationship between the sequences of different evolutionary origins in these enhancers and the gene regulatory functions they produce is poorly understood.

For example, whether the sequences from different evolutionary periods have independent gene regulatory functions is unclear for most complex enhancers.

Results

  • Enhancers are commonly composed of older core and younger derived sequences . 100
  • Both derived and core regions are older than expected from matched background
  • Complex enhancers are enriched for core and derived sequences from consecutive
  • Derived sequences have higher transcription factor binding site density than cores . 105
  • Core and derived regions have similar activity in MPRAs
  • Derived sequences are less evolutionarily constrained than core sequences
  • Derived enhancer regions have more genetic variation than core regions
  • Derived enhancer regions are enriched for eQTL

Derived sequences are also enriched for older ages compared to derived regions of background sequences with corresponding core ages. These results indicate that both core and derived sequences are older than expected and suggest that both components often have limited regulatory function. In general, enhancers are enriched for core and derived sequences from the successive phylogenetic branches compared to background complex regions (Figure 3.3).

Both core and derived regions are shorter than simple enhancers (dashed line, median 260 bp simple, p<2.2e-308). B) Both core and derived sequences are enriched for older range ages and depleted from younger range ages.

Figure 3.1: Complex enhancers consist of older core and younger derived sequences.
Figure 3.1: Complex enhancers consist of older core and younger derived sequences.

Discussion

  • What is the functional importance of derived enhancer sequences to their core regions?112
  • Can considering enhancer evolutionary architecture aid interpretation of rare and
  • Limitations
  • Conclusion

Thus, the core and derived sequence often appear to be functional, but they also have different evolutionary and functional properties. However, we do not know how often the core and derived sequences alone are sufficient for independent regulatory activity. Future work is needed to determine when derived sequences enhance or diversify the regulatory function of genes across species.

Without further biochemical assays, we cannot test whether most of the core and derived sequences have regulatory activity when separated.

Methods

  • Assigning ages to sequences based on alignment syntenic blocks
  • eRNA enhancer data, age assignment, and architecture mapping
  • cCRE enhancer data, age assignment, and architecture mapping
  • MPRA activity data
  • Genome-wide shuffles to determine expected background distributions
  • TFBS density and enrichment
  • LINSIGHT purifying selection estimates
  • TFBS motif sequence specificity
  • eQTL enrichment

Variation in gene regulatory sequences underlies much of the phenotypic variation between individuals and species. These shuffled sets were matched to the chromosome and length distribution of the observed regions in each data set. Variant density was estimated as the number of SNPs overlapping a syntenic block divided by the length of the syntenic block.

LINSIGHT provides per base pair estimates of the probability of negative selection (Huang et al.

Data availability

11 S11 High TFBS density in core regions correlates with high TFBS density in derived regions within. 12 S12 High TFBS density in core regions correlates with high TFBS density in derived regions within. Derived regions make up a significant proportion of complex K562 cCREs (N = 24,415 cCREs), are shorter (top left) and older (top right) than expected compared to mixed complex enhancer architectures (N cCREs).

TFBS density of core and matched derived regions per enhancer is plotted on the X and Y axis respectively.

Figure S1: Number of derived regions per complex enhancer
Figure S1: Number of derived regions per complex enhancer

ABSTRACT

INTRODUCTION

Cis and trans effects are difficult to study independently because the cellular environment and genomic sequence are intrinsically linked within the endogenous environment. In addition, ATAC-STARR-seq does not require prior knowledge of the DNA sequences and is therefore unbiased in assessing cis and trans effects in the genome. ATAC-STARR-seq is uniquely tailored to investigate cis and trans effects on a global scale.

We apply ATAC-STARR-seq to human and rhesus macaque lymphoblastoid cell lines (LCLs) to systematically identify cis and trans effects on gene regulatory divergence genome-wide.

RESULTS

  • Comparative ATAC-STARR-seq produces a multi-omic view of human and macaque
  • Decoupling of cis v. trans regulatory divergence
  • Most regulatory differences are driven by changes in cis and trans
  • Trans regions are significantly conserved while cis regions are enriched for acceler-
  • SINE/Alu TEs are enriched in cis & trans divergence
  • Trans-only sequence ages are older than cis-only and cis & trans
  • Trans-only elements are enriched for composite sequences with multiple-origins
  • Key transcriptional regulators of immune pathways are differentially expressed be-
  • The majority of trans regions are bound by differentially expressed TFs
  • Human accelerated cis-element regulates NLRP1 and impacts human-specific cel-

Among human active regions (N N = 6,509 regions), we showed evidence of differences in cis activity; that is, humans, but not the macaque homologue, were active in the human cellular environment (Figure 4.2D). However, when comparing human sequence activity between human and macaque environments, we found that regions of human active regions had trans effects: human DNA was active in the human, but not in the macaque cellular environment (Figure 4.2F). In addition, human-specific genes are enriched for immune pathways, such as interferon signaling and interleukin-10 signaling (Figure 4.5B).

Stratification of the footprint enrichment by differential gene expression revealed immune regulators, including IRF7, that are differentially expressed in humans and are enriched for binding in human active trans-only regions (Figure 4.5D).

Hansen/Fong et al. 2022, Figure 1
Hansen/Fong et al. 2022, Figure 1

DISCUSSION

  • Why do we observe so many trans effects?
  • What are cis & trans elements and why are they so abundant?
  • Divergence time may affect the abundance of cis and trans elements observed
  • Why are cis & trans elements less conserved?
  • What is the significance of the TEDs enrichment in cis & trans elements?
  • Is the LCL cell model relevant for evaluating gene regulatory divergence?
  • What is the significance of NLRP1 evolution in humans?
  • Limitations

Our ATAC-STARR-seq strategy directly tests differences in gene regulatory activity due to the environment. However, the abundance of cis- and trans-activity differences between species suggests that species-specific regulatory evolution is tightly coordinated between sequence and cellular environment. The abundance of cis and trans elements may reflect an evolutionary transition in the mechanism of gene regulatory divergence between humans and rhesus macaques from predominantly trans to predominantly cis.

Together, our work presents a broad assessment of gene regulatory variation and divergence between humans and rhesus macaques in cis and trans.

Figure 4.6: Human accelerated cis regulatory elements contribute to trans-regulation of inflammatory responses in humans
Figure 4.6: Human accelerated cis regulatory elements contribute to trans-regulation of inflammatory responses in humans

METHODS

  • Cell Culture
  • ATAC-STARR-seq
  • Read Processing
  • Chromatin Accessibility Peak Calling and Filtering
  • Differential Accessibility Analysis
  • TF Footprinting
  • Genome Browser
  • Active Region Calling Within Shared Accessible Peaks
  • Active Region Calling
  • Generation of ATAC-STARR-seq activity bigWigs
  • Heatmaps
  • Differential Activity Analysis
  • Functional Characterization of Cis and Trans Effects
  • TF Motif Enrichment
  • Gene Ontology
  • Histone modification heatmaps
  • Distance to ChrAcc peak summits
  • FANTOM B cell element enrichment
  • Evolutionary Analysis
    • Generating expected background datasets from shared accessible, inactive
    • PhastCons enrichment analysis
    • Human acceleration enrichment analysis
    • Repeatmasker transposable element enrichment
    • Multiple sequence origin enrichment analysis
    • Population Genetics Analysis
    • UKBB GWAS trait enrichment
  • RNA-sequencing
  • Gene Expression Analysis
    • Data Collection
    • Fastq Processing
    • Differential Expression Analysis
    • Correlation Plot
    • Principle Component Analysis
  • TF Footprint Enrichment Analysis
  • Trans only TF footprint enrichment vs. differential expression

We called active regions for each of the four experimental conditions using 2,028,304 filtered sliding window bins as input. To call active regulatory regions, we first assigned reads to filtered sliding window bins using the featureCounts function from the Subread package with the following parameters: -p -B -O -minOverlap 1 (cito Subread); for rheMac10 map readings, we used bins in rheMac10 coordinates (linked to hg38 coordinates by a unique bin ID). We first sampled inactive bins for each condition using the Unix command shuf (-n 150000) to reduce the number of regions plotted.

To identify regions that were divergent in both “cis & trans”, we asked whether the exact same region was contained in both the cis and trans effect region sets using BEDTools intersecting and -f 1.0 -r parameters; we maintained species specificity by comparing only human-specific cis with human-specific trans and macaque-specific cis with macaque-specific trans .

Supplemental Figures

The work elucidates how cellular environmental differences intransinfluence species-specific gene-regulatory activity through human and rhesus macaque LCL open chromatin, the whole genome. In this model, we propose two possible ways in which inactive sequences transition to gain gene regulatory activity. However, perturbations of gene regulatory sequences can occur without altering gene regulatory activity or without perturbing gene expression levels.

Probably a table of the gene regulatory code would be much more complex than the codon.

Hansen/Fong et al. 2022, Figure 1 Supplement A
Hansen/Fong et al. 2022, Figure 1 Supplement A

Approaches for identifying and testing the activity of candidate enhancer sequences

SHARPR-MPRA design and per base pair analysis strategy

ATAC-STARR-seq workflow

Estimating human acceleration

Enhancer sequence nucleation model

Proposed model of how transposable element derived sequences may form into species-

Illustration of the method for mapping enhancer sequence age architecture

Simple and complex enhancers have distinct evolutionary architectures, lengths, and ages. 41

Simple enhancers are enriched for GWAS hits and variants with significant regulatory

Simple and complex enhancers are enriched for sequences derived from different trans-

Model of enhancer evolutionary architecture change and activity

Complex enhancers consist of older core and younger derived sequences

Derived sequences are shorter than cores and older than expected from the non-coding

Complex enhancers are enriched for core and derived sequences from consecutive phylo-

Derived regions have high transcription factor binding site densities and bind different

Both core and derived regions have regulatory activity in massively parallel reporter assays. 109

ATAC-STARR-seq methods for comparing chromatin accessibility and reporter activity

Widespread Cis and Trans differences in gene regulatory activity for both human-active

Trans effect sequences are conserved, cis effect sequences enriched for human accelera-

Active ATAC-STARR regions are enriched for older sequence ages, multi-origin enhancer

Trans-only regions are enriched for TF footprints with differential expression

Human accelerated cis regulatory elements contribute to trans-regulation of inflammatory

ATAC-STARR-seq methods for comparing chromatin accessibility and reporter activity

Support of differential activity calls

Evolutionary sequence features of divergently active regulatory elements

Gambar

Figure 1.3: SHARPR-MPRA design and per base pair analysis strategy
Figure 1.7: Proposed model of how transposable element derived sequences may form into species- species-specific gene regulatory elements
Figure 2.1: Illustration of the method for mapping enhancer sequence age architecture.
Figure 2.2: Simple and complex enhancers have distinct evolutionary architectures, lengths, and ages.
+7

Referensi

Dokumen terkait

FUNCTIONAL GROUP INTERCONVERSIONS ALCOHOLS & THE CARBONYL GROUP INTRODUCTION • So far we have discussed methods for the formation of the carbon skeleton • In a large number of these