compared the number of simple/complex enhancers overlapping a TEDS family with the number of simple/complex architectures overlapping any other TEDS family of that age. Enrichment significance was evaluated using Fisher’s Exact Test and FDR controlled at 10%.
Supplemental Material Table of Contents
Supplemental Figures
1 – Strategy for generating architecture matched genomic background coordinates.
2 – Transcribed enhancer sequence ages are enriched for older sequences ages, deplete of younger sequence ages.
3 – Hg19 genome syntenic block age distribution.
4 – Masking exons, simple repeats, transposable elements from transcribed enhancer dataset, genomic background does not change interpretations of simple and complex enhancer architectures.
5 – Complex enhancers have fewer age segments than expected.
6 – Both simple and complex transcribed enhancers are enriched for older sequences, depleted of younger sequences compared to expectation. Odds of complex architecture is depleted or random across ages.
7 – Complex age architecture landscapes in transcribed enhancers.
8 – Complex age architecture landscapes for transcribed enhancer with 3+ breaks.
9 – Simple and complex transcribed enhancer lengths versus architecture-matched expectation, per age.
10 – Simple transcribed enhancer syntenic blocks are longer than complex syntenic blocks across ages.
11 – Transcribed simple and complex enhancer architecture enrichment across FANTOM tissue and cell line datasets.
12 – Tissue pleiotropy is correlated with transcribed enhancer length per age.
13 – Developmental human neocortical enhancers from Reilly et al., Emera et al. dataset is enriched for simple architectures.
14 – Complex developmental human neocortical enhancers overlap more mouse and rhesus neocortical active enhancers than simple enhancers.
15 – PhastCons estimates for complex and simple transcribed enhancers.
16 – Tissue pleiotropy is weakly correlated with purifying selection in simple and complex enhancers per age.
17– Simple transcribed enhancers are more enriched than complex enhancers for GWAS variants across ages.
18 – ClinVar annotations in transcribed enhancer architectures.
19 – eQTL variants are similarly enriched in simple and complex transcribed enhancers
20 – Variants in simple transcribed and histone enhancers are enriched for significantly affect regulatory activity in massively parallel reporter assay.
21 – TEDS enrichment in simple and complex transcribed enhancer sequences.
22 – TEDS are enriched in cores of younger complex transcribed enhancers, depleted from cores of older complex enhancers.
23 – Histone-defined enhancers are enriched for older sequence ages.
24 – Distribution of median number of age segments for 98 ROADMAP histone enhancer datasets.
25 – Simple and complex histone enhancer age architectures.
26 – Removing exons overlapping Roadmap histone enhancers increases enrichment of simple enhancers across tissues.
27 – Exon overlap flanking regions in complex histone enhancers 28 – Complex histone enhancer age architecture landscapes.
29 – Histone (non-exon) simple and complex enhancer age architectures.
30 – Trimmed histone (non-exon, 310 bp) simple and complex enhancer age architectures.
31 – Tissue pleiotropy across 98 tissue and developmental samples is higher complex histone enhancers versus simple.
32 – LINSIGHT purifying selection estimates in histone brain, blood, and developmental datasets.
33 – Histone simple and complex enhancer GWAS tag-SNP enrichment in 98 tissue and cell datasets.
34 – ClinVar annotations in histone enhancer architectures.
Supplemental Tables
1 – Summary of key FANTOM and ROADMAP findings.
2
Supplemental Figure 1. Strategy for generating architecture matched genomic backgroundcoordinates. Shuffled, non-exonic genomic background coordinates were matched on enhancer-length and
chromosome number (Methods). Syntenic sequence age was then assigned to matched-background datasets and simple/complex architecture was determined from the median number of age segments per enhancer in the corresponding enhancer dataset. On the right, kernel density estimates of simple and complex genomic background sequence lengths are comparable to matched simple and complex FANTOM sequence lengths.
62
Supplemental Figure 2. Transcribed enhancer sequence ages are enriched for older sequences ages, deplete of younger sequence ages. (A) The distribution of enhancer sequence ages across 30,474 FANTOM
transcribed enhancers compared to 100 sets of length-matched random genomic regions (N = 2,700,459 shuffled regions, gray). Enhancers are significantly older than expected compared to length-matched random genomic regions (p < 2.2e-308, Mann Whitney U test). Sample sizes for FANTOM (black) and shuffled (grey) bars are annotated below. (B) Enhancer lengths by age versus 100 sets of length-matched random genomic background sets. Older enhancers are longer than expected (median 321 bp versus 310 bp, enhancers versus random regions older than placental mammals; p < 2.2e-308), younger enhancers are shorter than expected (median 277 bp versus 286 bp random regions; p = 3e-15). (C) Enhancers are more conserved than younger enhancers and more conserved than expected (28% enhancers versus 12% random regions overlap a PhastCons element). Sample sizes for FANTOM (black) and shuffled (grey) bars are annotated below. (D) Enhancers are enriched for older sequence ages compared with length-matched random 100x genomic shuffle regions. Fold-change is log2-scaled. Numbers in parenthesis represent estimated MYA since the last common ancestor. (E) Younger enhancers are shorter than expectation. Linear regression fit to enhancer and shuffled lengths over ages.
Euar
Homo Prim Bore Euth Ther Mam Amni Tetr Vert 0 200 400 600
Millions of years ago 280
300 320 340 360 380
Enhancer Length
FANTOM SHUFFLE
A B C
D E
2 1 0 1 2
Fold ratio v. bkgd (Log2-scaled) 0.0 0.2 0.4 0.6
Frequency
240 280 320 360
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
PhastCons overlap per age Frequency
Length (bp) Prim Euar Bore Euth Ther Mam Amni Tetr Vert
Euar
Homo Prim Bore Euth Ther Mam Amni Tetr Vert Euar
Homo Prim Bore Euth Ther Mam Amni Tetr Vert
Enhancer N = 30438 Shuffle N = 2700459
N 46 1331 248 1791 16444 3828 3181 1418 616 153524918 455264 55868 262145 1316248 197471 131345 92308 46017 118874 N 15 4 64 2062 1894 2136 1339 549 1091974 207 1762 14055 10746 13089 11832 5492 1558
4
Supplemental Figure 3. Hg19 human genome age distribution. (A) Most human syntenic bases (N =2,877,464,452) are derived from the placental (Eutherian) ancestor. Sequence age for each base pair from hg19 UCSC 46-way MultiZ sequence alignments. Sequence age are assigned to each syntenic block based on the oldest most recent common ancestor (MRCA) of extant species alignable with humans. Number of bases per age is annotated. (B) Younger syntenic blocks are longer than older syntenic blocks. Median syntenic block length per age is shown. Syntenic block sample sizes are annotated per bar. (C) A minority of syntenic blocks per age overlap phastCons elements. Percent of syntenic blocks overlapping phastCons elements within each age. Number of syntenic blocks overlapping PhastCons elements annotated in black.
20 40 60 80 100 120
Syntenic block length (bp) Homo sapiens (0) Primate (74) Euarchontoglires (90) Boreoeutheria (96) Eutheria (105) Theria (159) Mammalia (177) Amniota (312) Tetrapoda (352) Vertebrata (615)0
Homo sapiens (0) Primate (74) Euarchontoglires (90) Boreoeutheria (96) Eutheria (105) Theria (159) Mammalia (177) Amniota (312) Tetrapoda (352) Vertebrata (615) 0.0
0.1 0.2 0.3 0.4
% of syntenic bases Primate (74) Euarchontoglires (90) Boreoeutheria (96) Eutheria (105) Theria (159) Mammalia (177) Amniota (312) Tetrapoda (352) Vertebrata (615)0.0
0.1 0.2 0.3
Frequency of syntenic blocks overlapping phastCons per age
A B C
N = 2,877,464,452 bp
57195 2466301 841559 3541287 18694511 2563874 1435312 1065308 463225 2205819
88413663 669414705 82753551 301638072 1262015699 161386021 95017310 669414705 88413663 123498074 1004 302 2702 49501 28167 28370 24471 10243 65795
64
No Simple Tandem Repeats
n = 28719
1 2 3 4 5
0.5 0.7 1 1.4
Fold Change v. Bkgd (log2-scaled)
No Ensembl Exons n = 30439
1 2 3 4 5
0.5 0.7 1 1.4
Fold Change v. Bkgd(log2-scaled)
No Repeatmasker n= 15564
1 2 3 4 5
0.5 0.7 1 1.4
Fold Change v. Bkgd(log2-scaled)
Raw
n = 30462 enhancers
1 2 3 4 5
0.5 0.7 1 1.4
Fold Change v. Bkgd (log2-scaled)
Number of Age Segments
FANTOM SHUFFLE
0.0 0.5 1.0 1.5 2.0 2.5
# age segments Homo sapiens (0) Primate (74) Euarchontoglires (90) Boreoeutheria (96) Eutheria (105) Theria (159) Mammalia (177) Amniota (312) Tetrapoda (352) Vertebrata (615)
2 1 0 1 2
log2(Fold Change) apiens (0) imate (74) glires (90) eria (96) eria (105) eria (159) alia (177) iota (312) oda (352) rata (615)
2 1 0 1 2
log2(Fold Change) Primate (74) Euarchontoglires (90) Boreoeutheria (96) Eutheria (105) Theria (159) Mammalia (177) Amniota (312) Tetrapoda (352) Vertebrata (615) 0.0
0.5 1.0 1.5 2.0 2.5 3.0
# age segments Homo sapiens (0) Primate (74) Euarchontoglires (90) Boreoeutheria (96) Eutheria (105) Theria (159) Mammalia (177) Amniota (312) Tetrapoda (352) Vertebrata (615)0 50
100 150 200 250 300 350
Enhancer Length (bp) Homo sapiens (0) Primate (74) Euarchontoglires (90) Boreoeutheria (96) Eutheria (105) Theria (159) Mammalia (177) Amniota (312) Tetrapoda (352) Vertebrata (615)0.0
0.1 0.2 0.3 0.4 0.5 0.6 0.7
Frequency
Number of Age Segments
Primate (74) Euarchontoglires (90) Boreoeutheria (96) Eutheria (105) Theria (159) Mammalia (177) Amniota (312) Tetrapoda (352) Vertebrata (615) 0.0
0.5 1.0 1.5 2.0 2.5 3.0
# age segments Homo sapiens (0) Primate (74) Euarchontoglires (90) Boreoeutheria (96) Eutheria (105) Theria (159) Mammalia (177) Amniota (312) Tetrapoda (352) Vertebrata (615)
2 1 0 1 2
log2(Fold Change) apien (0) imate (74) glires (90) eria (96) eria (105) eria (159) alia (177) iota (312) oda (352) rata (615)
2 1 0 1 2
log2(Fold Change) FANTOM SHUFFLE
0.0 0.5 1.0 1.5 2.0 2.5
# age segments Homo sapiens (0) Primate (74) Euarchontoglires (90) Boreoeutheria (96) Eutheria (105) Theria (159) Mammalia (177) Amniota (312) Tetrapoda (352) Vertebrata (615) 0
50 100 150 200 250 300 350
Enhancer Length (bp) Homo sapien (0) Homo sapiens (0) Primate (74) Euarchontoglires (90) Boreoeutheria (96) Eutheria (105) Theria (159) Mammalia (177) Amniota (312) Tetrapoda (352) Vertebrata (615)0.0
0.1 0.2 0.3 0.4 0.5 0.6 0.7
Frequency
Number of Age Segments
Homo sapiens (0) Primate (74) Euarchontoglires (90) Boreoeutheria (96) Eutheria (105) Theria (159) Mammalia (177) Amniota (312) Tetrapoda (352) Vertebrata (615) 0
50 100 150 200 250 300 350
Enhancer Length (bp) Primate (74) Euarchontoglires (90) Boreoeutheria (96) Eutheria (105) Theria (159) Mammalia (177) Amniota (312) Tetrapoda (352) Vertebrata (615)0.0
0.5 1.0 1.5 2.0 2.5 3.0 3.5
# age segments FANTOM SHUFFLE
0.0 0.5 1.0 1.5 2.0 2.5
# age segments Primate (74) Euarchontoglires (90) Boreoeutheria (96) Eutheria (105) Theria (159) Mammalia (177) Amniota (312) Tetrapoda (352) Vertebrata (615)0.0 0.1
0.2 0.3 0.4 0.5 0.6 0.7
Frequency Homo sapiens (0) Primate (74) Euarchontoglires (90) Boreoeutheria (96) Eutheria (105) Theria (159) Mammalia (177) Amniota (312) Tetrapoda (352) Vertebrata (615)
2 1 0 1 2
log2(Fold Change) apiens (0) ate (74) glires (90) eria (96) eria (105) eria (159) alia (177) iota (312) oda (352) rata (615)
2 1 0 1 2
log2(Fold Change)
Number of Age Segments
Primate (74) Euarchontoglires (90) Boreoeutheria (96) Eutheria (105) Theria (159) Mammalia (177) Amniota (312) Tetrapoda (352) Vertebrata (615) 0.0
0.5 1.0 1.5 2.0 2.5 3.0
# age segments Homo sapiens (0) Primate (74) Euarchontoglires (90) Boreoeutheria (96) Eutheria (105) Theria (159) Mammalia (177) Amniota (312) Tetrapoda (352) Vertebrata (615)
2 1 0 1 2
log2(Fold Change) FANTOM SHUFFLE
0.0 0.5 1.0 1.5 2.0 2.5
# age segments Homo sapiens (0) Primate (74) Euarchontoglires (90) Boreoeutheria (96) Eutheria (105) Theria (159) Mammalia (177) Amniota (312) Tetrapoda (352) Vertebrata (615)0 50
100 150 200 250 300 350
Enhancer Length (bp) Homo sapien Homo sapiens (0) Primate (74) Euarchontoglires (90) Boreoeutheria (96) Eutheria (105) Theria (159) Mammalia (177) Amniota (312) Tetrapoda (352) Vertebrata (615)0.0
0.1 0.2 0.3 0.4 0.5 0.6 0.7
Frequency mo sapien imate (74) glires (90) eria (96) eria (105) eria (159) alia (177) iota (312) oda (352) rata (615)
2 1 0 1 2
log2(Fold Change)
6
Supplemental Figure 4. Masking exons, simple repeats, transposable elements from transcribedenhancer dataset, genomic background does not change interpretations of simple and complex enhancer architectures. Hg19 Ensembl exon coordinates (downloaded from UCSC genome browser on
2020-09-25), tandem repeats, and TEs (RepeatMasker hg19 open-4.0.5 - Repeat Library 20140131) were masked from both the FANTOM enhancer datasets and 100x enhancer length- and chromosome-matched shuffle datasets (which had been previously masked from blacklisted ENCODE regions) using the BEDTools subtract function.
66
Supplemental Figure 5. Complex enhancers have fewer age segments than expected based on length-
matched regions from the genomic background (overall mean 2.54 complex (N = 10,581) v. 2.68 complex shuffle (N = 1,123,232) total age segments, p = 9.9e-42, Mann Whitney U). Complex transcribed enhancers have significantly different numbers of age segments per MRCA (p = 1.9e-88, Kruskal Wallis) and compared with genomic background. Number of segments of different ages in complex enhancers of different ages (green) compared to length-matched complex regions selected randomly from the genomic background (gray).
At every age, the random segments have greater than or equal numbers of segments of different ages compared to complex enhancers. The largest differences are observed in Eutheria and Theria enhancers.
Error bars are estimated from bootstrapped 95% confidence intervals. Sample sizes are annotated per bar.
Supplemental Figure 6. Both simple and complex transcribed enhancers are enriched for older
sequences, depleted of younger sequences compared to expectation. Odds of complex architecture is depleted or random across ages. (A) Fold-ratio was estimated from simple (left, N = 19857) and complex
enhancers (right, N = 10581) against 100x permuted architecture-matched background genome regions.
Sample size is annotated for each bar. (B) Odds ratio of observing complex enhancer architecture versus simple enhancer architecture per age was estimated using Fisher’s Exact Test and FDR correction < 5%. Error bars represent 95
thconfidence intervals. Sample size is annotated for each bar.
Primate (74) Euarchontoglires (90) Boreoeutheria (96) Eutheria (105) Theria (159) Mammalia (177) Amniota (312) Tetrapoda (352) Vertebrata (615) 0.0
0.5 1.0 1.5 2.0 2.5 3.0
Mean age segments
A B
FANTOM N = 10581 SHUFFLE N = 1123232
0.0 0.5 1.0 1.5 2.0 2.5
Mean age segments
*
*p = 9.9e-42
280 99 541 2968 2251 2100 945 483 91495753 29564 128314 472729 130146 92461 64923 35794 73464 MamPrim Euar Bore Euth Ther Amni Tetr Vert
Odds Ratio (log2-scaled)
Complex enrichment
2 1 0 1 2
Fold Ratio v. Bkgd (log2-scaled)
2 1 0 1 2
Fold Ratio v. Bkgd (log2-scaled) MamPrim Euar Bore Euth Ther Amni Tetr Vert
HomoMam
Prim Euar Bore Euth Ther Amni Tetr Vert
Homo
A B
*
* *
*
*
*q < 0.05 0 .2
0 .5 1 2
19857 6561 2767 842 268 95 27 14 5
280 99 541 2968 2251 2100 945 483 914
46 1051 149 1250 13476 1577 1081 473 133 621
8
Supplemental Figure 7. Complex transcribed enhancer age architecture landscapes. Enhancersequence age landscapes were quantified across 100 bins and stratified by oldest sequence age. Sequence age architecture sampled from 10,956 complex autosomal FANTOM enhancers and 17,277 autosomal non- exonic background regions matched on complex architecture, enhancer-length and chromosome number.
Mean age distribution across complex enhancer sequences are shown, one panel per age. Middle 50% versus
0 25 50 75 100
Normalized Enhancer Bin n = 280
Enh mwu-p=0.016
Shuf n= 1459, Shuf mwu-p=0.0 0.0425
0.0450 0.0475 0.0500 0.0525 0.0550
mean mrca
0.051 0.049 0.05 0.047 0.046 0.047
Primate (74)
0 25 50 75 100
Normalized Enhancer Bin n = 2100
Enh mwu-p=0.0
Shuf n= 1385, Shuf mwu-p=0.0 0.28
0.29 0.30 0.31 0.32
mean mrca
0.296 0.317 0.299 0.284 0.282 0.283
Mammalia (177)
0 25 50 75 100
Normalized Enhancer Bin n = 99
Enh mwu-p=0.0
Shuf n= 463, Shuf mwu-p=0.0 0.080
0.085 0.090 0.095 0.100 0.105 0.110
mean mrca
0.1 0.094 0.093 0.087 0.088 0.094 Euarchontoglires (90)
0 25 50 75 100
Normalized Enhancer Bin n = 541
Enh mwu-p=0.0
Shuf n= 1878, Shuf mwu-p=0.0 0.100
0.105 0.110 0.115
mean mrca
0.113 0.109 0.11 0.105 0.103 0.105
Boreoeutheria (96)
0 25 50 75 100
Normalized Enhancer Bin n = 2968
Enh mwu-p=0.0
Shuf n= 7035, Shuf mwu-p=0.0 0.1400
0.1425 0.1450 0.1475 0.1500 0.1525 0.1550
mean mrca
0.152 0.152 0.151 0.142 0.143 0.143
Eutheria (105)
0 25 50 75 100
Normalized Enhancer Bin n = 914
Enh mwu-p=0.001
Shuf n= 1806, Shuf mwu-p=0.032 0.56
0.58 0.60 0.62
mean mrca
0.594 0.593 0.581 0.601 0.595 0.587
Vertebrata (615)
0 25 50 75 100
Normalized Enhancer Bin n = 945
Enh mwu-p=0.0
Shuf n= 877, Shuf mwu-p=0.0 0.32
0.34 0.36 0.38 0.40
mean mrca
0.376 0.396 0.378 0.347 0.336 0.337
Amniota (312)
0 25 50 75 100
Normalized Enhancer Bin n = 483
Enh mwu-p=0.0
Shuf n= 471, Shuf mwu-p=0.003 0.40
0.42 0.44 0.46 0.48 0.50
mean mrca
0.456 0.471 0.451 0.434 0.44 0.432
Tetrapoda (352)
0 25 50 75 100
Normalized Enhancer Bin n = 2251
Enh mwu-p=0.0
Shuf n= 1903, Shuf mwu-p=0.0 0.22
0.23 0.24 0.25 0.26
mean mrca
0.241 0.253 0.24 0.228 0.228 0.231
Theria (159)
68
outer 50% Mann-Whitney U values were calculated for each age classification. Shaded area represents 1000
bootstrapped 95% confidence intervals.
10
70
Supplemental Figure 8. Complex enhancer age architecture landscapes with 3+ ages in FANTOM are distinct from matched background. Enhancer sequence age landscapes were quantified across 100 bins
and stratified by oldest sequence age. Sequence age architecture sampled from 1,338 complex autosomal FANTOM enhancers and 7,674 non-exonic genomic background matched on length, chromosome, and complex architecture. Grey lines represent complex shuffled architectures with 3+ ages are shown, one panel per age. Grey numbers represent mean ages in inner 50% and outer 25% quadrants. Middle 50% versus outer 50% Mann-Whitney U values were calculated for each age classification. Shaded area represents 1000
bootstrapped 95% confidence intervals.
Supplemental Figure 9 – Simple and complex transcribed enhancer lengths versus architecture- matched expectation per age. (A) Median ages for transcribed enhancer and shuffled genome architectures
stratified by age. Complex enhancer sequences are slightly, but significantly, longer than expected (median 347 bp versus 339 bp; p = 2.5e-06, Mann Whitney U test). Simple enhancers are slightly longer than expected (median 259 bp simple versus 255 bp simple genomic background; p = 7.3e-05). Per bar sample sizes are annotated below. (B) Linear regression models fit to simple and complex transcribed enhancer lengths and
A
Homo Prim Euar Bore Euth Ther Mam Amni Tetr Vert
0 50 100 150 200 250 300 350
Length (bp)
0 500
mya 250
300 350 400
Length (bp) FANTOM-simple
SHUFFLE-simple FANTOM-complexenh SHUFFLE-complexenh
B
2483546 00
3595111051 95753280
26304149 2956499
1338301250 128314541
13476 843519 4727292968
673251577 1301462251
388841081 924612100
27385473 64923945
10223133 35794483
45410621 73464914 N simple FANTOM
N simple Shuffle N complex FANTOM N complex Shuffle
12 architecture-matched genomic background per millions of years (MYA) estimates from TimeTree (Hedges et al., 2015). Complex enhancers have a steeper slope than matched genomic background (10.6 bp/100 million years (MY) complex enhancer slope; p= 1.1e-17 versus 4.3 bp/100 MY complex genomic region slope; p=
3.7e-251, linear regression). In contrast, simple enhancers maintain a flat slope over time (-0.7 bp/100 MY simple enhancer slope; p= 0.5, versus -5.5 bp/100 MY simple genomic region slope; p< 2.2e-308).
72
Supplemental Figure 10. Simple transcribed enhancer syntenic blocks are longer than complex syntenic blocks across ages. Shown is the mean syntenic length per enhancer age. Syntenic blocks in
simple enhancers range between 216-279 bp long (median), while complex syntenic blocks range between 122-168 bp (median) across ages. Random non-exonic genomic shuffles matched on age and architecture are shown. Confidence intervals were estimated with 1000 bootstraps. Sample sizes for each bar are reported in Fig S9.
2483546 00
3595111051 95753280
26304149 2956499
1338301250 128314541
13476 843519 4727292968
673251577 1301462251
388841081 924612100
27385473 64923945
10223133 35794483
45410621 73464914 N simple FANTOM
N simple Shuffle N complex FANTOM N complex Shuffle
homo prim euar bore eut h ther mam amni tet r vert
0 100 200 300
synt enic leng th (bp)
simple-FANTOM
simple-SHUFFLE
complex-FANTOM
complex-SHUFFLE
14
Supplemental Figure 11. Simple enhancers are significantly enriched across FANTOM tissue and cell line datasets. Simple enhancer enrichment for each tissue dataset was evaluated versus 100 non-exonic,length-matched, chromosome-matched random genomic datasets. Fold enrichment was measured using Fisher’s Exact Test and odds ratio confidence intervals are plotted. All datasets with significant enrichment (*p<0.05) are annotated with an asterisk.
74
Supplemental Figure 12. Pleiotropy is correlated with transcribed enhancer length per age. Tissue
pleiotropy was measured using trimmed FANTOM enhancers (310 bp long) to control for random overlap
between longer enhancer and multiple tissue datasets. We stratified simple and complex enhancers into 20
equally-sized tissue pleiotropy bins (points) and evaluated the correlation between pleiotropy and raw, original
enhancer lengths. Bootstrapped confidence intervals are shown for each data point. Linear regression lines
were fit to the data and correlation coefficients are shown in the legend.
16
Supplemental Figure 13 – Developmental human neocortical enhancers from Reilly et al., Emera et al.dataset is enriched for simple architectures. Developmental neocortical enhancers from Reilly 2015 (n =
40,176) and Emera 2016 (n = 29,706) were aged, masked for exon overlap, and evaluated for enhancer architecture enrichment. Enhancers from Emera et al. were previously filtered for homologous mouse developmental neocortex H3K27ac+ peaks, thus excluding human-specific and primate-specific sequences.
(A) Enrichment in the number of enhancer age segments (top) was calculated against a matched-genomic background dataset using Fisher’s Exact Test. Error bars represent 95
thpercentile confidence intervals. (B) Cumulative distribution of enhancer age segments. Blue line represents the relative simple definition (less than median number of age segments in dataset) for Reilly (median 5 age segments) and Emera (median 6 age segments). (C) Enhancer and genomic background age frequency. (D) Frequency of architecture stratified across ages. Sample sizes are annotated per bar and over the entire architecture dataset. (E) fold-change measured as the ratio of enhancer to genomic background frequency per age. Error bars represent
bootstrapped 95
thpercentile confidence intervals.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Number of Age Segments
0.7 1 1.4
Fold Change v. Bkgd (log2-scaled) 1972 1757 3364 2787 2950 2414 2136 1780 1632 1270 1125 965 763 641 524 464 402 332 280 277 229 175 167 153
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Number of Age Segments
0.5 0.7 1 1.4 2 2.8
Fold Change v. Bkgd (log2-scaled) 4584 3666 5298 4122 3930 3147 2696 2194 1766 1420 1285 1028 846 695 553 490 387 324 263
8 6 4 2 0
Fold-change (log2-scale)
homo pr
im euar Bore euth ther
mam amni tetr vert
0 5 10 15 20 25 30 number of segments 0.2
0.4 0.6 0.8
cumulative distribution
Reilly15 Shuffle
Fold-change (log2-scale)
8 6 4 2 0
homo prim euar Bore euth ther mam amni tetr vert
0 5 10 15 20 25 30 number of segments 0.2
0.4 0.6 0.8 1.0
cumulative distribution
Emera16 Shuffle
A
B
C
D
Emera 2016 N = 29685 Reilly 2015
N = 40176
Homo Prim Euar Bore Euth Ther Mam Amni Tetr Vert
0.0 0.1 0.2 0.3 0.4
% of architecture Homo Prim Euar Bore Euth Ther Mam Amni Tetr Vert
8 6 4 2 0
Fold change (log2-scaled)
0.0 0.1 0.2 0.3
% of architecture
E
Homo Prim Euar Bore Euth Ther Mam Amni Tetr Vert
4 2 0 2
Fold change (log2-scaled) Homo Prim Euar Bore Euth Ther Mam Amni Tetr Vert
Simple N = 17670 Complex N = 22506
Simple N = 12830 Complex N = 16855
3 431 53 564 9523 3999 3114 2132 712 10690 27 16 122 3187 2919 4127 3352 1817 3009 0 0 4 46 4004 2852 2528 1971 626 7990 0 0 15 1438 2428 3796 3729 2154 3295
76
Supplemental Figure 14. Complex enhancers in human developmental neocortical tissues overlap more mouse and rhesus developmental neocortical active enhancers than simple enhancers. (A) Reilly
et al 2015 mouse and rhesus non-exon neocortical enhancers were lifted over using liftOver and intersected with simple and complex human neocortical enhancers. Simple architecture was defined as enhancers with less than 5 age segments. Length-matched complex enhancers (n = 17,061) significantly overlap more species than simple enhancers (n = 17,155), though the difference is slight (1.29 v. 1.26 species overlaps for complex and simple enhancers, p = 7.9e-13, Mann-Whitney U). (B) Emera et al 2016 human enhancers intersected with mouse and rhesus non-exon neocortical enhancers. Simple architecture was defined as enhancers with less than 6 age segments. Length-matched complex enhancers (n = 12,707) significantly overlap more species than simple enhancers (n = 11,481), though the difference is slight (2.40 v. 2.36 species overlaps for complex and simple enhancers, p = 1.1e-4, Mann-Whitney U). Error bars represent 95% bootstrapped confidence intervals for both. Sample size for each bar is annotated in white.
A B
Simple Complex
0.0 0.5 1.0 1.5 2.0
Number of Active Species 17155 17061 Homo sapiens (0) Primate (74) Euarchontoglires (90) Boreoeutheria (96) Eutheria (105) Theria (159) Mammalia (177) Amniota (312) Tetrapoda (352) Vertebrata (615)
taxon2 0.0
0.5 1.0 1.5 2.0
3 412 46 486 8197 3139 2213 1501 464 6940 38 22 172 3498 2804 3706 2962 1497 2362 Simple Complex0.0
0.5 1.0 1.5 2.0 2.5 3.0
Number of Active Species 11480 12705 Homo sapiens (0) Primate (72) Euarchontoglires (90) Boreoeutheria (96) Eutheria (105) Theria (159) Mammalia (177) Amniota (312) Tetrapoda (352) Vertebrata (615)
taxon2 0.0
0.5 1.0 1.5 2.0 2.5 3.0
0 0 4 38 3530 2564 2286 1774 554 7300 0 0 13 1144 1930 2890 2837 1607 2284
18
Supplemental Figure 15. PhastCons estimates for complex and simple transcribed enhancers. (A)Complex enhancers are more frequently conserved than simple enhancers. Overall frequency of enhancers overlapping PhastCons elements among simple or complex enhancer datasets (N = 4766 simple and N = 3703 complex enhancers overlap PhastCons elements). (B) Frequency of enhancers overlapping PhastCons
elements within each age.
Prim Euar Bore Euth Ther Mam Amni Tetr Vert
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
% Ph a stc o n s o ve rl a p p e r a g e
Complex Simple
Architecture
0.000.05 0.10 0.15 0.20 0.25 0.30 0.35
% of a rc h o ve rla ppin g ph as tCo n s
32%24%
3734 5420
1051 149 1254 13961 1987 1464 748 186 804280 99 541 2979 2443 2454 1214 640 1102
78
Supplemental Figure 16. Tissue pleiotropy is weakly correlated with purifying selection in simple and complex enhancers per age. We stratified simple and complex enhancers into 10 equally-sized tissue
pleiotropy bins (points) and evaluated the correlation between pleiotropy and purifying selection. Bootstrapped
confidence intervals are shown for each data point. Linear regression lines were fit to the data and correlation
coefficients are shown in the legend.
20
Supplemental Figure 17. FANTOM simple enhancers are more enriched than complex enhancers for GWAS variants across ages. Simple and complex FANTOM enhancers were stratified by age and tested forGWAS variant enrichment compared with 100 length-matched and architecture-matched permuted
background. Error bars represent 95% confidence intervals bootstrapped 10000 times. The number of overlapping GWAS variants is annotated for each bar.
P rim E uar B ore E ut h T her Mam A mni Tet r V ert
0.0 0.5 1.0 1.5 2.0
F old- change
Enhancer Sequence Age
29 10 26 18 522 152 54 120 28 141 15 56 4 25 11 645
80
Supplemental Figure 18. ClinVar annotations in transcribed architectures. Simple FANTOM enhancers