Chapter 4. COELACANTH-SPECIFIC ADAPTIVE GENES GIVE
4.4 Results
69
70
these genes, I identified 47 PSGs unique to coelacanth compared to holostean and teleostean lineages (Fig. 2A). To determine if these sites were located in functional regions in each protein, I conducted NCBI conserved domain search (Marchler- Bauer et al., 2011). All of 47 PSGs specific to coelacanth consisted of 122 functional domains. However, only 34 PSGs contained 52 domains with 159 sites under positive selection. Out of these 34 PSGs, neurochondrin (NCDN)showed the highest number of positively selected sites of 23 harboring in functional domains (Fig. 2B).
71
Figure 4.2. Positively selected genes on Teleostei, Holostei, and coelacanth.
(A) Venn diagram of the number of genes under positive selection on each lineage (dN/dS > 1, FDR < 0.05, Posterior probability > 0.95). Red, blue and skyblue indicate the number of PSGs on coelacanth, Teleostei, and Holostei lineage. (B) Distribution of posterior probabilities of dN/dS analysis on NCDN gene. X-axis:
positions in the peptide sequence of coelacanth, Y-axis: score calculated by bayes empirical bayes (BEB); Black dash line: threshold of statistical significance (BEB = 0.95); Red bar: BEB > 0.95; Grey bar: 0.5 < BEB ≤ 0.95; Bottom of the graph indicate the conserved domain (blue box) and sites under positive selection (red pin: BEB > 0.95, grey pin: 0.5 < BEB ≤ 0.95).
72
Functional annotation and protein network of PSGs
To estimate the function of 47 PSGs combination uniquely identified in coelacanth, I performed functional annotation analysis by using DAVID (Huang et al., 2009).
These genes were enriched in 4 major clusters of biological processes; nitrogen compound metabolic process (NCMP), metabolic process, spindle organization, and cellular transition metal ion homeostasis (Fig. 3). Out of these, NCMP included interconversion of nitrogenous organic matter and ammonium, which is a key process in adapting to the changing environment during water-to-land transition. In the previous study, Amemiya et al. found that CPS1 gene, which is involved in ammonium conversion, was accelerated in of MRCA of tetrapods and MRCA of amniotes by adaptation to land (Amemiya et al., 2013). Out of 14 PSGs of NCMP, 4 genes -DDX11, DDX49, MMS19, and TRMT1- showed protein interactions with 2 genes -ARG2 and CPS1- of urea cycle (Fig. 4). Out of these 4 genes, both of MMS19 and TRMT1 showed the highest numbers of residues under positive selection on functional domains among NCMP genes. These genes were also directly associated with ARG2 and CPS1.
73
Figure 4.3. Enriched GO term of coelacanth-specific PSGs. Four clusters of biological processes divided in red shades.
74
Figure 4.4. Protein-protein interaction networks among genes of urea cycle and coelacanth-specific PSGs of nitrogen compound metabolic process. Red and yellow circles indicate coelacanth-specific PSGs and genes of urea cycle, respectively.
75
Non-synonymous substitutions on homeobox gene superfamily
In previous study (Amemiya et al., 2013), the genetic alteration on the regulatory regions of HOX genes, associated with morphological developments, were investigated in order to discover molecular evolution of limb emergence in tetrapod.
However, it was not discovered that genetic alterations on coding regions causing gene product alterations were responsible for anatomical changes of the MRCA of lobe-finned fishes and four-legged vertebrates different from actinopterygian fishes.
dN/dS analysis explains molecular evolutionary history of coelacanth based on non- synonymous and synonymous mutations, but it does not identify amino acid substitutions specific to coelacanth which may lead to changed functions of the resulting protein products. So, I conducted target-specific amino acid substitutions (TAAS) analysis (Zhang et al., 2014) to identify coelacanth-specific variation in homeobox gene superfamily.
Within 3538 conserved one to one orthologues, 43 genes were HOX gene superfamily. Out of these, 40 genes showed 603 amino acid substitutions specific to coelacanth compared to ray-finned fishes. Including 4 outgroup species in tetrapod vertebrate lineage, I found only 35 genes which contained 300 coelacanth-specific substitutions showing the same information as that of tetrapod. All of 35 genes did not show strong statistical significance; however, 6 of them showed higher likelihood values in alternative model than the null model (D > 0), which may be the evidence of positive selection on parts of the genes. Out of these 6 genes, 3 genes showed 4 coelacanth-specific amino acid with significant posterior probability (BEB > 0.95). Especially, SHOX gene included the top number of amino acid substitutions. One of the amino acid in SHOX gene, serine was shared between coelacanth and some of tetrapod animals as opposed to that of ray-finned fishes, leucine.
Focusing on SHOX gene, I collected and aligned amino acid sequences of 100 vertebrates (83 tetrapod species and 17 fishes including coelacanth) in UCSC genome browser (Meyer et al., 2013). SHOX gene was present in 81 vertebrates, but was absent in 19 species (Fig. 5). In the candidate site, all of Sarcopterygii, including
76
tetrapods and coelacanth, showed different non-synonymous substitutions (asparagine, serine, threonine, and glycine) from Actinopterygii (leucine) (Fig. 5).
77
78
Figure 4.5. Amino acid substitutions specific to coelacanth and tetrapod mutually exclusive to fishes on SHOX gene. Red box in peptide alignments indicates the site with coelacanth and tetrapod specific amino acid replacement compared to other fishes. Numbers on top of alignment indicate positions of peptide sequence of human. In amino acid alignments and common names, green, red, and blue indicate tetrapod, coelacanth, and other fishes, respectively. Tree and alignment are from UCSC genome browser database (Meyer et al., 2013).
79