Materials and Methods - COELACANTH-SPECIFIC ADAPTIVE GENES GIVE

Chapter 4. COELACANTH-SPECIFIC ADAPTIVE GENES GIVE

4.3 Materials and Methods

６３

６４

Table 4.1. Versions of reference sequences of species.

Common name

Scholar name Class Infraclass Order Reference version

Amazon molly Poecilia formosa

Actinopteri Teleosteiei Cyprinodontiformes Poecilia_formosa- 5.1.2

Cave fish Astyanax mexicanus

Actinopteri Teleosteiei Characiformes AstMex102

Cod Gadus morhua Actinopteri Teleosteiei Gadiformes gadMor1

Fugu Takifugu

rubripes

Actinopteri Teleosteiei Tetraodontiformes FUGU 4.0

Medaka Oryzias latipes

Actinopteri Teleosteiei Beloniformes HdrR

Platyfish Xiphophorus maculatus

Actinopteri Teleosteiei Cyprinodontiformes Xipmac4.4.2

Stickleback Gasterosteus aculeatus

Actinopteri Teleosteiei Gasterosteiformes BROAD S1

Tetraodon Tetraodon nigroviridis

Actinopteri Teleosteiei Tetraodontiformes TETRAODON 8.0

Tilapia Oreochromis niloticus

Actinopteri Teleosteiei Perciformes Orenil1.0

Zebrafish Danio rerio Actinopteri Teleosteiei Cypriniformes GRCz10 Spotted gar Lepisosteus

oculatus

Actinopteri Holostei Lepisosteiformes LepOcu1

Coelacanth Latimeria chalumnae

Sarcopterygii Coelacanthiformes LatCha1

Anole Lizard Anolis carolinensis

Reptilia Squamata AnoCar2.0

Chinese softshell turtle

Pelodiscus sinensis

Reptilia Testudines PelSin_1.0

Human Homo sapiens Mammalia Primates GRCh38.p7

Xenopus Xenopus tropicalis

Amphibia Anura JGI 4.2

６５

Figure 4.1. Cladogram of Osteichthyes family. Bold lines in the tree indicate the most recent ancestral branches of each lineage. Blue, skyblue, and red indicates Teleostei, Holostei, and coelacanth lineages, respectively.

６６ Orthologous gene set alignments

Multiple sequence alignments of suitable coding gene sets were prepared for detection of positive selection with the following steps. Firstly, to exclude possibility of functional changes caused by gene expansion (gain and loss of genes), I focused on genes that show one to one orthologues in 12 fishes. Using coelacanth genome as a representative dataset, I found 4160 coding gene sets in ENSEMBL Biomart (Kinsella et al., 2011). Secondly, I filtered out 28 genes with sequence lengths which are not multiple of 3. After filtering these genes, I aligned 4132 gene sets by using PRANK (Löytynoja and Goldman, 2008) with two options; ‘-codon’ for codon-wise alignments and ‘-F’ for the most accurate alignments to identify homologous sites in each species. Finally, to exclude regions with poorly scored alignment caused by indels and mismatch, I trimmed 4132 alignments by using GBlocks (Talavera and Castresana, 2007) with one option ‘-t = c’ for codon-wise adjustments. Finally, I prepared conserved coding sequence alignments of 3538 genes.

PSGs specific to coelacanth

To identify genes responsible for the evolution of coelacanth, I screened for the molecular signatures under episodic adaptive evolution. This was done by calculating dN (number of non-synonymous substitutions per number of non- synonymous sites of each gene), dS (number of synonymous substitutions per number of synonymous sites of each gene), and dN/dS (ratio of number of non- synonymous substitutions per number of non-synonymous sites to number of synonymous substitutions per number of synonymous sites of each gene) values of 3538 orthologous genes from 12 fishes excluding 4 tetrapods as an outgroup. In order to detect accurate selection signatures and to estimate site-wise selection on the latest ancestral branch of each lineage of coelacanth, spotted gar, and Teleostei fishes in the species tree (Fig. 1), ‘branch-site model’ based on ‘CodeML’ in PAML program (version 4.8) (Yang, 2007) was performed with 3 options; ‘model = 2’ for 2 or more dN/dS ratios for branches, ‘NSsites = 2’ to detect sites under positive selection on a foreground branch, and ‘CodonFreq = 2’ to calculate codon frequencies based on

‘F3X4’. Based on estimated parameters from the test, I compared maximum likelihoods of null and alternative models by using likelihood ratio test (LRT,

６７

D = 2 * ∆ l). The statistical significances were calculated by using chi-square test and false discovery rate (FDR) was used for multiple test correction using R program (version 3.2.3.) (Team, 2013). Consequently, I identified sites under positive selection on each lineage with posterior probability. PSGs were detected with strict filtering criteria (dN/dS value of class 2 of foreground branch > 1, D > 0, and adjusted p < 0.05). After identification of significant PSGs, I checked posterior probability of each gene (> 0.95) to find specific sites under positive selection (site class 2) based on the Bayes empirical Bayes (BEB) inference. Finally, PSGs specific to coelacanth were identified through comparing PSGs of coelacanth, Holostei, and Teleostei.

Conserved domain search

To determine whether sites under positive selection are located in functional domains of each gene, I performed domain analysis by using Batch web C-Search tool in NCBI (Marchler-Bauer et al., 2011). Peptide sequences of PSGs unique to coelacanth were used as a query set, and following options were applied: Data source:

CDSEARCH/cdd v3.15; Expected value: 0.01; Composition-corrected scoring:

Applied; Low-complexity regions: Not filtered.

Gene ontology analysis

To check the group functions of PSGs specific to coelacanth, I applied gene ontology analysis with gene set enrichment tests by using DAVID functional annotation (Huang et al., 2009). To compare with other fishes, zebrafish was used as a representative background model. The cutoff of statistical significance of enrichment test was applied as the default p-value < 0.1, due to the small number of coelacanth- specific PSGs. I summarized gene ontology of biological process based on hierarchical clustering with ‘hclust’ function in R (version 3.2.3.) (Team, 2013).

Protein-protein interaction network analysis

To investigate interactions among genes, Search Tool for the Retrieval of Interacting Genes (STRING) online database (http://string-db.org/) was used (Szklarczyk et al., 2014). STRING provides direct (physical) and indirect (functional) associations among genes based on multiple resources (Szklarczyk et al., 2014). I searched interactions between 5 genes of urea cycle and 14 coelacanth-specific PSGs of

６８

nitrogen compound metabolic process to generate a network with the following options: Organism: Danio rerio; Active interaction sources: Text-mining, Experiments, Databases, Co-expression, Neighborhood, Gene fusion, and Co- occurrence; minimum required interaction score: medium confidence (0.4).The network was visualized using Cytoscape 3.3.0 (Shannon et al., 2003).

Amino acid changes specific to coelacanth

Target-specific amino acid substitutions (TAAS) analysis (Zhang et al., 2014) was conducted to find mutually exclusive amino acid substitutions between coelacanth and other fishes. The TAAS module and a codon translator were written and executed by Python (version 2.7.9., htttp://www.python.org). For one of homeobox genes, SHOX, I conducted additional TAAS analysis with 100 way multiz-alignment of 100 vertebrates (Blanchette et al., 2004) in UCSC genome browser (Meyer et al., 2013).

６９

Dalam dokumen 저작자표시-비영리-변경금지 2.0 대한민국 이용자는 ... - S-Space (Halaman 78-84)