2 Methods - Copy Number Variants

For this population genetic study, unrelated individuals were selected using genotype data based on Pi-hat-value (Pi-hat<0.25).

Pi-hat value was estimated with genome option implemented in PLINK software. For instance, we have pop.ped and pop.map in PLINK format, the following command can be used to generate the table with estimated Pi-hat value for pairwise individuals.

plink --file pop --genome --out pop --cow --noweb Then, unrelated individuals were kept for downstream analyses.

Raw data was extracted from Illumina’s GenomeStudio application (see Note 1).

To facilitate population genetic analysis, CNV segments were extracted from Golden Helix SVS 8.0 using the default multivariate option (see Note 2). After merging across all individuals, we discovered 263 nonredundant CNVs which are commonly shared within the whole population.

Since the SVS multivariate option was developed to identify moderate to high frequency CNVs, only segments with frequencies above 1% were retained for further analysis in order to filter away potential false positive calls. The detailed SVS multivariate option can be found in the Web (http://doc.goldenhelix.com/

SVS/latest/svsmanual/copy_number_analysis.html).

Finally, a total of 257 CNVs were retained and used to categorize the samples as one of three types (loss, neutral and gain events) according to a three-state model with strict threshold levels of marker mean log R Ratio (LRR) ± 0.3. While all 257 CNVs were used for frequency and VST calculations, only the 184 deletion 2.1 Sample Selection

2.2 CNVs Segmentation and Genotyping

CNV regions were used in all other subsequent population genetic analyses (see Note 3).

We then utilized the copy number analysis module (CNAM) under the multivariate option to segment chromosomes with default set, and a significance level of p = 0.01 for pairwise permu- tations (n = 1000) as described previously. The three state covari- ates with a comparatively strict threshold (segment mean 0.3) was used to genotype the CNVs as one of three type (loss, neutral and gain events) across all the samples. It was noted that the multivariate method tends to detect the common deletions with relative small sizes across multiple samples.

We generated the mean LRR matrix from Golden Helix SVS, row presents individual, while the column presents CNV segment. The R function heatmap.2 (http://www.inside-r.org/packages/cran/

gplots/docs/heatmap.2) was used to graph the segment mean LRR values and generate hierarchical cluster dendrograms using 257 CNVs for all animals.

Seg257<-read.table("Seg_257.

txt",header=T);dim(Seg257) library(gplots)

#help(heatmap.2)

pdf("Seg_renames_257Seg_2.pdf")

seg257_heatmap <- heatmap.2(as.matrix(Segs257), Rowv=NA, Colv=NA, col = redblue(256),

scale="column") dev.off()

We first checked the normalized LRR distribution for all 184 deletion CNVs. The grand majority (99%) of deletions had two distinct peaks, representing neutral (around 0) and homozygous deletion (around −1) states, respectively. Only a couple of deletions had a few samples located in the midpoint between −1 and 0, suggesting a lack of heterozygous events. Additionally, it was dif- ficult to define a universal threshold between homozygous or heterozygous deletions for all deletions, therefore, we decided to categorize all deletion events using one state: homozygous deletion. To use population genetic programs originally developed for SNPs, we manually recoded each 184 deletion CNVs by convert- ing a loss event into “12” and a neutral event into “22,” where

“12” represented a homozygous deletion.

We then performed multidimensional scaling (MDS) and admixture analysis to determine how 205 unrelated individuals were clustered according to these CNV genotypes. Using a total of 184 deletion CNVs, we performed MDS analysis of pairwise genetic distance (4 dimensions) to detect the relationship between populations with PLINK 1.07 (-mds -plot 4). For a separate verification, we also carried out the cluster analysis based on mean LRR values using prcomp function in R.

2.3 Population Differences Across Population

Prepare the PLINK format file (PED and MAP) from the CNV genotypes information based on deletions. PLINK files:

del_184.ped and del_184.map. Pairwise IBS metric was calculated and clustering of pairwise IBS metrics were evaluated based on 184 deletions using following command lines.

>plink --file del_184 --genome --out del_184 --cow --noweb

>plink --file del_184 --read- genome ibs.genome --cluster --ppc 1e-3 --mds-plot 2 --out del_184 --cow --noweb

Finally, we obtained del_184.mds file for MDS plot using R plot function.

To perform multidimensional scaling analysis on the N x N matrix of genome-wide IBS pairwise distances, use the --mds-plot option in conjunction with --cluster. This command takes a single parameter, the number of dimensions to be extracted. Plotting the C1 values against C2, for example, will generate a scatter plot in which each point represents an individual; the two axes correspond to a reduced representation of the data in two dimensions, which can be useful for identifying any clustering.

Genetic distance matrix was genetated using the command as follow

>plink --file del_184 --cluster --distance- matrix --out del_184 --noweb --cow

This step can generate “del_184.mdist” file. To explore the admixture analysis using 184 deletions, we prepared the input file for STRUCTURE software using the recode option --recode- structure in PLINK.

> plink --file del_184 --recode- structure --cow --noweb

Then, this command line generates the str file for STRUCTURE, here the second line of str file containing position information should be removed to fit special format in STRUCTURE “Date file stores dada for individuals in a single line”. Population structure was examined using STRUCTURE 2.3. Each admixture analysis was performed using 5000 replicates and 2000 burn-in cycles under admixture and allele frequencies correlated models. The detailed usage of STUCTURE, refer to web site from Pritchard Lab, Stanford University http://web.stanford.edu/group/pritchardlab/struc- ture_software/release_versions/v2.3.4/html/structure.html.

Neighbor-joining clustering analysis was performed using PHYLIP 3.69 (http://www.phylip.com/) based on pairwise genetic distance. Pairwise genetic distance (D) between individuals was calculated using PLINK 1.07, where D = 1-[IBS2 + 0.5IBS1)/N], and IBS2 and IBS1 are the number of loci that share either 2 or 1 alleles identical by state (IBS), respectively and the N is the number of loci.

The clustering dendrograms were plotted in Figtree 1.3.1 (http://

tree.bio.ed.ac.uk/software/figtree/). Based on del_184.mdist, reformate the distance file for executable file neighbor.exe in PHYLIP package. Detailed usages of PHYLIP and its programs were presented at http://evolution.genetics.washington.edu/phylip.html. Next, executable file neighbor.exe was used to generate the outtree file and can be loaded into Figtree 1.3.1 software.

To detect the lineage differentiated CNV events, we calculated VST

for each CNV using the following equation: (VT− VS)/VT, where VT is the total variance in mean LRRs across all individuals and V_S is the average variance in cattle within each breed (see Note 4).

The test file contains mean LRRs of 257 segments across all individuals. The first column is population ID, and from second column to last were mean LRRs for 257 segments, the calculations of V_ST was showed as follows, and the map file (cnv-map) for each CNV segment was required, the head of map file includes first SNP name, chromosome, segment start, and segment end.

###R code for estimating the VST between pair-wise groups.

###Read ped file setwd("~://PATH/Vst)

CNV<-read.table("Segs257_min3_Max100_tran_1.txt", header=F)

###Read map file

CNVmap<-read.table("cnvmap.txt", header=F)

###Calculate the pair-wise VST between BT and BI

###Extract population from Bos taurus group.

BTHOL<-CNV[CNV[,1]=="HOL",];dim(BTHOL) BTANG <-CNV[CNV[,1]=="ANG",];dim(BTANG) BTHFD <-CNV[CNV[,1]=="HFD",];dim(BTHFD) BTBWS <-CNV[CNV[,1]=="BWS",];dim(BTBWS)

###Merge to Bos taurus (BT) group

BT<-rbind(BTHOL,BTANG, BTHFD, BTBWS);dim(BT);

nBT<-dim(BT)[1]

nBTHOL<-dim(BTHOL)[1]

nBTANG<-dim(BTANG)[1]

###Extract population Bos indican BINEL <-CNV[CNV[,1]=="NEL",];dim(BINEL) BIBRM <-CNV[CNV[,1]=="BRM",];dim(BIBRM)

###Merge to Bos indicine (BI) group BI<-rbind(BINEL,BIBRM);dim(BI) nBI<-dim(BI)[1]

############################

2.4 Signatures of Selection for CNVs or CNVR

###Calculate variance in BT group BTtype<-BT[,2:258];dim(BTtype) SX2 <- function(x)[ var(x) ] BTvar <- apply(BTtype,2,SX2)

### Calculate variance in BI group BItype<-BI[,2:258]

SX2 <- function(x)[ var(x)]

BIvar <- apply(BItype,2,SX2)

###Calculate variance among all unrelated individuals

Atype<-rbind(BTtype,BItype) ABTBIvar <- apply(Atype,2,SX2)

###Calculate weight mean of variance across two groups (BT and BI)

wBTBIvar <- (BTvar*nBT+BIvar*nBI)/(nBT+nBI)

###Calculate VST for BT vs BI

VstBTBI<-(ABTBIvar-wBTBIvar)/ABTBIvar length(VstBTBI)

###Combine the map information and VST values ctVstBTBI<-cbind(CNVmap,VstBTBI)

###Assign the column names for the output file names(ctVstBTBI)<-c("Chr","CNV","Start","Strand","c tVstBTBI")

write.table(ctVstBTBI,"wctVstBTBI_205_

OK.txt",sep="\t",quote = FALSE)

Dalam dokumen Copy Number Variants (Halaman 180-184)