Proc. Assoc. Advmt. Anim. Breed. Genet. 17: 223-226
223
DESIGN OF A WHOLE GENOME SCAN EXPERIMENT FOR A MULTI-BREED BEEF CATTLE POPULATION WITH A COMPLEX PEDIGREE
M.J. Kelly1, B.J Hayes2 and S.P. Miller1
1Department of Animal and Poultry Science, University of Guelph, Canada 2DPI, Victoria.
SUMMARY
The paper describes a method to select a number of animals for genotyping in a whole genome scan with dense single nucleotide polymorphism (SNP) markers from a much larger, existing beef cattle breeding experiment in which expensive traits were measured. A genetic algorithm was used to select animals based on a set of criteria that affect the power of the whole genome scan and subsequent applicability of results to industry. Using the genetic algorithm, we were able to derive experimental designs that balanced these competing objectives and minimised marginal costs in each criterion.
INTRODUCTION
The availability of tens of thousands of bovine single nucleotide polymorphism (SNP) markers following the sequencing of the bovine genome and recent developments in genotyping technology has allowed whole genome scans (WGS) and the exploitation of linkage disequilibrium (LD) with important traits in beef and dairy cattle. The size of such studies is limited by the costs of recording phenotypes and genotyping. This leads to a design problem when selecting a set of animals that will have the most power to detect quantitative trait loci (QTL). Power can be increased by selective genotyping both within and across families. In multi-breed beef populations, experimental design is further complicated by the need to find markers which are in linkage disequilibrium with QTL across the breeds. Care must also be taken to ensure that there are sufficient animals within fixed effect groups, such as herd, year or season, such that these effects are estimable. It is also desirable that a diverse group of animals are represented in the selected set of animals. In this paper, we contrast several criteria for selecting animals for WGS from a multi-breed beef population with a complex pedigree. These criteria include selection of phenotypically extreme animals, maintaining adequate family size to reconstruct haplotypes and sufficient observations per fixed effect class. A genetic algorithm was used to select a set of animals that were balanced across each of the selection criteria.
MATERIALS AND METHODS
Experimental population. Candidates were selected from within the University of Guelph Maternal-Terminal Line Crossbreeding Project, an experiment running between 1998 and 2005.
These animals were bred at one of three research stations in Ontario, Canada. At approximately 200 days of age, cattle were transported to Elora Beef Research Centre for finishing. After an adjustment period, cattle were allocated to a nutritional treatment for the feedlot finishing period. Between 2 and 6 treatments were used during the experimental period. Allocation was balanced across herd of origin, sex, breed and sire. After finishing to a constant back fat of 9mm at the 12/13th rib (determined by ultrasound appraisal), animals were processed at the University of Guelph Meat Research Laboratory. The shear of steaks from the longissimus dorsi muscle was assessed after aging for 7 days.
Genomics 1
224
Optimisation: Four criteria were considered: (1) selection of extreme animals for longissimus dorsi shear force EBVs (LM7D), (2) at least 5 animals per herd of origin by year (H-Y) effect, at least 5 animals per treatment by year (T-Y), a bonus was applied to encourage a minimum number of animals from any treatment-year class, (3) balanced number of animals with respect to sire breed (BS), at least 5 progeny per sire (MPS), and (4) mean co-ancestry between selected animals (x′Ax) to encourage a genetically diverse selected group of animals, where A is the relationship matrix between the animals and x is a vector of contributions (1/ number of animals selected). A was calculated using PyPedal module for python (Cole 2007)
A genetic algorithm was used to optimise across all parameters, potential experimental designs being assessed against an objective function. The objective function was defined as the sum of the difference between the expected value of each parameter and the observed value of a design parameter from a trial solution.
obj = E
i− o
iE
in
∑
i=1 where Ei is the optimal design for a particular parameter and oi is the observed value of the parameter for a particular design. The optimal design for a parameter (Ei) was determined by running the genetic algorithm with all other parameters removed (Table 1). The individual sire breeds were treated as a single parameter by averagingE
i− o
iE
in
∑
i=1 for all breeds. A similar approach was used for the penalty applied to the two treatment design parameters (H-Y-S and T-Y), the average of two components was taken. As the range in potential solutions for mean coancestry was very small compared to the other parameters examined, it was rescaled from 0-1 based on the highest and lowest mean coancestry possible for this set of candidates.Table 1 Optimal value for each design parameter
Genetic Algorithm. A simple binary encoded genetic algorithm was used to optimise the objective function (Holland 1992). Each bit in the binary string was used to represent whether an animal was selected or not (0=not selected, 1=selected). The selected group of animals were the first n bits that
Design
parameter Objective Ei
Extremes Count in extremes 480
MPS Mean number of progeny per sire 9 H-Y-S Count groups with > 5 animals 65 T-Y Count groups with > 5 animals 38
Sire AN Count (sire breed) 141
SM Count (sire breed) 140
CH Count (sire breed) 60
LM Count (sire breed) 60
xAx mean co-ancestory 1*
* calculated as 1-(xAx-minxAX)*rangexAx
Proc. Assoc. Advmt. Anim. Breed. Genet. 17: 223-226
225
were equal to 1, where n represents the size of the group to be genotyped. If less than n bits were equal to one, a penalty was applied to the objective function,
RESULTS AND DISCUSSION
When half the cattle needed to be selected for a whole genome scan from the existing population, there were large differences in the designs resulting from using only single criteria (Table 2).
Table 2. Each column represents observed values for design parameters in experiment designed to optimise with respect to this parameter only and optimal design across all parameters and the average of 1000 experiments where the animals were selected at random (Ei in bold)
Component of experimental design
Extreme (1)
Sire (2)
Treatment (3)
SB (4)
xAx (5)
Random (SD)
Optimal (6)
Extremes 480 286 250 303 187 243 (8.11) 430
Progeny per
sire 8.87 9 4.84 5.75 4.35 4.85(0.11) 6.78
H*Y*S 43 57 65 59 61 61(1.54) 63
Y* Treat * Trial 33 35 38 36 34 36(0.89) 38
AN 158 158 144 140 99 126(6.75) 142
SM 121 90 81 140 66 80(5.75) 117
CH 55 45 39 60 37 36(3.89) 60
LM 37 41 27 60 23 33(3.83) 60
xAx 0.012 0.011 0.011 0.009 0.006 0.010(0.0002) 0.0083
obj 1.298 1.528 1.761 1.604 1.659 1.810(0.040) 0.991
For example selecting the extreme animals for LM14D leads to a design with a high average number of progeny per sire, however the mean relationship between candidates and balance across herd year season groups was severely compromised. Similarly using the minimum number of animals per treatment group as a the criterion, results in a low number of progeny per sire and balance across the major breeds is compromised. Figure 1 shows the relationships between the design variables from 1000 potential designs. It graphically depicts the trade offs that are required in coming up with an experimental design. To illustrate this consider the AN and xAx criteria. The arrows for these parameters go in opposite directions on the diagram, thus increasing the number of Angus cattle in the design will result in a design with a higher mean coancestory. The optimisation of experimental design can be based on a variety of criteria such as A-optimality for example, which considers the power of the design matrix X’X to make inference on an as-yet unobserved y. For example, Gondro and Kinghorn (2006) used a genetic algorithm to optimise experimental design for microarray experiments using A-optimality as the objective. Unfortunately, in the case of genome selection or QTL detection the SNP genotypes are unknown before genotyping, so we cannot use this approach.
Genomics 1
226
Figure 1. Biplot representing the relationships amongst following experimental design variables; the number of progeny sired by Angus (AN), Charolais (CH), Limosin (LM) and Simmental bulls, the average number of progeny per sire, the average coancestory between the selected animals and balance across treatments (Treat*Trial and H*Y*S). Where Principle component 1 and 2 explain 46% and 28% of the variance, respectively.
In this instance we have considered selective genotyping for single traits only. Selective genotyping on a single trait can result in reduction in the power to detect QTL for other traits. Given the cost of WGS, to increase the returns from a single experiment it would be efficient a single set of genotypes was used to look for QTL in multiple traits.
Acknowlegements. The funding for this project was provided by Ontario Cattlemen’s Association, the Canadian Agriculture Adaptation council and the Ontario Ministry of Agriculture, Food and Rural Affairs. The advice and assistance of Margaret Quinton and John Cole are also acknowledged.
REFERENCES
Cole JB (2007) (In press) Comput. Electron. Agric. doi:10:101.
Gondro C, Kinghorn BP (2006) 8th World Congress Applied to LivestockProduction. 23:15.
Holland JH (1992) 'Adaptation in Natural and Artificial Systems.' (The MIT Press: Cambridge, Massachusetts, USA.).