There are multiple approaches to designing a genetic study, each with a unique set of challenges and benefits. As the technology used to perform these studies changes, methods used to analyze the data must adapt to maximize the utility of the results. Many genetic studies follow the common disease/common variant hypothesis (CDCV), where common genetic variants confer the majority of disease susceptibility (Reich and Lander 2001). Because these risk alleles are common (typically defined as having a minor allele frequency >5%) and not subject to strong natural selections as are mutations, these risk alleles are likely ancient and shared across most populations. Each common variant is expected to confer a small effect towards disease risk; therefore, it is expected that common diseases will have multiple risk alleles and interactions with environmental factors. Genome-wide association studies, discussed below, were developed on the CDCV premise (Manolio et al. 2009). For most common disorders, current studies have failed to identify more than a small fraction of the genetic component using approaches based on CDCV, suggesting that rare variants with large effect sizes and/or complex interactions between genetic variants and environmental factors may play a substantial role in disease susceptibility (Manolio et al. 2009; Eichler et al. 2010; Cirulli and Goldstein 2010). For women’s health, two main approaches have been used to identify the genetic variants associated with disease: candidate gene and genome-wide association studies.
Candidate gene approach
Based on prior knowledge or biological plausibility, the candidate gene association study interrogates specific variants, genes, or regions for association with the disease or quantitative trait.
Candidate gene association studies (CGAS) were the first type of association study performed and are still widely used. Benefits to this approach include lower costs than other methods, the hypothesis- driven nature of the study, and limited number of tests performed. However, this design is not without drawbacks. If the correct variant/gene is not selected, no association with the phenotype will be
found—a potential hazard due to genetic (both locus and allelic) heterogeneity. Candidate gene studies with small sample sizes have led to few of these studies replicating, though completely excluding a gene based on negative results is difficult. As in GWAS studies (see below), properly designed (appropriately powered) studies are essential when attempting to replicate a genotype-phenotype association.
27
Despite the challenges inherent with CGAS, they have been successful in identifying genes associated with numerous phenotypes important in women’s health. Early CGAS evaluating the role of hormone biosynthesis pathways in the timing of the reproductive lifespan were largely unsuccessful;
however, more recent studies have identified FSHB and ESR2 associations with timing of AM (He et al.
2010). A CGAS was used to identify inflammatory pathway genes that were associated with
endometrial cancer in the Shanghai Endometrial Cancer Genetics Study (Delahanty et al. 2013). CGAS may also be used to confirm results obtained from genome-wide association studies (GWAS) (see below); O’Mara et al. assessed five SNPs previously associated with endometrial cancer in GWAS, but failed to confirm these past associations(O'Mara et al. 2011b). In addition, CGAS may be used to prioritize studies hoping to generalize results from association studies in one population to another.
With the majority of genetic studies performed in European-descent populations, extension of the findings to more diverse populations may suggest underlying biological disease mechanisms, while those that fail to associate may suggest population-specific disease risk or false positives. A recent CGAS assessed forty SNPs, previously identified in GWAS of European women, for association with breast cancer in Chinese women. rs9693444 was associated with overall breast cancer in this Chinese cohort (p=6.44x10-04), while others were associated with various breast cancer subtypes (Zhang et al.
2014). Though there are significant challenges to the CGAS approach, it is a useful and relevant method to identify genotype-phenotype associations.
Genome-wide association studies
Unlike candidate gene studies where an a priori hypothesis about a relationship between the genetic variant and phenotype exists, genome-wide association studies (GWAS) require no previous knowledge about such a relationship, relying instead on the CDCV hypothesis. In a GWAS,
interrogation occurs across the genome, generally capturing common variants in European-descent populations; the exact number of variants tested varies by platform and assay. It is common for hundreds of thousands or millions of SNPs to be tested. Linkage disequilibrium, the non-random association of alleles, allows a fraction of the genome to be genotyped while inferring information about untyped variants. Though the GWAS approach offers researchers the ability to discover new genotype-phenotype associations, it comes with a high statistical price: correcting for multiple statistical tests. This statistical burden is often corrected for using the Bonferroni method, where the alpha value for a single hypothesis test is divided by the total number of tests performed. For GWAS, the rule of thumb is that a result is significant if the p-value <5x10-8 (0.05/1 million) (Dudbridge and
28
Gusnanto 2008). With newer GWAS chips capable of genotyping 5 million SNPs, this threshold may be inadequate; however, some have suggested the Bonferroni method is too stringent and other methods, such as false-discovery rates may be better (Zablocki et al. 2014; Pan 2013; Lin and Lee 2012; Wei 2012).
This stringency arises from linkage disequilibrium; many of the SNPs tested in a GWAS are not independent, and the Bonferroni correction of these non-independent tests can result in a greater number of false negatives (missed true interactions) (De, Bush, and Moore 2014). In addition, GWAS chips primarily focus on common SNPs with allele frequencies greater than 0.05, limiting their ability to identify rare variants that are associated with a particular phenotype. The ability of genotyping chips to tag common variants also depends on the population under study (Hoffmann et al. 2011; Eberle et al.
2007). Furthermore, most SNPs associated with a disease phenotype have small effect sizes, explaining only a small amount of the phenotypic variance. These small effect sizes may fail to be clinically meaningful and frustrate replication attempts, as increasingly larger sample sizes are required to replicate the initial discovery. For example, a recent meta-analysis performed in the GIANT consortium with more than 250,000 cases and controls identified novel variants associated with overweight with an OR=1.04 (Berndt et al. 2013); successful replication of these results will require many more thousands of individuals derived from the same population, not already used in one of the contributing studies, demonstrating the practical challenges of replicating results with very small effect sizes.
Despite these limitations, GWAS have been successful in identifying genetic variants associated with many women’s health traits and complex diseases and have led to additional hypotheses about the biological mechanisms responsible for disease. For example, a recent GWAS for systemic lupus erythematosus (SLE) identified novel HLA-region genes and replicated four genes previously associated with the autoimmune disorder (Armstrong et al. 2014). Numerous GWAS have been
performed for breast cancer (briefly, (Low et al. 2013; Garcia-Closas et al. 2013; Michailidou et al. 2013)), endometriosis (Albertsen et al. 2013; Nyholt et al. 2012; Painter et al. 2011), and cervical cancer (Chen et al. 2013; Shi et al. 2013). Traits like gestational diabetes (Hayes et al. 2013) and fibroid tumors (Cha et al.
2011) have also been assessed with GWAS. These studies represent only a few women’s health traits that have been investigated using GWAS. Though GWAS has not been successful in identifying causal variants with large effect sizes for most diseases/traits, the findings may point to underlying genetic architecture and biological mechanisms.
29 Interactions
Interactions, both gene-gene (GxG) and gene-environment (GxE), have been suggested as explanations for the “missing heritability” from GWAS studies (Zuk et al. 2012; Manolio et al. 2009).
Interactions are challenging to identify in human genetic studies for a variety of reasons. Sample size requirements differ based on the study design (e.g., case-only vs. matched case-control), what type of interaction (GxG, GxE), and expected effect size of the interaction (Gauderman 2002a; Gauderman 2002b). Testing for statistical interactions among all the genetic variants is computationally intensive and leads to sparse/no data for some interactions; limiting GxG testing to variants with significant associations with the phenotype improves the computational challenges, including corrections for multiple tests, but compromises the ability to identify interactions between variants without main effects. Testing for interactions may be done using data reduction methods (e.g., combinatorial
partitioning (Nelson et al. 2001), restricted partitioning (Culverhouse 2007), multifactor dimensionality reduction (Ritchie et al. 2001)), extensions to regression analysis (e.g., classification and regression trees (CART) (Breiman, Friedman, and Olshen), multivariate adaptive regression splines (MARS) (Lin et al.
2008)), and pattern recognition methods (e.g., neural networks (Turner, Dudek, and Ritchie 2010)).
Prioritizing variants for GxG or GxE by biological plausibility reduces the number of statistical tests and computational burdens, yet restricts the potential to identify novel interactions that may be clinically meaningful.
Gene-gene interactions
Despite the issues addressed above, GxG interactions have been identified for a variety of phenotypes. Gene-level interactions between SMAD3 and NEDD9 affecting lipid levels was found in the Atherosclerosis Risk in Communities (ARIC) study and replicated in an independent sample from the Multi-Ethnic Study of Atherosclerosis (MESA) (Ma, Clark, and Keinan 2013). Samples with age- related macular degeneration (Klein et al. 2005) were used to find variants in several genes interacting with CFH, a well-characterized AMD gene (Zhang, Long, and Ott 2014). Notably, the AMD results are not only biologically plausible, but the interaction between BBS9 and CFH replicates earlier studies performed using different methodology to detect the interactions (Chen et al. 2007; Wang et al. 2009).
30 Gene-environment interactions
Based on epidemiologic studies, environmental factors, which include such variables as body mass index (BMI), dietary intake, and carcinogen exposure, are known to play a role in the
susceptibility of numerous disorders and complex traits (Cecchini et al. 2012; Turati et al. 2014;
Steenland et al. 1996). How environmental exposures in conjunction with genetic variants contribute to the genetic architecture of complex diseases and traits is not fully understood. Examples of GxE
interactions include exposure to farming with genetic variants on asthma risk (Ege et al. 2011; Ober and Vercelli 2011) and the effect of early childhood environment with genetic predisposition on mental health traits (Cicchetti and Rogosch 2012; Forsyth et al. 2013).
Despite some success, there is no systematic approach to identifying GxG or GxE and relatively few phenotypes have been adequately assessed for these potentially important interactions. The potential importance of GxG and GxE interactions should be considered in the context of personalized medicine. Individuals with significantly higher or lower disease risks based on GxG interactions may benefit from modified screening schedules; for example, in absence of a family history predisposing to colorectal cancer (CRC), someone with a GxG interaction that significantly increases their risk of developing CRC could benefit from more frequent colonoscopies. In addition, modifying exposures, such as alcohol intake, may reduce the risk of some diseases, like liver cancer. For individuals with increased genetic risk for a specific disease, understanding how environmental factors may interact with genetic factors may be a motivational tool to encourage healthy lifestyle choices.