There are several different study designs that poten- tially allow GxE to be detected, each with its own strengths and potential problems [59, 60].
4.9.1 Cohort design
The cohort study has been the design of choice for common disorders; however rarer disorders and those with late or very wide ranges of age of onset may require sample sizes which are too large to be practically viable. In this design DNA samples and environmental exposure information can be obtained from an initially healthy sample, that is followed up prospectively. As the assessment of the environmen- tal exposure occurs prior to onset of the disorder it is relatively free of information (recall) bias. High rates of follow up are however necessary to reduce selection bias, this can be difficult if the disorder has a long incubation period requiring years or even decades of observation. It is also necessary to try and ensure DNA samples come from a high proportion of both cases and controls, as differential take up can lead to selection bias. Information about ethnic background or genomic control methods [61] should be considered if population stratification is a poten- tial problem due to migration. A nested case–control approach can be used to compare cases with those individuals who did not develop the disorder, and analysed using the group of individuals who have neither exposure to the environmental risk factor or the high risk gene variant as the reference group, so estimating odds ratios. Measured confounding vari- ables can be adjusted for by stratification procedures or by using multivariable modelling such as logistic regression, Poisson regression or Cox’s proportional hazard models [62, 63].
4.9.2 Case–control design
Generally, case–control studies are more economi- cal than cohort studies. Further, they are potentially powerful methods for the investigation of rarer disor- ders. Selection bias, particularly due to the selection of controls who may not be representative of the population at risk, is a major limitation. Further, if cases are collected from a clinical setting then the sample is enriched with individuals who are help-seeking. This is especially problematic when investigating less severe disorders as many people will not bother engaging with services for mild dis- ability. Information on environmental exposure is often collected retrospectively, which may result in information (recall) bias, therefore it is preferable if the estimation of previous exposure comes from multiple sources or contemporaneous records. A high and non-differential take up rate for DNA analyses is required if we wish unbiased estimates of genetic main effects and interactions; however, biased main effect estimates for the environmental factor [64]
and biased genetic main effects [65] may result in relatively unbiased interaction parameters. If there is an ethnic differential between cases and controls then population stratification could result in spuri- ous gene variant associations. The controls can be unrelated individuals or relatives of the cases. The use of unrelated controls can be analysed by using the group of individuals who have neither exposure to the environmental risk factor or the high-risk gene variant as the reference comparison and estimating odds ratios, controlling for measured confounding variables by stratification procedures. Multivariable methods such as logistic regression, recent tradi- tionally measured interaction terms on the multi- plicative scale, but more recently extensions have been developed to assess interactions on the additive scale [66, 67].
When relatives are used as controls, detection of interactions may be more efficient as we are enriching the sample with the high risk gene, However, if the risk variant has a high frequency in the family controls there will be a loss of contrast, which will reduce the study power, such that in the most extreme case, monozygotic twins, testing for GxE will more likely reflect main effects rather than the interaction effect. Each case is matched to one or more unaffected
relative and conditional logistic regression models are used to estimate the GxE. The main threat to the validity of findings from such studies is the problem that both genes and environment are generally shared by family members so correlation on unmatched risk factors within the matched case control pair is likely.
Furthermore, gene–environment correlation is built into the design reducing its power to detect GxE.
Twin studies share the same disadvantages [68, 69].
Few candidate/susceptibility genes have been replicated in psychiatric disorders to date. When there is no candidate gene known GxE can be measured indirectly using family-based approaches:
(i) Case–control studies using both relatives and unrelated (population based) controls. The analytical strategy is to compare the odds ratio for the effect of the environmental factor, in the cases with relative controls, to the odds ratio estimated from the case and non-related controls. The premise is that if there is GxE operating you would expect to find higher odds ratios when relatives are used as controls as compared to the analyses using population-based controls, while you would expect equivalence of the risk across control groups if there was no GxE interaction [59]. Of course it is quite possible that the ‘family effect’ could result from a shared environmental factor which has been unmeasured.
(ii) The use of proxy (surrogate) measures of genetic liabilitysuch as family history orconfirmedinterme- diate (endo)phenotypes (heritable biomarker, lying along the causal pathway from gene to disorder, but at a more proximal position to the gene, than the manifest symptoms), is also possible.
4.9.3 Case only design
When a genotype is independent of an environmen- tal exposure and the disorder is rare, then within the population GxE can be tested in cases only [70]. In this case–case design, the prevalence of the exposure in the genotype-positive cases is expected to be the same as the prevalence of the exposure in the cases without the high risk genetic variant. Thus, statis- tically significant departures from equal prevalence are indicative of an interaction between genotype and environmental exposure. However, independence of genotype and environmental exposure is rare and gene–environment correlation is generally the rule
rather than the exception. Violation of this assump- tion of independence has been shown to produce grossly inflated type 1 errors [71]. Furthermore, this method only allows an estimation of interaction not the main effect of the genotype and environment.
Simulation studies demonstrate that GxE can be subsumed into main effect of the genotype; therefore this design fails to provide a comprehensive test of the causal mechanism and should only be used with great caution.
4.9.4 Family designs
Sib-pair analyses are linkage techniques based on the simple premise that pairs of phenotypically con- cordant siblings (the affected sib-pair design) will demonstrate excess sharing of commonly inherited genomic segments, while phenotypically discordant siblings (the unaffected sib pair design) , will tend to have lower proportions of shared variants. By esti- mating the degree of inter-pair genetic similarity (at the region of interest, or across the genome) should help us identify the chromosomal location of candi- date genes. This is achieved by estimating thesharing pattern, that is the number of alleles at a given locus that are the same (identical by descent, IBD). The expected sharing pattern in siblings approximates to z0=25% for no identical allele – z1=50% for one identical allele and z2=25% for two identical alle- les. Departure from this pattern suggests linkage and statistical significance can be estimated within the likelihood framework [72, 73]. Sib-pair studies can be extended to include GxE by using stratification or extensions of common multivariable models [74].
Case parent trio design has been used to test candidate gene associations including testing GxE interactions. This model uses the genotypes on all three members of the trio butonlythe environmen- tal exposure from the case (i.e., a partial case control design). The basic premise of the design is to stratify the genetic relative risk estimates from the case- parent trio, by environmental exposure status of the case. If there is no GxE interaction the two genetic relative risks would be expected to be the same, how- ever if an interaction is present their ratio will be an estimation of the interactive relative risk. Currently stratified analyses are used to control for known within family variables which may influence the risk,
however multi-level analytical models are currently being developed to deal with these factors [75].
Multi-generational pedigrees may be useful in order to indirectly test the hypothesis that there has been a change in the penetrance of a known high risk variant over time due to changes in environ- mental factors, however this approach is most useful when the high risk gene variant is highly penetrant, which is required to allow familiar aggregation to be adequately detected.
4.9.5 Gene–environment wide interaction studies (GEWIS)
The candidate (susceptibility) gene approach to the identification of genetic determinants of common psychiatric disorders is impeded by:
• Lack of a definitive allelic architecture model for the disorders: the polygenetic model is generally considered the best approximation; however, this has been strongly contested [76].
• Substantial gaps in pathophysiological under- standing of the disorders.
If the multifactorial (polygenetic) model is a good approximation to the allelic architecture of common psychiatric disorders, GWAS will provide a poten- tially unbiased method to search the genome for causative variants of small effect [77]. However, we should bear in mind that if substantial allelic hetero- geneity is present, due to rare variants or epigenetic phenomena (i.e., low allelic identity) this method will be less successful as each genetic variant will arise from an independent haplotype (set of genetic markers in DL) background, so cancelling out each other’s signal [78, 79]. Experience from GWAS in non-psychiatric conditions suggests that for some dis- orders as many as 30 000 cases and similar numbers of controls will be required to robustly identify high- risk genetic variants [77]. Such large-scale studies have led to the formation of consortia to coordinate the development of such methodology and carry out the studies in psychiatry [80].
To date, GWAS methods have only been used to detect main (direct) effects of single or linked (haplotypes) markers. GWAS SNPs cover more than 4/5th of the SNPs known to HapMap (http://www
.hapmap.org) CNVs are also detected but with less reliability using current technology. However, in complex multi-factorial diseases, scanning for main effects might miss important genetic variants, especially in subgroups of individuals with specific environmental exposure interactions. Furthermore, GxE with opposite effects in groups with different exposure profiles, that is crossing interaction will not be identified, as no direct main effect will be found. Therefore, to be clinically relevant GWAS will have to be placed in an epidemiological and public health context. One way of doing this is to enrich GWAS with environmental information – a technique known as GEWIS. No GEWIS study has yet been done, due to considerable methodological and logistic challenges, however a number of analytical approaches have been proposed which attempt to deal with the substantial problems of prior probability errors which will occur when estimating main effects on 1 000 000 or more markers, which is even more likely with concomitant estimation of E exposures and GxE. GEWIS studies will therefore require new statistical approaches as the current log linear regression methods do not effectively test the global null hypothesis of a genetic variant not being associated with the disorder in any of E strata. Extensions of the interaction methods beyond the currently employed simple departures from additive (or multiplicative) joint effects will be required, most likely based on multivariate latent variable modelling techniques that can deal with
‘mega-variate’ data [81–83].