true difference between the proportions in the two groups or on testing the hypothesis that the proportions in the populations from which the samples were selected are equal. That discussion was presented within the context of the cohort or follow-up study. The key element in the cohort study is that individuals are grouped according to whether or not they have a certain characteristic which is suspected to be related to the outcome of interest.
This characteristic will be called the exposure variable. Individuals in the various exposure groups (usually presence and absence of exposure) are subsequently followed until a determination of the outcome characteristic (e.g., disease or no disease) can be made. For the epidemiologist, however, this type of study design may not be practical when the time between exposure and outcome is lengthy and/or unknown. For example, a cohort study to assess the relationship between consumption of artificial sweeteners and bladder cancer would not be practical for at least two major reasons. First, in order to obtain a sufficiently large group of patients who develop the disease, a huge sample would have to be identified when they are disease-free and then followed for several years to determine subsequent disease status. This is due to the fact that the disease is relatively rare. Second, because exposure is at relatively low levels, a long exposure time would be necessary before the disease could be expected to develop. This presents numerous logistical problems, not the least of which is keeping track of and staying in contact with a large number of study subjects for a long period of time. Finally, from a practical point of view, if sweeteners are suspected of being associated with bladder cancer, we would not want to wait twenty years or so to have confirmatory scientific evidence. Hence, a cohort design is not a realistic option for many modern epidemiologic investigations on chronic diseases.
In a case-control design, subjects are selected on the basis of their outcome status (e.g.
patients with bladder cancer are enlisted into the study, as is a group of "controls" or non- cancer patients) and all subjects are studied with respect to their prior and current exposure to suspected risk factors. From a practical point of view, this type of study may be carried out at relatively low cost and within a relatively short time frame since it is not necessary to wait for the disease to develop in previously disease-free individuals.
There is a third type of epidemiologic study known as a prevalence study. In this type of study, a representative sample is selected from the population in order to estimate the proportion of individuals with a condition of interest at a specific point in time. This condition can relate to either exposure or to disease and is typically presented as a proportion. This proportion is termed the prevalence and represents an instantaneous snapshot of the number of people with the condition at a specified point in time relative to the total number of eligible individuals in the population. The concept of prevalence is distinct from that of incidence which is a measure of the number of new cases occurring in the population in a specified time period. Prevalence may be either greater than or less than incidence depending upon the duration of the condition and the rate at which incident cases die or leave the population. Hence one measure cannot be substituted for the other.
72 Foundations of Sampling and Statistical Theory
Table 11.3 Tabular display of disease/exposure relationship Exposure
Present Absent
Disease (E) (E) Total
Present
(D) a b n1
Absent
(D) c d n2
Total m1 m2 n
In the cohort study, n individuals are enrolled in the study. All of these individuals are disease-free at the beginning of the study and n1 of them are known to be subsequently exposed to a suspected risk factor while n2 of them are not.
The relative risk, denoted "RR", is a population parameter defined as the ratio of the probability of disease development among exposed individuals to the probability of disease development among nonexposed individuals. The expression Prob{AIB} will denote the probability of the event "A" among all individuals having the characteristic "8". Using this notation, the relative risk is:
RR = Prob{DIE}/Prob{DIE}.
This parameter may be estimated directly only in a cohort study since in that type of study the outcom~, presence or absence of disease, is the measured variable. In fact, Prob{DIE}
and Prob{DIE} may be estimated by a/n 1 and c/n2 respectively. Thus, the relative risk may be estimated as
A
RR = (a/m1)/(b/m2)
The remarkable feature of a case-control study is that it permits estimation of the relative risk under certain conditions. This is accomplished via the odds ratio. The odds of an event is defined as the ratio of the probability that the event will occur to the probability that the event will not occur. For example, in the cohort study, let the odds in the exposed group be denoted as "01" where
- 01 = Prob{DIE}/Prob{DIE}.
Let the odds in the unexposed group be denoted as "02" where 02 = Prob{DIE}/Prob{DIE}.
As its name implies, the odds ratio, denoted by "OR", is the ratio of 01 to 02• That is,
From Table II.4 it is clear that to estimate the odds ratio for a cohort study we compute
In a case-control study, the measured variable is the ex2osure status of the individual.
Thus, for the cases, the o~ds is d~fip.ed as Prob{EID}!Prob{E ID} while, for the controls, the odds is given by Prob{EI D}/Prob{E ID}. Hence, the odds ratio for case-control studies is
OR = [Prob{EID}/Prob{E ID}]/[Prob{EI D }/Prob{E I D}] . -
From Table II.4 it can be seen that to estimate this quantity from a case-control study we compute
The quantity ad/be is also called the cross product ratio in the literature of 2x2 contingency tables. We see that the odds ratio, as estimated by the cohort study, is identical to the odds ratio as estimated by the case-control study. While odds ratios may be estimated from either case-control or cohort studies, relative risks may only be estimated directly from cohort studies.
In many, if not all, epidemiologic studies, the parameter of primary interest is the relative risk, RR, since this parameter quantifies how much more (or less) likely an individual who has been exposed is to develop the outcome than is an individual who has not been exposed. In the epidemiologic literature, relative risks of order of magnitude 2 (providing they are statistically significant) or larger are often considered important evidence of an exposure effect.
In many epidemiologic studi~s the diseases being studied are relatively rare events and, as a result, Prob{DIE} and Prob{DIE} are both small. The relative risk, as estimated from a cohort study, is
A
RR = (a/m1)/(blm2)
= [a/(a+e)]/[b/(b+d)]
= (ab+ad)/(ab+bc)
= (ad/be){[(b/d)+ 1]/[(a/e)+ 1]}
=OR {[(b/d)+ 1]/[(a/e)+ 1]}
If the odds ratio is used as an estimate of the relative risk, the expression { [(b/d)+ 1]/[(a/e)+ 1]} should be approximately one - which will be the case if b/d and ale are small. This will be true if the number of individuals without the disease is very large relative to the number of individuals with the disease in both the exposed and unexposed groups. Thus when the disease is rare, we may approximate the value of the relative risk by the value of the odds ratio which in tum may be estimated from the case-control study design.
In the previous discussion of hypothesis testing and estimation of proportions the notation P1 and P2 were used to denote the proportions with the condition in populations 1 and 2 respectively. Putting this into the framework of the cohort study, where the subscript 1 denotes the exposed cohort and the subscript 2 denotes the unexposed cohort, it follows that
- P1 = Prob{DIE}, 1-P1 = Prob{ DIE}
-
P2 = Prob{DIE }, 1-P2 = Prob{ DIE}
For the case-control study, similar notation may be defined for the probabilities of exposure
74 Foundations of Sampling and Statistical Theory
given disease presence or absence. That is, define
P1 * = Prob{EID}, 1-P1 * = Prob{E ID}
-
P2*=Prob{EID}, 1-P2*=Prob{EID}.
It follows that and that
In terms of the notation of the 2x2 table in Table 11.4 the parameters may be estimated as follows:
P1 =alm1, 1-P1 =dm1 P2 = b/m2, 1-P2 = d/m2 P1* = a/n1, 1-p1* = b/n1 P2* = ctn2, 1-p2* = d/n2 Estimates of the relative risk and odds ratio may be defined as:
RR = P1/P2 = (a/m1)/(b/m2) = am2/bm1