• Tidak ada hasil yang ditemukan

Selection of controls

CHAPTER 10: Case-control Studies

10.3: Selection of controls

Control sampling options

As discussed in chapter 2, there are three main options for selection of controls: (i) cumulative sampling involves selecting controls from those who do not experience the outcome during the risk period (i.e. the survivors) and will estimate the incidence odds; (ii) case-cohort sampling involves selecting controls from the entire source population and will estimate the risk ratio; (iii) density sampling involves selecting controls longitudinally throughout the course of the study and will estimate the rate ratio. Density sampling is therefore usually the preferred approach since

the rate ratio is usually the effect measure of interest. Fortunately, although case-control studies have traditionally been presented in terms of cumulative sampling (e.g. Cornfield, 1951), most case-control studies actually involve density sampling (Miettinen, 1976), often with matching on a time variable (such as calendar time and/or age), and therefore estimate the rate ratio without the need for any rare disease assumption (Pearce, 1993). In particular, the

“standard” population-based case- control design in which all cases

occurring in a country (or state or city) in a particular year are compared with a control sample of all other people

living in the same country during the same year, actually involves density sampling with calendar year as the

“time” matching variable (possibly with additional matching on the additional

“time” variable of age).

Sources of controls

In a population-based case-control study, controls are usually sampled at random from the entire source

population (perhaps with matching on factors such as age and gender). In some instances, it may be necessary to restrict the source population in order to achieve valid control

sampling. For example, if controls are to be selected from voter registration rolls, and these are known to be less than 100% complete for the

geographical area under study, then the source population might be

restricted to persons appearing on the voter registration roll, and cases that were not registered to vote would be excluded; controls would then be sampled from this redefined source population by taking a random sample of the roll.

In registry-based studies, selection of controls may not be so straightforward because the source population may not be so easy to define and enumerate. For example, if there are two major hospitals in a city, and a study is based on lung cancer admissions in one of them during a defined risk period, then the source population is “all those who would have come to this hospital for treatment if they had developed lung cancer during this risk period”. This population may be difficult to define and enumerate,

particularly if cases may also be referred from smaller regional hospitals. The best solution is usually to define a more specific source population (e.g. all people living in the city) and to attempt to identify all cases generated by that source population, e.g. by including

admissions from all major hospitals in the city and excluding cases who do not live in the city; controls can then be sampled from that defined source population.

If it proves impossible to define and enumerate the source population, then one possibility is to select controls from people appearing in the same

“register” for other health conditions (e.g. admissions to the hospital for other causes). This may not only produce a valid sample of the “source population”, but may also have advantages in making the case and control recall more comparable (Smith et al, 1988). However, it may result in bias if the other health conditions are also caused (or prevented) by the exposure under study (Pearce and Checkoway, 1988). For this reason, the population-based approach is preferable, although registry-based studies may still be valuable when population-based studies are not practicable, provided that careful consideration is given to possible sources of bias.

Matching

In some instances it may be appropriate to match cases and controls on potential confounders (e.g. age and gender). This can be done by 1:1 matching (e.g. for each case, choose a control of the same age and gender) or by frequency

matching (e.g. if there are 25 male cases in the 30-34 age-group then choose the same number of male controls for this age-group). It is important to

emphasize, however, that this will not remove confounding in a case-control study, but will merely facilitate its

control in the analysis. For example, in a case-control study of lung cancer, the cases will generally be relatively old whereas a random general population control sample will be relatively young.

This may lead to inefficiencies when age

is controlled in the analysis since the older age-groups will contain many cases and few controls, whereas the younger age-groups will contain many controls and few cases. Matching on age will ensure that there are approximately equal numbers of cases and controls in each age-strata and will thereby improve the precision of the effect estimates (given a fixed number of cases and controls). However, it will not remove confounding by age – it merely makes it easier to control in the analysis

(Checkoway et al, 2004).

It is also important to emphasize that if

“pair” matching (i.e. 1:1 matching) has been done, then it is important to control for the matching factors in the analysis, but that this need not involve a

“matched analysis”. For example, if pair matching has been done on age and gender, then it is important to control for age and gender in the analysis, but this

can be done with simple stratification on age (e.g. by five-year age-groups) and gender and it is not necessary to retain the 1:1 matched pairs in the analysis (Rothman and Greenland, 1998).

There are also potential disadvantages of matching. In particular, matching may actually reduce precision in a case- control study if it is done on a factor that is associated with exposure but is not a risk factor for the disease under study and hence is not a true confounder (Rothman and Greenland, 1998).

Furthermore, matching is often

expensive and/or time consuming. For these reasons, it is usually sufficient, and preferable, to only match on basic

demographic factors such as age and gender, and to then control for other potential confounders (along with age and gender) in the analysis (Checkoway et al, 2004).

Example 10.3 Cole et al (2000) studied time urgency and risk of non-fatal myocardial infarction (MI) in a study of 340 cases and an equal number of age, sex and community-matched controls. Cases were identified from admissions to the coronary or intensive care units of six suburban Boston hospitals between 1 January 1982 and 31 December 1983. Those eligible for inclusion were white men and

women under 76 years old living in the Boston area with no previous history of MI. For each case, a control subject of the same sex and age (+ 5 years) was selected at random from the residents’ list of the town in which the patient resided.

Each subject was interviewed in his or her home by one of two trained nurse

interviewers

approximately 8 weeks after discharge from the hospital. A sense of

time urgency/

impatience was

ascertained using four items from the 10-item Framingham Type A scale. A dose-response relation was apparent among subjects who rated themselves higher on the four-item urgency/impatience scale with a matched odds ratio for non-fatal MI of 4.45 (95% CI 2.20-8.99) comparing those with the highest rating to those with the lowest.