Directory UMM :Data Elmu:jurnal:J-a:Journal Of Business Research:Vol48.Issue3.2000:

(1)

Error in Health-Care Survey Research

Debi Prasad Mishra

STATEUNIVERSITY OFNEWYORK ATBINGHAMTON

This article estimates the average amount of error variance in a typical Buckley, 1989). Commenting on the deleterious impact of

health-care measure by reanalyzing published multitrait–multimethod measurement error on validity, Lowe and Ryan-Wenger (1992)

(MTMM) matrices via the confirmatory factor analysis (CFA) procedure. note that “the discipline of nursing needs to be more

sophisti-For the set of studies under consideration, a typical health-care measure cated in its study of construct validity,” and that “the questions

contains a relatively high proportion (64%) of error variance (method of convergence and discrimination are central to the validity

and random). The magnitude of this error variance estimate seems to be of the instruments being developed” (p. 74). Likewise, Cote

higher than corresponding variance estimates reported in the other social and Buckley (1987) observe that “measures used in social

sciences. Using principles from psychometric theory, we note how measure- science research have a large error component” (p. 318), and

ment error can confound validity, and we provide directions to researchers go on to say that “in the future, researchers must be more

for improving the quality of survey-based health-care measures.J BUSN RES resolute in their desire to develop construct measures that are

Given the potential of measurement error to confound the reliability and validity of constructs seriously, we would expect that summary psychometric analyses (c.f., Cote and Buckley,

I

n recent years, the use of paper and pencil items for measur- 1987) of paper and pencil items used in health-care research ing theoretical concepts has become widespread in the would be readily available. However, a careful scrutiny of the health-care discipline. For example, researchers have used literature belies this expectation and underscores the paucity survey-based questionnaires for gauging theoretical constructs of research on measurement and validity issues in health-care involving patient satisfaction (Singh, 1991), depression (Beck, research. For example, in the context of nursing, Gustafson, 1967), sleep patterns (Fontaine, 1989), juvenile health (Green- Sainfort, Van Konigsveld, and Zimmerman (1990) note that baum, Dedrick, Prange, and Friedman, 1994; Marsh and “there have been few detailed evaluations of measures of qual-Gouvernet, 1989), family health (Sidani and Jones, 1995), ity of care in nursing homes” and that “much research on caregiving (Phillips, Rempusheski, and Morrison, 1989), nurs- factors affecting nursing home quality has used measures of ing (Gustafson, Sainfort, Van Konigsveld, and Zimmerman, questionable reliability and validity . . . some measures cur-1990), and total quality management in hospitals (Lammers, rently in use have been developed using methodologies not Cretin, Gilman, and Calingo, 1996), among others. based on solid conceptual grounds, offering little reason to Although the use of subjective paper and pencil measures expect them to have much internal or external validity” (p. 97). has advanced our understanding of theoretical concepts in a Our objective in this paper is twofold: (1) to provide empir-number of health-care areas, it has also been recognized that ical estimates of the amount of measurement error in paper the presence of error in such measures can seriously under- and pencil health-care measures; and in a related vein, (2) to mine the reliability and validity of concepts being measured illustrate the use of the confirmatory factor analysis (CFA) (Bagozzi, Yi, and Phillips, 1991; Bagozzi and Yi, 1990; Cole, method for modeling measurement error in health-care re-Howard, and Maxwell, 1981; Cote and Buckley, 1987; search. By conducting such an empirical analysis, we seek to Churchill, 1979; Figuerdo, Ferketich, and Knapp, 1991; Fer- address three important issues in the health-care field. First, ketich, Figuerdo, and Knapp, 1991; Lowe and Ryan-Wenger, by ascertaining the pervasiveness of measurement error and 1992; Waltz, Strickland, and Lenz, 1984; Williams, Cote, and understanding its deleterious impact upon construct validity, health-care researchers will be in a position to interpret prop-erly the results of statistical analyses conducted to uncover

Address correspondence to Debi Prasad Mishra, School of Management,

State University of New York, Binghamton, NY 13902, USA. relationships among theoretical constructs. For example, Cote

Journal of Business Research 48, 193–205 (2000)

(2)

194

J Busn Res D. P. Mishra 2000:48:193–205

Figure 1. Process for establishing construct validity of latent constructs.

and Buckley (1987) found that on average, measures in the ing TQM practices. Again, the presence of measurement error may seriously impinge upon efforts to implement, analyze, social sciences contained 41.7% trait variance, 26.3% method

and modify TQM efforts in health care. variance, and 32% error variance. Consequently, using these

In keeping with out objectives, this paper is organized as estimates, Cote and Buckley reported that “the true value for

follows. The first section offers a perspective on construct the slope in a two-variable (regression) case would be 2.4

validity and ties in the notion of measurement error with times the estimated value for measures containing 50% error

related psychometric concepts (i.e., scaling, validity, reliabil-variance” (p. 318). In other words, interpreting statistical

re-ity). The next section details the plan of data analysis. Spe-sults without assessing measurement error can seriously

un-cifically, the nested confirmatory factor analysis approach of dermine construct validity. Second, much of the concern over

Widaman (1985) based on MTMM (multitrait–multimethod) an appropriate regulatory role in health care has focused on

matrices (Campbell and Fiske, 1959) is outlined. The third efforts to define, measure, and ensure the quality of care of

section describes those health-care studies from which MTMM nursing homes and hospitals (Lemke and Moos, 1986). Given

data were reanalyzed. This is followed by a discussion of the that quality of care measures are often of the paper and pencil

results of our analysis. Finally, we describe the limitations of type, greater attention to measurement issues is warranted.

our study together with some implications for future research. Finally, there is a growing need for health-care institutions to

adopt “total quality management” (TQM) practices for

manag-ing the efficient delivery of health care (Casalou, 1991; Lam-

_{A Psychometric Perspective on}

mers, Cretin, Gilman, and Calingo, 1996). The central theme

Construct Validity

of TQM is patient satisfaction, which is a latent construct that

(3)

implement-related to the question of what the instrument is in fact measur- stones were tossed on one occasion might correlate highly with how far they were tossed on another occasion, and thus, ing—what construct, trait, or concept underlies a person’s

performance or score on a measure” (p. 70). In other words, being repeatable, the measure would be highly reliable; but obviously the tossing of stones would not constitute a valid constructs are latent entities constructed by researchers for

describing and understanding an underlying phenomenon of measure of intelligence” (pp. 191–192). In other words, “relia-bility is a necessary but not sufficient condition for validity” interest. Because these constructs are latent, they can only be

indirectly determined by using a set of observable measures. (Nunnally, 1978, p. 192). The fundamental construct validity question can be phrased

as follows.Given that a set of items purportedly measures a latent

_{Measurement Error and Validity}

construct, how confident is the researcher about a correspondence _{According to classical test theory (Gulliksen, 1950), the}

vari-between these measures and the construct in question?For

exam-ance can be partitioned into trait variexam-ance and error variexam-ance. ple, when a researcher measuresdepressionby using

physiolog-Furthermore, error variance can be subdivided into method ical indicators (e.g., heartbeat), can he or she assert that

heart-variance and random error heart-variance. Mathematically, this rela-beat is in fact a valid measure of anxiety?

tionship is expressed as Equation (1): Although the fundamental premise of construct validity is

rather simple, investigators must pay careful attention to the so5 st 1 sm1 sr (1)

underlying psychometric processes that are related to validity.

wheresois the observed variance,stthe trait (or valid)

vari-More specifically, the relationship between scaling, reliability,

ance,smthe method variance, andsrthe random error

vari-measurement error, and validity must be fully explored by

ance. Construct validity entails explaining the trait variance researchers. The following paragraphs discuss the

interrela-only. Note that method variance and random error variance tionship between validity and other psychometric concepts

may serve to attenuate or inflate theoretical relationships (scaling, reliability, measurement error). Let us consider the

among measures (Cote and Buckley, 1988). To this extent, a conceptual process involved in measuring a latent construct

rigorous estimation of method variance and random error as depicted in Figure 1.

variance should necessarily precede any attempt at investigat-As a first step, a researcher specifies the domain of the

ing construct validity. construct from which items are sampled to form a scale. The

The effect of measurement error (method variance and process of selecting a relevant subset of items from a universal

random error variance) can be highlighted with the following set of items is termed “domain sampling” (Nunnally, 1978).

example. Let us assume that a researcher administers the Note that this universal set is an infinite set of all possible

Beck Depression Inventory (BDI; Beck, 1967) to respondents. items that tap the latent construct. Because of constraints of

Furthermore, let us assume that approximately half the items time and costs, a researcher typically selects a few

representa-on the BDI are negatively scored. It is entirely possible that tive items from the universal set to form a “scale” where

re-some items on the scale may correlate highly (and conse-spondents assign numbers to questions. This process of

trans-quently load on a latent factor) just because of the particular forming responses into scores is calledscaling. Scores obtained

method (i.e., negatively worded items). In such a situation, from scales are calledobserved scores. Before proceeding

fur-it is erroneous to conclude that the set of negatively worded ther, researchers determine the relationship of the observed

items measure a trait factor. On the other hand, these items score to the score that would have been obtained had all

merely tap a method factor. In a similar vein, we can visualize items from the infinite set (i.e., the construct domain) been

how random error (respondent fatigue, length of questions, administered to aninfinite set of respondentsacross aninfinite

format) may also produce a spurious factor. In sum, measure-set of situations. The score on such an infinite set (of items,

ment error seriously compromises the construct validity of situations, and respondents) is called thetrue score.

measures. The relationship between the observed score (from a finite

Given the potential of measurement error to undermine set) and the true score (from an infinite set) is calledreliability.

construct validity severely, researchers first must estimate the Mathematically, reliability is calculated as the ratio of the

amount of error variance (both method variance and random observed variance to the true variance (Churchill, 1979). Note

error variance) in their measures before addressing the ques-that high reliability for a set of items merely implies ques-that

the correlations among items constituting a scale reflect an tion of validity. Such as exercise will help investigators ascer-tain the degree to which measurement error bias may be underlying latent factor. In other words, if one administered

a scale with high reliability across multiple situations and present. Furthermore, health-care researchers will also be compelled to make use of such statistical estimation tech-respondents, the amount of variance in the observed score

would be low; that is, the scale would beconsistentandrepeat- niques as latent variables structural equations modeling (Ben-tler, 1992), which explicitly permits the estimation of mea-able. Whether a scale exhibiting high reliability actually

mea-sures the underlying construct of interest is a question of surement error while studying theoretical relationships among constructs.

(4)

196

Figure 2. Nested confirmatory factor analysis models for analyzing MTMM data. j1,j2,j3,j45trait factors;j5,j65method factors;

l 5factor loading;φ5factor correlation; x5measured variable;d 5random error.

pointed out several problems with Campbell and Fiske’s

Method

(1959) method (see Schmitt and Stutts, 1986 for a good

The Use of MTMM Matrices for

_{discussion). Most of the criticism against the MTMM technique}

Estimating Variance Components

has been directed at its underlying assumption of the use of “maximally dissimilar methods” (Marsh, 1989), because Per conventional research practice (Bagozzi and Yi, 1991;

methods are rarely maximally dissimilar in practice. In this Bentler, 1992; Byrne, 1994; Cote and Buckley, 1987, 1988;

vein, Williams, Cote, and Buckley (1989) note that “if maxi-Williams, Cote, and Buckley, 1989), variance components

mally dissimilar methods are not used in MTMM analysis, it of measures can be estimated by analyzing MTMM matrices

becomes difficult to identify the existence of methods effects,” (Campbell and Fiske, 1959) by means of CFA. Briefly, a

(p. 462). Furthermore, the inability of the MTMM technique MTMM matrix represents intercorrelations of more than one

to provide meaningful statistical estimates of different variance trait measured by more than one method. In its original

formu-components of measures (i.e., trait, method, and random er-lation, Campbell and Fiske (1959) assumed that methods

ror) provides, at best, an incomplete assessment of validity were maximally dissimilar and that traits and methods were

(Schmitt and Stutts, 1986). not correlated.

In contrast to the well-documented shortcomings of the Despite the widespread use of MTMM matrices for assessing

MTMM technique, CFA of MTMM data represents a viable construct validity of measures (Fiske and Campbell, 1992),

alternative for assessing the construct validity of measures and researchers (Lowe and Ryan-Wenger, 1992; Marsh 1989;

(5)

its various components. The present study empirically assesses to the data than the baseline model. In the second step, M4 the amount of measurement error in measures by using CFA was compared with M3. A significantx2_{difference test further} to reanalyze published MTMM matrices in health care. The indicates that trait factors are present. Likewise, the presence nested CFA approach that formed the basis for analyzing of method factors was determined by comparing M3 with M1,

MTMM data is described next. and M4 and M2, respectively. Once trait and method factors

were detected, a confirmatory factor analysis model was esti-mated and different variance components calculated per

stan-Nested CFA Approach

dard procedures (Cote and Buckley, 1987, 1988; Williams, The nested CFA approach subsumes two interrelated

tech-Cote, and Buckley, 1989). niques for partitioning an item’s total variance: (1) estimating

While estimating the variance components of health-care a sequence of successively restricted nested models; and (2)

measures we assume that measure variation is a linear combi-comparing these nested models to determine incremental

nation of traits, methods, and error. In other words, we assume goodness-of-fit.

that trait and method factors do not vary. In contrast, recent For each of the studies in the health-care area that used

research (Bagozzi and Yi, 1991; Browne, 1989; Kim and Lee, the MTMM approach, the amount of measurement error in

1997) using the direct product (DP) model permits the estima-indicators of latent constructs was assessed by estimating a set

tion of models where trait and method factors interact. The of four hierarchically nested models per guidelines originally

multiplicative effect of trait and method factors on measure offered by Widaman (1985) and subsequent modifications

suggested in recent research that use the MTMM technique variation may be justified when differential augmentation or (Bagozzi and Yi, 1990, 1991; Bagozzi, Yi, and Phillips, 1991; differential attenuation is expected to be present among trait Williams, Cote, and Buckley, 1989). To implement the nested and method factors (Bagozzi and Yi, 1991). However, in the approach, four confirmatory factor analysis models were esti- absence of compelling reasons for considering these interac-mated and compared to yield meaningful tests of hypotheses. tions, using multiplicative models may not be justified. Based The nested models that show the possible sources of variation upon our interpretation of the studies from where data have in measures and their representation in a confirmatory factor been reanalyzed, we did not find any strong rationale for analytic model are graphically depicted in Figure 2 and de- including trait and method interations. Consequently, we have

scribed in more detail below. _{refrained from specifying and estimating DP models.}

Recall that from the set of hierarchically nested models Model 1 (M1): is a “null” model that assumes the

intercorre-described earlier, M1 represents a situation where intercorrela-lations among measures can be explained by random error

tions among measures are caused by random error alone. It only and no trait or method factors are present. This null

is highly unlikely that M1 alone can explain the data very model is depicted in panel A of Figure 2.

well, because various measures used in a study are likely to Model 2 (M2): is a “trait-only” model that hypothesizes _{stem from a conceptual framework and are expected to contain} variation in measures can be explained completely by traits _{some amount of trait and method variances as well. Likewise,} assumed to be correlated and by random error. Panel B in _{M3 holds that method factors and error variances alone can} Figure 2 depicts the trait-only model. _{explain all the variation in measures. This situation of method}

Model 3 (M3): depicted in Panel C of Figure 2 is a “method- and error variances accounting for all the variance in measures only” model that hypothesizes variation in measures can does not seem reasonable, because a researcher who employs be completely explained by correlated method factors plus the MTMM technique has likely used some theoretical

guide-random error. lines to develop measures. Although M1 and M3 are not

related to theory at all, M2 suggests that measure variance is Model 4 (M4): is a “trait and method” model where variance

a function of trait factors and error variance. The situation among measures is explained completely by traits,

meth-depicted by M2 is also unlikely, because paper and pencil ods, and random error, with freely estimated correlations

measures used to measure latent constructs are expected to among trait factors and correlated method factors. Panel

contain some degree of method variance. Hence, M4, which D in Figure 2 depicts the trait and method model.

is a trait and method model, is expected to outperform M1, To implement the nested approach and determine the best _{M2, and M3. In other words, M4 is more representative of} fitting model, a sequence ofx2_{difference tests were conducted.}

actual data than the other nested models. Given these observa-First, Model 2 (M2) was compared with the null model (M1) _{tions we hypothesize that:}

to determine whether trait factors were present or not. Model

H1: For the studies under consideration, the trait and comparison entails computing thex2_{difference between two}

method model (M4), which holds that the variance models and calculating the corresponding difference in the

in measures of latent constructs can be explained by degrees of freedom. A rejection of the null hypothesis (H0:

traits, methods, and random error, will outperform The compared models offers no incremental fit of model to

(6)

198

Figure 3. CFA model of MTMM data in Saylor Finch, Baskin, Furey, and Kelly’s (1984)j1,j2,5trait factors;j3,j4,j5,j65method factors;l 5factor loading;φ5factor correlation; x5measured variable;d 5random error.

only), M2 (trait plus error), and M3 (method plus tional Journal of Nursing Studies, and American Journal of Public

error). Health.

Based upon our search, we uncovered 19 studies that claimed to have used the MTMM method for examining the

Data Collection

convergent and discriminant validity of measures. Of these To identify studies in the health-care area that have used

19 studies, 15 articles were excluded, because either they did MTMM matrices for assessing the validity of measures, we

not publish the MTMM matrices at all (e.g., Greenbaum, turned to several published sources and implemented a

sys-Dedrick, Prange, and Friedman, 1994) or published a partial tematic search process along the lines followed by Cote and

MTMM matrix (e.g., Sidani and Jones, 1995) that could not Buckley (1987). First, we examined Turner’s (1981) article,

be used for reanalyzing the data. Hence, our empirical analysis which lists 70 studies in which the MTMM method was used.

is based upon correlation data provided in four different stud-Second, we examined the PSYCHLIT database for the 1979–

ies that have used the MTMM design. 1994 time period and searched for studies in the health-care

Although the few studies that have used the MTMM method area by using “MTMM,” “multitrait,” and “multimethod” as

in health-care research might seem atypical, a closer scrutiny keywords. In addition, we also examined the MEDLINE and

of available evidence from other streams in the social sciences ABI-INFORM databases for the 1980–1995 time period to

suggests that this is not so. For example, Bagozzi and Yi (1991) identify published studies that used the MTMM technique.

reported that only four studies that used the MTMM design Finally, we visually examined all articles that appeared in

were published over a 17-year period in theJournal of Con-major health-care journals between 1990 and 1994 for

evi-sumer Research. Explaining the relatively infrequent use of the dence of use of the MTMM procedure. The major journals

MTMM method in social science research, Bagozzi and Yi that we examined wereResearch in Nursing and Health, Medical

(1991) note that “one explanation [for this paucity of pub-Care, Heart and Lung, Journal of Health Care Marketing,

Ad-lished studies] is the difficulty in obtaining multiple measures vances in Nursing Science, Journal of Consulting and Clinical

of each construct in one’s theory and using different methods Psychology, Nursing Research, Journal of the American Medical

(7)

(1992) in a review article note that over a 10-year period, WOLFE (1991). The authors developed the “Children’s Impact of Traumatic Events Scale (CITES)” for assessing the impact only six studies that reported the use of the MTMM procedure

of sexual abuse from the child’s perspective. Seventy-eight were published in the major nursing journals. Given the

rela-scale items were used to measure four traits;PTSD(a compos-tively infrequent use of MTMM matrices for construct

valida-ite factor of intrusive thoughts, avoidance, hyperarousal, and tion across a number of disciplines, the present dataset seems

sexual anxiety),Social Reactions,Abuse Attributions, and Eroti-adequate enough to estimate empirically the extent to which

cism. Two methods were used for construct validation: (1) measurement error is present in health-care research measures.

the CITES-R scale; and (2) a combined scale comprising items from a number of scales (i.e., SAFE, CBCL-FTSD etc.; Wolfe,

Description of Studies

Gentile, Michienzi, Sas, and Wolfe, 1991). In a sample size A few studies in health-care research (e.g., Marsh and

Gouver-of 60 respondents, the researchers found some evidence Gouver-of net, 1989; Saylor, Finch, Baskin, Furey, and Kelly, 1984;

convergent and discriminant validity but acknowledged that Sidani and Jones, 1995; Wolfe, Gentile, Michienzi, Sas, and

their results were prone to measurement error biases. The Wolfe, 1991) have used MTMM matrices for examining the

CFA model for the Wolfe study is shown in Figure 4. construct validity of measures. Specifically, the Saylor and

colleagues study investigates the construct validity for mea- MARSH AND GOUVERNET (1989). The construct validity of sures of “childhood depression” using an MTMM approach, children’s responses to two multidimensional self-concept and the Wolfe and colleagues study develops and validates a scales (methods) for measuring four traits (physical skill, social scale for assessing the impact of sexual abuse from the child’s skill, general skill, and cognitive skill) was assessed by means perspective. In both these articles, a reasonable degree of of an MTMM analysis based on 508 responses. The authors “convergent” and “discriminant” validity was reported by the of this study found “strong support for both the convergent authors. However, these studies did not use a CFA framework, and discriminant validity of responses to these two multidi-rendering the observed results prone to measurement error mensional self-concept instruments” (p. 63). Figure 5 illus-biases. In fact, Wolfe and colleagues explicitly acknowledge trates the CFA model pertaining to the Marsh and Gouvernet that “of concern is the magnitude of error variance . . . which (1989) study.

means that scores on the tests are more subject to unknown

SINGH (1991). In a study involving 368 subjects, the author sources of variance than they are to the recognized sources,”

used a pseudo-MTMM approach, where three different traits and go on to say that “these results call for a subsequent

(i.e., physician satisfaction, hospital satisfaction, and insurance confirmatory factor analysis of the items” (p. 382). We hope

satisfaction) were measured by two different methods; namely, that the results of our analysis will answer some of their

the over-all item and multi-item ratings. Convergent and dis-concerns. We describe these aforementioned studies more

criminant validity of measures was evaluated by means of fully in the following paragraphs.

CFA. Although the MTMM matrix was not reported in the study, estimates of trait and method loadings were provided

SAYLOR, FINCH, BASKIN, FUREY, AND KELLY (1984). This study

by the author, which, in turn, permits us to estimate the investigated the construct validity of childhood depression

amount of trait, method, and error variances in the measures. measures. Specifically,two trait factors,angerand depression,

Note that because Singh’s (1991) study does not publish the were measured byeight items(four each for anger and

depres-pseudo-MTMM matrix, we do not reanalyze the MTMM ma-sion) and four methods. The CFA model pertaining to this

trix by CFA. On the other hand, we use the factor loadings study is depicted in Figure 3. The first method used was a

(for traits and methods) reported in the study to compute “Children’s Depression and Anger” inventory (Kovacs, 1981;

variance estimates for the measures. Figure 6 depicts the CFA Nelson and Finch, 1978). The second format was the “Peer

model for the Singh (1991) study. Nomination Inventory of Depression and Anger” (Finch and

MTMM matrices reported in the studies described above Eastman, 1983; Lefkowitz and Tesiny, 1980). The last two

(with the exception of Singh, 1991) were reanalyzed by using methods were the “Teacher Nominated Inventory of

Depres-the CFA procedure (Bentler, 1992). Specifically, Depres-the nested sion and Anger” (Saylor, Finch, Baskin, Furey, and Kelly,

approach of Widaman (1985) described earlier was used to 1984), and the “AML Rating Form” (Cowen, Door, Clarfield,

estimate a set of hierarchical (nested) factor analytic models. and Kreling, 1973), respectively. Construct validity of

mea-The results of our analysis are discussed in the next section. sures was investigated by means of an 838 multitrait (2 traits)

multimethod (4 methods) matrix based on 133 responses.

Interestingly, although the authors found some support for

_Results

convergent and discriminant validity, they also expressed

con-Model Estimation Results

cern over measurement error, which was an important source

(8)

200

Figure 4. CFA model of MTMM data in Wolfe, Gentile, Michienzi, Sas, and Wolfe’s (1991) study.j1,j2,j3,j45trait factors;j5,j65

method factors;l 5factor loading;φ5factor correlation; x5measured variable;d 5random error.

EQS was preferred over Lisrel VII (Joresko¨g and Sorbo¨m considered in the present study, intercorrelations among items can be best explained by a combination of random error, 1989), because the EQS program automatically imposes

bounds on negative error variances (Heywood cases), which method factors, and trait factors.

For estimating the amount of trait, method, and random are constrained at the lower limit of “0,” yielding stable

param-eter estimates. As Williams, Cote, and Buckley (1989) note error variance in each measure, we squared the loading of each item in the final model (M4) on its corresponding trait “Heywood cases are quite common in MTMM analyses using

confirmatory factor analyses” (p. 463). By using the EQS soft- and method factor. Error variances were freely estimated by the EQS program. Note that for some items, error variances ware we attempted to minimize estimation problems

associ-ated with negative error variances. had to be fixed at the lower bound of zero for model identifica-tion (Dillon, Kumar, and Mulani, 1987) and for avoiding For each dataset we analyzed, thetrait and methodmodel

(M4) provides the best fit of data to theory, thereby supporting Heywood cases. The results of variance partitioning are de-tailed in Table 2.

H1. Specifically, different model comparisons yield significant values of thex2 _{difference test for the various nested model}

Variance Partitioning Results

comparisons. For example, in the Saylor, Finch, Baskin, Furey,

and Kelly (1984) study, each model comparison is significant For the Saylor, Finch, Baskin, Furey, and Kelly (1984) study, (M2–M1: Dx2 ₅ _156, _df ₅ _9, _p_, _{.0001; M4–M3:}_Dx2 ₅

estimates of trait variance vary from a low of 7% to a high of 133,df5 14, p, .0001; M3–M1:Dx2 ₅ _95,_df ₅_9,_p _,

(9)

Figure 5. CFA model of MTMM data in Marsh and Gouvernet (1991).j1,j2,j3,j45trait factors;j5,j65method factors;l 5factor loading;φ5factor correlation; x5measured variable;d 5random error.

of 21% to a high of 77%, and both the method (17–50%) in health-care research seem to perform more poorly than their social science counterparts in measuring trait variance. and error variance (1–34%) estimates seem moderately high.

For the Singh (1991) study, estimates of trait variances vary Specifically, although the typical health-care measure in this study contains an average of only 35.6% trait variance, Cote from a low of 2% to a high of 74%, and the method (25–64%)

and error variance (2–53%) values seem to be high. Across and Buckley (1987) reported that, on average, measures in the social sciences contained 41.7% trait variance.

all studies (including two published studies that were not

reanalyzed), the mean values of trait (35.6%), method (30%), We can only speculate on some of the possible reasons for the presence of high measurement error in health-care and random error (34.4%) variances suggest the widespread

prevalence of measurement error in health-care measures. We research. First, because some subdisciplines within health-discuss these results in greater detail in the following sections. care research (e.g., patient satisfaction, TQM) are relatively new, a number of new instruments are being developed by researchers to understand phenomena without paying

ade-Conclusions

_{quate attention to the construct validity of these instruments.}

Consequently, measures have not been successively refined This paper has two main objectives: (1) to estimate the amount

and validated. For example, many studies in the Journal of of measurement error in health care research; and (2) to

Clinical Psychology represent attempts at developing new

illustrate the use of the CFA technique as a viable method

scales. Such an emphasis on new scales is understandable for modeling measurement error in health-care research. We

given the relative “youth” of the discipline where new concepts hoped to estimate empirically the degree of measurement error

and scales have to be developed for tackling the impact of in health-care measures so that researchers would be sensitized

changing social mores on everyday life (e.g., industrialization to the interaction between construct validity and error variance.

on depression; alternative life styles on AIDS). Although the The results of our reanalysis underline the presence of

direction of research effort toward the development of new high levels of measurement error in health-care research. In

(10)

202

Figure 6. CFA model for MTMM data in Singh (1991).j1,j2,j35trait factors;j4,j55method factors;l 5factor loading;φ5factor correlation; x5measured variable;d 5random error.

attention to refining existing scales. To this extent, scale purifi- undertaken in line with guidelines established by psychomet-ric theory (Nunnally, 1978). Several excellent illustrations of cation efforts may be greatly aided by placing a relatively

greater emphasis on replication studies (Hubbard and Arm- the use of proper procedures for developing better measures of latent constructs exist in the literature (Churchill, 1979; strong, 1994).

Another reason for the presence of high amounts of mea- Gerbing and Anderson, 1988; Kim and Lee, 1997; Singh and Rhoads, 1991; Suen 1990). Second, researchers should surement error seems to be the dominance of a paradigm that

has historically emphasized the assessment of technical aspects undertake more replication studies in the future (Hubbard and Armstrong, 1994). Specifically, researchers may use existing of health-care quality. For example, a historical quality

mea-scales as a basis for replication and consequent purification sure for nursing homes may involve computing composite

of measures. Finally, in undertaking statistical analyses, re-scores on such tangible and objective entities as room odor,

searchers may make use of such techniques as covariance rodent control, and soiled linen handling (Gustafson, Sainfort,

structure modeling, which allows the decomposition and esti-Van Konigsveld, and Zimmerman, 1990). In contrast to the

mation of total variance in measures into its various error use of such direct measures, today’s health-care quality

mea-components. Covariance structure modeling is easily imple-sures are often more perceptual in nature (e.g., patient

satisfac-mented by using such a technique as EQS (Bentler 1992). tion). Consequently, while making a transition from the use of

Although covariance structure models do not eliminate mea-direct measures to more perceptual measures, construct

valida-surement error in measures, they provide less biased estimates tion efforts have lagged behind.

of true relationships among various theoretical constructs. How should health-care researchers improve upon the

rela-tively poor quality of measures? We suggest a multipronged

approach to this problem. First, irrespective of the chronologi-

_Limitations

cal stage of the discipline, health-care researchers should pay

(11)

Table 1. Statistics for Nested Models and Model Differences

Study x2(df) p CFIa

Saylor, Finch, Baskin, Furey, and Kelly (1984)

Null model x2₍₂₈₎₅₂₃₃ _,_.001

Trait only x2₍₁₉₎₅₇₇ _,_.001 _0.668

Method only x2₍₁₉₎₅₁₃₈ _,_.001 _0.408

Trait and method x2₍₅₎ ₅₅ ₅_0.6 _0.978

Null-trait only x2₍₉₎ ₅₁₅₆ _,_.001

Trait and Method-method only x2₍₁₄₎₅₁₃₃ _,_.001

Method only-null x2₍₉₎ ₅₉₅ _,_.01

Trait and method-trait only x2₍₁₄₎₅₇₂ _,_.01

Wolfe, Gentile, Michienzi, Sas, and Wolfe (1991)

Null model x2₍₂₈₎₅₁₁₁ _,_.001

Trait and method-method only x2₍₁₄₎₅₁₅₃₈ _,_.001

Method only-null x2₍₇₎ ₅₆₇ _,_.01

Trait and method-trait only x2₍₉₎ ₅₃₈₃ _,_.01

All models were estimated by the EQS software package.

a_{CFI per Bentler (1990).}

Bentler, P. M.: Comparative Fit Indexes in Structural Models.

Psycho-matrices. As a result, our findings may not be sufficiently

logical Bulletin107 (1990): 238–246.

generalizable. However, as Bagozzi and Yi (1991) note, there

Bentler, P. M.: EQS: Structural Equations Program Manual. BMDP

is a paucity of published MTMM matrices in many disciplines.

Statistical Software, Los Angeles, CA. 1992.

To the extent that many studies have not measured traits

Blalock, Hubert M.:Conceptualization and Measurement in the Social

using different methods, we suspect that the overall picture

Sciences.Sage, Thousand Oaks, CA. 1982.

in terms of measurement error may be even worse. However,

Browne, Michael W.: Relationship Between an Addictive Model and

the results of our analyses are in line with extant literature on

a Multiplicative Model for Multitrait Multimethod Matrices, in

measurement error and validity in the social sciences (Blalock,

Multiway Data Analysis. R. Coppi and S. Bolaco, eds.,

North-1982; Venkatraman and Grant, 1986), and may serve as a _{Holland, Amsterdam. 1989, pp. 507–520.}

starting point for calling more attention to construct validity _{Byrne, Barbara:}_{Structural Equation Modeling with EQS and}

EQS/Win-issues in health-care research. _dows_{. Sage, Thousand Oaks, CA. 1994.}

Campbell, Donald T., and Fiske, Donald W.: Convergent and Dis-criminant Validation by the Multitrait–Multimethod Matrix.

Psy-References

chological Bulletin56 (March 1959): 81–105. Bagozzi, Richard P., and Yi, Youjae: Assessing Method Variance in

Casalou, Robert F.: Total Quality Management in Health Care.

Hospi-Multitrait–Multimethod Matrices: The Case of Self-Reported

Af-tal and Health Services Administration36(1) (Spring 1991): 134– fect and Perceptions at Work.Journal of Applied Psychology 75

146. (1990): 547–560.

Churchill, Gilbert A., Jr.: A Paradigm for Developing Better Measures Bagozzi, Richard P., and Yi, Youjae: Multitrait–Multimethod Matrices

of Marketing Construct.Journal of Marketing Research16 (Febru-in Consumer Research.Journal of Consumer Research17 (March

ary 1979): 64–72. 1991): 426–439.

Cole, D. A., Howard, G. S., and Maxwell, S. E.: Effects of Mono Bagozzi, Richard P., Yi, Youjae, and Phillips, Lynn W.: Assessing

versus Multiple Operationalization in Construct Validation Ef-Construct Validity in Organizational Research.Administrative

Sci-forts.Journal of Consulting and Clinical Psychology49 (1981): 395–

ence Quarterly36 (September 1991): 421–458.

405. Beck, A. T.:Depression: Clinical, Experimental, and Theoretical Aspects.

(12)

204

Table 2. Variance Partitioning Results and Error Variance: Generalizing Across 70 Construct Validation Studies.Journal of Marketing Research24 (August 1987): 315–318.

Variance Partitioning (%)

Study _{Cote, Joseph A., and Buckley, M. Ronald: Measurement Error and} (Year) Items Trait Method Error _{Theory Testing in Consumer Research: An Illustration of the}

Importance of Construct Validation.Journal of Consumer Research

Saylor, Finch, Baskin, Furey, and Kelly _{14 (March 1988): 579–582.} (1984)

Cowen, E., Door, D., Clarfield, S., and Kreling, B.: The AMLIA

Quick-CIA 7 41 52

Screening Device for Early Identification of School Maladaptation.

CDI 15 29 56

American Journal of Community Psychology1 (1973): 12–35.

PNIA 54 2 44

Dillon, William R., Kumar, A., and Mulani, N.: Offending Estimates

PNID 24 76 ——a

in Covariance Structure Analysis: Comments on the Causes of

TNIA 47 0.5 52.5

and Solutions to Heywood Cases.Psychological Bulletin 101(1)

TNID 27 73 ——a

(1987): 126–135.

AML-A 55 45 ——a

AML-M 24 10 66 _{Ferketich, Sandra L., Figuerdo, Aurelio J., and Knapp, Thomas R.:} Marsh and The Multitrait–Multimethod Approach to Construct Validity.

Re-Gouvernet search in Nursing and Health14 (August 1991): 315–320. (1989) PHYSICAL 1 64 20 16 _{Figuerdo, Aurelio J., Ferketich, Sandra L., and Knapp, Thomas R.:}

SOCIAL 1 49 20 31 _{More on MTMM: The Role of Confirmatory Factory Analysis.} GENERAL 1 21 50 29 _{Research in Nursing and Health}_{14 (October 1991): 387–391.}

COGNITIVE 1 65 34 1

Finch, A. J., and Eastman, E. S.: A Multimethod Approach to

Measur-PHYSICAL 2 49 17 34

ing Anger in Children.Journal of Psychology115 (1983): 55–60.

SOCIAL 2 64 26 10

GENERAL 2 77 22 1 Fiske, Donald W., and Campbell, Donald T.: Citations Do Not Solve COGNITIVE 2 21 50 29 Problems.Psychological Bulletin112(3) (1992): 393–395. Singh _{Fontaine, D. K.: Measurement of Nocturnal Sleep Patterns in Trauma}

(1991) SAT 1 49 49 2 _Patients._{Heart and Lung}_{18 (1989): 402–410.}

SAT 2 2 49 49

Gerbing, David W., and Anderson, James C.: An Updated Paradigm

SAT 3 38 25 37

for Scale Development Incorporating Unidemensionality and its

SAT 4 74 49 ——a

Assessment.Journal of Marketing Research25 (May 1988): 186–

SAT 5 22 25 53

192.

SAT 6 36 25 39

SAT 7 41 64 ——a _{Greenbaum, Paul E., Dedrick, Robert F., Prange, Mark E., and}

Fried-SAT 8 49 49 2 man, Robert M.: Parent, Teacher, and Child Ratings of Problem SAT 9 64 25 11 Behaviors of Youngsters With Serious Emotional Disturbances. Wolfe et al. Psychological Assessment6(2) (1994): 141–148.

(1991) PTSD 1 46 52 2 _{Gulliksen, H.:}_{Theory of Mental Tests}_{, Wiley, New York. 1950.}

SOCRE 1 0.1 79 20.9

Gustafson, David H., Sainfort, F. C., Van Konigsveld, Richard, and

ATTRIB 1 31 68 1

Zimmerman, David R.: The Quality Assessment Index (QAI) for

EROTIC 1 13 12 75

Measuring Nursing Home Quality.Health Services Research25(1)

PTSD 2 8 11 81

(1990): 97–127.

SOCRE 2 85 15 ——a

ATTRIB 2 2 28 70 Hubbard R., and Armstrong, J. S.: Replications and Extensions in EROTIC 2 7 7 76 Marketing: Rarely Published but Quite Contrary. International

Cole et al. Journal of Research in Marketing11 (1994): 233–248.

(1981) SELFR 1 44 44 12 _{Joresko¨g, Karl G., and Sorbo¨m, Dag:}_{Lisrel 7: User’s Reference Guide.} SIGOT 1 38 41 21 _{Indiana Scientific Software. 1989.}

ROLEP 1 24 22 54

Kim, Chankon, and Lee, Hanjon: Development of Family Triadic

INVIV 1 10 4 86

Measures for Children’s Purchase Influence.Journal of Marketing

SELFR 2 28 55 17

Research35 (August 1997): 307–321.

SIGOT 2 45 41 14

ROLEP 2 18 22 60 Kovacs, M.: Rating Scales To Assess Depression in School-Age Chil-INVIV 2 24 4 72 dren.Acta Paedopsychiatry23 (1981): 437–457.

Fontaine,b

Lammers, John C., Cretin, Shan, Gilman, Stuart, and Calingo, E.: (1989) Overall 22 4 74 _{Total Quality Management in Hospitals: The Contributions of} Over-allc

Commitment, Quality Councils, Teams, Budgets, and Training estimates 35.6 30 34.4 _{to Perceived Improvement at Veterans Health Administration}

Hos-pitals.Medical Care34(5) (1996): 463–478.

a_{Value of this estimate was fixed at zero for identification purposes.}

Lefkowitz, M. M., and Tesiny, E. P.: Assessment of Childhood

De-b_{Over-all value estimated by ANOVA by Rowe and Ryan-Wenger (1992).}

pression.Journal of Consulting and Clinical Psychology48 (1980):

c_{Fontaine (1989) excluded.}

43-50.

Lemke, S., and Moos, R.: Quality of Residential Settings for Elderly Adults.Journal of Gerontology4(2) (1986): 268–276.

(13)

and Fiske: Assessment of Convergent and Discriminant Validity. Sidani, Souraya, and Jones, Elaine: Use of Multrait Multimethod (MTMM) to Analyze Family Relational Data. Western Journal of Research in Nursing and Health15 (1992): 67–75.

Nursing Research17(5) (1995): 556–570. Marsh, Herbert W.: Confirmatory Factor Analyses of Multitrait–

Singh, Jagdip: Understanding the Structure of Consumers’ Satisfaction Multimethod Data: Many Problems and a Few Solutions.Applied

Evaluations of Service Delivery.Journal of the Academy of Marketing Psychometric Measurement13(4) (December 1989): 335–361.

Science19(3) (1991): 223–244. Marsh, Herbert W., and Hocevar, Dennis: A New More Powerful

Singh, Jagdip, and Rhoads, Gary K.: Boundary Role Ambiguity in Approach to Multitrait–Multimethod Analyses: Application of

Sec-Marketing Oriented Positions: A Multidimensional, Multifaceted, ond-Order Confirmatory Factor Analysis.Journal of Applied

Psychol-Operationalization.Journal of Marketing Research28 (August 1991):

ogy73(1) (1988): 107–117.

328–338. Marsh, Herbert W., and Gouvernet, Paul J.: Multidimensional

Self-Suen, Hoi K.:Principles of Test Theories.Lawrence Erlbaum, Hillsdale, Concepts and Perceptions of Control: Construct Validation of

Re-NJ. 1990. sponses by Children.Journal of Educational Psychology81(1) (1989):

Turner, Carol J.:The Multitrait–Multimethod Matrix: 1967–1980.JSAS

57–69.

Catalog of Selected Documents in PsychologyMs. 2280. 1981. Nelson, W. M., and Finch, A. J.: The Children’s Inventory of Anger

Venkatraman, N., and Grant, John H.: Construct Measurement in (CIA). Unpublished manuscript, Medical University of South

Caro-Organizational Strategy Research: A Critique and Proposal.Academy

lina. 1978.

of Management Review11(1) (1986): 71–87. Nunnally, Jum C.:Psychometric Theory.McGraw-Hill, New York. 1978.

Waltz, C. F., Strickland, O. L., and Lenz, E. R.:Measurement in Nursing

Phillips, L. R., Rempusheski, V. F., and Morrison, E.: Developing and Research.Davis, Philadelphia. 1984.

Testing the Beliefs about Caregiving Scale.Research in Nursing and _{Widaman, Keith F.: Hierarchically Nested Covariance Structure Models} Health12 (1989): 207–220. _{for Multitrait–Multimethod Data.}_{Applied Psychological Measurement}

9 (March 1985): 1–26. Saylor, Conway Fleming, Finch, A. J. Jr., Baskin, Cathy Haas, Furey,

William, and Kelly, Mary Margaret: Construct Validity for Measures _{Williams, L. J., Cote, J. R., and Buckley, M. R.: Lack of Method Variance} of Childhood Depression: Application of Multitrait–Multimethod _{in Self-Reported Affect and Perceptions at Work: Reality or Artifact?} Methodology.Journal of Consulting and Clinical Psychology52(6) _{Journal of Applied Psychology}_{74 (1989): 462–468.}

(1984): 977–985. _{Wolfe, Vicky Veitch, Gentile, Carole, Michienzi, Teresa, Sas, Louise,} Schmitt, Neal, and Stutts, Daniel M.: Methodology Review: Analysis of and Wolfe, David A.: The Children’s Impact of Traumatic Events Multitrait–Multimethod Matrices.Applied Psychological Measurement Scale: A Measure of Post-Sexual-Abuse PTSD Symptoms.Behavioral