5 Validity and reliability - Research Methods in Education

• systemic validity;

• catalytic validity;

• ecological validity;

• cultural validity;

• descriptive validity;

• interpretive validity;

• theoretical validity;

• evaluative validity.

It is not our intention in this chapter to discuss all of these terms in depth. Rather, the main types of validity will be addressed. The argument will be made that, whilst some of these terms are more comfortably the preserve of quantitative methodologies, this is not exclusively the case.

Indeed, validity is the touchstone of all types of educational research. That said, it is important that validity in different research traditions is faithful to those traditions; it would be absurd to declare a piece of research invalid if it were not striving to meet certain kinds of validity, e.g.

generalizability, replicability, controllability.

Hence the researcher will need to locate her dis-cussions of validity within the research paradigm that is being used. This is not to suggest, how-ever, that research should be paradigm-bound, that is a recipe for stagnation and conservatism.

Nevertheless, validity must be faithful to its premises and positivist research has to be faith-ful to positivist principles, e.g.:

• controllability;

• replicability;

• predictability;

• the derivation of laws and universal state-ments of behaviour;

• context-freedom;

• fragmentation and atomization of research;

• randomization of samples;

• observability.

By way of contrast, naturalistic research has several principles (Lincoln and Guba, 1985;

Bogdan and Biklen, 1992):

• the natural setting is the principal source of data;

• context-boundedness and ‘thick description’;

• data are socially situated, and socially and culturally saturated;

• the researcher is part of the researched world;

• as we live in an already interpreted world, a doubly hermeneutic exercise (Giddens, 1979) is necessary to understand others’ understand-ings of the world; the paradox here is that the most sufficiently complex instrument to understand human life is another human (Lave and Kvale, 1995; 220), but that this risks human error in all its forms;

• holism in the research;

• the researcher—rather than a research tool—

is the key instrument of research;

• the data are descriptive;

• there is a concern for processes rather than simply with outcomes;

• data are analysed inductively rather than us-ing a priori categories;

• data are presented in terms of the respondents rather than researchers;

• seeing and reporting the situation through the eyes of participants—from the native’s point of view (Geertz, 1974);

• respondent validation is important;

• catching meaning and intention are essential.

Indeed Maxwell (1992) argues that qualitative researchers need to be cautious not to be work-ing within the agenda of the positivists in argu-ing for the need for research to demonstrate con-current, predictive, convergent, criterion-related, internal and external validity. The discussion be-low indicates that this need not be so. He ar-gues, with Guba and Lincoln (1989), for the need to replace positivist notions of validity in quali-tative research with the notion of authenticity.

Maxwell, echoing Mishler (1990), suggests that

‘understanding’ is a more suitable term than ‘va-lidity’ in qualitative research. We, as researchers, are part of the world that we are researching, and we cannot be completely objective about that, hence other people’s perspectives are equally as valid as our own, and the task of research is to uncover these. Validity, then, attaches to accounts, not to data or methods (Hammersley and Atkinson, 1983); it is the meaning that subjects

Chapter 5

107

give to data and inferences drawn from the data that are important. ‘Fidelity’ (Blumenfeld-Jones, 1995) requires the researcher to be as honest as possible to the self-reporting of the researched.

The claim is made (Agar, 1993) that, in quali-tative data collection, the intensive personal in-volvement and in-depth responses of individu-als secure a sufficient level of validity and reli-ability. This claim is contested by Hammersley (1992:144) and Silverman (1993:153), who ar-gue that these are insufficient grounds for valid-ity and reliabilvalid-ity, and that the individuals con-cerned have no privileged position on interpre-tation. (Of course, neither are actors ‘cultural dopes’ who need a sociologist or researcher to tell them what is ‘really’ happening!) Silverman argues that, whilst immediacy and authenticity make for interesting journalism, ethnography must have more rigorous notions of validity and reliability. This involves moving beyond select-ing data simply to fit a preconceived or ideal conception of the phenomenon or because they are spectacularly interesting (Fielding and Field-ing, 1986). Data selected must be representa-tive of the sample, the whole data set, the field, i.e. they must address content, construct and concurrent validity.

Hammersley (1992:50–1) suggests that valid-ity in qualitative research replaces certainty with confidence in our results, and that, as reality is independent of the claims made for it by research-ers, our accounts will only be representations of that reality rather than reproductions of it.

Maxwell (1992) argues for five kinds of va-lidity in qualitative methods that explore his notion of ‘understanding’:

• descriptive validity (the factual accuracy of the account, that it is not made up, selective, or distorted); in this respect validity subsumes reliability; it is akin to Blumenfeld-Jones’s (1995) notion of ‘truth’ in research—what actually happened (objectively factual);

• interpretive validity (the ability of the research to catch the meaning, interpretations, terms, in-tentions that situations and events, i.e. data, have for the participants/subjects themselves, in their

terms); it is akin to Blumenfeld-Jones’s (1995) notion of ‘fidelity’—what it means to the re-searched person or group (subjectively meaning-ful); interpretive validity has no clear counter-part in experimental/positivist methodologies;

• theoretical validity (the theoretical construc-tions that the researcher brings to the research (including those of the researched)); theory here is regarded as explanation. Theoretical validity is the extent to which the research explains phenomena; in this respect is it akin to construct validity (discussed below); in theo-retical validity the constructs are those of all the participants;

• generalizability (the view that the theory gen-erated may be useful in understanding other similar situations); generalizing here refers to generalizing within specific groups or com-munities, situations or circumstances validly) and, beyond, to specific outsider communi-ties, situations or circumstance (external va-lidity); internal validity has greater signifi-cance here than external validity;

• evaluative validity (the application of an evalu-ative framework, judgemental of that which is being researched, rather than a descriptive, explanatory or interpretive one). Clearly this resonates with critical-theoretical perspectives, in that the researchers’ own evaluative agenda might intrude.

Both qualitative and quantitative methods can address internal and external validity.

Internal validity

Internal validity seeks to demonstrate that the explanation of a particular event, issue or set of data which a piece of research provides can ac-tually be sustained by the data. In some degree this concerns accuracy, which can be applied to quantitative and qualitative research. The find-ings must accurately describe the phenomena being researched.

This chapter sets out the conventional notions of validity as derived from quantitative meth-odologies. However, in ethnographic research

DEFINING VALIDITY

internal validity can be addressed in several ways (LeCompte and Preissle, 1993:338):

• using low-inference descriptors;

• using multiple researchers;

• using participant researchers;

• using peer examination of data;

• using mechanical means to record, store and retrieve data.

In ethnographic, qualitative research there are several overriding kinds of internal validity (LeCompte and Preissle, 1993:323–4):

• confidence in the data;

• the authenticity of the data (the ability of the research to report a situation through the eyes of the participants);

• the cogency of the data;

• the soundness of the research design;

• the credibility of the data;

• the auditability of the data;

• the dependability of the data;

• the confirmability of the data.

The writers provide greater detail on the issue of authenticity, arguing for the following:

• fairness (that there should be a complete and balanced representation of the multiple re-alities in and constructions of a situation);

• ontological authenticity (the research should provide a fresh and more sophisticated under-standing of a situation, e.g. making the famil-iar strange, a significant feature in reducing

‘cultural blindness’ in a researcher, a problem which might be encountered in moving from being a participant to being an observer (Brock-Utne, 1996:610));

• educative authenticity (the research should generate a new appreciation of these understandings);

• catalytic authenticity (the research gives rise to specific courses of action);

• tactical authenticity (the research should ben-efit all those involved—the ethical issue of

‘beneficence’).

Hammersley (1992:71) suggests that internal validity for qualitative data requires attention to:

• plausibility and credibility;

• the kinds and amounts of evidence required (such that the greater the claim that is being made, the more convincing the evidence has to be for that claim);

• clarity on the kinds of claim made from the research (e.g. definitional, descriptive, ex-planatory, theory generative).

Lincoln and Guba (1985:219, 301) suggest that credibility in naturalistic inquiry can be ad-dressed by:

• prolonged engagement in the field;

• persistent observation (in order to establish the relevance of the characteristics for the focus);

• triangulation (of methods, sources, investi-gators and theories);

• peer debriefing (exposing oneself to a disin-terested peer in a manner akin to cross-ex-amination, in order to test honesty, working hypotheses and to identify the next steps in the research);

• negative case analysis (in order to establish a theory that fits every case, revising hypoth-eses retrospectively);

• member checking (respondent validation) to assess intentionality, to correct factual errors, to offer respondents the opportunity to add further information or to put information on record; to provide summaries and to check the adequacy of the analysis).

Whereas in positivist research history and matu-ration are viewed as threats to the validity of the research, ethnographic research simply as-sumes that this will happen; ethnographic re-search allows for change over time—it builds it in. Internal validity in ethnographic research is also addressed by the reduction of observer ef-fects by having the observers sample both widely and stay in the situation long enough for their presence to be taken for granted. Further, by tracking and storing information, it is possible for the ethnographer to eliminate rival explana-tions of events and situaexplana-tions.

Chapter 5

109

External validity

External validity refers to the degree to which the results can be generalized to the wider popu-lation, cases or situations. The issue of generali-zation is problematical. For positivist research-ers generalizability is a sine qua non, whilst this is attenuated in naturalistic research. For one school of thought, generalizability through strip-ping out contextual variables is fundamental, whilst, for another, generalizations that say little about the context have little that is useful to say about human behaviour (Schofield, 1993). For positivists variables have to be isolated and con-trolled, and samples randomized, whilst for eth-nographers human behaviour is infinitely com-plex, irreducible, socially situated and unique.

Generalizability in naturalistic research is in-terpreted as comparability and transferability (Lincoln and Guba, 1985; Eisenhart and Howe, 1992:647). These writers suggest that it is pos-sible to assess the typicality of a situation—the participants and settings, to identify possible comparison groups, and to indicate how data might translate into different settings and cul-tures (see also LeCompte and Preissle, 1993:348). Schofield (1992:200) suggests that it is important in qualitative research to provide a clear, detailed and in-depth description so that others can decide the extent to which findings from one piece of research are generalizable to another situation, i.e. to address the twin issues of comparability and translatability. Indeed, qualitative research can be generalizable, the paper argues (p. 209), by studying the typical (for its applicability to other situations—the is-sue of transferability (LeCompte and Preissle, 1993:324)) and by performing multi-site stud-ies (e.g. Miles and Huberman, 1984), though it could be argued that this is injecting (or infect-ing!) a degree of positivism into non-positivist research. Lincoln and Guba (1985:316) caution the naturalistic researcher against this; they ar-gue that it is not the researcher’s task to provide an index of transferability; rather, they suggest, researchers should provide sufficiently rich data for the readers and users of research to

deter-mine whether transferability is possible. In this respect transferability requires thick description.

Bogdan and Biklen (1992:45) argue that generalizability, construed differently from its usage in positivist methodologies, can be ad-dressed in qualitative research. Positivist re-searchers, they argue, are more concerned to derive universal statements of general social processes rather than to provide accounts of the degree of commonality between various social settings (e.g. schools and classrooms). Bogdan and Biklen are more interested not with the is-sue of whether their findings are generalizable in the widest sense but with the question of the settings, people and situations to which they might be generalizable.

In naturalistic research threats to external va-lidity include (Lincoln and Guba, 1985:189, 300):

• selection effects (where constructs selected in fact are only relevant to a certain group);

• setting effects (where the results are largely a function of their context);

• history effects (where the situations have been arrived at by unique circumstances and, there-fore, are not comparable);

• construct effects (where the constructs being used are peculiar to a certain group).

Content validity

To demonstrate this form of validity the instru-ment must show that it fairly and comprehen-sively covers the domain or items that it pur-ports to cover. It is unlikely that each issue will be able to be addressed in its entirety simply because of the time available or respondents’

motivation to complete, for example, a long questionnaire. If this is the case, then the re-searcher must ensure that the elements of the main issue to be covered in the research are both a fair representation of the wider issue under investi-gation (and its weighting) and that the elements chosen for the research sample are themselves addressed in depth and breadth. Careful sampling of items is required to ensure their representa-tiveness. For example, if the researcher wished

DEFINING VALIDITY

to see how well a group of students could spell 1,000 words in French but decided only to have a sample of fifty words for the spelling test, then that test would have to ensure that it represented the range of spellings in the 1,000 words—

maybe by ensuring that the spelling rules had all been included or that possible spelling errors had been covered in the test in the proportions in which they occurred in the 1,000 words.

Construct validity

A construct is an abstract; this separates it from the previous types of validity which dealt in ac-tualities—defined content. In this type of valid-ity agreement is sought on the ‘operationalized’

forms of a construct, clarifying what we mean when we use this construct. Hence in this form of validity the articulation of the construct is important; is my understanding of this construct similar to that which is generally accepted to be the construct? For example, let us say that I wished to assess a child’s intelligence (assum-ing, for the sake of this example, that it is a uni-tary quality). I could say that I construed intel-ligence to be demonstrated in the ability to sharpen a pencil. How acceptable a construc-tion of intelligence is this? Is not intelligence something else (e.g. that which is demonstrated by a high result in an intelligence test)?

To establish construct validity I would need to be assured that my construction of a particu-lar issue agreed with other constructions of the same underlying issue, e.g. intelligence, creativ-ity, anxiety, motivation. This can be achieved through correlations with other measures of the issue or by rooting my construction in a wide literature search which teases out the meaning of a particular construct (i.e. a theory of what that construct is) and its constituent elements.

Demonstrating construct validity means not only confirming the construction with that given in relevant literature, but looking for counter ex-amples which might falsify my construction.

When I have balanced confirming and refuting evidence I am in a position to demonstrate

con-struct validity. I am then in a position to stipu-late what I take this construct to be. In the case of conflicting interpretations of a construct, I might have to acknowledge that conflict and then stipulate the interpretation that I shall use.

In qualitative/ethnographic research construct validity must demonstrate that the categories that the researchers are using are meaningful to the participants themselves (Eisenhart and Howe, 1992:648), i.e. that they reflect the way in which the participants actually experience and construe the situations in the research; that they see the situation through the actors’ eyes.

Campbell and Fiske (1959) and Brock-Utne (1996) suggest that convergent validity implies that different methods for researching the same construct should give a relatively high inter-cor-relation, whilst discriminant validity suggests that using similar methods for researching dif-ferent constructs should yield relatively low in-ter-correlations.

Ecological validity

In quantitative, positivist research variables are frequently isolated, controlled and manipulated in contrived settings. For qualitative, naturalis-tic research a fundamental premise is that the researcher deliberately does not try to manipu-late variables or conditions, that the situations in the research occur naturally. The intention here is to give accurate portrayals of the reali-ties of social situations in their own terms, in their natural or conventional settings. In educa-tion, ecological validity is particularly important and useful in charting how policies are actually happening ‘at the chalk face’ (Brock-Utne, 1996:617). For ecological validity to be demon-strated it is important to include and address in the research as many characteristics in, and fac-tors of, a given situation as possible. The diffi-culty for this is that the more characteristics are included and described, the more difficult it is to abide by central ethical tenets of much re-search—traceability, anonymity and non-identifiability.

Chapter 5

111

A related type of validity is the emerging no-tion of cultural validity (Morgan, 1999). This is particularly an issue in cross-cultural, inter-cul-tural and comparative kinds of research, where the intention is to shape research so that it is appropriate to the culture of the researched.

Cultural validity, Morgan (1999) suggests, ap-plies at all stages of the research, and affects its planning, implementation and dissemina-tion. It involves a degree of sensitivity to the participants, cultures and circumstances being studied.

Catalytic validity

Catalytic validity embraces the paradigm of criti-cal theory discussed in Chapter 1. Put neutrally, catalytic validity simply strives to ensure that research leads to action. However, the story does not end there, for discussions of catalytic valid-ity are substantive; like critical theory, catalytic validity suggests an agenda. Lather (1986, 1991), Kincheloe and McLaren (1994) suggest that the agenda for catalytic validity is to help participants to understand their worlds in order to transform them. The agenda is explicitly po-litical, for catalytic validity suggests the need to expose whose definitions of the situation are operating in the situation. Lincoln and Guba (1986) suggest that the criterion of ‘fairness’

should be applied to research, meaning that it should (a) augment and improve the partici-pants’ experience of the world, and (b) that it should improve the empowerment of the par-ticipants. In this respect the research might fo-cus on what might be (the leading edge of inno-vations and future trends) and what could be (the ideal, possible futures) (Schofield, 1992:209).

Catalytic validity—a major feature in femi-nist research which, Usher (1996) suggests, needs to permeate all research—requires solidarity in the participants, an ability of the research to promote emancipation, autonomy and freedom within a just, egalitarian and democratic soci-ety (Masschelein, 1991), to reveal the distor-tions, ideological deformations and limitations

that reside in research, communication and so-cial structures (see also LeCompte and Preissle, 1993). Validity, it is argued (Mishler, 1990;

Scheurich, 1996), is no longer an ahistorical given, but contestable, suggesting that the defi-nitions of valid research reside in the academic communities of the powerful. Lather (1986) calls for research to be emancipatory and to empower those who are being researched, suggesting that catalytic validity, akin to Freire’s notion of

‘conscientization’, should empower participants to understand and transform their oppressed situation.

Validity, it is proposed (Scheurich, 1996), is but a mask that in fact polices and sets boundaries to what is considered to be accept-able research by powerful research communi-ties; discourses of validity, in fact, are dis-courses of power to define worthwhile knowl-edge. Valid research, if it is to meet the de-mands of catalytic validity, must demonstrate its ability to empower the researched as well as the researchers.

How defensible it is to suggest that research-ers should have such ideological intents is, per-haps, a moot point, though not to address this area is to perpetuate inequality by omission and neglect. Catalytic validity reasserts the central-ity of ethics in the research process, for it re-quires the researcher to interrogate her alle-giances, responsibilities and self-interestedness (Burgess, 1989a).

Criterion-related validity

This form of validity endeavours to relate the results of one particular instrument to another external criterion. Within this type of validity there are two principal forms: predictive valid-ity and concurrent validvalid-ity.

Predictive validity is achieved if the data ac-quired at the first round of research correlate highly with data acquired at a future date. For example, if the results of examinations taken by 16-year-olds correlate highly with the examina-tion results gained by the same students when aged 18, then we might wish to say that the first

DEFINING VALIDITY

examination demonstrated strong predictive validity.

A variation on this theme is encountered in the notion of concurrent validity. To demonstrate this form of validity the data gathered from us-ing one instrument must correlate highly with data gathered from using another instrument.

For example, suppose I wished to research a stu-dent’s problem-solving ability. I might observe the student working on a problem, or I might talk to the student about how she is tackling the problem, or I might ask the student to write down how she tackled the problem. Here I have three different data-collecting instruments—ob-servation, interview and documentation respec-tively. If the results all agreed—concurred—that, according to given criteria for problem-solving ability, the student demonstrated a good ability to solve a problem, then I would be able to say with greater confidence (validity) that the stu-dent was good at problem-solving than if I had arrived at that judgement simply from using one instrument.

Concurrent validity is very similar to its part-ner—predictive validity—in its core concept (i.e.

agreement with a second measure); what differ-entiates concurrent and predictive validity is the absence of a time element in the former; con-currence can be demonstrated simultaneously with another instrument.

An important partner to concurrent validity, which is also a bridge into later discussions of reliability, is triangulation.

Triangulation

Triangulation may be defined as the use of two or more methods of data collection in the study of some aspect of human behaviour. It is a tech-nique of research to which many subscribe in principle, but which only a minority use in prac-tice. In its original and literal sense, triangula-tion is a technique of physical measurement:

maritime navigators, military strategists and surveyors, for example, use (or used to use) sev-eral locational markers in their endeavours to pinpoint a single spot or objective. By analogy,

triangular techniques in the social sciences at-tempt to map out, or explain more fully, the rich-ness and complexity of human behaviour by studying it from more than one standpoint and, in so doing, by making use of both quantitative and qualitative data. Triangulation is a power-ful way of demonstrating concurrent validity, particularly in qualitative research (Campbell and Fiske, 1959).

The advantages of the multimethod approach in social research are manifold and we examine two of them. First, whereas the single observa-tion in fields such as medicine, chemistry and physics normally yields sufficient and unambigu-ous information on selected phenomena, it pro-vides only a limited view of the complexity of human behaviour and of situations in which human beings interact. It has been observed that as research methods act as filters through which the environment is selectively experienced, they are never atheoretical or neutral in representing the world of experience (Smith, 1975). Exclu-sive reliance on one method, therefore, may bias or distort the researcher’s picture of the particu-lar slice of reality she is investigating. She needs to be confident that the data generated are not simply artefacts of one specific method of col-lection (Lin, 1976). And this confidence can only be achieved as far as normative research is con-cerned when different methods of data collec-tion yield substantially the same results. (Where triangulation is used in interpretive research to investigate different actors’ viewpoints, the same method, e.g. accounts, will naturally produce different sets of data.) Further, the more the methods contrast with each other, the greater the researcher’s confidence. If, for example, the outcomes of a questionnaire survey correspond to those of an observational study of the same phenomena, the more the researcher will be con-fident about the findings. Or, more extreme, where the results of a rigorous experimental in-vestigation are replicated in, say, a role-playing exercise, the researcher will experience even greater assurance. If findings are artefacts of method, then the use of contrasting methods con-siderably reduces the chances that any consistent

Chapter 5

113

findings are attributable to similarities of method (Lin, 1976).

We come now to a second advantage: some theorists have been sharply critical of the lim-ited use to which existing methods of inquiry in the social sciences have been put (Smith, 1975).

The use of triangular techniques, it is argued, will help to overcome the problem of ‘method-boundedness’, as it has been termed. One of the earliest scientists to predict such a condition was Boring, who wrote:

as long as a new construct has only the single op-erational definition that it received at birth, it is just a construct. When it gets two alternative op-erational definitions, it is beginning to be validated.

When the defining operations, because of proven correlations, are many, then it becomes reified.

(Boring, 1953) In its use of multiple methods, triangulation may utilize either normative or interpretive tech-niques; or it may draw on methods from both these approaches and use them in combination.

Types of triangulation and their characteristics

We have just seen how triangulation is charac-terized by a multi-method approach to a prob-lem in contrast to a single-method approach.

Denzin (1970) has, however, extended this view of triangulation to take in several other types as well as the multi-method kind which he terms

‘methodological triangulation’, including:

• time triangulation (expanded by Kirk and Miller (1986) to include diachronic reliabil-ity—stability over time—and synchronic re-liability—similarity of data gathered at the same time);

• space triangulation;

• combined levels of triangulation (e.g. indi-vidual, group, organization, societal);

• theoretical triangulation (drawing on alter-native theories);

• investigator triangulation (more than one observer);

• methodological triangulation (using the same method on different occasions or different methods on the same object of study).

The vast majority of studies in the social sci-ences are conducted at one point only in time, thereby ignoring the effects of social change and process. Time triangulation goes some way to rectifying these omissions by making use of cross-sectional and longitudinal approaches.

Cross-sectional studies collect data concerned with time-related processes from different groups at one point in time; longitudinal stud-ies collect data from the same group at different points in the time sequence. The use of panel studies and trend studies may also be mentioned in this connection. The former compare the same measurements for the same individuals in a sam-ple at several different points in time; and the latter examine selected processes continually over time. The weaknesses of each of these meth-ods can be strengthened by using a combined approach to a given problem.

Space triangulation attempts to overcome the limitations of studies conducted within one cul-ture or subculcul-ture. As one writer says, ‘Not only are the behavioural sciences culture-bound, they are sub-culture-bound. Yet many such scholarly works are written as if basic principles have been discovered which would hold true as tendencies in any society, anywhere, anytime’ (Smith, 1975).

Cross-cultural studies may involve the testing of theories among different people, as in Piagetian and Freudian psychology; or they may measure differences between populations by using several different measuring instruments. Levine describes how he used this strategy of convergent valida-tion in his comparative studies:

I have studied differences of achievement motiva-tion among three Nigerian ethnic groups by the analysis of dream reports, written expressions of values, and public opinion survey data. The con-vergence of findings from the diverse set of data (and samples) strengthens my conviction…that the differences among the groups are not artifacts pro-duced by measuring instruments.

(Levine, 1966) TRIANGULATION

Dalam dokumen Research Methods in Education (Halaman 122-152)