There are many different types of validity and reli- ability. Threats to validity and reliability can never be erased completely; rather the effects of these threats can be attenuated by attention to validity and reliability throughout a piece of research.
This chapter discusses validity and reliability in quantitative and qualitative, naturalistic research.
It suggests that both of these terms can be applied to these two types of research, though how validity and reliability are addressed in these two approaches varies. Finally validity and reliability are addressed, using different instruments for data collection. It is suggested that reliability is a necessary but insufficient condition for validity in research; reliability is a necessary precondition of validity, and validity may be a sufficient but not necessary condition for reliability. Brock- Utne (1996: 612) contends that the widely held view that reliability is the sole preserve of quantitative research has to be exploded, and this chapter demonstrates the significance of her view.
Defining validity
Validity is an important key to effective re- search. If a piece of research is invalid then it is worthless. Validity is thus a requirement for both quantitative and qualitative/naturalistic re- search (see http://www.routledge.com/textbooks/
9780415368780 – Chapter 6, file 6.1. ppt).
While earlier versions of validity were based on the view that it was essentially a demonstration that a particular instrument in fact measures what it purports to measure, more recently validity has taken many forms. For example, in qualitative data validity might be addressed through the honesty, depth, richness and scope of the data achieved, the
participants approached, the extent of triangula- tion and the disinterestedness or objectivity of the researcher (Winter 2000). In quantitative data va- lidity might be improved through careful sampling, appropriate instrumentation and appropriate sta- tistical treatments of the data. It is impossible for research to be 100 per cent valid; that is the optimism of perfection. Quantitative research pos- sesses a measure of standard error which is inbuilt and which has to be acknowledged. In qualita- tive data the subjectivity of respondents, their opinions, attitudes and perspectives together con- tribute to a degree of bias. Validity, then, should be seen as a matter of degree rather than as an absolute state (Gronlund 1981). Hence at best we strive to minimize invalidity and maximize validity.
There are several different kinds of va- lidity (see http://www.routledge.com/textbooks/
9780415368780 – Chapter 6, file 6.2. ppt):
O content validity
O criterion-related validity
O construct validity
O internal validity
O external validity
O concurrent validity
O face validity
O jury validity
O predictive validity
O consequential validity
O systemic validity
O catalytic validity
O ecological validity
O cultural validity
O descriptive validity
O interpretive validity
O theoretical validity
O evaluative validity.
It is not our intention in this chapter to discuss all of these terms in depth. Rather the main types of validity will be addressed. The argument will be made that, while some of these terms are more comfortably the preserve of quantitative methodologies, this is not exclusively the case. Indeed, validity is the touchstone of all types of educational research. That said, it is important that validity in different research traditions is faithful to those traditions; it would be absurd to declare a piece of research invalid if it were not striving to meet certain kinds of validity, e.g. generalizability, replicability and controllability. Hence the researcher will need to locate discussions of validity within the research paradigm that is being used. This is not to suggest, however, that research should be paradigm-bound, that is a recipe for stagnation and conservatism. Nevertheless, validity must be faithful to its premises and positivist research has to be faithful to positivist principles, for example:
O controllability
O replicability
O predictability
O the derivation of laws and universal statements of behaviour
O context-freedom
O fragmentation and atomization of research
O randomization of samples
O observability.
By way of contrast, naturalistic research has several principles (Lincoln and Guba 1985;
Bogdan and Biklen, 1992):
O The natural setting is the principal source of data.
O Context-boundedness and ‘thick description’
are important.
O Data are socially situated, and socially and culturally saturated.
O The researcher is part of the researched world.
O As we live in an already interpreted world, a doubly hermeneutic exercise (Giddens 1979) is necessary to understand others’ understandings of the world; the paradox here is that the most
sufficiently complex instrument to understand human life is another human (Lave and Kvale 1995: 220), but that this risks human error in all its forms.
O There should be holism in the research.
O The researcher – rather than a research tool – is the key instrument of research.
O The data are descriptive.
O There is a concern for processes rather than simply with outcomes.
O Data are analysed inductively rather than using a priori categories.
O Data are presented in terms of the respondents rather than researchers.
O Seeing and reporting the situation should be through the eyes of participants – from the native’s point of view (Geertz 1974).
O Respondent validation is important.
O Catching meaning and intention are essential.
Indeed Maxwell (1992) argues that qualitative researchers need to be cautious not to be working within the agenda of the positivists in arguing for the need for research to demonstrate concurrent, predictive, convergent, criterion- related, internal and external validity. The discussion below indicates that this need not be so. He argues, with Guba and Lincoln (1989), for the need to replace positivist notions of validity in qualitative research with the notion of authenticity. Maxwell (1992), echoing Mishler (1990), suggests that ‘understanding’ is a more suitable term than ‘validity’ in qualitative research.
We, as researchers, are part of the world that we are researching, and we cannot be completely objective about that, hence other people’s perspectives are equally as valid as our own, and the task of research is to uncover these. Validity, then, attaches to accounts, not to data or methods (Hammersley and Atkinson 1983); it is the meaning that subjects give to data and inferences drawn from the data that are important. ‘Fidelity’ (Blumenfeld-Jones 1995) requires the researcher to be as honest as possible to the self-reporting of the researched.
The claim is made (Agar 1993) that, in qualitative data collection, the intensive personal
DEFINING VALIDITY 135
Chapter 6
involvement and in-depth responses of individuals secure a sufficient level of validity and reliability.
This claim is contested by Hammersley (1992:
144) and Silverman (1993: 153), who argue that these are insufficient grounds for validity and reliability, and that the individuals concerned have no privileged position on interpretation. (Of course, neither are actors ‘cultural dopes’ who need a sociologist or researcher to tell them what is ‘really’ happening!) Silverman (1993) argues that, while immediacy and authenticity make for interesting journalism, ethnography must have more rigorous notions of validity and reliability. This involves moving beyond selecting data simply to fit a preconceived or ideal conception of the phenomenon or because they are spectacularly interesting (Fielding and Fielding 1986). Data selected must be representative of the sample, the whole data set, the field, i.e. they must address content, construct and concurrent validity.
Hammersley (1992: 50–1) suggests that validity in qualitative research replaces certainty with confidence in our results, and that, as reality is in- dependent of the claims made for it by researchers, our accounts will be only representations of that reality rather than reproductions of it.
Maxwell (1992) argues for five kinds of validity in qualitative methods that explore his notion of
‘understanding’:
O Descriptive validity (the factual accuracy of the account, that it is not made up, selective or distorted): in this respect validity subsumes reliability; it is akin to Blumenfeld-Jones’s (1995) notion of ‘truth’ in research – what actually happened (objectively factual).
O Interpretive validity (the ability of the research to catch the meaning, interpretations, terms, intentions that situations and events, i.e. data, have for the participants/subjects themselves, in their terms): it is akin to Blumenfeld-Jones’s (1995) notion of ‘fidelity’ – what it means to the researched person or group (subjectively meaningful); interpretive validity has no clear counterpart in experimental/positivist methodologies.
O Theoretical validity (the theoretical construc- tions that the researcher brings to the research, including those of the researched): theory here is regarded as explanation. Theoretical validity is the extent to which the research explains phenomena; in this respect is it akin to con- struct validity (discussed below); in theoretical validity the constructs are those of all the participants.
O Generalizability (the view that the theory generated may be useful in understanding other similar situations): generalizing here refers to generalizing within specific groups or communities, situations or circumstances validly and, beyond, to specific outsider communities, situations or circumstances (external validity); internal validity has greater significance here than external validity.
O Evaluative validity (the application of an eval- uative, judgemental of that which is being researched, rather than a descriptive, explana- tory or interpretive framework). Clearly this resonates with critical-theoretical perspectives, in that the researcher’s own evaluative agenda might intrude.
Both qualitative and quantitative methods can address internal and external validity.
Internal validity
Internal validity seeks to demonstrate that the explanation of a particular event, issue or set of data which a piece of research provides can actually be sustained by the data. In some degree this concerns accuracy, which can be applied to quantitative and qualitative research. The findings must describe accurately the phenomena being researched.
In ethnographic research internal validity can be addressed in several ways (LeCompte and Preissle 1993: 338):
O using low-inference descriptors
O using multiple researchers
O using participant researchers
O using peer examination of data
O using mechanical means to record, store and retrieve data.
In ethnographic, qualitative research there are several overriding kinds of internal validity (LeCompte and Preissle 1993: 323–4):
O confidence in the data
O the authenticity of the data (the ability of the research to report a situation through the eyes of the participants)
O the cogency of the data
O the soundness of the research design
O the credibility of the data
O the auditability of the data
O the dependability of the data
O the confirmability of the data.
LeCompte and Preissle (1993) provide greater detail on the issue of authenticity, arguing for the following:
O Fairness: there should be a complete and balanced representation of the multiple realities in, and constructions of, a situation.
O Ontological authenticity: the research should provide a fresh and more sophisticated understanding of a situation, e.g. making the familiar strange, a significant feature in reducing ‘cultural blindness’ in a researcher, a problem which might be encountered in moving from being a participant to being an observer (Brock-Utne 1996: 610).
O Educative authenticity: the research should generate a new appreciation of these understandings.
O Catalytic authenticity: the research gives rise to specific courses of action.
O Tactical authenticity: the research should bring benefit to all involved – the ethical issue of
‘beneficence’.
Hammersley (1992: 71) suggests that internal validity for qualitative data requires attention to
O plausibility and credibility
O the kinds and amounts of evidence required (such that the greater the claim that is being made, the more convincing the evidence has to be for that claim)
O clarity on the kinds of claim made from the research (e.g. definitional, descriptive, explanatory, theory generative).
Lincoln and Guba (1985: 219, 301) suggest that credibility in naturalistic inquiry can be addressed by
O Prolonged engagement in the field.
O Persistent observation: in order to establish the relevance of the characteristics for the focus.
O Triangulation: of methods, sources, investiga- tors and theories.
O Peer debriefing: exposing oneself to a dis- interested peer in a manner akin to cross-examination, in order to test honesty, working hypotheses and to identify the next steps in the research.
O Negative case analysis: in order to establish a theory that fits every case, revising hypotheses retrospectively.
O Member checking: respondent validation, to assess intentionality, to correct factual errors, to offer respondents the opportunity to add further information or to put information on record; to provide summaries and to check the adequacy of the analysis.
Whereas in positivist research history and maturation are viewed as threats to the validity of the research, ethnographic research simply assumes that this will happen; ethnographic research allows for change over time – it builds it in. Internal validity in ethnographic research is also addressed by the reduction of observer effects by having the observers sample both widely and staying in the situation for such a long time that their presence is taken for granted.
Further, by tracking and storing information clearly, it is possible for the ethnographer to eliminate rival explanations of events and situations.
External validity
External validity refers to the degree to which the results can be generalized to the wider population, cases or situations. The issue of
DEFINING VALIDITY 137
Chapter 6
generalization is problematical. For positivist researchers generalizability is a sine qua non, while this is attenuated in naturalistic research. For one school of thought, generalizability through stripping out contextual variables is fundamental, while, for another, generalizations that say little about the context have little that is useful to say about human behaviour (Schofield 1990). For positivists variables have to be isolated and controlled, and samples randomized, while for ethnographers human behaviour is infinitely complex, irreducible, socially situated and unique.
Generalizability in naturalistic research is interpreted as comparability and transferability (Lincoln and Guba 1985; Eisenhart and Howe 1992: 647). These writers suggest that it is possible to assess the typicality of a situation – the participants and settings, to identify possible comparison groups, and to indicate how data might translate into different settings and cultures (see also LeCompte and Preissle 1993: 348). Schofield (1990: 200) suggests that it is important in qualitative research to provide a clear, detailed and in-depth description so that others can decide the extent to which findings from one piece of research are generalizable to another situation, i.e. to address the twin issues of comparability and translatability. Indeed, qualitative research can be generalizable (Schofield 1990: 209), by studying the typical (for its applicability to other situations – the issue of transferability: LeCompte and Preissle 1993: 324) and by performing multi-site studies (e.g. Miles and Huberman 1984), though it could be argued that this is injecting a degree of positivism into non-positivist research. Lincoln and Guba (1985: 316) caution the naturalistic researcher against this; they argue that it is not the researcher’s task to provide an index of transferability; rather, they suggest, researchers should provide sufficiently rich data for the readers and users of research to determine whether transferability is possible. In this respect transferability requires thick description.
Bogdan and Biklen (1992: 45) argue that generalizability, construed differently from its usage in positivist methodologies, can be addressed
in qualitative research. Positivist researchers, they argue, are more concerned to derive universal statements of general social processes rather than to provide accounts of the degree of commonality between various social settings (e.g. schools and classrooms). Bogdan and Biklen (1992) are more interested not with the issue of whether their findings are generalizable in the widest sense but with the question of the settings, people and situations to which they might be generalizable.
In naturalistic research threats to external validity include (Lincoln and Guba 1985:
189, 300):
O selection effects: where constructs selected in fact are only relevant to a certain group
O setting effects: where the results are largely a function of their context
O history effects: where the situations have been arrived at by unique circumstances and, therefore, are not comparable
O construct effects: where the constructs being used are peculiar to a certain group.
Content validity
To demonstrate this form of validity the instrument must show that it fairly and comprehensively covers the domain or items that it purports to cover. It is unlikely that each issue will be able to be addressed in its entirety simply because of the time available or respondents’ motivation to complete, for example, a long questionnaire. If this is the case, then the researcher must ensure that the elements of the main issue to be covered in the research are both a fair representation of the wider issue under investigation (and its weighting) and that the elements chosen for the research sample are themselves addressed in depth and breadth.
Careful sampling of items is required to ensure their representativeness. For example, if the researcher wished to see how well a group of students could spell 1,000 words in French but decided to have a sample of only 50 words for the spelling test, then that test would have to ensure that it represented the range of spellings in the 1,000 words – maybe by ensuring that the spelling rules had all been
included or that possible spelling errors had been covered in the test in the proportions in which they occurred in the 1,000 words.
Construct validity
A construct is an abstract; this separates it from the previous types of validity which dealt in actualities – defined content. In this type of validity agreement is sought on the
‘operationalized’ forms of a construct, clarifying what we mean when we use this construct.
Hence in this form of validity the articulation of the construct is important; is the researcher’s understanding of this construct similar to that which is generally accepted to be the construct?
For example, let us say that the researcher wished to assess a child’s intelligence (assuming, for the sake of this example, that it is a unitary quality).
The researcher could say that he or she construed intelligence to be demonstrated in the ability to sharpen a pencil. How acceptable a construction of intelligence is this? Is not intelligence something else (e.g. that which is demonstrated by a high result in an intelligence test)?
To establish construct validity the researcher would need to be assured that his or her construction of a particular issue agreed with other constructions of the same underlying issue, e.g. intelligence, creativity, anxiety, motivation.
This can be achieved through correlations with other measures of the issue or by rooting the researcher’s construction in a wide literature search which teases out the meaning of a particular construct (i.e. a theory of what that construct is) and its constituent elements. Demonstrating construct validity means not only confirming the construction with that given in relevant literature, but also looking for counter-examples which might falsify the researcher’s construction. When the confirming and refuting evidence is balanced, the researcher is in a position to demonstrate construct validity, and can stipulate what he or she takes this construct to be. In the case of conflicting interpretations of a construct, the researcher might have to acknowledge that conflict and
then stipulate the interpretation that will be used.
In qualitative/ethnographic research construct validity must demonstrate that the categories that the researchers are using are meaningful to the participants themselves (Eisenhart and Howe 1992:
648), i.e. that they reflect the way in which the participants actually experience and construe the situations in the research, that they see the situation through the actors’ eyes.
Campbell and Fiske (1959), Brock-Utne (1996) and Cooper and Schindler (2001) suggest that construct validity is addressed by convergent and discriminant techniques. Convergent techniques imply that different methods for researching the same construct should give a relatively high inter-correlation, while discriminant techniques suggest that using similar methods for researching different constructs should yield relatively low inter-correlations, i.e. that the construct in question is different from other potentially similar constructs. Such discriminant validity can also be yielded by factor analysis, which clusters together similar issues and separates them from others.
Ecological validity
In quantitative, positivist research variables are frequently isolated, controlled and manipulated in contrived settings. For qualitative, naturalistic research a fundamental premise is that the researcher deliberately does not try to manipulate variables or conditions, that the situations in the research occur naturally. The intention here is to give accurate portrayals of the realities of social situations in their own terms, in their natural or conventional settings. In education, ecological validity is particularly important and useful in charting how policies are actually happening ‘at the chalk face’ (Brock-Utne 1996:
617). For ecological validity to be demonstrated it is important to include and address in the research as many characteristics in, and factors of, a given situation as possible. The difficulty for this is that the more characteristics are included and described, the more difficult it