Validity is a tenn that is often invoked in decisions to use neuropsychological tests.
Unfortunately, the context of this use is usually negative, as when a test is cited as invalid.
The use of the tenn implies that a test can be determined to be either valid or invalid. Of course, most clinical neuropsychologists agree that a test that is "valid" for one population may be "invalid" for another. If this is true, can a test ever be evaluated as universally valid or invalid? A second question relates to how a test is evaluated as valid or invalid. This is a question of both method (How do we evaluate a test?) and of epistemology (How do we know what we know?). Although method may be discussed separately from epistemology, the obverse is not necessarily true. That is, how we know something is highly related to how we investigate that something. This chapter discusses general issues in the relationship between epistemology and method, and Chapter 5 discusses the methodological issues more directly.
Historically, validity in clinical neuropsychological research has involved either the demonstration that scores derived from a test can accurately separate neurologically impaired individuals from unimpaired individuals or the demonstration of a statistical rela- tionship between scores on a neuropsychological test and the results of a medical neuro- diagnostic procedure such as postmortem surgical investigations or CT scans. We say that we know the validity of a test by systematic, empirical investigation. Limiting neuro- psychological validity studies to these variables was the result of the questions posed to the neuropsychologist in the clinical setting. Clinical neuropsychological assessment did not have its own canon of methods or its own set of mature scientific principles. To a large extent, clinical neuropsychology still does not have these. However, along with the devel- opment of clinical neuropsychology as a fonn of behavioral science with unique training requirements and professional identity has come a growth in methods that, although not unique in principle, are unique in application.
These developments have made necessary the examination of the concepts of both reliability and validity as applied to clinical neuropsychological assessment. The methods for investigating these concepts are fonned partly by the nascent body of neuropsychologi- cal assumptions and principles and partly by the changing questions that are posed to the neuropsychologist in the clinical setting. Instead of being asked to localize the site of a lesion, clinical neuropsychologists are being asked to predict the limits of the behavior of a patient in the open environment or to determine whether a substantial change in skill level has occurred as the result of applications of a rehabilitation strategy.
Earlier, we stated that we know the validity of a test by empirical observations. How- 27
ever, there is a leap that needs to be made before statements can be made regarding the validity of a test. That leap is from the specific results of a particular procedure to statements regarding the test used as part of the procedure. The investigation is actually an evaluation of the conclusions drawn from the use of the test. These conclusions may relate to localization or to prediction issues, but they always depend on the procedure used and the context in which the procedures are used. These issues are usually discussed in terms of internal and external validity, and they are applied to the interpretations of the results of empirical investigations. Threats to internal validity are presented by those events or pro- cesses that cast doubt on the reasonableness of the conclusions drawn. Threats to external validity are presented by those events or processes that cast doubt on the generalizability of the results to other populations. These terms may be easily applied to the investigations of neuropsychological tests and may also be appropriate to discussions of the conclusions drawn from the use of these tests in a clinical situation. It may be misleading to speak of a test as valid or invalid when our research actually investigates specific hypotheses.
There is yet another consideration linking validity to method. In discussing person- ality constructs, Fiske (1971) argued that there was too much variance in the results when constructs are measured by different methods. Instead, Fiske proposed that the unit of analysis be the construct-operation unit. Huba and Hamilton (1976) replied that there was too much convergence among the data to support such a notion. Even though different instruments give slightly different results, they seem to share a central construct, as demon- strated by covariation among the instruments. Huba and Hamilton implicitly suggested that the best way to measure a construct is through multioperationalization; however, they did not suggest a way to concatenate the data into a single index. Fiske (1976) replied that the presence of even small variations in relationships among different methods of measur- ing constructs indicates the need to include the method as an integral part of the measuring unit.
Not all of these arguments may be applied to clinical neuropsychological assessment;
however, parts of the arguments are very pertinent to the present discussion. Clinicians are familiar with the pattern of results when a patient performs well on a test of verbal memory but not on a test of visual memory. Alternately, the patient may perform well on a test of recognition memory, but not on a test of free recall. When these results occur, clinical neuropsychologists do not generally throw up their hands and conclude that the results are due to method variance but that the construct is singular. Instead, more than one construct is used to explain the pattern of results.
Clinical neuropsychologists often attempt to delineate the actual skill or ability that is deficient by presenting a task to a patient under different conditions. A useful method for conceptualizing this set of relationships is to consider aspects of the stimulus (e.g., the sensory modality used and the potency of the stimulus compared to other stimuli in the environment), aspects of the processing required to perform the task (mental arithmetic vs.
the use of paper-and-pencil or verbal encoding vs. abstract visual encoding), and aspects of the response (motoric, verbal, and recognition). In this way, we arrive at an assessment of the ability of the individual to copy abstract line drawings and not an assessment of the construct of construction apraxia. The construct-operation unit may be specified by means of the three aspects of the behavior requested: stimulus, processing, and response. Validity investigations can then be aimed at the evaluation of the conclusions drawn from the results of a specific test procedure.
It is still important to consider the underlying central trait, that is, memory. The trait may help us to generate hypotheses that can then be tested with data. For example, knowing that spatial skills are related to certain types of mathematical skills allows us to generate some hypotheses regarding performance on mathematical tasks when a subject demon- strates spatial manipulation deficits. One of the tasks is to determine the conditions and subjects for which the relationships occur or do not occur. By focusing on the construct- operation unit, we do not as easily commit the error of assuming that the traits measured are singular. By focusing on the construct-operation unit, we can remain close to the behavioral data and can use a more parsimonious set of cognitive constructs.
THE NATURE OF VALIDITY
In the area of general psychological research, there is an unsettled debate about whether validation is tripartite or unitary. Cronbach and Meehl (1955) suggested that validity is composed of three varieties: criterion-related (comprising both predictive and concurrent validity), content validity, and construct validity. Other theorists, such as Landy (1986), have suggested that validity has a unitarian nature. In this view, validation is a multidimensional activity by which the type of validity is determined by the inference attempted. Landy suggested that validation be viewed as hypothesis testing. Taking this suggestion a step further, one concentrates on evaluating the validity of the inference rather that on the validity of the test.
Discussions of the validity of a test may be clarified by considering the conditions, populations, and types of generalization that form the parameters of the inference. Instead of stating that a given test is valid, we should state instead that it is valid (or invalid) for drawing certain conclusions when it is administered to a certain individual in a certain setting. Doing so helps to make it clear that validity has as its central concern the evaluation of hypotheses formed by attempts to generalize past the test situation.
Some people draw distinctions between constructs (such as reasoning ability) and observable facts (such as the accuracy of an individual's attempt to solve some problem).
The construct helps one to make predictions beyond the individual, the setting, and the observed behavior by hypothesizing a central commonality. In this way, we reduce the un- certainty of each new clinical question by pointing out the similarities with other previously answered questions. Concentrating on the observable fact allows greater accuracy in predicting a specific outcome. There is an obvious tradeoff in this situation: the greater the extent to which constructs are used, the greater the generality of the predictions made. On the other hand, the obverse is true of the reliance on observable facts: the greater the extent to which the predictions are restricted to observable facts, the more accurately can predic- tions be made in a given situation. The distinction is not just semantic, for the clinician is faced with a decision that has implications for the eventual validity of inferences.
There is no single answer to the question of how extensive the level of abstraction should be in naming the skills evaluated by clinical neuropsychological methods. In one sense, the task is completed during the assessment when the clinician gathers data from extratest sources, that is, clinical and collateral interviews, behavioral observations, and reviews of previous test results. The limits of the inferences drawn regarding the visual- spatial skills of a subject as determined by a test score are partly determined by data
regarding the perfonnance of the subject on real-life tasks that require visual-spatial skills.
Unfortunately, the strategy of allowing all of the limitations of generalizability to be described by extratest data removes the process from quantification and public observation.
A preferable strategy is to limit the description of the construct being assessed to the most basic level of behavioral description that can still allow generalizability to other situations but not to other skills. As a result, the inferences remain in the public scrutiny of the community of clinicians and researchers. The inferences can then be quantified and empirically evaluated.
For example, a test of visual-spatial skills may require the subject to reproduce a simple abstract line drawing after having viewed the test stimulus for lO seconds. Because the task requires memory, motor skills, and visual-spatial perception, it would be mislead- ing to label the test as purely an index of visual-spatial constructional skills. Perfonnance on the test may not generalize to situations in which memory is not required. Although it may seem more cumbersome to describe the test as assessing visual-spatial motor- reproduction skills for which short-tenn memory is required, it is less cumbersome than the theoretical excess baggage required to explain discrepancies in perfonnance between a situation that requires memory and a situation that does not require memory. The clinician can identify which skill component of the construct-operation unit is actually deficient by comparing the results of the application of various construct-operation units (tests or procedures) that vary only slightly in the content of their components. This method is similar both to Luria's method of qualifying the symptom and to Teuber's concept of double discrimination. However, to show the similarity to double discrimination, the goal of assessment must be changed from physical localization to functional localization. For example, if a patient perfonns poorly on a test of visual-recognition memory but perfonns adequately on tests of verbal recognition and verbal free recall, we might hypothesize that some aspect of visual encoding is impaired.
Limiting construct descriptions to the lowest possible level of necessary abstraction has its roots in the concept of face validity and has implications for both construct and content validity. A prerequisite for naming a test as an index of a given neuropsychological skill is that the test appear to tap the construct of interest. A test of visual-spatial constructional skills should contain tasks that require the subject to perfonn using those skills or else that require the perfonnance of skills highly related to the construct of interest, for example, drawing to command.
We return now to a consideration of the construct-operation unit. As an abstract entity, the construct is not actually measurable. When we specify the construct-operation unit, we provide both an abstract definition of the skill and a public observational system for assessing that skill. The validity of the inferences drawn from test scores is related to the similarity of the method to the demands of the environment in the perfonnance of the behavioral products of the skill. Memory tests generally assess the skill of an individual in receiving, encoding, and retrieving discrete bits of infonnation in a relatively distraction- free environment. Those particular conditions are rarely met in the free environment. As a result, predictions from test scores may be inaccurate (that is, may have limited validity) in describing the perfonnance of the subject in everyday memory tasks. By concentrating on the construct-operation unit, we explicitly accept the theoretical considerations under- lying the use of the test, namely, that memory perfonnance differs under differing levels of distraction. We may wish to devise and use two tests, one with and one without distraction
methods. When we desire to make predictions to extratest behavior, we would choose the test with the method that best approximates the conditions under which the subject is expected to perform. Or conversely, we may specify the environmental limitations and conditions under which performance of the central task is expected, for example, telling the subject to learn new material under minimal distraction.
VALIDITY AND NONNEUROLOGICAL VARIABLES
There is always a problem with omitting variables. When we omit relevant variables in our research, we consign to error variance those sources of variance that might otherwise be explained systematically. Clinical neuropsychological assessment tends to look at the score derived from a test and the possible membership in a certain class, such as a diagnostic subgroup. In doing so, it ignores the context of the evaluation, the demographic characteris- tics of the subject, the learning history of the subject, and the influence of conative variables such as level of motivation or affective state. This occurs even though theory states that these variables are important. The role of these variables is relegated to the domain of clinical inference, intuition, or decision. The price paid for this omission ranges from unfortunate (in the form of lowered validity coefficients) to inexcusable (in the form of misleading information resulting in disservice to the patient). These variables can be theo- retically argued to have import in the decisions regarding the validity of inferences drawn from test scores; however, in reality, the import of these variables is an empirical question yet to be answered.
At this point in the development of clinical neuropsychological assessment as a scientific endeavor, it may not be possible to directly specify the effects of nonneurological variables on assessment results. It is still necessary to attempt to delineate these effects. We are evaluating the validity of inferences drawn from test results. Therefore, we need to rule out the extraneous effects of conative variables, or else we need to make some statements regarding the likely effects of these variables. Table 4.1 describes a model for determining the possible relevant variables. The variables are divided into three major classes: examiner variables, contextual variables, and subject variables.
The effects of these variables may be different for different levels of other variables in the model. That is, the variables may have moderating effects on each other. An obvious example would be the gender of the examiner, which may have different effects, depending on the gender and the learning history of the subject. Again, these are all empirical questions that need to be addressed if we are to increase the validity of the inferences drawn
TABLE 4.1. Variables Affecting Test Results Situation
Setting
Reason for assessment (perceived objective, rationale provided)
Forensic
For child: school versus medical setting
Subject Gender History IQ Occupation Reaction to examiner
Examiner Gender
Voice inflection
Level of skill and experience
from our test results. It is likely that there are other variables that need to be placed on the table. More conceptual as well as empirical work needs to be done.
Fiske (1978) observed that most psychologists recognize (or pay lip service to) the importance of the person-situation interaction. His comments were made in the context of discussing personality assessment, but some of the same considerations apply to clinical neuropsychological assessment. It is not sufficient to state that anxiety plays a role in the assessment of memory functions or that forensic settings affect test results. It would be better, instead, to make the person-situation interaction the focus of our investigations. By stating and experimentally controlling the situations in which assessment takes place, we bring the moderating influence of these situations into the arena of public scrutiny, and the result is greater agreement regarding the confirmation or disconfirmation of the inferences drawn.
Every person is different. Each of us has different levels of skills as the result of genetic heterogeneity, different learning histories, and differing current states. However, faced with these differences, we should not throw up our hands at the insurmountability of the task. As Fiske (1978) recommended, it would be more productive to attempt to deter- mine whether any regularities exist in the phenomenon under study and to uncover the conditions in which these regularities exist.
CONCLUSIONS
Validity is a term that is better applied to inferences than to tests. We can know little about the validity of a test, but we can know the accuracy of the hypotheses and inferences associated with its use. It may be difficult to change the language of a profession, but doing so could have beneficial effects on our use of test instruments. When we focus on the validity of inferences, we draw attention to the construct-operation unit. In essence, we become more behavior-minded; our conclusions are limited by the characteristics of the observed data rather than by references to categorical abstracts. In addition, when we focus on the validity of inferences, we focus on the decision-making processes of the clinician.
No test can be valid in the hands of an inadequately trained clinician. We also become more aware of the nonneurological variables impinging on test performance. Much work needs to be done, but we can be cheered and motivated by the fact that much work has already been done. We know some of the basics regarding the different performance by diagnostic groups on certain tests. We now need to determine the basics regarding performance that differs because of other variables.