Neurobehavioral Tests: Problems, Potential, and Prospects

J. Graham Beaumont

There seems to be general agreement that any monitoring of the effects of environmental and occupational exposure to neurotoxins should include behavioral measures. An important element in the effects of known toxins is the response of the nervous system, including peripheral sensory and motor components and higher central effects upon the function of the forebrain. This response has clear behavioral aspects following gross acute exposure and significant chronic exposure to a range of neurotoxins. There are considered to be more subtle behavioral effects of less severe acute exposure or of sustained exposure to lower levels of the relevant substances.

The assessment of behavioral effects is considered to be the primary approach to the systematic monitoring of neurotoxic exposure, and where mass screening is considered for large populations at risk, it may be the only practicable approach, at least for initial selection. It is obvious that automated screening by the use of computer-based assessment could contribute significantly to the development of appropriate techniques.

The essential context for the adoption of acceptable assessment techniques is that the potential behavioral changes should have been identified and reliable measures of these changes should be available, which have been demonstrated to be valid, and for which appropriate normative data are available.

It may also be desirable that the test be stable under conditions of repeated testing. Particularly when relatively subtle changes, with a

NEUROBEHAVIORAL TESTS: PROBLEMS, POTENTIAL, AND PROSPECTS 86

al representation of the original work has been recomposed from XML files created from the original paper book, not from the are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution.

Behavioral Measures of Neurotoxicity http://www.nap.edu/catalog/1352.html

low base-rate in the population (as may be typical of mass screening), are to be detected, it is essential that the validity (and therefore the reliability) be exceptionally high.

None of this is in conflict with the preceding chapters, indeed there is remarkable agreement as to the current state of the field, the methodological principles that apply, and the standards that should be adopted. Areas in which there is some potential disagreement are as to whether the currently available tests are sufficient for their purposes, and whether the introduction of new test instruments is to be encouraged. This chapter therefore concentrates principally upon those issues.

TESTS CURRENTLY IN USE

The last three chapters have covered the history and description of current tests, with particularly helpful tabulations by Hanninen and Anger, and it would be redundant to repeat much of this material.

It is worth, however, drawing attention to the version of the World Health Organization's Neurobehavioral Core Test Battery (WHO-NCTB) in a computer-based form developed by the Institute of Occupational Health at the University of Milan. The battery is much as in its original form except that the Santa Ana Rotation Test of the original battery has, for pragmatic reasons, been replaced by a test which assesses rather different cognitive functions, and the modality of the Digit Span task has been changed in a way that is known to alter the cognitive functions involved (Beaumont, 1985).

A preliminary study of the psychometric characteristics of this implementation of the NCTB has been reported (Camerino, 1987). This indicates that there are some serious questions concerning the validity of these tests in terms of their suitability for the assessment purposes under consideration.

A group of 30 volunteers—young, relatively well-educated adults—were retested at weekly intervals on certain of the tests [excluding Benton Visual Retention Test (VRT) and Aiming Pursuit], and estimates of the reliability and validity of the measures were made from the results. It is a little unclear what reliability should be expected from an instrument that assesses mood "over the past week," given at weekly intervals: the range of values of r from -0.24 to +0.87 on the various individual scales is probably not remarkable. The reliabilities on the cognitive tasks are more acceptable, being in the range 0.62 to 0.89, if reaction time (RT) variability and Digit Learning are excluded.

The reliabilities of the Digit Learning test at 0.40 and 0.19 (for occasions 1–2, 2–3, respectively) are clearly quite inadequate and suggest that the test should be abandoned as part of this assessment.

NEUROBEHAVIORAL TESTS: PROBLEMS, POTENTIAL, AND PROSPECTS 87

his new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be pographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution.

Behavioral Measures of Neurotoxicity http://www.nap.edu/catalog/1352.html

Correlations were also calculated with paper-and-pencil versions, as a crude measure of construct validity. Correlations were modest ranging from 0.55 (Serial Digit) to 0.79 (Benton VRT). These values are not atypical of values that might be expected on psychometric tests of this type.

However, these results do raise certain doubts about the psychometric suitability of these tests to the purposes for which they are being employed. For purposes of debate, assume that the validity of the measures is on average about 0.75. This is probably rather generous: reliability limits the upper extent of validity, and reliabilities are in some cases below this level. In addition, the sample employed was likely to provide relatively high levels of reliability and validity. At this level of validity, if we are trying to identify pathological effects which are present in 50 percent of those tested, the best that the test can theoretically achieve is 77 percent correct classification of the test subjects. In practice, a much more unfavorable base-rate of the condition is likely to apply in the test population. If the incidence to be detected falls to 1 in 10, the theoretical maximum achievement of the test will be 90 percent overall correct classification, but of those affected only 50 percent will be correctly identified.

Of those achieving "positive" results on the test, half will be misclassified because they are false positives. As the base-rate or the validity falls, these statistics become even more unacceptable. It should be clear that in psychometric terms, these tests as implemented in the Milan study are insufficiently powerful to allow any valid assessment of the neurobehavioral functions under study.

This must be a serious concern because (quite reasonably) the NCTB has been adopted in a number of centers around the world. Swedish studies conducted at the National Board of Safety and Health (Iregren, 1986) have used these tasks among a battery of others administered in both traditional and computer-based formats, as well as some other automated modes. The computer-based tests include Memory Reproduction (letter and digit sequences, rather like Digit Span), Simple and Choice Reaction Time, and Color Word Vigilance, and others are under development. Studies conducted with the full range of tests have demonstrated some significant interesting findings between criterion groups selected for contrast on relevant variables, mostly relating to solvent exposure. Some reliability data are reported by Iregren in this volume.

The methodological rigor of this approach is to be applauded, and the data show some reliabilities for certain of the assessments significantly higher than the Milan data for their battery. Nevertheless, with assessments of higher cognitive functions and of affect, the psychometric adequacy of the instruments remains a problem.

NEUROBEHAVIORAL TESTS: PROBLEMS, POTENTIAL, AND PROSPECTS 88

Behavioral Measures of Neurotoxicity http://www.nap.edu/catalog/1352.html

A battery that shares some provenance with the WHO-NCTB, although it strictly just predates it, is Baker and Letz's Neurobehavioral Evaluation System (NES). The NES has been adopted by a major study being carried out by the Institute of Occupational Health in Birmingham, United Kingdom (Spurgeon and Harrington, 1987). This study will use the Clinical Interview Schedule together with the Hogstedt Symptom Questionnaire, Stress and Arousal Checklist, Cognitive Failures Questionnaire, and Prospective Memory Test, in addition to the NES tests. At present only preliminary pilot data are available.

Of course there have been a large number of other studies published in the literature which have employed a wide variety of tests. A survey of the literature of the effects of lead on intelligence reveals the WISC-R to be the most popular test in a traditional format to have been employed in this research (Yule and Rutter, 1985). A great variety of more specific tests of individual functions have also been employed (Anger, 1985).

Further contributions concerning the use of computer-based assessment in this domain are to be found in Braconnier (1985). A useful collection of papers concerned more generally with the issues raised by computer-based assessment appeared in Applied Psychology (e.g., see Huba, 1987).

The preceding chapters seem to be in agreement that (1) there are problems evident in the construction of various batteries, (2) most of the tests currently in use are relatively inadequate, and (3) there is poverty in the current psychological descriptions of neurotoxic syndromes.

SOME SPECIFIC POINTS

Some specific points made in the preceding chapters are highlighted here.

Methods in Behavioral Toxicology (Hanninen)

The problems concerning the definition and description of the neurotoxic deficit are well taken: this is clearly of crucial significance for any advance in the field and emphasizes the need for more fundamental research into the cognitive processes affected.

The suggestion that the in-depth study of individual patients might be profitable is also a valuable one. There are now many good single case experimental designs that might be appropriately deployed in this area, and they should be considered in order to further clarify the description of the relative deficits.

The dilemma that Hanninen discusses between the "conservative"

NEUROBEHAVIORAL TESTS: PROBLEMS, POTENTIAL, AND PROSPECTS 89

Behavioral Measures of Neurotoxicity http://www.nap.edu/catalog/1352.html

and "progressive" approaches is a real one and, to some extent, is fundamental to much of the discussion that follows. It is of central importance to decide whether to make the best of the rather poor tests that are currently in use, or whether to adopt a more radical reevaluation of current tests and the potential new instruments that might be created.

Current Status of Test Development (Williamson)

Williamson sensibly highlights the potential for tests that relate explicitly to psychological theory (although the distinction between those that relate to

"cognitive structure" and those that are "theory-based" may not be so easy to sustain). If it becomes possible to elaborate our understanding of the psychological processes (and, perhaps, as a contribution to that understanding), there is obvious merit in the use of such tests.

The "potential barrier" of computer-based testing must be taken seriously.

There is clearly no value in developing computer-based tests if they confer few advantages, introduce extraneous sources of error, and hinder the wide application of tests. There may be benefits from the application of computers that outweigh these disadvantages—at least in parts of the world where they can practicably be used—but it is important to be clear about the advantages in any given case.

The need for more basic research is again emphasized, and the proposal that the adaptive nature of some of the changes which take place be considered may be a particularly useful insight.

Human Neurobehavioral Tests (Anger)

Anger's useful and authoritative view is clear and correct about the potential contribution that the test batteries may make in this field. It is necessary, however, to ensure that this potential is realized in practice. It is certainly possible that the relevant changes could be detected. It is much less certain that current batteries are capable of detecting the changes (and some reason to believe that they are not).

The case also has to be argued more clearly for the value of cross-cultural data collection. It is naturally important, indeed essential, that appropriate local norms be available. However, given that there are inevitably differences among cultures in education, cognitive processes, cultural experience, exposure to testing and test materials, and even (some believe) in intelligence, test performance will differ in different cultures and subcultural contexts. In this

NEUROBEHAVIORAL TESTS: PROBLEMS, POTENTIAL, AND PROSPECTS 90

Behavioral Measures of Neurotoxicity http://www.nap.edu/catalog/1352.html

The results will be difficult, perhaps impossible, to interpret, and little will have been gained by international comparisons. The idea of a worldwide pool of test results may be superficially attractive, yet not based in the psychometric realities of the situation.

THE ADEQUACY OF CURRENT TESTS

The problems inherent in current assessment batteries appear to be twofold. First, the tests employed have been selected on the basis of their previous use in experimental studies of the effects of exposure to neurotoxins. It is natural that, when a test has been shown to distinguish between a criterion group of exposed individuals and a control group, this test should be considered suitable for inclusion in an assessment battery. This is, however, not necessarily the case. Only if the test can be shown to have sufficient psychometric power for the role of general screening can it be considered useful in this way. It is important throughout to maintain a careful distinction between tests that are useful for group experiments and those that may be used for individual screening.

Second, there is a temptation to select tests that are generally considered to be capable of indicating central nervous system (CNS) dysfunction. Here the temptation has been to take tests that are believed capable of revealing the effects of dementia, cerebral disease, or gross trauma, and to adopt them for detection of the effects of neurotoxins. This procedure is open to two misconceptions: that the effects of neurotoxins will be the same (in cognitive terms) as the effects of dementia, cerebral disease, or trauma, and that there are tests capable of simply discriminating among these other disorders. There seems to be little basis for accepting either of these proposals. It is unlikely that CNS poisoning is similar in its effects to other cerebral pathology, any more than the similarity between, say, dementia and trauma. The history of neuropsychology is littered with failed attempts to identify, by means of a single measure or small group of measures, general cerebral pathology. In particular, if the effects are relatively diffuse, the problem is especially difficult.

An example is the difficulty of distinguishing, by cognitive measures alone, dementia of the Alzheimer type in the elderly—at least in its early stages—from either functional psychiatric illness or acute systemic illnesses. Much the same problem must apply to the effects of neurotoxins.

It is therefore not surprising that the battery of tests now generally employed is not of strong validity and is probably inadequate for the general detection of the behavioral effects of neurotoxins. There is

NEUROBEHAVIORAL TESTS: PROBLEMS, POTENTIAL, AND PROSPECTS 91

Behavioral Measures of Neurotoxicity http://www.nap.edu/catalog/1352.html

simply insufficient power in the basic psychological instruments being employed.

The critical problem is the psychometric power of the tests, and the critical question is, Is the WHO-NCTB (including related batteries such as the NES) adequate to the task? It is important at this point to be clear as to what the task is

—either to conduct group experiments or to undertake individual screening.

If the task is to investigate the differences between criterion groups, then the NCTB may be adequate to the task. Its psychometric power is still weak, and there might well be better tools available. It is probably, as a psychometric instrument, best described as ''premature.'' Nevertheless, the fact that it is available, and already quite widely adopted, is of some importance, and it is clearly capable of discriminating between carefully selected groups under favorable conditions. Its use is certainly justified in this context, although efforts should be made to dramatically increase the size of the standardization samples available and to improve the basic reliability of the tests. In the context of such studies using the NCTB, it might be that computers are an impediment and that administration in the standard form is to be preferred.

However, if the aim is to carry out screening for exposed and affected individuals, the NCTB is likely to be quite inadequate on psychometric grounds. As discussed above, the available data suggest that the battery is not reliable enough to permit sufficiently accurate classification of affected and nonaffected individuals.

This implies that if screening is a goal of the research (or if significant improvements are to be made in the sensitivity of the tests for detecting differences between criterion groups), then the whole basis of the assessments currently employed needs to be reexamined. Better fundamental research is needed to generate a psychological description of the deficits and better models of the effects which can be related to that description. In achieving this it may well be advantageous to make better use of new developments in psychometrics and in the explicit models of cognitive performance. It is at this point that computers might well be introduced. One way in which this might be done is described below.

SOME NEW DEVELOPMENTS IN COMPUTER-BASED ASSESSMENT

It seems worth inquiring whether there are alternative approaches that could potentially provide more satisfactory solutions to the assessment of cognitive performance. There seem to be at least two potentially

NEUROBEHAVIORAL TESTS: PROBLEMS, POTENTIAL, AND PROSPECTS 92

Behavioral Measures of Neurotoxicity http://www.nap.edu/catalog/1352.html

Dalam dokumen Behavioral Measures of Neurotoxicity.pdf (Halaman 103-116)