TEACHING EX PERIMENTAL DESIGN TO BIOLOGISTS
James F. Zolman
Depa rtm ent of Physiology, University of Kentuck y College of Medicine, Lexington, Kentuck y 40536-0298
T
The teaching of research design and data analysis to our graduate students has been a persistent problem. A course is described in which students, early in their graduate training, obtain extensive practice in designing experiments and interpreting data. Lecture-discussions on the essentials of biostatistics are given, and then these essentials are repeatedly reviewed by illustrating their applications and misapplica-tions in numerous research design problems. Students critique these designs and prepare similar problems for peer evaluation. In most problems the treatments are confounded by extraneous variables, proper controls may be absent, or data analysis may be incorrect. For each problem, students must decide whether the researchers’ conclusions are valid and, if not, must identify a fatal experimental flaw. Students learn that an experiment is a well-conceived plan for data collection, analysis, and interpretation. They enjoy the interactive evaluations of research designs and appreciate the repetitive review of common flaws in different experiments. They also benefit from their practice in scientific writing and in critically evaluating their peers’ designs.AM. J. PHYSIOL. 277 (ADV. PHYSIOL. EDUC. 22): S111–S118, 1999.
Key words: biostatistics; designing experiments; analyzing and interpreting data; experimental flaws; writing abstracts; peer reviewing
Many assessment reviews have found that much of biological research is flawed because of incorrect conclusions drawn from confounded experimental designs and the misuse of simple statistical methods (1, 2, 8, 14, 22). These flaws are relatively simple mistakes such as not randomly allocating experimen-tal units to treatments, neglecting to include a control group or condition, or misapplying statistical models to data (e.g., having correlated observations when the applied model requires independent observations). Along with a decline in the rigor and quality of research (3, 11, 19), these errors result in significant research costs. Money is spent and time is taken to collect data that may not be replicated because of extraneous confounds or may not be correctly inter-preted because inappropriate statistical models have been unknowing applied.
testing in which an explanation based on chance factors (null hypothesis) must be rejected to assert a possible effect of a treatment (alternative hypothesis). These problems of uncertainty, terminology, abstrac-tion, and quirky logic can only be minimally addressed by nonstatisticians (5, 21).
The other two major reasons for general research flaws that we, as scientists, can dramatically change are 1) our erroneous separation of data collection (what we wrongfully believe is a real experiment) from statistical analysis and interpretation (what we wrongfully assume are only computations of test statistics) and 2) the limited practice our students receive in designing experiments and critically evaluat-ing the experiments of others. Too often, statistical methods are not included in the design of a study. In these cases data collection is usually completed before suitable data analysis procedures are even considered. In addition, we typically expect students in their research proposals to provide detailed descriptions of measurement techniques, some information of pos-sible treatment confounds and their controls during data collection, and little, if any, information on suitable statistical procedures. As teachers and re-search mentors, we must, by example, convince students that an experiment is a well-conceived plan (design) for data collection, data analysis, and data interpretation. Also, early in their academic careers, we must provide them with numerous opportunities for designing experiments and analyzing and interpret-ing experimental data. These are the objectives of an annual research design course developed for second-year graduate students in the Department of Physiol-ogy at the University of Kentucky.
GENERAL PHILOSOPHY AND COURSE FORMAT
Biological scientists are concerned primarily with designing and interpreting their own experiments as well as critically evaluating articles and papers in their research field. Consequently, they must know whether an experiment is properly conceived, correctly con-trolled, adequately analyzed, and correctly inter-preted. To accomplish these aims I focus on experi-mental design and statistical literacy and do not cover derivations of probability distributions or computa-tional procedures. Lecture-discussions are given on the essentials of experimental design and statistical inference. These essentials then are reviewed by
illustrating their applications and misapplications in numerous research design problems. Course partici-pants critique these research design problems and prepare similar design problems for peer evaluation. In most of these problems the treatment conditions are confounded by subject-, environment-, or time-related extraneous variables, proper controls may be absent, or data analysis may be incorrect (7, 9, 15, 16, 23).
As researchers we strive to translate important ques-tions into appropriate treatment condiques-tions, obtain accurate and valid measurements of functional pro-cesses, do experiments that produce unequivocal data, and integrate our data with current knowledge. Therefore, throughout the course I repeatedly empha-size that an appropriate experimental design with proper statistical methods will not ensure correct scientific conclusions if a research hypothesis is not clearly defined, the applied treatment conditions are not appropriate, or the data collected are inaccurate or invalid. These latter requirements are based on specialized discipline knowledge. Also, I emphasize that the logic of experimental design can be learned and a critical perception can be cultivated so that important research hypotheses are more likely to be evaluated in well-designed experiments. Fortunately, in a well-designed experiment the included statistical methods are relatively easy to select (17, 20, 23). In short, I reassure students that their discipline-based ideas and techniques are extremely important but that any idea may be evaluated in either a well-designed or a poorly designed experiment. Consequently, they must not expect even an ingenious idea to be ac-cepted when evaluated in a poorly designed experi-ment. They also should not expect a trivial idea to become important when evaluated in a well-designed experiment.
LECTURE-DISCUSSION SESSIONS
Topics covered in the initial 13 lecture-discussion sessions include an introduction to the design and analysis of experiments, confounding in experiments, factorial experiments with two or three factors, de-signs with repeated measures, interpretation of inter-actions, multiple comparison tests, and problems in generalizing experimental data (see Table 1). Students are expected to have some prior knowledge of com-mon probability distributions and elementary test statistics. An introductory course in statistics is a prerequisite, and most students have completed a graduate level course for scientists. The design knowl-edge of students is determined on a pretest entitled ‘‘PhD Qualifying Examination: Experimental Design and Statistical Inference’’ given at the first organiza-tional meeting. This pretest consists of choice and short-answer questions. The multiple-choice questions are on very elementary concepts, and short answers are requested on questions such as the following: How many independent groups are necessary in a given between-group design with two factors?; What are the degrees of freedom (df) in a simple two-group design?; and What is an interaction? Test results are reviewed at the next class, but tests are retained by me. Hence, for the last three years the same pretest has been used. Much more demanding multiple-choice and short-answer questions are asked on a midterm examination on material covered in the introductory lectures and assigned readings. As ex-amples, students are expected to be able to
recon-struct complete analysis of variance (ANOVA) tables (sources of variance, df, andF-ratios) for three-factor experimental designs with or without correlated (re-peated) measures on some or all of the factors and to identify confounds common to different research designs, know how to control for common con-founds, and know how to analyze and interpret two-and three-way interactions. The multiple-choice ques-tions of the midterm examination also are reviewed only in class. The short-answer questions on experi-mental design change each year, and this section is returned to students.
The performances of students (n 5 19) in the last three classes on these two examinations are shown in Fig. 1. A significant improvement in test scores from the pretest to the midterm examination occurred (P, 0.001). Overall, the percentage of correct responses increased from 39.8 (612.6 SD) to 79.8 (68.4 SD). No significant differences were found among the three classes in the two test scores or in the increases across the two tests. Physiology students are encouraged to enroll in this course during the fall semester of their second year. Hence, students with limited back-grounds in statistics may do remedial work during the
TABLE 1
Lectur e-discussions on ex perimental design and statistical infer ence
Introduction to
1) Experimental Design and Statistical Inference
2) The Anatomy of an Experiment and of a Scientific Paper
3) The Logic of Experimental Design and Statistical Inference Advantages, Disadvantages and Confounds of
4) Between-Subject (experimental unit) Designs
5) Within-Subject Designs
6) Mixed Factorial Designs Statistical Interpretation
7) Analysis of Variance (ANOVA) Models
8) Main Effects and Interactions in Factorial Designs
9) Interactions and Multiple Comparisons Selected Statistical Topics
10) Regression and Correlation
11) Parametric and Nonparametric Tests
12) Correlated Measures
13) Data Analysis
FIG. 1.
summer, by either taking a statistics course or self-study. However, even with this opportunity the pre-test performance of students is disappointing because most of them do not understand elementary concepts of experimental design (see Fig. 1).
SEMINARS ON RESEARCH DESIGN PROBLEMS AND THEIR CRITIQUES
In the second half of the course the essentials of experimental design and statistical inference are re-viewed by illustrating their applications and misappli-cations in experimental briefs. One hundred research design problems and their critiques have been pub-lished, and these are used as introductory and refer-ence material (23). Each year students prepare and critique about 40 new problems.
Each student must prepare four new research design problems and their critiques (about one problem a week). The design problems are distributed to the class during the meeting before their scheduled evalu-ations. Critiques are given only to me. Each research design problem is expected to describe concisely the significance of the research, the research hypothesis, the preparation used, the experimental methods, the findings (including descriptive statistics for each sub-group or condition, statistical tests, and significance levels), the interpretation of the findings, and the author’s conclusions. Problems must avoid jargon, have specialized terms defined, and concentrate on general design and statistical errors (see Table 2). Essentially, a longer and more detailed abstract is expected. One of the four submitted problems must have no fatal experimental flaws.
The critiques of design problems must include a restatement of the research hypothesis, the ANOVA table, and whether the inferences of the researchers are appropriate. The submitted critique of a research design problem must be concise and cannot provide additional details of the experiment. The errors identi-fied must be those that would make the author’s conclusions not acceptable given the information provided in the research problem. If a general method-ological error is not clearly described (all control samples measured first), evaluators are instructed to assume that the measurement methods and proce-dures were appropriate for all conditions.
Three or four of these new problems are reviewed during each class. I select the sequence of problems for review and inform the class that the problems have been graded and that their criticisms will not change the grades. If students find additional confounds or statistical errors (and to my delight they have), I acknowledge my oversight and congratulate them on their insightful evaluations. Also, I inform the group that the author of the problem will not answer questions—the problem must stand alone. Graded problems (and critiques) are returned at the end of each class. I facilitate discussion by asking students whether they accept the author’s conclusions and, if not, why. In these discussions the emphasis is on design and inference errors but a critical, not a cynical, perceptive is encouraged. To be critical is not to deride or to demolish but to make exact and discriminating judgments. I emphasize that flaws may occur in biological studies but that every flaw is not fatal. Each student decides when to submit the re-quested problem without a fatal experimental flaw, and this uncertainty helps maintain the objectivity of the evaluators. In theAPPENDIXare two representative design problems, their critiques, and issues that may be reviewed for each research problem.
Five critiques are also completed in class on problems that I prepared (,30 minutes for each problem). After
the students’ written critiques are submitted, poten-TABLE 2
Potential confounds in between-subject, within-subject and mix ed factorial designs
I. Between-Subject (experimental unit) Design
• Selective bias in assignment of subjects (units)
• Selective loss of subjects (units) because of treatment condi-tions
• Treatment condition associated with another potential inde-pendent variable
II. Within-Subject Design
• General carry-over effects across treatments
• Differential carry-over effects from one treatment to another
• Time-related effects
• Treatment condition associated with another potential inde-pendent variable
III. Mixed Factorial Design
• Between-subject component: Subject selection and treatment-subject loss
• Within-subject component: Carry-over and time-related effects associated with repeated (correlated) measurements
tial confounds and statistical errors are discussed. At least one problem without a fatal experimental error is included in the assessment set. Three critique sessions are spaced throughout the second half of the course. The flawed problems have confounds and errors that have been repeatedly discussed but that now are embedded in different experimental contexts. The students’ critiques are graded and returned during the next class, when discussion may continue. Also, students may be requested to critique posters mounted outside of laboratories and at regional or national meetings.
The final problem set, which substitutes for a final examination, is due at the last class meeting. The first problem in the two-problem set must be a two- or three-factor design with repeated measures on at least one factor and must have a serious design flaw or inappropriate statistical inference. The critique format of this problem is the same as the format for all flawed problems. The second problem must correct the serious error in the first problem and must extend the research question(s) by using a mixed factorial design. For this latter problem the experimental design, statistical inferences, and researcher’s conclusions must all be appropriate.
STUDENT FEEDBACK
Evaluation forms are distributed a week before the last class meeting, returned at the last class, and summa-rized by department staff after final grades are as-signed. Also, to guarantee future anonymity, students are requested to type their answers. The evaluation form asks for agreement/disagreement rankings on general statements and short answers to three open-ended questions. For the last three classes (n 5 19) the percentages of ranks of strongly agree (SA), agree (A), disagree (DA), and strongly disagree (SDA) on three statements were as follows. 1) The general organization of the course enhances conceptual learn-ing: 50% SA, 50% A.2) Class discussions enhanced and enriched the learning experience: 69% SA, 31% A.3) Faculty-student interactions in and out of class pro-vided a positive learning experience: 55% SA, 45% A. Other general statements were included on the text, assigned readings, level of difficulty of the course, and subsequent course recommendations.
The three open-ended questions were as follows: What did you like about the course?; What did you like the least about the course?; and What would you change in the course? Most students enjoyed the interactive discussions on designs and potential founds and appreciated the repetitive review of con-founds and errors in different experimental problems. Many students requested more introductory lectures on applied statistics, some wanted more examples of data analysis, and a few requested calculation proce-dures and guided practice with computer statistics packages. To schedule these additions, most students recommended that the critiques completed in class on my design problems be reduced or eliminated. Some students were not comfortable with these assess-ments because they could not prepare for specific problems, had to decide relatively quickly to accept or reject the authors’ conclusions, and had to provide valid reasons for decisions to reject. Surprisingly, none of the students objected to the open evaluations of their research designs or written presentations (see Ref. 12).
RECOMMENDATIONS
The major advantage of the course format is that students are exposed to all aspects of research with an emphasis on whether an experiment is properly conceived, correctly controlled, adequately analyzed, correctly interpreted, and concisely presented. Stu-dents receive extensive practice in identifying serious design flaws, inappropriate statistical analyses, and invalid conclusions in many different experiments. Students also benefit from their practice in writing detailed abstracts and learn quickly to provide con-structive criticism of the research designs and presen-tations of others. The long-term effects of these problem-solving exercises are difficult to quantify. However, informal assessments by mentors of the subsequent research efforts of their students indicate that significant improvement did occur. Also, evi-dence from cognitive science suggests that the skills of critical thinking must be practiced with a wide variety of problems for these skills to be learned and retained.
seminar group, a department, or a college, and partici-pants normally maintain their interest in identifying flaws in new problems. A continuing benefit is that in subsequent consulting sessions with students, both partners use the same design and statistics terminol-ogy, which produces more productive discussions. The major shortcomings of focusing on design prob-lems are that students receive limited exposure to real data sets and no practice in analyzing data using computer statistics programs.
Significant course modifications, of course, would be necessary if class size became larger than 10–12 students. With a large group, class evaluations of all submitted problems would not be feasible unless the number of requested problems was greatly reduced, thereby reducing the students’ experience in design-ing experiments. Some of the teachdesign-ing exercises, however, can be adapted for use with large classes. For example, in 1996 at Monash University (Australia), after introductory lectures were given by other fac-ulty, design problems in behavioral science were distributed to over 40 undergraduate students. These psychology honor students were very active in trying to identify flaws in the study problems. On the basis of informal assessment by the Head of the Department, students enjoyed and profited from these interactive discussions, and a critique of a published article is now part of the promotion assessment for the honors degree in psychology.
Although many statistics books, like the design course at Kentucky, focus on statistical reasoning (see e.g., Refs. 4, 6, 10, 13, 18), many biological scientists continue to use flawed designs and give elaborate interpretations of data that most likely were produced by chance processes. To improve our scientific out-put, we must become more proficient in the designing of our own experiments and in conveying efficiently this knowledge to our students. To achieve these objectives, the three intertwined experimental phases of data collection, analysis, and interpretation must be integrated in our research planning, our students’ research proposals, and our research seminars. When discussing research articles we should critically evalu-ate all components—the research hypothesis, measure-ment methods and protocols, experimeasure-mental design and statistical inferences, and authors’ conclusions. In
our department, first-year students are expected to read research articles in many areas of physiology. Obviously, given their performance on the design pretest (see Fig. 1), students cannot critically evaluate the experimental designs and statistical inferences in these articles. Hence, only research hypotheses and data collection methods can be adequately evaluated in seminar courses in the first year. This separation of discipline knowledge and protocols from experimen-tal design perpetuates the myth of data collection as real science (an experiment) and data analysis and interpretation as only number crunching (statistics).
If nothing else, we should immediately convey to our students that all planned experiments must be well-conceived blueprints for data collection, analysis, and interpretation. As a teaching tool the evaluation of research design problems is an excellent way to reinforce this important message. The more students are exposed, early in their academic careers, to both good and bad experimental designs, the better their scientific output should become.
APPENDIX
Ex amples of Resear ch Design Pr oblems, Their Critiques, and Issues Reviewed
The following two research design problems (RDP) illustrate the evaluative experiences obtained by students. Design problems are prepared and critiqued after lectures on the essentials of biostatis-tics (see Table 1). Consequently, for some readers these two relatively brief problems may contain some unfamiliar material such as ANOVA tables with their sources of variances, df, and error terms for the properF-ratios. The first problem is similar to introductory problems that I use, and the second, more detailed design is similar to those expected from students. In the design course, submitted problems can be no more than four double-spaced pages and their critiques no more than three double-spaced pages. Detailed instruc-tions for the preparation of different types of problems and their critiques are available on request. At the end of each of these selected problems are issues, discussed in the first part of the course, that may be reviewed. One hundred sample design prob-lems and their critiques have been published (see Ref. 23).
Mean Plasma HDL Concentrations and Standard Deviations (SD) Inactive men (n530) 48.3 mg/dL 10.2 SD Joggers (n530) 50.3 mg/dL 19.6 SD Marathon runners (n530) 64.8 mg/dL 11.9SD
A between-subject ANOVA was done, and there was a significant group effectF(2,87) 55.63, P,0.01, and a Scheffe´’s multiple comparison test indicated that the runners were significantly different from the inactive men (P,0.05) but did not differ from the joggers and that the joggers and inactive men did not differ significantly. From these results, the researchers concluded that only very strenuous exercise will significantly increase HDL concen-tration; that is, jogging has no significant effect on HDL concentra-tion.
Critique of RDP 1. The research hypothesis is whether HDL concentrations differ in middle-aged men who either run or jog, or are inactive. The ANOVA table for this study is
Source df Error term
Group 2 S/group
S/group 87
Total 89
This study, however, was not an experiment and does not permit causal inferences to be made. Classification variables are the independent variables in this study, and classification variables are always subject confounded. Inactive men, joggers, and marathon runners probably differ in their diets, their responses to stress, their self-esteem, their body weights, and so forth. Because these subject-related confounding variables cannot be controlled by the researcher, cause-and-effect conclusions are impossible when only classification variables are used. The data presented also make the author’s conclusions suspect. Inactive men and marathon runners are classes of individuals that are relatively easy to define. Inactive men do not walk, jog, or run. Marathon runners run in marathons (26 miles). However, joggers may jog very slowly once a week or rather briskly four to seven times a week. Notice that the standard deviation for the joggers is about twice that of the other two groups. Consequently, this group includes individuals who vary widely in their HDL levels and probably also in the amount of their weekly exercise. Individuals who jog briskly for 40 minutes at least 3 times a week may have the same HDL concentrations as many marathon runners. High HDL concentrations may, therefore, be associated with both moderate and strenuous exercise. Hence, the author’s conclusion that only very strenuous exercise will significantly increase HDL concentration is not justified because classification variables were used. Furthermore, a conclusion of only an associa-tion between strenuous exercise and high HDL concentraassocia-tion would be suspect because of the probable measurement error.
Issues that may be reviewed in addition to correlation/classification versus experimental study:
•ANOVA, single factor
•Familywise error rate
•Multiple comparison tests
•Scheffe´’s test
•Power of test
•Standard deviation
•Heterogeneity of variances
•Measurement error
•ANOVA table, levels of a factor, definitions of a factor, df, andP
value
•Assumptions of statistical test
RDP 2.The mammalian hippocampus has been proposed to be an important brain region for memory processing; consequently, interfering with hippocampal functioning should produce a greater effect on complex tasks (requiring extensive memory processing) than on simple tasks. To test this hypothesis, a biologist prepared a new type of knockout mouse in which the N-methyl-D-aspartate (NMDA) receptor was deleted specifically from the pyramidal cells of the hippocampus in the adult mouse in some littermates but not in others. This novel procedure apparently eliminates both develop-ment differences and other brain changes between the knockout mice and their normal littermates. These same outcomes are assumed to have occurred in this experiment. The behavior of 20 of these adult knockout mice were compared with that of 20 normal adult littermates (only one littermate pair of male mice from each litter) on a simple two-choice T-maze and on a complex radial arm (RA) maze with multiple choice points. The order of maze testing was counterbalanced so that one-half of the mice in each group were trained first on the T-maze and subsequently on the RA maze. The other 10 mice in each group were trained in the reverse order. Food reward was used in both tasks, and all mice were maintained at 85% of their pretraining weight. For all mice the number of trials to reach 90% correct responding on each maze was recorded.
A mixed factorial ANOVA indicated a significant group effect,
F(1, 38)58.35,P,0.01, and a significant test effect,F(1, 38)5 8.95,P,0.01. The group3test interaction was not significant,
F(1, 38)51.20. Both the knockout and normal mice required more trials to learn the RA maze (41.263.5, mean6SD) than the T-maze (19.863.7). Also, the knockout mice required more averaged trials to learn the two mazes (39.863.6) compared with their controls (21.262.9). From these findings, the biologist concluded that the loss of NMDA receptors from the neurons of the hippocampus in the male mouse has a greater effect on the learning of complex than simple tasks, a conclusion that supports the hypothesis that the hippocampus is involved in memory processing.
Source df Error term
Group 1 S/group
Test 1 Test3S/group
S/group 38
Group3test 1 Test3S/group
Test3S/group 38
Total 79
The statistical test selected was appropriate, but the biologist’s conclusions were wrong. If the researcher’s conclusions were correct, a significant group3 test interaction would have been obtained. Subsequent analyses of this interaction should also have indicated that for the knockout mice the difference in errors made on the complex and simple tasks was greater than the trial error difference between these two behavioral tests for the normal mice. However, the group3test interaction,F(1, 38)51.20, was not significant; consequently, the only conclusions the biologist could make would be interpretations based on the two reported main effects, namely, that the knockout mice required significantly more trials to learn the two mazes compared with their control litter-mates and that mice in both groups required significantly more trials to learn the complex than the simple maze. Therefore, no evidence exists that a deficiency in hippocampal function affected the mouse’s performance more on complex than on simple tasks.
Issues that may be reviewed in addition to interaction interpreta-tion:
•Mixed factorial design
•Main effects and interactions
•Between-subject and within-subject components
•Confounds, carryover effects
•Counterbalancing
•ANOVA table, levels of a factor, definitions of a factor, df, andP
value
•Expected value ofF-ratio
•Assumptions of statistical test
I am grateful to Brian Jackson, Daniel Richardson, and Eric Smart for helpful comments on this paper. I thank Kim Ng of Monash University for supporting the use of research design problems and their evaluations as a teaching method with a large class of advanced undergraduate students.
Address for reprint requests and other correspondence: J. F. Zolman, Dept. of Physiology, MS 508C Medical Center, College of Medicine, Univ. of Kentucky, Lexington, KY 40536-0298. (E-mail: [email protected]).
Received 18 November 1998; accepted in final form 25 August 1999.
Refer ences
1. Altman, D. G.Statistics in medical journals: developments in the 1980s.Sta t. Med. 10: 1897–1913, 1991.
2. Altman, D. G.Statistical reviewing for medical journals.Sta t. Med. 17: 2661–2674, 1998.
3. Bailar, J. C., III. The real threats to the integrity of science.
Chronicle of Higher Educa tion, April 21: B1–B2, 1995. 4. Bland, M. An Introduction to Medica l Sta tistics (2nd ed.).
Oxford, UK: Oxford Univ. Press, 1995.
5. Bradstr eet, T. E. Teaching introductory statistics courses so that nonstatisticians experience statistical reasoning.Am erica n Sta tisticia n50: 69–78, 1996.
6. Dawson-Saunders, B., and R. G. Trapp.Ba sic a nd Clinica l Biosta tistics(2nd ed.). East Norwalk, CT: Appleton and Lange, 1994.
7. Friedman, H. H., and L. W. Friedman. A new approach to teaching statistics: Learning from misuses.New York Sta tisti-cia n31: 1–3, 1980.
8. Glantz, S. A. Prim er of Biosta tistics (4th ed.). New York: McGraw-Hill, 1995.
9. Jaffe, A. J., and H. F. Spir er.Misused Sta tistics: Stra ight Ta lk for Twisted Num bers. New York: Marcel Dekker, 1987. 10. Keppel, G.Design a nd Ana lysis: A Resea rcher’s Ha ndbook
(3rd ed.). Englewood Cliffs, NJ: Prentice Hall, 1991.
11. Lederber g, J. Sloppy research extracts a greater toll than misconduct.The Scientist, February 20: 13, 1995.
12. Lightfoot, J. T.Adifferent method of teaching peer review systems.
Am . J. Physiol. 274 (Adv. Physiol. Educ. 19): S57–S61, 1998. 13. Munr o, B. H. (Editor). Sta tistica l Methods for Hea lth Ca re
Resea rch(3rd ed.). Philadelphia, PA: Lippincott-Raven, 1997. 14. Nowak, R.Problems in clinical trials go far beyond
miscon-duct.Science264: 1541–1541, 1994.
15. Riegelman, R. K., and R. P. Hirsch.Studying a Study a nd Testing a Test: How to Rea d the Hea lth Science Litera ture(3rd ed.). Boston, MA: Little, Brown, 1996.
16. Scott, E. M., and J. M. Waterhouse. Physiology a nd the Scientific Method: The Design of Experim ents in Physiology. Manchester, UK: Manchester Univ. Press, 1986.
17. Selwyn, M. R.Principles of Experim enta l Design for the Life Sciences. Boca Raton, FL: CRC, 1996.
18. Sokal, R. R., and F. J. Rohlf.Biom etry: The Principles a nd Pra ctice of Sta tistics in Biologica l Resea rch (3rd ed.). New York: Freeman, 1995.
19. Swazey, J. P., M. S. Anderson, and K. S. Lewis. Ethical problems in academic research.Am . Sci. 81: P542–P553, 1993. 20. Tweedie, R. L., K. L. Mengersen, and J. A. Eccleston.Garage in, garbage out: can statisticians quantify the effects of poor data?Cha nce7: 20–27, 1993.
21. Watts, D. G.Why is introductory statistics so difficult to learn? And what can we do to make it easier?Am erica n Sta tisticia n
45: 290–292, 1991.
22. Williamson, J. W., P. G. Goldschmidt, and T. Colton.The quality of medical literature: An analysis of validation assessments. In:Medica l Uses of Sta tistics, edited by J. C. Bailar III and F. Mosteller. Cambridge, MA: N. Engl. J. Med., 1986, p. 370–391. 23. Zolman, J. F.Biosta tistics: Experim enta l Design a nd Sta