Analysis of statistical data

Useful electronic database at the end of the book contains a substantial collection of data. Most of the new exercises are based on new, data-based examples in the text.

ACKNOWLEDGMENTS

Available at companion website: www.wiley.com/. college/johnson) Contains a large number of supplementary questions for each chapter. To view a demonstration of WileyPLUS, contact your local Wiley sales representative or visit: www.wiley.com/college/wileyplus.

3 DESCRIPTIVE STUDY OF BIVARIATE DATA 81

6 THE NORMAL DISTRIBUTION 221

7 VARIATION IN REPEATED SAMPLES —

8 DRAWING INFERENCES FROM LARGE SAMPLES 295

9 SMALL-SAMPLE INFERENCES

11 REGRESSION ANALYSIS — I

12 REGRESSION ANALYSIS — II

Multiple Linear Regression and Other Topics 485

APPENDIX A3 EXPECTATION AND

APPENDIX A4 THE EXPECTED VALUE AND STANDARD DEVIATION OF _

APPENDIX B TABLES 624

ANSWERS TO SELECTED ODD-NUMBERED EXERCISES 665

What Is Statistics?
Statistics in Our Everyday Life 3. Statistics in Aid of Scientific Inquiry
Two Basic Concepts—Population and Sample 5. The Purposeful Collection of Data
Statistics in Context 7. Objectives of Statistics
WHAT IS STATISTICS?
STATISTICS IN OUR EVERYDAY LIFE
STATISTICS IN AID OF SCIENTIFIC INQUIRY
TWO BASIC CONCEPTS — POPULATION AND SAMPLE
THE PURPOSEFUL COLLECTION OF DATA
STATISTICS IN CONTEXT
OBJECTIVES OF STATISTICS

Thus, the limitations of time, resources and facilities, and sometimes the destructive nature of the testing, mean that we have to work with incomplete information - the data that is actually collected during the course of an experimental study. It is part of a much larger collection about which we want to make inferences - the set of measurements that would result if all the units in the population could be observed.

USING STATISTICS WISELY

Statistical concepts and methods make it possible to draw valid conclusions about the population based on a sample. The fundamental statistical concepts and methods described in this book are core to all areas of application.

KEY IDEAS

REVIEW EXERCISES

What is the proportion of the 20 experiments that give one of the students you like and another. You want to estimate the proportion of Danes inside and decide to collect your sample by observing the first seven dogs that jump high enough to be seen over the fence. a) Explain how this is a self-selected sample that is obviously highly misleading.

Introduction

Main Types of Data

Describing Data by Tables and Graphs 4. Measures of Center

Measures of Variation

Checking the Stability of the Observations over Time 7. More on Graphics

Statistics in Context 9. Review Exercises

The acidity of the first 50 measured rains, measured on a pH scale from 1 (very acidic) to 7 (basic), is summarized in the histogram. More research will undoubtedly improve our understanding of the acid rain problem and, we hope, lead to a better environment.

INTRODUCTION

MAIN TYPES OF DATA

Counts are inherently discrete and are treated as such, provided they take relatively few discrete values (for example, the number of children in a family or the number of traffic violations committed by a driver). For example, the white blood cell count, the number of insects in a colony, and the number of shares traded per day are strictly discrete but are considered continuous for practical purposes.

DESCRIBING DATA BY TABLES AND GRAPHS

CATEGORICAL DATA

The remainder of this chapter is devoted to a descriptive study of measurement data, both discrete and continuous. As with summarizing and annotating a long, wordy document, it is difficult to prescribe concrete steps for summary descriptions that work well for all types of measurement data.

Computation of numerical measures

DISCRETE DATA
DATA ON A CONTINUOUS VARIABLE

Note: Relative frequencies provide the most important information about the data pattern. A stem-and-leaf display stores all information in key data digits.

MEASURES OF CENTER

The calculation of the sample mean and its physical interpretation is illustrated in Example 6. Does the sample mean or median give a better indication of the amount of mineral loss.

MEASURES OF VARIATION

One could feel that the average of the deviations would provide a numerical measure of the spread. To obtain a measure of variability in the same unit as the data, we take the positive square root of the variance, called the sample standard deviation. The standard deviation, rather than the variance, serves as a basic measure of variability. The sample interquartile range represents the length of the interval covered by the middle half of the observations.

CHECKING THE STABILITY OF THE OBSERVATIONS OVER TIME

There is a fairly strong downward trend for most of the period, so the exchange rate is certainly not under statistical control. According to the empirical rule, only about 5% of observations will fall outside the control limits if the process is statistically controlled and the observations are stable over time. Whenever an observation falls outside the control limits, the reason must be sought.

STATISTICS IN CONTEXT

Because it was near the end of her shift and the beginning of the weekend, the operator did not report the out-of-control reading to the setup person or foreman. She knew that the technician was already cleaning up before the end of the shift and that the foreman was probably thinking of going across the street to the Legion Bar for some refreshments once the shift was over. When the pressing machine was started on Monday morning, one of the stamps broke.

KEY IDEAS AND FORMULAS

Class intervals are non-overlapping and span the range of the data set from smallest to largest. A list of class intervals together with the corresponding relative frequencies provides a frequency distribution which can be displayed graphically as a histogram. The histogram is constructed to have a total area of 1, equal to the total relative frequency. Quartiles and, more generally, percentiles are other useful locators of the distribution of a data set.

TECHNOLOGY

REVIEW EXERCISES

2.115 The 50 measurements of acid rain in Wisconsin, the histogram of which appears on the front page of the chapter, are. a) Calculate the median and quartiles. Growth the first year in freshwater is measured by the width of the growth rings for that life period. How many of the alligators above the 90th percentile are female. a) Obtain the sample mean and standard deviation.

Summarization of Bivariate Categorical Data 3. A Designed Experiment for Making a Comparison

Prediction of One Variable from Another (Linear Regression)

Review Exercises

In search of clues about the origin and composition of the planets, scientists have performed chemical analyzes of rocks collected by astronauts and unmanned space probes. The Apollo Moon landings made it possible to study the geology of the Moon first hand. Other rocks typically have small amounts of both elements, indicating a positive correlation between hydrogen and carbon content.

SUMMARIZATION OF BIVARIATE CATEGORICAL DATA

In Example 1, you might want to compare the part-time work pattern for undergraduates with that of upperclassmen. Reversing the comparison, as in example 2, when data are combined from several groups is called Simpson's paradox. The result is "survival" if the patient lives for at least six weeks. a) Calculate the percentage of patients who survive surgery at each hospital.

A DESIGNED EXPERIMENT FOR MAKING A COMPARISON

First, subjects or experimental units must be assigned to the two groups in such a way that neither method is favored. At the end of the study, the number of people in each group who abstained and smoked was recorded. Two studies on the clinical effectiveness of the nicotine patch with different counseling treatments." Chest pp.

SCATTER DIAGRAM OF BIVARIATE MEASUREMENT DATA

Example 3. A scatter diagram provides a visual representation of a relationship. Table 7 shows the data. Example 4 Multiple Scatter Plot to Visually Compare Relationships Environmentalists have expressed concern that pollutant spills are affecting wildlife in and around a neighboring lake. The concentrations of two steroids, estradiol and testosterone, were determined by radioimmunoassay. a) Make a scatter plot of the two concentrations for Lake Apop alligators.

THE CORRELATION COEFFICIENT — A MEASURE OF LINEAR RELATION

The correlation coefficient is best interpreted from the standardized observations or sample z-values. The sample correlation coefficient is the sum of the products of the standardized x observation times the standardized y observations divided by . The quantities and are the sum of squared deviations of the xobservations and the yobservations, respectively.

PREDICTION OF ONE VARIABLE FROM ANOTHER (LINEAR REGRESSION)

A least-squares straight line fit helps describe the relationship between the response or output variable y and the predictor or input variable x. Start with the values of the predictor variable in column A and the values of the response variable in column B. Enter the values of the predictor variable in L1 and the values of the response variable in L2.

REVIEW EXERCISES

Eleven of the vaccinated people and 70 of the non-vaccinated people later contracted the disease. a) Present these data in the following two-way frequency table. The total tar yield is determined by laboratory analysis of the pool of smoke taken by the machine. Use MINITAB (or another package program) to obtain the scatterplot, correlation coefficient, and regression line for:. a) The GPA and GMAT score data from Table 7 in Example 3. 3.59 For the adjustment of body length to weight for all wolves given in Table D.9 in the Databank, use MINITAB or another computer package to obtain: .

Probability of an Event

Methods of Assigning Probability

Event Relations and Two Laws of Probability 5. Conditional Probability and Independence

Random Sampling from a Finite Population 8. Review Exercises

In general terms, the probability of an event is a numerical value that indicates how likely the event is to occur. We will see later that the probability of at least 12 heads on 15 tosses of a fair coin is 0.018, indicating that the event is unlikely to occur. So if we provisionally assume the model (or hypothesis) that the method is ineffective, twelve or more cures are very unlikely.

PROBABILITY OF AN EVENT

The event B in which no head is obtained at all consists of the single element e4so B. The event A in which exactly one head is obtained consists of the elementary outcomes HT and TH. The probability of an event is the sum of the probabilities assigned to all elementary outcomes occurring in the event.

METHODS OF ASSIGNING PROBABILITY

EQUALLY LIKELY ELEMENTARY OUTCOMES — THE UNIFORM PROBABILITY MODEL
PROBABILITY AS THE LONG-RUN RELATIVE FREQUENCY In many situations, it is not possible to construct a sample space where the ele-

A simple uniform probability model lies at the heart of Mendel's explanation of the mechanism of selection. The reliability of the uniform probability model can only be ascertained from a broad set of birth records. An offspring receives one gene from each parent. a) Construct the sample space for the genetic type of the offspring.

EVENT RELATIONS AND TWO LAWS OF PROBABILITY

The law of addition expresses the probability of a larger event A B in terms of the probabilities of the smaller events A,B and AB. What is the probability that the selected puppies are either of the same sex or the same age. A high concentration of any one of the chemicals and low concentrations of the other two.

CONDITIONAL PROBABILITY AND INDEPENDENCE

Thus, the law of multiplication of probability states that the conditional probability of an event multiplied by the probability of the conditioning event gives the crossover probability. The event [precisely a flaw] is the union of two incompatible events and. Information about the occurrence of Bthen has no influence on the probability estimate of A.

BAYES’ THEOREM

If the selected mouse is found to be infected, what is the probability that it is female. If the mouse selected turns out to be male, what is the probability that he is infected? If a repair turns out to be incomplete, what is the probability that the repair was done by Karl?

RANDOM SAMPLING FROM A FINITE POPULATION

To motivate the formula, let's consider the number of possible choices (or sets) of three letters out of the seven letters {a,b,c,d,e,f,g}. Notation: The number of possible choices of robjects from a group of N different objects is indicated with the text “Nchoose r.”. SOLUTION The number of ways in which two persons can be selected out of five is given by.

Introduction 2. Random Variables

Probability Distribution

Expectation (Mean) and Standard Deviation of a Probability Distribution

Successes and Failures — Bernoulli Trials 6. The Binomial Distribution

The Binomial Distribution in Context 8. Review Exercises

The relative frequencies from a long record of summer days lead to an approximate distribution of the number of rescues per day. Often the results of an experiment are numerical values: for example, the daily number of burglaries in a city, the hourly wages of students on summer jobs, and scores on a university exam. If a new vaccine is tested on 100 people, the information relevant to an evaluation of the vaccine may be the number of responses in the categories - severe, moderate or no nausea.

RANDOM VARIABLES

SOLUTION First, Scanning our list, we now identify the events (i.e., the sets of the elementary outcomes) that correspond to the different values of X. Using this example, we observe the following general facts. Use A, B, and C to indicate the three children.) (b) Let X be the number of times Carol is.

PROBABILITY DISTRIBUTION OF A DISCRETE RANDOM VARIABLE

The fair coin model implies that the eight basic outcomes are equally likely, so each outcome is assigned its probability. The event [X 0 ] has the single outcome TTT, so its probability is similar as the probabilities of [X 1 ], [X 2 ] and [X 3 ] turn out to be and respectively. A probability distribution or the probability function describes the way in which the total probability 1 is assigned to the individual values of the random variable. That is, knowing that the first student selected prefers Internet news does not change the probability that the second student prefers Internet news, and so on.