• Tidak ada hasil yang ditemukan

Percentiles and Quartiles

Dalam dokumen Statistics for Business and Economics (Halaman 65-70)

Percentiles and quartiles are measures that indicate the location, or position, of a value relative to the entire set of data. Suppose you are told that you scored in the 92nd percentile on your SAT mathematics exam. This means that approximately 92%

of the students who took this exam scored lower than you and approximately 8% of the students who took this exam scored higher than you. Percentiles and quartiles are generally used to describe large data sets, such as sales data, survey data, or even the weights of newborn babies. Pediatricians will measure a baby’s weight in terms of percentiles. A newborn who weighs in the 5th percentile is quite small in comparison to a newborn in the 95th percentile in weight (Grummer-Strawn, Reinold, and Krebs 2010).

Statisticians do not agree on one best method to calculate percentiles and quartiles and propose different ways to calculate these measures (Langford 2006). Slightly dif-ferent values for percentiles and quartiles are found using various computer software packages (such as SPSS, SAS, MINITAB, JMP) or using Excel or with the use of dif-ferent calculators. In this book we rely on linear interpolation between ranked values and identify the location of percentiles and quartiles, as given in Equations 2.6, 2.7, and 2.8.

Suppose that the annual growth rate is actually 5%; then the total growth over 5 years will be

11.052 11.052 11.052 11.052 11.052 = 1.2763

or 27.63%. However, the annual growth rate, r, that would yield 25% over 5 years must satisfy this equation:

11 + r25 = 1.25 First, solve for the geometric mean:

xg = 1 + r = 11.2521>5 = 1.046 The geometric mean growth rate is rg = 0.046, or 4.6%.

Percentiles and Quartiles

To find percentiles and quartiles, data must first be arranged in order from the smallest to the largest values.

The Pth percentile is a value such that approximately P % of the observa-tions are at or below that number. Percentiles separate large ordered data sets into 100ths. The 50th percentile is the median.

The P th percentile is found as follows:

Pth percentile = value located in the 1P>10021n + 12th ordered position (2.6) Quartiles are descriptive measures that separate large data sets into four quarters. The first quartile, Q1, (or 25th percentile) separates approximately the smallest 25% of the data from the remainder of the data. The second quartile, Q2, (or 50th percentile) is the median (see Equation 2.3).

2.1 Measures of Central Tendency and Location 65 In describing numerical data, we often refer to the five-number summary. In Section 2.2 we present a graph of the five-number summary called a box-and-whisker plot.

The third quartile, Q3, (or 75th percentile), separates approximately the smallest 75% of the data from the remaining largest 25% of the data.

Q1 = the value in the 0.251n + 12th ordered position (2.7) Q2 = the value in the 0.501n + 12th ordered position

Q3 = the value in the 0.751n + 12th ordered position (2.8)

Five-Number Summary

The five-number summary refers to the five descriptive measures: minimum, first quartile, median, third quartile, and maximum.

minimum 6 Q16 median 6 Q3 6 maximum

To illustrate the use of Equations 2.7 and 2.8, we include Example 2.5 with only n = 12 observations. For such a small sample size, one would rarely compute these val-ues in practice. Percentiles and quartiles are generally used to describe large data sets.

Example 2.6 has n = 104 observations and Example 2.7 has n = 4,460 observations.

Example 2.5 Demand for Bottled Water (Quartiles)

In Example 2.1 we found the measures of central tendency for the number of 1-gallon bottles of water sold in a sample of 12 hours in one store in Florida during hurricane season. In particular, the median was found to be 73.5 bottles. Find the five-number summary.

Solution We arrange the data from Example 2.1 in order from least to greatest.

60 63 65 67 70 72 75 75 80 82 84 85 Using Equation 2.7, we find the first quartile, Q1, as follows:

Q1 = the value located in the 0.25112 + 12th ordered position Q1 = the value located in the 3.25th ordered position

The value in the third ordered position is 65 bottles, and the value in the 4th ordered position is 67 bottles. The first quartile is found as follows:

Q1 = 65 + 0.25167 - 652 Q1 = 65 + 0.50 = 65.5 bottles

Using Equation 2.8, the third quartile, Q3, is located in the 0.75(12 + 1)th ordered position—that is, the value in the 9.75th ordered position. The value in the 9th ordered position is 80 bottles and the value in the 10th ordered position is 82 bottles. The third quartile is calculated as follows:

Q3 = 80 + 0.75182 - 802

Q3 = 80 + 0.75122 = 81.5 bottles The five-number summary for this data is as follows:

Minimum 6 Q1 6 median 6 Q3 6 maximum 60 6 65.5 6 73.5 6 81.5 6 85

66 Chapter 2 Using Numerical Measures to Describe Data

Statistical software packages are useful to describe data when the sample size is very large. In Chapter 1 we developed bar charts to graph one of the categorical vari-ables, activity level, from the Healthy Eating Index–2005 (Figure 1.1 to Figure 1.3).

Now, in Example 2.7 we find the five-number summary for the HEI–2005 data using Minitab.

Example 2.6 Shopping Times at a Mall (Percentiles)

In an endeavor to increase sales at a local mall, the management gathered data on the amount of time that current shoppers spend in the mall. A random sample of n = 104 shoppers were timed, and the results (in minutes) are given in Table 2.1. Find the 25th and 85th percentiles. The data is listed in Table 2.1 and contained in the data file Shop-ping Times.

Table 2.1 Shopping Times

18 34 42 37 19 37 30 40 28 34 71 18

46 42 34 30 21 23 40 37 57 69 73 47

45 38 34 25 34 23 37 20 63 57 73 52

20 31 18 42 25 40 21 40 57 69 71 55

33 38 30 41 18 31 34 18 63 57 70 25

33 21 48 34 25 45 34 21 31 70 69

21 37 51 50 25 51 42 52 67 18 68

31 37 52 52 43 45 43 18 25 70 64

23 30 19 50 59 60 60 68 69 70 59

Solution The first step is to sort the data in the data file Shopping Times from smallest to largest. Using Equation 2.6, we find the 25th percentile as follows:

25th percentile = the value located in the 0.251n + 12th ordered position 25th percentile = the value located in the 0.251104 + 12th ordered position 25th percentile = the value located in the 26.25th ordered position

The value in the 26th ordered position is 28 minutes, and the value in the 27th ordered position is 30 minutes. The 25th percentile is found as follows:

25th percentile = 28 + 0.25130 - 282 = 28.5 Similarly, we use Equation 2.6 to locate the 85th percentile as follows:

85th percentile = the value located in 0.851104 + 12th ordered position 85th percentile = the value located in the 89.25th ordered position

Since the value in the 89th ordered position is 64 minutes and the value in the 90th ordered position is 67 minutes, the value in the 89.25th ordered position is 25% of the distance between 67 and 64. The 85th percentile is found as follows:

64 + 0.25167 - 642 = 64 + 0.75 = 64.75 minutes

Approximately 85% of the shoppers in our sample spend less than 64.75 minutes at the mall.

Exercises 67

Example 2.7 Healthy Eating Index–2005 (Five-Number Summary)

The HEI–2005 measures how well the population follows the recommendations of the 2005 Dietary Guidelines for Americans (Guenther et al. 2007). The HEI measures, on a 100-point scale, the adequacy of consumption of vegetables, fruits, grains, milk, meat and beans, and liquid oils. This scale is titled HEI2005 in the data file HEI Cost Data Variable Subset.

We saw in Example 1.1 that the data file HEI Cost Data Variable Subset contains considerable information on randomly selected individuals who participated in an extended interview and medical examination. Recall that there are two interviews for each person in the study. Results for the first interview are identified by daycode = 1, and data for the second interview are identified by daycode = 2. Other variables in the data file are described in the data dictionary in the Chapter 10 appendix. Find the five-number summary of the HEI scores taken during the first interview for both males (code = 0) and females (code = 1).

Solution Since the data file contains n = 4,460 observations, we use Minitab to obtain the measures in the five-number summary (Figure 2.2).

Figure 2.2 Healthy Eating Index–2005 Scores: First Interview (Five-Number Summary)

E

XERCISES

Visit www.mymathlab.com/global or www.pearsonglobal editions.com/newbold to access the data files.

Basic Exercises

2.1 A random sample of 5 weeks showed that a cruise agency received the following number of weekly spe-cials to the Caribbean:

20 73 75 80 82

a. Compute the mean, median, and mode.

b. Which measure of central tendency best describes the data?

2.2 A department-store manager is interested in the num-ber of complaints received by the customer-service department about the quality of electrical products sold by the store. Records over a 5-week period show the following number of complaints for each week:

13 15 8 16 8

a. Compute the mean number of weekly complaints.

b. Calculate the median number of weekly complaints.

c. Find the mode.

2.3 Ten economists were asked to predict the percentage growth in the Consumer Price Index over the next year. Their forecasts were as follows:

3.6 3.1 3.9 3.7 3.5 3.7 3.4 3.0 3.7 3.4 a. Compute the sample mean.

b. Compute the sample median.

c. Find the mode.

2.4 A department-store chain randomly sampled 10 stores in a state. After a review of sales records, it was found that, compared with the same period last year, the fol-lowing percentage increases in dollar sales had been achieved over the Christmas period this year:

10.2 3.1 5.9 7.0 3.7 2.9 6.8 7.3 8.2 4.3 Descriptive Statistics: HEI2005 (Females; First Interview)

Variable N Minimum Q1 Median Q3 Maximum

HEI2005 2,321 11.172 42.420 53.320 63.907 92.643

Descriptive Statistics: HEI2005 (Males; First Interview)

Variable N Minimum Q1 Median Q3 Maximum

HEI2005 2,139 13.556 39.644 49.674 59.988 99.457

68 Chapter 2 Using Numerical Measures to Describe Data

2.2 M

EASURES OF

V

ARIABILITY

The mean alone does not provide a complete or sufficient description of data. In this sec-tion we present descriptive numbers that measure the variability or spread of the obser-vations from the mean. In particular, we include the range, interquartile range, variance, standard deviation, and coefficient of variation.

No two things are exactly alike. Variation exists in all areas. In sports, the star basket-ball player might score five 3-pointers in one game and none in the next or play 40 min-utes in one game and only 24 minmin-utes in the next. The weather varies greatly from day to day and even from hour to hour; grades on a test differ for students taking the same course with the same instructor; a person’s blood pressure, pulse, cholesterol level, and caloric intake will vary daily. In business, variation is seen in sales, advertising costs, the percentage of product complaints, the number of new customers, and so forth.

While two data sets could have the same mean, the individual observations in one set could vary more from the mean than do the observations in the second set. Consider the following two sets of sample data:

Sample A: 1 2 1 36

Sample B: 8 9 10 13

Although the mean is 10 for both samples, clearly the data in sample A are farther from 10 than are the data in sample B. We need descriptive numbers to measure this spread.

a. Calculate the mean percentage increase in dollar sales.

b. Calculate the median.

2.5 A sample of 12 senior executives found the following results for percentage of total compensation derived from bonus payments:

15.8 17.3 28.4 18.2 15.0 24.7 13.1 10.2 29.3 34.7 16.9 25.3 a. Compute the sample median.

b. Compute the sample mean.

2.6 During the last 3 years Consolidated Oil Company expanded its gasoline stations into convenience food stores (CFSs) in an attempt to increase total sales revenue. The daily sales (in hundreds of dollars) from a random sample of 10 weekdays from one of its stores are:

6 8 10 12 14 9 11 7 13 11

a. Find the mean, median and mode for this store.

b. Find the five-number summary.

2.7 A textile manufacturer obtained a sample of 50 bolts of cloth from a day’s output. Each bolt is carefully in-spected and the number of imperfections is recorded as follows:

Number of imperfections 0 1 2 3

Number of bolts 35 10 3 2

Find the mean, median, and mode for these sample data.

2.8 The ages of a sample of 12 students enrolled in an on-line macroeconomics course are as follows:

21 22 27 36 18 19 22 23 22 28 36 33

a. What is the mean age for this sample?

b. Find the median age.

c. What is the value of the modal age?

Application Exercises

2.9 A random sample of 156 grade point averages for students at one university is stored in the data file Grade Point Averages.

a. Compute the first and third quartiles.

b. Calculate the 30th percentile.

c. Calculate the 80th percentile.

2.10 A sample of 33 accounting students recorded the number of hours spent studying the course material during the week before the final exam. The data are stored in the data file Study.

a. Compute the sample mean.

b. Compute the sample median.

c. Comment on symmetry or skewness.

d. Find the five-number summary for this data.

2.11 The data file Sun contains the volumes for a random sample of 100 bottles (237 mL) of a new suntan lotion.

a. Find and interpret the mean volume.

b. Determine the median volume.

c. Are the data symmetric or skewed? Explain.

d. Find the five-number summary for this data.

2.2 Measures of Variability 69

Dalam dokumen Statistics for Business and Economics (Halaman 65-70)