• Tidak ada hasil yang ditemukan

Mean, Median, and Mode

Dalam dokumen Statistics for Business and Economics (Halaman 61-64)

In Chapter 1 we introduced the terms parameter and statistic. A parameter refers to a spe-cific population characteristic; a statistic refers to a spespe-cific sample characteristic. Measures of central tendency are usually computed from sample data rather than from population data. One measure of central tendency that quickly comes to mind is the arithmetic mean, usually just called the mean, or average.

Arithmetic Mean

The arithmetic mean (or simply mean) of a set of data is the sum of the data values divided by the number of observations. If the data set is the entire population of data, then the population mean, m, is a parameter given by

m = a

N i=1xi

N = x1 + x2 + . . . + xN

N (2.1)

where N = population size and g means “the sum of.”

If the data set is from a sample, then the sample mean, x, is a statistic given by

x = a

n i=1xi

n (2.2)

where n = sample size. The mean is appropriate for numerical data.

Median

The median is the middle observation of a set of observations that are ar-ranged in increasing (or decreasing) order. If the sample size, n, is an odd number, the median is the middle observation. If the sample size, n, is an even number, the median is the average of the two middle observations. The median will be the number located in the

0.501n + 12th ordered position. (2.3)

To locate the median, we must arrange the data in either increasing or decreasing order.

Mode

The mode, if one exists, is the most frequently occurring value. A distribu-tion with one mode is called unimodal; with two modes, it is called bimodal;

and with more than two modes, the distribution is said to be multimodal. The mode is most commonly used with categorical data.

2.1 Measures of Central Tendency and Location 61

Example 2.1 Demand for Bottled Water (Measures of Central Tendency)

The demand for bottled water increases during the hurricane season in Florida. The number of 1-gallon bottles of water sold for a random sample of n = 12 hours in one store during hurricane season is:

60 84 65 67 75 72 80 85 63 82 70 75 Describe the central tendency of the data.

Solution The average or mean hourly number of 1-gallon bottles of water demanded is found as follows:

x = a

n i=1xi

n = 60 + 84 + . . . + 75

12 = 73.17

Next, we arrange the sales data from least to greatest sales:

60 63 65 67 70 72 75 75 80 82 84 85

and find that the median sales is located in the 0.5112 + 12 = 6.5th ordered position;

that is, the median number of 1-gallon bottles of water is midway between the 6th and 7th ordered data points: (72 + 75)>2 = 73.5 bottles. The mode is clearly 75 bottles.

The decision as to whether the mean, median, or mode is the appropriate measure to describe the central tendency of data is context specific. One factor that influences our choice is the type of data, categorical or numerical, as discussed in Chapter 1.

Categorical data are best described by the median or the mode, not the mean. If one person strongly agrees (coded 5) with a particular statement and another person strongly disagrees (coded 1), is the mean “no opinion”? An obvious use of median and mode is by clothing retailers considering inventory of shoes, shirts, and other such items that are available in various sizes. The size of items sold most often, the mode, is then the one in heaviest demand. Knowing that the mean shirt size of European men is 41.13 or that the average shoe size of American women is 8.24 is useless, but knowing that the modal shirt size is 40 or the modal shoe size is 7 is valuable for inventory decisions. However, the mode may not represent the true center of numerical data. For this reason, the mode is used less frequently than either the mean or the median in business applications.

Example 2.2 Percentage Change in Earnings per Share (Measures of Central Tendency)

Find the mean, median, and mode for a random sample of eight U.S. corporations with the following percentage changes in earnings per share in the current year compared with the previous year:

0% 0% 8.1% 13.6% 19.4% 20.7% 10.0% 14.2%

Solution The mean percentage change in earnings per share for this sample is

x = a

n i=1xi

n = 0 + 0 + 8.1 + 13.6 + . . . + 14.2

8 = 10.75 or 10.75%

and the median percentage change in earnings per share is 11.8%. The mode is 0%, since it occurs twice and the other percentages occur only once. But this modal percent-age rate does not represent the center of this sample data.

62 Chapter 2 Using Numerical Measures to Describe Data

Numerical data are usually best described by the mean. However, in addition to the type of data, another factor to consider is the presence of outliers—that is, observations that are unusually large or unusually small in comparison to the rest of the data. The me-dian is not affected by outliers, but the mean is. Whenever there are outliers in the data, we first need to look for possible causes. One cause could be simply an error in data entry.

The mean will be greater if unusually large outliers are present, and the mean will be less when the data contain outliers that are unusually small compared to the rest of the data.

Shape of a Distribution

In Chapter 1 we described graphically the shape of a distribution as symmetric or skewed by examining a histogram. Recall that if the center of the data divides a graph of the dis-tribution into two mirror images, so that the portion on one side of the middle is nearly identical to the portion on the other side, the distribution is said to be symmetric. Graphs without this shape are asymmetric.

We can also describe the shape of a distribution numerically by computing a measure of skewness. In nearly all situations, we determine this measure of skewness with Excel or a statistical software package such as SPSS, SAS, or Minitab. Skewness is positive if a distribution is skewed to the right, negative for distributions skewed to the left, and 0 for distributions, such as the bell-shaped distribution, that are mounded and symmetric about their mean. Manual computation of skewness is presented in the chapter appendix.

For continuous numerical unimodal data, the mean is usually less than the median in a skewed-left distribution and the mean is usually greater than the median in a skewed-right distribution. In a symmetric distribution the mean and median are equal. This relationship between the mean and the median may not be true for discrete numerical variables or for some continuous numerical variables (von Hippel 2005).

Example 2.3 Grade Point Averages (Skewed-Left Distribution)

Describe the shape of the distribution of grade point averages stored in the data file Grade Point Averages.

Solution The data file Grade Point Averages contains a random sample of 156 grade point averages for students at one university. In Chapter 1, we described the shape of this distribution graphically with a histogram. In Figure 1.16 we saw that the shape of the distribution appears to be skewed left. Figure 2.1 gives the descriptive measures of the data using Excel. The value of the mean is approximately 3.14 and is less than the median of 3.31. Also, the median is less than the mode of 3.42. The graph, the negative value of skewness, and the comparison of the mean and the median suggest that this is a skewed-left distribution.

Figure 2.1 Grade Point Average

Grade Point Average

Mean 3.141154

Standard Error 0.029144

Median 3.31

Mode 3.42

Standard Deviation 0.364006

Sample Variance 0.132501

Kurtosis 0.609585

Skewness -1.1685

Range 1.73

Minimum 2.12

Maximum 3.85

Sum 490.02

Count 156

2.1 Measures of Central Tendency and Location 63 The median is the preferred measure to describe the distribution of incomes in a city, state, or country. Distribution of incomes is often right skewed since incomes tend to con-tain a relatively small proportion of high values. A large proportion of the population has relatively modest incomes, but the incomes of, say, the highest 10% of all earners extend over a considerable range. As a result, the mean of such distributions is typically quite a bit higher than the median. The mean, which is inflated by the very wealthy, gives too optimistic a view of the economic well-being of the community. The median is then pre-ferred to the mean.

We do not intend to imply that the median should always be preferred to the mean when the population or sample is skewed. There are times when the mean would still be the preferred measure even if the distribution were skewed. Consider an insurance company that most likely faces a right-skewed distribution of claim sizes. If the company wants to know the most typical claim size, the median is preferred. But suppose the com-pany wants to know how much money needs to be budgeted to cover claims. Then, the mean is preferred.

In spite of its advantage in discounting extreme observations, the median is used less frequently than the mean. In Chapter 7 we discuss certain properties of the mean that make it more attractive than the median in many situations. The reason is that the theo-retical development of inferential procedures based on the mean, and measures related to it, is considerably more straightforward than the development of procedures based on the median.

Dalam dokumen Statistics for Business and Economics (Halaman 61-64)