Chapter03 - Numerical Descriptive Measures

(1)

Statistics for Managers

Using Microsoft® Excel

5th Edition

Chapter 3

(2)

Learning Objectives

In this chapter, you will learn:



_{To describe the properties of central tendency,}

variation and shape in numerical data



_{To calculate descriptive summary measures for a}

population



_{To construct and interpret a box-and-whisker plot}



_{To describe the covariance and coefficient of}

(3)

Summary Definitions



_The

_{central tendency}

_{is the extent to which}

all the data values group around a typical or

central value.



_The

_variation

_{is the amount of dispersion, or}

scattering, of values



_The

_shape

_{is the pattern of the distribution of}

(4)

Measures of Central Tendency

The Arithmetic Mean



_{The arithmetic mean (mean) is the most common}

measure of central tendency

For a sample of size n:

(5)

Measures of Central Tendency

The Arithmetic Mean



_{The most common measure of central tendency}



_{Mean = sum of values divided by the number of values}



_{Affected by extreme values (outliers)}

(6)

Measures of Central Tendency

The Median



_{In an ordered array, the median is the “middle” number (50%}

above, 50% below)



_{Not affected by extreme values}

0 1 2 3 4 5 6 7 8 9 10

Median = 4

0 1 2 3 4 5 6 7 8 9 10

(7)

Measures of Central Tendency

Locating the Median



_{The median of an ordered set of data is located at the}

ranked value.



_{If the number of values is odd, the median is the}

middle number.



_{If the number of values is even, the median is the}

average of the two middle numbers.



_{Note that is NOT the value of the median,}

only the position of the median in the ranked data.

2

1 

n

2

1 

(8)

Measures of Central Tendency

The Mode



_{Value that occurs most often}



_{Not affected by extreme values}



_{Used for either numerical or categorical data}



_{There may be no mode}



_{There may be several modes}

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 9

0 1 2 3 4 5 6

(9)

Measures of Central Tendency

Review Example

House Prices:

$2,000,000

500,000

300,000

100,000

Sum

3,000,000



_Mean:

_{($3,000,000/5)}

=

$600,000



_Median:

_{middle value of ranked}

data

=

$300,000



_Mode:

_{most frequent value}

(10)

Measures of Central Tendency

Which Measure to Choose?



_The

_mean

_{is generally used, unless extreme}

values (outliers) exist.



_Then

_median

_{is often used, since the median}

is not sensitive to extreme values. For

example, median home prices may be

(11)

Quartile Measures



_{Quartiles split the ranked data into 4 segments with}

an equal number of values per segment.

25%

Q

1

Q

2

Q

3



The first quartile, Q

₁

, is the value for which 25% of

the observations are smaller and 75% are larger



Q

₂

is the same as the median (50% are smaller, 50% are

larger)

(12)

Quartile Measures

Locating Quartiles

Find a quartile by determining the value in the appropriate

position in the ranked data, where

First quartile position:

Q

₁

= (n+1)/4 ranked value

Second quartile position:

Q

₂

= (n+1)/2

ranked value

Third quartile position:

Q

₃

= 3(n+1)/4 ranked value

(13)

Quartile Measures

Guidelines



_{Rule 1: If the result is a whole number, then the}

quartile is equal to that ranked value.



_{Rule 2: If the result is a fraction half (2.5, 3.5, etc),}

then the quartile is equal to the average of the

corresponding ranked values.



_{Rule 3: If the result is neither a whole number or a}

(14)

Quartile Measures

Locating the First Quartile



_{Example: Find the first quartile}

Sample Data in Ordered Array:

11 12 13 16 16 17 18 21 22

First, note that n = 9.

Q

1 = is in the

(9+1)/4 = 2.5 ranked value

of the ranked

data, so use the value half way between the 2

nd

and 3

rd

ranked values,

so

Q

1 = 12.5

(15)

Measures of Central Tendency

The Geometric Mean



_{Geometric mean}



Used to measure the rate of change of a variable over time



_{Geometric mean rate of return}



Measures the status of an investment over time

(16)

Measures of Central Tendency

The Geometric Mean

An investment of $100,000 declined to $50,000 at the end of

year one and rebounded to $100,000 at end of year two:

The overall two-year return is zero, since it started and ended

at the same level.

(17)

Measures of Central Tendency

The Geometric Mean

Use the 1-year returns to compute the arithmetic mean

and the geometric mean:

(18)

Measures of Central Tendency

Summary

Central Tendency

Arithmetic

Mean

Median

Mode

Geometric Mean

n

Middle value

in the ordered

array

Most

(19)

Measures of Variation



_{Variation measures the spread, or dispersion,}

of values in a data set.



Range



Interquartile Range



Variance



Standard Deviation

(20)

Measures of Variation

Range



_{Simplest measure of variation}



_{Difference between the largest and the smallest values:}

Range = X

_largest

– X

_smallest

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 13 - 1 = 12

(21)

Measures of Variation

Disadvantages of the Range



_{Ignores the way in which data are distributed}



_{Sensitive to outliers}

7 8 9 10 11 12

Range = 12 - 7 = 5

7 8 9 10 11 12

Range = 12 - 7 = 5

1 ,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,

5

1 ,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,

120 Range = 5 - 1 = 4

(22)

Measures of Variation

Interquartile Range



_{Problems caused by outliers can be eliminated by}

using the

interquartile range.



_{The IQR can eliminate some high and low values}

and calculate the range from the remaining values.



_{Interquartile range = 3rd quartile – 1st quartile}

(23)

Measures of Variation

Interquartile Range

Median

(Q

₂

)

X

maximum

X

minimum

Q

1 Q

3 Example:

25% 25% 25% 25%

12 30 45 57 70

(24)

Measures of Variation

Variance



_The

_variance

_{is the average (approximately) of}

squared deviations of values from the mean.

(25)

Measures of Variation

Standard Deviation



_{Most commonly used measure of variation}



_{Shows variation about the mean}



_{Has the same units as the original data}

Sample standard deviation:

(26)

Measures of Variation

Standard Deviation

Steps for Computing Standard Deviation

1. Compute the difference between each value and

the mean.

2. Square each difference.

3. Add the squared differences.

4. Divide this total by n-1 to get the sample variance.

5. Take the square root of the sample variance to get

(27)

Measures of Variation

(28)

Measures of Variation

Comparing Standard Deviation

Mean = 15.5

S =

3.338 11 12 13 14 15 16 17 18 19 20

21 11 12 13 14 15 16 17 18 19 20

21 Data B

Data A

Mean = 15.5

S =

0.926 11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5

(29)

Measures of Variation

Comparing Standard Deviation

Small standard deviation

(30)

Measures of Variation

Summary Characteristics



_{The more the data are spread out, the greater}

the range, interquartile range, variance, and

standard deviation.



_{The more the data are concentrated, the}

smaller the range, interquartile range,

variance, and standard deviation.



_{If the values are all the same (no variation),}

all these measures will be zero.

(31)

Coefficient of Variation



_{The coefficient of variation is the standard deviation}

divided by the mean, multiplied by 100.



_{It is always expressed as a percentage. (%)}



_{It shows variation relative to mean.}



_{The CV can be used to compare two or more sets of}

data measured in different units.

100%

X

S

CV

_



_



(32)

Coefficient of Variation



_{Stock A:}



Average price last year = $50



Standard deviation = $5



_{Stock B:}



Average price last year = $100



Standard deviation = $5

10%

Both stocks

have the

(33)

Locating Extreme Outliers

Z-Score



_{To compute the Z}

_-score

_{of a data value, subtract the}

mean and divide by the standard deviation.



_{The Z-score is the number of standard deviations a}

data value is from the mean.



_{A data value is considered an extreme outlier if its}

Z-score is less than -3.0 or greater than +3.0.



_{The larger the absolute value of the Z-score, the}

(34)

Locating Extreme Outliers

Z-Score

where X represents the data value

X is the sample mean

S is the sample standard deviation

S

X

(35)

Locating Extreme Outliers

Z-Score



_{Suppose the mean math SAT score is 490,}

with a standard deviation of 100.



_{Compute the z-score for a test score of 620.}

3 

_{A score of 620 is 1.3 standard deviations above the}

(36)

Shape of a Distribution



_{Describes how data are distributed}



_{Measures of shape}



Symmetric or skewed

Mean =

Median

Mean <

Median

<

Mean

Right-Skewed

(37)

General Descriptive Stats

Using Microsoft Excel

1. Select Tools.

2. Select Data Analysis.

3. Select Descriptive

(38)

General Descriptive Stats

Using Microsoft Excel

4. Enter the cell

range.

5. Check the

Summary

Statistics box.

(39)

General Descriptive Stats

Using Microsoft Excel

Microsoft Excel

descriptive statistics output,

using the house price data:

(40)

Numerical Descriptive

Measures for a Population



_{Descriptive statistics discussed previously described}

a

sample

, not the

population

.



_{Summary measures describing a population, called}

parameters

, are denoted with Greek letters.



_{Important population parameters are the population}

(41)

Population Mean



_The

_{population mean}

_{is the sum of the values in}

the population divided by the population size,

N

.

N

μ = population mean

N

= population size

X

_i

= i

th

value of the variable

X

(42)

Population Variance



_{The population variance is the average of squared}

deviations of values from the mean

μ = population mean

N

= population size

X

_i

= i

th

value of the variable

X

(43)

Population Standard Deviation



_The

_{population standard deviation}

_{is the most}

commonly used measure of variation.



_{It has the}

_{same units as the original data.}

N

X

N



i

1

2 i

μ)

(

σ

μ = population mean

N

= population size

X

_i

= i

th

value of the variable

X

(44)

Sample statistics versus

population parameters

Measure

Population

Parameter

Statistic

Sample

Mean

Variance

Standard

Deviation

X

2 S

S



2 

(45)

The Empirical Rule



_The

_{empirical rule}

_{approximates the variation of}

data in bell-shaped distributions.

Approximately 68% of the data in a bell-shaped

distribution lies within one standard deviation of the

mean, or

μ



1σ

μ

68%

(46)

The Empirical Rule

2σ

μ



3σ

μ





_{Approximately 95% of the data in a bell-shaped}

distribution lies within two standard deviation of the

mean, or



_{Approximately 99.7% of the data in a bell-shaped}

distribution lies within three standard deviation of the

mean, or

3σ

μ



99.7%

95%

(47)

Using the Empirical Rule



_{Suppose that the variable Math SAT scores is}

bell-shaped with a mean of 500 and a standard deviation

of 90. Then, :



68% of all test takers scored between 410 and

590 (500 +/- 90).



95% of all test takers scored between 320 and

680 (500 +/- 180).

(48)

Chebyshev Rule



_{Regardless of how the data are distributed}

(symmetric or skewed), at least (1 - 1/k

2 ) of the

values will fall within k standard deviations of

the mean (for k > 1)



_Examples:



_k=2

_{(1 - 1/2}

2 ) = 75% ……... (μ ± 2σ)



_k=3

_{(1 - 1/3}

2 ) = 89% ………. (μ ± 3σ)

(49)

Exploratory Data Analysis

The Five Number Summary



_{The five numbers that describe the spread of}

data are:



Minimum



First Quartile (Q

₁

)



Median (Q

₂

)



Third Quartile (Q

₃

)

(50)

Exploratory Data Analysis

The Box-and-Whisker Plot



_{The Box-and-Whisker Plot is a graphical display of}

the five number summary.

Minimum 1st Median 3rd Maximum

Quartile Quartile

(51)

Exploratory Data Analysis

The Box-and-Whisker Plot

Min Q

₁

Median Q

₃

Max



_{The Box and central line are centered between the}

endpoints if data are symmetric around the median.



_{A Box-and-Whisker plot can be shown in either}

(52)

Exploratory Data Analysis

The Box-and-Whisker Plot

Right-Skewed

Left-Skewed

Symmetric

(53)

Sample Covariance



_The

_{sample covariance}

_{measures the strength of the linear}

relationship between

two numerical variables

.



_The

_{sample covariance}

_:



_{The covariance is only concerned with the strength of the}

relationship.

(54)

Sample Covariance



_Covariance

_{between two random variables:}



_cov(

_X

_,

_Y

_{) > 0}

_X

_and

_Y

_{tend to move in the same}

direction



_cov(

_X

_,

_Y

_{) < 0}

_X

_and

_Y

_{tend to move in opposite}

directions

(55)

The Correlation Coefficient



_The

_{correlation coefficient}

_{measures the relative}

strength of the

linear

relationship between two

variables.



_{Sample coefficient of correlation:}

(56)

The Correlation Coefficient



_{Unit free}



_{Ranges between –1 and 1}



_{The closer to –1, the stronger the negative linear}

relationship



_{The closer to 1, the stronger the positive linear}

relationship



_{The closer to 0, the weaker any linear}

(57)

The Correlation Coefficient

Y

X

Y

X

Y

X

r = -1

r = -.6

r = 0

Y

X

Y

(58)

The Correlation Coefficient

Using Microsoft Excel

1. Select Tools/Data

Analysis

2. Choose Correlation from

the selection menu

(59)

The Correlation Coefficient

Using Microsoft Excel

3. Input data range and select

appropriate options

(60)

The Correlation Coefficient

Using Microsoft Excel



_{r = .733}



_{There is a relatively}

strong positive linear

relationship between test

score #1 and test score

#2.



_{Students who scored high}

on the first test tended to

score high on second test.

Scatter Plot of Test Scores

70

Test #1 Score

(61)

Pitfalls in Numerical

Descriptive Measures



_{Data analysis is}

_objective



Analysis should report the summary measures that best

meet the assumptions about the data set.



_{Data interpretation is}

_subjective



Interpretation should be done in fair, neutral and clear

(62)

Ethical Considerations

Numerical descriptive measures:



_{Should document both good and bad results}



_{Should be presented in a fair, objective and neutral}

manner



_{Should not use inappropriate summary measures to}

(63)

Chapter Summary



_{Described measures of central tendency}



Mean, median, mode, geometric mean



_{Discussed quartiles}



_{Described measures of variation}



_{Range, interquartile range, variance and standard}

deviation, coefficient of variation



_{Illustrated shape of distribution}



_{Symmetric, skewed, box-and-whisker plots}

(64)

Chapter Summary



_{Discussed covariance and correlation coefficient.}



_{Addressed pitfalls in numerical descriptive measures}