Probability Theory
Grouped Data
R. Fajriyah1
1Dept of Statistics UII Jogjakarta, Indonesia
April 7, 2015
Outline
1 How to construct histogram
2 How to construct Boxplot
3 Mean, Median and Mode from grouped frequencies
4 Variance from grouped frequencies
Data
Here is the data on starting salaries of 1995 Psychology graduates. When constructing a histogram it is helpful to sort the observations. 08820 10800 12000 12500 13000 14000 15000 16000 16500 16600 16700 16900 16900 17000 17000 17600 17880 18000 18000 18000 18000 18000 18000 18000 18000 18000 18000 18500 18680 19100 20000 20000 20000 20000 20000 20300 20900 22000 23000 23000 23000 23000 23400 24000 25000 25000 26000 26000 27000 30000 30000 32500 37000 48000
Steps to construct histogram 1
1 Compute the range : Maximum-Minimum. Minimum = 8820 Maximum = 48000 Range = 39180
2 Decide how many intervals you would like. Use the square root of the number of observations (after rounding). Here, that is the square root of 54 = 7.34; round up and use 8.
3 Compute the interval width: Range/# Intervals = 39180/8 = 4897.5; round up to 5000.
4 Start the first interval at a convenient value below the minimum. Here the minimum is 8820, so begin at 7500 (other choices are equally acceptable).
5 The intervals then begin at 7500 and have a width of 5000.
So, the first interval runs from 7500 to 12500 and so on.
Steps to construct histogram 2
1 Construct the dot plot to see many intervals you would like.
2 Compute the interval width: Range/# Intervals ; round up to the same scale as the original data.
3 Start the first interval at a convenient value below the minimum.
Steps
1 Draw a number line that includes the range of observations. ComputeQ1,Q2, andQ3.
2 Above the line drawn in the first step, draw a box extending fromQ1toQ3. Inside the box, draw a line at the median (Q2).IQR =Q3−Q1
3 To identify the outliers, compute the lower and upper fences: fL=Q1−1.5∗(IQR)and the upper fence is fU =Q3+1.5∗(IQR)
4 Observations located beyond the fence are classified as outliers and are identified with an asterisk (*).
5 If there are no outliers, extend horizontal line segments (whiskers) from the ends of the box to the smallest and largest observations. If there are outliers, extend the whiskers to the smallest and largest non-outliers.
Data
9, 15, 11, 12, 3, 5, 10, 20, 14, 6, 8, 8, 12, 12, 18, 15, 6, 9, 18, 11 Please compute: Mean, Median, Mode
Table
Number of games Frequency
1-5 2
6-10 7
11-15 8
16-20 3
Computation I
For grouped data, we cannot find the exact Mean, Median and Mode, we can only give estimates.
To estimate the Mean use the midpoints of the class intervals.
To Estimate MedianMgd =L+
n 2−cfb
fm w, where
Lis the lower class boundary of the group containing the median
nis the total number of data
cfbis the cumulatif frequency of the groups before the median group
fmis the frequency of the median group wis the group width
To Estimate ModeMgd =L+ fm−fm−1 (fm−fm−1)+fm−fm+1
w, where
Computation II
Lis the lower class boundary of the modal group
fm−1is the frequency of the group before the modal group fmis the frequency of the modal group
fm+1is the frequency of the group after the modal group wis the group width
For continuous data use limits (rather than boundaries) for median and mode
Results
Our final result is
Estimated Mean: 11 Estimated Median: 11.625 Estimated Mode: 11.833..
Another example
Length Frequency
150 - 154 5
155 -159 2
160 - 164 6
165 - 169 8
170 - 174 9
175 - 179 11
180 - 184 6
185 - 189 3
Another example
Age Number 0 - 9 20 10 - 19 21 20 - 29 23 30 - 39 16 40 - 49 11 50 - 59 10 60 - 69 7 70 - 79 3 80 - 89 1
midpoints: 5, 15, 25, etc
Steps
s2=
P(xi−¯x)2fi
Pfi−1
or
s2=
Pxi2fi−(Pxi fi)2 P fi
fi
Thank you for your attention!
References I
Walpole, Myers, Myers, and Ye, 2011.Probability and Statistics for Engineers and Scientists, ninth edition, Prentice Hall, Boston