MAT 254- Probability and Statistics
Spring 2015
LECTURE 2
DATA COLLECTION AND
PRESENTATION (Charts, graphs, etc)
25 30 35 40 45 50 55 60 65 70 75 80 85 90 9 8 7 6 5 4 3 2 1 0 f midpoint 29.5 - UCB
27- midpoint
Types of Data
Quantitative data are measurements that are recorded
on a naturally occurring numerical scale.
Exp. Height in cm. ,weight in kg. ,blood pressure
(mm/Hg)
Qualitative data are measurements that cannot be
measured on a natural numerical scale; they can only be
classified into one of a group of categories.
Exp . Sex, tall or short, blood group
SAMPLING TECHNIQUES
3/11/2015 [email protected]
Sampling techniques
are used to economize (on
the part of the researcher)
the following:
Time
Effort
POPULATION
SAMPLE
Sampling techniques
are classified
into:
•
probability sampling
•
non-probability sampling
PROBABILITY SAMPLING
It is a method of selecting a sample (n)
from a universe (N) such that each
member of the population has an equal
chance of being included in the sample
and all possible combinations of size (n)
have an equal chance of being chosen
as the sample.
NON-PROBABILTY SAMPLING
It is a method wherein the
manner of selecting a sample
(n) from a universe (N)
depends on some inclusion
rule as specified by the
researcher.
PROBABILITY SAMPLING
TECHNIQUES
3/11/2015 [email protected]
•
Simple Random (Lottery) Sampling
•
Systematic Sampling
•
Stratified Sampling
•
Cluster or Area Sampling
SRS or Lottery Sampling
It is done by simply assigning number
to each member of the population in a
piece of paper, placing them in a
container and drawing the desired
number of samples from it.
This applies to a
not-so-large population
when listing is still
possible.
SYSTEMATIC SAMPLING
3/11/2015 [email protected]
Ex: N = 100, n = 25
N/n = 100/25
= 4
•
This means every
4
th
element in a series should be
taken as a sample.
This method still
uses the concept of
random sampling
and involves the
selection of the n
thelement of a series
representing the
Note:
All numbers in yellow color are the desired
samples.
3/11/2015 [email protected]
1 11 21 31 41 51 61 71 81 91 2 12 22 32 42 52 62 72 82 92
3 13 23 33 43 53 63 73 83 93 4 14 24 34 44 54 64 74 84 94 5 15 25 35 45 55 65 75 85 95 6 16 26 36 46 56 66 76 86 96
STRATIFIED SAMPLING
3/11/2015 [email protected]
MULTI-STAGE SAMPLING
3/11/2015 [email protected]
Ex: Region
–
1
stlevel
Province
–
2
ndLevel
City
–
3
rdLevel
Barangay
–
4
thLevel
A technique that considers
MULTI-STAGE SAMPLING
3/11/2015 [email protected]
Regions
Divisions
School Districts
Schools
Schools
School Districts
Schools
Schools
Divisions
School Districts
School Districts
Schools
NON-PROBABILITY SAMPLING
TECHNIQUES
3/11/2015 [email protected]
•
Purposive Sampling, based on a criteria
or qualifications given by the researcher.
Those who will satisfy the criteria are
included.
•
Quota Sampling
It is quick and cheap
since the interviewer is given a definite
Presentation of Data
Objectives: At the end of the lesson, the
students should be able to:
1. Prepare a stem-and-leaf plot
2. Describe data in textual form
3. Construct frequency distribution table
4. Create graphs
5. Read and interpret graphs and tables
Presentation of Data
Textual
Method
•
Rearrangeme
nt from
lowest to
highest
•
Stem-and-leaf
plot
Tabular
Method
•
Frequency
distribution
table (FDT)
•
Relative FDT
•
Cumulative
FDT
•
Contingency
Table
Graphical
Method
•
Bar Chart
•
Histogram
•
Frequency
Polygon
•
Pie Chart
•
Less than,
greater than
Ogive
Textual Presentation of Data
Data can be presented using paragraphs or
sentences. It involves enumerating important
characteristics, emphasizing significant figures
and identifying important features of data.
Solution
First, arrange the data in order for you to identify
the important characteristics. This can be done in
two ways: rearranging from lowest to highest or
using the stem-and-leaf plot.
Below is the rearrangement of data from lowest to
highest:
9
23
28
35
38
43
45
48
17
24
29
37
39
43
45
49
18
25
34
38
39
44
46
50
20
26
34
38
39
44
46
50
23
27
35
38
42
45
46
50
With the rearranged data, pertinent data
worth mentioning can be easily
recognized. The following is one way of
presenting data in textual form.
In the Statistics class of 40 students, 3 obtained
the perfect score of 50. Sixteen students got a score
of 40 and above, while only 3 got 19 and below.
Generally, the students performed well in the test
with 23 or 70% getting a passing score of 38 and
above.
Another way of rearranging data is by
making use of the stem-and-leaf plot.
Stem-and-leaf Plot
is a table which sorts
data according to a certain pattern. It involves
separating a number into two parts. In a
two-digit number, the stem consists of the first two-digit,
and the leaf consists of the second digit. While in
a three-digit number, the stem consists of the
first two digits, and the leaf consists of the last
digit. In a one-digit number, the stem is zero.
MCPegollo/Basic Statistics/SRSTHS
Below is the stem-and-leaf plot of the
ungrouped data given in the example.
Stem Leaves
0 9
1 7,8
2 0,3,3,4,5,6,7,8,9
3 4,4,5,5,7,8,8,8,8,9,9,9
4 2,3,3,4,4,5,5,5,6,6,6,8,9
5 0,0,0
Utilizing the stem-and-leaf plot, we can readily see the
order of the data. Thus, we can say that the top ten got
scores 50, 50, 50, 49, 48, 46, 46, 46,45, and 45 and the ten
lowest scores are 9, 17, 18, 20, 23,23,24,25,26, and 27.
Tabular Presentation of Data
MCPegollo/Basic Statistics/SRSTHS
http://www.sws.org.ph/youth.htm
Table Number Table Title
Column Header
Row Classifier Body
Source Note
Sample of a Frequency Distribution
Table for Grouped Data
Table 1.2
Frequency Distribution Table for the Quiz Scores of 50 Students
in Geometry
MCPegollo/Basic Statistics/SRSTHS
0 - 2 1 3 - 5 2 6 - 8 13 9 - 11 15 12 - 14 19
Lower Class Limits
Lower Class
Limits
0 - 2 1 3 - 5 2 6 - 8 13 9 - 11 15
12 - 14 19
Rating Frequency
Upper Class Limits
0 - 2 1 3 - 5 2 6 - 8 13 9 - 11 15 12 - 14 19
Rating Frequency
Upper Class Limits
Upper Class
Limits
0 - 2 1 3 - 5 2 6 - 8 13 9 - 11 15 12 - 14 19
Rating Frequency
Class Boundaries
Class
Boundaries
0 - 2
20
3 - 5
14
6 - 8
15
9 - 11
2
12 - 14
1
Rating Frequency
- 0.5 2.5 5.5 8.5 11.5 14.5
midpoints
of
the classes
Class Midpoints
Class
Midpoints
0 - 1 2 20 3 - 4 5 14 6 - 7 8 15 9 - 10 11 2 12 - 13 14 1
Class Width
Class Width
3 0 - 2 20
3 3 - 5 14
3 6 - 8 15
3 9 - 11 2
3 12 - 14 1
Rating Frequency
3. Select for the first lower limit either the lowest score or a convenient value slightly less than the lowest score.
4. Add the class width to the starting point to get the second lower class limit, add the width to the second lower limit to get the
third, and so on.
5. List the lower class limits in a vertical column and enter the upper class limits.
6. Represent each score by a tally mark in the appropriate class.
Total tally marks to find the total frequency for each class.
Constructing A Frequency Table
1. Decide on the number of classes .
2. Determine the class width by dividing the range by the number of classes (range = highest score - lowest score) and round up.
class width
round up ofRelative Frequency Table
relative frequency =
class frequency
Relative Frequency Table
0 - 2 20 3 - 5 14 6 - 8 15 9 - 11 2 12 - 14 1
Rating Frequency
0 - 2 38.5% 3 - 5 26.9% 6 - 8 28.8% 9 - 11 3.8% 12 - 14 1.9%
Rating
Relative
Frequency
20/52 = 38.5%
14/52 = 26.9%
etc.
Table 2-5
Cumulative Frequency Table
Cumulative
Frequencies
0 - 2 20 20 52 3 – 5 14 34 32 6 – 8 15 49 18 9 – 11 2 51 3 12 – 14 1 52 1
Rating
<cf
Table 2-6
Ahmed-Refat-ZU
Graphical Presentation
The diagram should be:
•
Simple
•
Easy to understand
•
Save a lot of words
•
Self explanatory
•
Has a clear title indicating its content
•
Fully labeled
Ahmed-Refat-ZU
Graphical Presentation
The Line diagram
37
Example:
38
Cumulative Frequency Graph
A cumulative frequency graph or ogive, is a line graph that
displays the cumulative frequency of each class at its upper class
boundary.
17.5
Age (in years)
Ages of Students
24 18 12 6 30 0 C u m u la ti ve f re q u e n cy (p o rt io n o f st u d e n ts )
25.5 33.5 41.5 49.5 57.5
The graph ends at the upper
Ahmed-Refat-ZU
Graphical Presentation
Bar chart
•
It is used for presenting
discrete or qualitative
data.
•
It represent the measured value (or %) by separated
rectangles
of constant width and its lengths
proportional to the frequency
•
Type:
•
>>>Simple ,
•
>>> Multiple,
Bar diagram
Ahmed-Refat-ZU
Graphical Presentation
Pie diagram:
Percentage of causes of child death in Egypt
diarrhea 50% chest infection
30%
congenital 10%
Ahmed-Refat-ZU
Graphical Presentation
Histogram:
•
It is very similar to the bar chart with the difference
that the rectangles or bars are
adherent (without
gaps).
•
It is used for presenting class frequency table
(continuous data).
Ahmed-Refat-ZU
Graphical Presentation
Frequency Polygon
•
Derived from a histogram by connecting the
mid
points of the tops of the rectangles in the histogram.
•
The line connecting the centers of histogram
rectangles is called frequency polygon.
•
We can draw polygon without rectangles so we will
get simpler form of line graph.
•
A special type of frequency polygon is the
Normal
44
The Frequency Polygon
• Examples:
Age in Years Sex Mid-point of interval
Males Females
20-30 3 2 (20+30)/2=25 30-40 5 5 (30+40)/2=35 40-50 7 8 (40+50)/2=45 50-60 4 3 (50+60)/2=55 60-70 2 4 (60+70)/2=65
The Frequency Polygon
• Example:
Figure : Distribution of a group of subjects by age and sex
Ahmed-Refat-ZU
Graphical Presentation
Scatter diagram
Ahmed-Refat-ZU
This scatter diagram showed a positive or direct
relationship between NAG and
albumin/creatinine among diabetic patients
Correlation between NAG and albumin creatinine ratio in group of early diabetics
0 5 10 15 20 25 30 35
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
albumin creatinine ratio
N
A
Graphical Presentation
Box Plots
Box Plots are another way of representing all the same information that can be found on a Cumulative Frequency graph.
!
Lowest value
Upper Quartile
Highest value Lower Quartile
Median
Inter-Quartile Range
Range