Statistik Bisnis
Week 7
Agenda
Time Activity
40 minutes Sampling
60 minutes Sampling Distribution of the Mean
50 minutes Sampling Distribution of the Proportion 50 minutes Exercise
Learning Objectives
In this chapter, you learn:
• To distinguish between different sampling methods
• The concept of the sampling distribution
• To compute probabilities related to the sample mean and the sample proportion
Why Sample?
• Selecting a sample is less time-consuming
than selecting every item in the population (census).
• Selecting a sample is less costly than selecting every item in the population.
• An analysis of a sample is less cumbersome and more practical than an analysis of the entire population.
A Sampling Process Begins With A
Sampling Frame
• The sampling frame is a listing of items that make up the population
• Frames are data sources such as population lists, directories, or maps
• Inaccurate or biased results can result if a frame excludes certain portions of the population
• Using different frames to generate data can lead to dissimilar conclusions
Types of Samples
Samples
Non-Probability Samples Judgment Convenience Probability Samples SimpleTypes of Samples:
Nonprobability Sample
• In a nonprobability sample, items included are chosen without regard to their probability of occurrence.
– In convenience sampling, items are selected based only on the fact that they are easy, inexpensive, or convenient to sample.
– In a judgment sample, you get the opinions of pre-selected experts in the subject matter.
Types of Samples:
Probability Sample
• In a probability sample, items in the sample are chosen on the basis of known
probabilities.
Probability Samples
Simple
Probability Sample:
Simple Random Sample
• Every individual or item from the frame has an
equal chance of being selected
• Selection may be with replacement (selected individual is returned to frame for possible reselection) or without replacement (selected individual isn’t returned to the frame).
• Samples obtained from table of random numbers or computer random number generators.
Selecting a Simple Random Sample Using A Random Number Table
Sampling Frame For Population With 850
Items
Item Name Item #
Bev R. 001 Ulan X. 002 . . . . . . . . Joann P. 849 Paul F. 850
Portion Of A Random Number Table
49280 88924 35779 00283 81163 07275 11100 02340 12860 74697 96644 89439 09893 23997 20048 49420 88872 08401
The First 5 Items in a simple random sample
Item # 492 Item # 808
Item # 892 -- does not exist so ignore Item # 435
Item # 779 Item # 002
Probability Sample:
Systematic Sample
• Decide on sample size: n
• Divide frame of N individuals into groups of k individuals: k=N/n
• Randomly select one individual from the 1st
group
• Select every kth individual thereafter
N = 40
n = 4
k = 10
First Group
Probability Sample:
Stratified Sample
• Divide population into two or more subgroups (called strata) according to some common characteristic
• A simple random sample is selected from each
subgroup, with sample sizes proportional to strata sizes
• Samples from subgroups are combined into one
• This is a common technique when sampling population of voters, stratifying across racial or socio-economic
lines. Population Divided into 4 strata
Probability Sample
Cluster Sample
• Population is divided into several “clusters,” each representative of the population
• A simple random sample of clusters is selected
• All items in the selected clusters can be used, or items can be chosen from a cluster using another probability
sampling technique
• A common application of cluster sampling involves election exit polls, where certain election districts are selected and sampled.
Population divided into
16 clusters. Randomly selected
Probability Sample:
Comparing Sampling Methods
• Simple random sample and Systematic sample
– Simple to use
– May not be a good representation of the population’s underlying characteristics
• Stratified sample
– Ensures representation of individuals across the entire population
• Cluster sample
– More cost effective
– Less efficient (need larger sample to acquire the same level of precision)
Evaluating Survey Worthiness
• What is the purpose of the survey?
• Is the survey based on a probability sample?
• Coverage error – appropriate frame?
• Nonresponse error – follow up
• Measurement error – good questions elicit good responses
Types of Survey Errors
• Coverage error or selection bias
– Exists if some groups are excluded from the frame and have no chance of being selected
• Non response error or bias
– People who do not respond may be different from those who do respond
• Sampling error
– Variation from sample to sample will always exist
• Measurement error
– Due to weaknesses in question design, respondent error, and interviewer’s effects on the respondent (“Hawthorne effect”)
Types of Survey Errors
• Coverage error
• Non response error
• Sampling error • Measurement error Excluded from frame Follow up on nonresponses Random differences from sample to sample Bad or leading question (continued)
Sampling Distributions
• A sampling distribution is a distribution of all of
the possible values of a sample statistic for a given size sample selected from a population.
• For example, suppose you sample 50 students
from your college regarding their mean GPA. If you obtained many different samples of 50, you will compute a different mean for each sample. We are interested in the distribution of all potential mean GPA we might calculate for any given sample of 50 students.
Developing a
Sampling Distribution
• Assume there is a population …
• Population size N=4 • Random variable, X, is age of individuals • Values of X: 18, 20, 22, 24 (years) A B C D
.3 .2 .1 0 18 20 22 24 A B C D Uniform Distribution P(x) x (continued)
Summary Measures for the Population Distribution:
Developing a
Sampling Distribution
21 4 24 22 20 18 N X μ i
2.236 N μ) (X σ 2 i
16 possible samples (sampling with replacement) 1st 2nd Observation Obs 18 20 22 24 18 18 19 20 21 20 19 20 21 22 22 20 21 22 23 24 21 22 23 24 (continued) 16 Sample Means 1st Obs 2nd Observation 18 20 22 24 18 18,18 18,20 18,22 18,24 20 20,18 20,20 20,22 20,24 22 22,18 22,20 22,22 22,24 24 24,18 24,20 24,22 24,24
Developing a Sampling Distribution
1st 2nd Observation Obs 18 20 22 24 18 18 19 20 21 20 19 20 21 22 22 20 21 22 23 24 21 22 23 24
Developing a Sampling Distribution
Sampling Distribution of All Sample Means
18 19 20 21 22 23 24 0 .1 .2 .3 P(X) X Sample Means Distribution 16 Sample Means _ (continued)
(no longer uniform)
Developing a Sampling Distribution
Summary Measures of this Sampling Distribution:
(continued) 21 16 24 19 19 18 N X μ i X
1.58 16 21) -(24 21) -(19 21) -(18 N ) μ X ( σ 2 2 2 2 X i X
Comparing the Population Distribution
to the Sample Means Distribution
18 19 20 21 22 23 24 0 .1 .2 .3 P(X) X 18 20 22 24 A B C D 0 .1 .2 .3 Population N = 4 P(X) X _ 1.58 σ 21 μX X 2.236 σ 21 μ
Sample Means Distribution n = 2
Sample Mean Sampling Distribution:
Standard Error of the Mean
• Different samples of the same size from the same population will yield different sample means
• A measure of the variability in the mean from sample
to sample is given by the Standard Error of the Mean:
(This assumes that sampling is with replacement or sampling is without replacement from an infinite population)
• Note that the standard error of the mean decreases as
the sample size increases
n σ σX
Sample Mean Sampling Distribution:
If the Population is Normal
• If a population is normally distributed with mean
μ and standard deviation σ, the sampling
distribution of is also normally distributed with and X
μ
μ
X
n
σ
σ
X
Z-value for Sampling Distribution
of the Mean
• Z-value for the sampling distribution of :
where: = sample mean
= population mean
= population standard deviation n = sample size X μ σ n σ μ) X ( σ ) μ X ( Z X X X
Normal Population Distribution
Normal Sampling Distribution
(has the same mean)
Sampling Distribution Properties
(i.e. is unbiased )
x
x
x
μ
μ
x
μ x μSampling Distribution Properties
– As n increases,
– decreases Larger sample size
Smaller sample size
x
(continued) x σ μExample
Oxford Cereals mengisi ribuan kotak sereal dalam satu shift (8 jam). Sebagai manajer operasional, anda bertanggung jawab untuk memonitor jumlah sereal yang diisi pada tiap kotak. Agar konsisten dengan label pada kotak, kotak-kotak tersebut harus rata-rata berisi 368 gram sereal. Karena kecepatan proses, berat isi sereal bervariasi dari kotak ke kotak, menyebabkan ada kotak yang isinya lebih sedikit dan ada kotak yang isinya lebih banyak. Jika proses tersebut tidak bekerja dengan benar, berat rata-rata dari kotak-kotak tersebut dapat terlalu bervariasi dari berat label 368 gram tersebut.
Example
Karena menimbang semua kotak akan terlalu menghabiskan waktu, biayanya besar dan tidak efisien, anda harus mengambil sampel. Untuk tiap sampel yang anda pilih, anda berencana untuk menimbang masing-masing kotak dan menghitung rata-rata sampel. Anda perlu menentukan peluang munculnya rata-rata sampel tersebut dari populasi yang rata-ratanya 368 grams. Berdasarkan analisis, anda harus memutuskan apakah anda perlu mempertahankan, menyesuaikan atau menutup proses pengisian sereal tersebut.
Example
a. Jika anda memilih 25 kotak secara acak tanpa dikembalikan dari ribuan kotak yang diisi pada sebuah shift, sampel ini jumlahnya jauh lebih sedikit dari 5% populasi. Diketahui bahwa simpangan baku proses pengisian sereal adalah 15 gram, hitunglah kesalahan baku rata-rata (standard error of the mean)?
Example
a. Jika anda memilih 25 kotak secara acak tanpa dikembalikan dari ribuan kotak yang diisi pada sebuah shift, sampel ini jumlahnya jauh lebih sedikit dari 5% populasi. Diketahui bahwa simpangan baku proses pengisian sereal adalah 15 gram, hitunglah kesalahan baku rata-rata (standard error of the mean)?
3
25
15
n
σ
σ
X
Example
b. Bagaimana kesalahan baku rata-rata (standard error of the mean) dipengaruhi oleh peningkatan ukuran sampel dari 25 hingga 100 kotak?
Example
b. Bagaimana kesalahan baku rata-rata (standard error of the mean) dipengaruhi oleh peningkatan ukuran sampel dari 25 hingga 100 kotak?
5
.
1
100
15
n
σ
σ
X
Example
c. Jika anda memilih 100 kotak, berapakah peluang rata-rata sampel dibawah 365 gram?
Example
c. Jika anda memilih 100 kotak, berapakah peluang rata-rata sampel dibawah 365 gram?
368 365 2 5 . 1 3 100 15 368 365 n σ μ X Z 0228 . 0 ) 2 ( ) 365 (x P z P
Example
d. Temukan selang yang berdistribusi simetris disekitar rata-rata populasi yang mencakup 95% rata-rata sampel, jika sampel yang diambil adalah 25 kotak.
Example
d. Temukan selang yang berdistribusi simetris disekitar rata-rata populasi yang mencakup 95% rata-rata sampel, jika sampel yang diambil adalah 25 kotak.
Dengan demikian: 368 L X XU 95% 025 . 0 ) (X XL P 975 . 0 ) (X XU P
Example
025 . 0 ) (X XL P P(X XU ) 0.975 96 . 1 L X Z 1.96 U X Z 25 15 68 3 96 . 1 - X L 3 . 96 . 1 368 L X 12 . 362 L X 25 15 68 3 96 . 1 X L 3 . 96 . 1 368 L X 88 . 373 L XExercise 1
Biro Sensus U.S. mengumumkan bahwa median dari harga jual rumah baru pada tahun 2009 adalah $215.600, dan rata-rata harga jualnya adalah $270.100. Asumsikan simpangan baku dari harga jual adalah $90.000.
a. Jika anda memilih sampel, n = 100, berapakah peluang rata-rata sampel akan kurang dari $300.000?
b. Jika anda memilih sampel, n = 100, berapakah peluang rata-rata sampel akan berada antara $275.000 dan $290.000?
Exercise 2
Waktu yang dihabiskan untuk menggunakan surel (e-mail) per sesi berdistribusi normal, dengan = 8 menit dan = 2 menit. Jika anda memilih sampel acak 25 sesi,
a. Berapakah peluang rata-rata sampel berada diantara 7.8 dan 8.2 menit?
b. Berapakah peluang rata-rata sampel berada diantara 7.5 dan 8 menit?
c. Jika anda memilih sampel acak 100 sesi, berapakah peluang rata-rata sampel berada diantara 7.8 dan 8.2 menit?
Exercise 3
Jumlah waktu yang dihabiskan oleh seorang teller bank untuk melayani tiap pelanggan memiliki rata-rata, = 3.10 menit dan simpangan baku, = 0.40 menit. Jika anda memilih sampel acak 16 pelanggan,
a. Berapakah peluang rata-rata waktu yang dihabiskan per pelanggan paling tidak 3 menit?
b. Terdapat 85% peluang bahwa rata-rata sampel akan kurang dari berapa menit?
c. Apakah asumsi yang harus ada untuk dapat menyelesaikan poin (a) dan (b)?
d. Jika anda memilih sampel acak 64 pelanggan, terdapat 85% peluang bahwa rata-rata sampel kurang dari berapa menit?
Exercise 1
a. Karena mean > median, distribusi populasi harga jual akan menceng ke kiri. Karena n=3 (n<30) maka distribusi sampelnya juga akan menceng ke kiri
b. Karena n=100 maka distribusi sampelnya akan mendekati normal dengan rata-rata $274.300 dan simpangan baku $9.000
c. 0.9996 d. 0.2796
Exercise 3
a. P( >3) = P(Z>-1.00) = 1.0 – 0.1587 = 0.8413 b. P(Z<1.04) = 0.85
= 3.10 + 1.04 (0.1) = 3.204
a. Distribusi populasi paling tidak harus simetris b. P(Z<1.04) = 0.85
= 3.10 + 1.04 (0.05) = 3.152
X
X X
n↑
Sampling from Non-Normally Distributed
Populations—Central Limit Theorem
As the sample size gets large enough… the sampling distribution becomes almost normal regardless of shape of population
x
Sample Mean Sampling Distribution:
If the Population is
not
Normal
How Large is Large Enough?
• For most distributions, n > 30 will give a sampling distribution that is nearly normal
• For fairly symmetric distributions, n > 15 will usually give a sampling distribution is almost normal
• For normal population distributions, the
sampling distribution of the mean is always normally distributed
Population Proportions
π = the proportion of the population having some characteristic
Sample proportion ( p ) provides an estimate of π:
• 0 ≤ p ≤ 1
• p is approximately distributed as a normal distribution when n is large
• (assuming sampling with replacement from a finite population or without replacement from an infinite population)
size sample interest of stic characteri the having sample the in items of number n X p
Sampling Distribution of p
• Approximated by a
normal distribution if:
where
and
(where π = population proportion)
Sampling Distribution P(ps) .3 .2 .1 0 0 . 2 .4 .6 8 1 p
π
pμ
n ) (1 σp π π5
)
n(1
5
n
and
π
π
Z-Value for Proportions
n ) (1 p σ p Z p Example
• Seorang manajer bank lokal menetapkan bahwa
40% dari pelanggannya memiliki lebih dari satu akun rekening.
• Jika anda memilih sampel acak 200 pelanggan,
karena n = 200(0.40) = 80 ≥ 5 dan n(1 – ) =
200(0.60) = 120 ≥ 5, maka ukuran sampel cukup besar untuk bisa diasumsikan mendekati
distribusi normal
• Hitunglah peluang proporsi sampel pelanggan
yang memiliki akun rekening lebih dari satu kurang dari 0.30.
Example
89 . 2 200 24 . 0 10 . 0 200 ) 60 . 0 )( 40 . 0 ( 40 . 0 30 . 0 ) 1 ( n p Z P(Z<-2.89) = 0.0019Jika proporsi populasi 0.40, hanya 0.19% dari sampel (n=200) akan memiliki proporsi sampel kurang dari 0.3
Exercise 4
Sebuah badan survey independen melakukan hitung cepat hasil pemilu. Misalkan terdapat dua kandidat pemilu, jika salah satu kandidat mendapat paling tidak 55% suara dari sampel, kandidat tersebut akan diprediksi sebagai pemenang pemilu. Jika anda memilih sampel acak 100 pemilih, berapakah peluang seorang kandidat akan diprediksi menjadi pemenang jika
a. Persentase populasi pemilihnya sebesar 50.1%? b. Persentase populasi pemilihnya sebesar 60%?
c. Persentase populasi pemilihnya sebesar 49% (dan dia sebenarnya kalah pemilu)?
d. Jika ukuran sampelnya dinaikan menjadi 400,
Exercise 5
Pada survei terbaru pada pekerja wanita penuh waktu usia 22 hingga 35 tahun, 46% mengatakan bahwa lebih baik gaji mereka dikurangi demi mendapatkan lebih banyak waktu luang. (Data didapatkan dari “I’d Rather Give Up,” USA Today, 4 Maret 2010, hal. 1B.) Misalkan anda memilih sampel 100 pekerja wanita penuh waktu berusia 22 hingga 35 tahun.
a. Berapakah peluang bahwa didalam sampel, kurang dari 50% sampel lebih memilih gaji mereka dikurangi demi waktu luang yang lebih banyak?
b. Berapakah peluang bahwa didalam sampel, terdapat di antara 40% dan 50% sampel lebih memilih gaji mereka dikurangi demi waktu luang yang lebih banyak?
c. Berapakah peluang bahwa didalam sampel, lebih dari 40% sampel lebih memilih gaji mereka dikurangi demi waktu luang yang lebih banyak? d. Jika jumlah sampel menjadi 400 orang, bagaimanakah perubahan