Statistika 2

(1)

STATISTICS 2

Hotniar Siringoringo

Lembaga Penelitian

Kampus D, gd 4 lt. 1

http://staffsite.gunadarma.ac.id/hotniars

[email protected]

(2)

Statistics 1

•

Role of Statistics on data analyses

•

Statistics Terms

•

Frequency Distribution

•

Central Tendency and Measure of

Variation

(3)

Statistics 2

•

Sampling Distribution

•

Confidence Interval

•

Hypothesis Testing

•

Statistical Inference Based on Two

Samples

•

Simple Linear Regression

•

Multiple Regression

(4)

Books

1. Bowerman, Bruce and O’Connell,

Richard T. 1997. Applied Statistics:

Improving Business Processes. Irwin

Professional Publishing, USA

(5)

SAMPLING

(6)

Populasi dan Sampel

•

Populasi : totalitas dari semua

objek/ individu yg memiliki

karakteristik tertentu, jelas dan

lengkap yang akan diteliti

•

Sampel : bagian dari populasi

yang diambil melalui cara-cara

tertentu yg juga memiliki

karakteristik tertentu, jelas dan

lengkap yg dianggap bisa

(7)

Lambang Parameter dan

Statistik

Besaran Lambang Parameter

(Populasi)

Lambang Statistik (Sampel) Rata-rata μ x bar

Varians σ2 S2

Simapangan

baku σ

S

Jumlah

Observasi N n

Proporsi P p

(8)

 _{Distribusi Sampling merupakan distribusi}

teoritis (distribusi kemungkinan) dari semua hasil sampel yang mungkin, dengan ukuran sampel yang tetap N, pada statistik

(karakteristik sampel) yang digeneralisasikan ke populasi.

 _{Distribusi Sampling memungkinkan untuk}

memperkirakan probabilitas hasil sampel tertentu untuk statististik tersebut

(9)

Distribusi Sampling

_{Distribusi dari besaran-besaran}

statistik spt rata-rata, simpangan

baku, proporsi yg mungkin muncul dr sampel-sampel

_{Secara umum informasi yang perlu untuk}

mencirikan suatu distribusi secara cukup akan mencakup:

_{Ukuran Kecenderungan Memusat (mean,}

median, modus)

_{Ukuran Persebaran Data (range, standar}

deviasi)

_{Bentuk distribusi}

_{Strategi Umum penerapan statistik inferensial}

(10)

Jenis-jenis Distribusi Sampling

1. Distribusi Sampling Rata-rata

2. Distribusi Sampling Proporsi

3. Distribusi Sampling yang Lain

•

Distribusi Sampling Mean : Distribusi

sampling dari mean-mean sampel adalah

distribusi

mean-mean aritmetika dari

(11)

• _{Distribusi sampling proporsi : Distribusi}_{sampling dari}

proporsi adalah distribusi proporsi-proporsi dari seluruh sampel acak berukuran n yang mungkin yang dipilih dari sebuah populasi

• _{Distribusi Sampling perbedaan/penjumlahan :} – _{Terdapat 2 populasi}

– _{Untuk setiap sampel berukuran}_{n1 dari populasi}

pertama dihitung sebuah statistik S1 dan

menghasilkan sebuah distribusi sampling dari statistik

S1 yang memiliki mean μs1 dan deviasi standard σs1

– _{Dari populasi kedua, untuk setiap sampel berukuran}

(12)

Distribusi Sampling Rata-rata

a. Pemilihan sampel dari populasi

terbatas

1. Utk pengambilan sampel tanpa pengembalian atau n/N > 5%

2. Utk pengambilan sampel dgn pengembalian atau n/N ≤ 5%

1   



N

n N

n

x x

 

 

n

x x

 

 

(13)

Sebuah toko memiliki 5 Karyawan A,B,C,D,E dengan upah perjam: 2,3,3,4,5. Jika upah yang diperoleh dianggap sebagai populasi, tentukan: (tanpa Pengembalian)

a. Rata-rata sampel 2 unsur

b. Rata-rata dari rata-rata sampel

c. Simpangan baku dari rata sampel

Banyaknya sampel yang mungkin adalah

= 10 buah

2

!

(

5

2

)!

!

5

5 2





(14)

b. Rata-rata dari sampel

µ = 2+3+3+4+5 = 3.4

5

c. Simpangan baku

(15)

Distribusi Sampling mean



Teorema Sampling populasi

terdistribusi normal:

Bila sampel-sampel random diulang-ulang dengan ukuran n diambil dari suatu populasi terdistribusi normal dengan rata-rata μ dan

standar deviasi σ, maka distribusi sampling rata-rata sampel akan normal dengan rata-rata-rata-rata μ dan standar deviasi

n

X



(16)

Distribusi Sampling

(17)

Distribusi Sampling

(18)

b.

Pemilihan sampel dari populasi yg tidak terbatas

c. Daftar distribusi normal untuk distribusi sampling rata-rata

1. Utk populasi terbatas atau n/N > 5%

2. Utk populasi tdk terbatas atau n/N ≤ 5%

n dan _x

x

 



  

1

   

N

n N

n X Z





n X Z

 

(19)

SOAL

•

Upah per jam pekerja memiliki rata-rata

Rp.500,- perjam dan simpangan baku

Rp.60,-. Berapa probabilitas bahwa upah

rata-rata 50 pekerja yang merupakan

sampel random akan berada diantara

510,- dan 520,- ?

Diket:

(20)

X = 510 maka Z = 1.18

X = 520 maka Z = 2.36

P (1.18 < Z < 2,36) = P (0<Z<2,36) –

P(0<Z<1.18)

(21)

Distribusi Sampling

Proporsi

• _Distribusi_{sampling dari proporsi adalah}

distribusi proporsi-proporsi dari seluruh sampel acak berukuran n yang mungkin yang dipilih dari sebuah populasi

• _{proporsi kesuksesan desa yang mendapat}

bantuan program

• _{Perbedaan persepsi penduduk miskin dan kaya}

(22)

Distribusi Sampling Proporsi

•

Proporsi dr populasi dinyatakan

•

Proporsi utk sampel dinyatakan

1. Utk pengambilan sampel dgn pengembalian atau jika ukuran

populasi besar dibandingkan dgn ukuran sampel yi n/N ≤ 5%

N X P 

n X p 

n P P

P

p p

) 1

(  



(23)

2. Utk pengambilan sampel tanpa

pengembalian atau jika ukuran

populasi kecil dibandingkan dgn

ukuran sampel yi n/N >

5%

1 )

1 (

  

 

N

n N

n P P

P

p p

(24)

Sebuah toko memiliki 6 karyawan, misalkan A,B,C untuk yang senang membaca dan X,Y,Z untuk yang tidak senang membaca. Jika dari 6 karyawan tersebut diambil sampel yang beranggotakan 4 karyawan (pengambilan sampel tanpa pengembalian), tentukan:

a. Banyaknya sampel yang mungkin diambil b. Distribusi sampling proporsinya

c. Rata-rata dan simpangan baku sampling proporsinya Jwb:

(25)

Distribusi Sampling yang

Lain

a. Distribusi sampling beda dua

rata-rata

1. Rata-rata

2. Simpangan baku

3. Untuk n1 dan n2 dgn n1, n2 > 30

2 1

2

1







 x x 2 2 2 1 2 1 2

1 x n n

x    _   2 1

)

(

)

(

₁ ₂ ₁ ₂

(26)

• _{Misalkan rata-rata pendapatan manajer dan karyawan,}

Rp. 50.000,- dengan simpangan baku Rp. 15.000,- dan 12.000,- dengan simpangan baku 1.000,-. Jika diambil sampel random manajer sebanyak 40 orang dan

karyawan sebanyak 150 orang. Tentukan:

a. Beda rata-rata pendapatan sampel

b. Simpangan baku rata-rata pendapatan sampel

c. Probabilitas beda rata-rata pendapatan manajer dan karyawan biasa lebih dari 35.000,-

Diket:

µ = 50.000 µ = 12.000 Simp: 15.000 Simp b : 1.000

(27)

b. Distribusi sampling beda dua

proporsi

1. Rata-rata

2. Simpangan baku

3. Untuk n1 dan n2 dgn n1, n2 ≥ 30

2 1

2

1 P

P

P 







2 2 2 1 1 1 2 1 ) 1 ( ) 1 ( n P P n P P P P       2 2 1 1 2 1 2 1 2 1 2

1 ) ( )

(28)

Metode Sampling

•

Cara pengumpulan data yg

hanya mengambil sebagian

elemen populasi

•

Alasan dipilihnya metode ini :

1. Objek penelitian yg homogen

2. Objek penelitian yg mudah

rusak

3. Penghematan biaya dan waktu

4. Masalah ketelitian

(29)

Teknik pengambilan sampel dibagi atas 2 kelompok besar, yaitu :

1. Probability Sampling (Random Sample)

Dengan teknik ini, peneliti dapat menentukan derajat kepercayaan terhadap sebuah sampel. Selain itu, perbedaan dalam menafsirkan

parameter populasi dengan statistik sampel dapat diperkirakan.

2. Non Probability Sampling (Non Random Sample) Sedangkan pada non probability

sampel, penyimpangan nilai sampel terhada populasinya tidak mungkin diukur.

Pengukuran penyimpangan ini merupakan salah satu bentuk pengujian statistik.

(30)

Random sampling:

1. Pengambilan sampel acak sederhana

(simple random sampling)

2. Pengambilan sampel acak stratifikasi

(stratified random sampling)

3. Pengambilan sampel acak bertahap

(multistage random sampling)

4. Pengambilan sampel acak sistematis

(systematic random sampling)

(31)

Pengambilan sampel tanpa acak

1. Pengambilan sampel seadanya

(accidental sampling)

2. Pengambilan sampel berjatah (quota

sampling)

(32)

Sampling Acak Sederhana

1. Bentuk kerangka sampel

2. Pilih sampel menggunakan pengundian

atau dengan menggunakan tabel

(33)

• _{Kerangka Sampel (Sampling Frame) → Suatu}

daftar unit-unit dari sebuah populasi yang sampelnya akan diambil.

• _{Unit Sampel (Sampling Unit) → Sebuah unit}

terkecil dari sebuah populasi yang akan diambil sampelnya.

• _{Rancangan Sampel → meliputi bagaimana cara}

mengambil sampel dan menentukan besar sampelnya.

• _Random.

Cara pengambilan sampel dimana setiap unit dalam populasi mempunyai kesempatan

(34)

1. Pengambilan sampel acak sederhana

(simple random sampling)

→rancangan yang paling sederhana dan

mudah, tetapi membutuhkan persyaratan

tertentu, yaitu populasi yang benar-benar

atau mendekati homogen dan sudah

(35)

• _Keuntungan

1. Ketepatan yang tinggi dan setiap unit

sampel mempunyai probabilitas yang sama

untuk diambil sebagai sampel

2. Sampling error dapat ditentukan secara kuantitatif

• _Kerugian

jika tidak terdapat unit dasar (sampling frame) dan populasi yang tersebar atau populasi yang sangat luas dengan prasarana yang tidak

(36)

•

Teknik pelaksanaan

1. dibuat daftar semua unit sampel, disusun

dan diberi nomor secara berurutan

2. Semua unit sampel ditulis pada gulungan

kertas atau kepingan dengan bentuk dan

ukuran serta warna yang sama kemudian

dimasukan kedalam kotak dan diaduk

sampai rata

3. Gulungan kertas atau keping diambil

sesuai dengan jumlah sampel yang

(37)

Sampling Acak Sistematis

1. Bentuk kerangka sampel

2. Tentukan jarak :

3. Pilih sampel sampel yang pertama

dengan cara pengundian atau tabel

acak=n1

4. : sampai semua sampel sudah

terpilih

a el

jumlahsamp lasi jumlahpopu



a n

(38)

2. Pengambilan sampel acak stratifkasi

(Stratified Random Sampling)

(39)

•

Keuntungan:

→ketapatan yang lebih tinggi dengan

simpangan baku yang lebih kecil

dibandingkan dengan pengambilan

sampel secara acak sederhana.

•

Kerugian:

- Harus mengetahui kondisi populasi

yang sering tidak diketahui

(40)

Tahap-tahap rancangan stratifikasi:

1. Bagilah (kelompokkan) subjek populasi

dalam beberapa stratum beranggotakan

subjek yang sama atau hampir sama

karakteristisknya

2. Buatlah daftar subjek dari stratum

(sub-populasi)

3. Pilihlah subjek sampel dari

masing-masing sub-populasi dengan teknik

random murni atau teknik

(41)

3. Pengambilan sampel acak bertahap (multistage

random sampling)

→Teknik pemilihan sampel dengan cara

menggabungkan dua atau lebih rancangan sampel sekaligus

• _{Keuntungan :}

1. Varians yang relatif kecil untuk biaya setiap unit

2. Kontrol terhadap kesalahan tak sampling menjadi lebih baik

3. Penelitian ulang membutuhkan biaya yang relatif kecil

(42)

•

Kerugian:

→ Pada Primary Sampling Unit

(PSU)besar,penggambaran terhadap

populasi kurang baik, sedangkan

dengan PSU kecil hanya dapat

(43)

Tahap-tahap pengambilan sampel acak

bertahap

1. Lakukan tahap-tahap rancangan klaster

(pembagian daerah menjadi klaster,

penetapan jumlah klaster dan

randomisasi klaster)

2. Buatlah daftar subjek dari semua klaster

yang terpilih sebagai klaster sampel

3. Pilihlah subjek sampel dari daftar subjek

tersebut, sebanyak yang dikehendaki

(44)

4. Pengambilan sampel acak sistematis

(sistematic random sampling)

→apabila pengmbilan sampel acak

dilakukan secara berurutan dengan

interval tertentu

→besarnya interval (i)dapat ditentukan

dengan membagi populasi (N) dengan

jumlah sampel yang diinginkan (n) atau

(45)

Keuntungan:

1. Sampling frame tidak mutlak dibutuhkan karena daftar responden dapat dilakukan bersamaan dengan pengambilan sampel 2. Cara ini relatif mudah dan dapat dilakukan

oleh petugas lapangan

3. Cara ini sangat praktis bila populasi dalam bentuk kartu

4. Variasi akan lebih kecil dibandingkan dengan cara lain

(46)

Kerugian:

1. Setiap unit sampel tidak mempunyai

peluang yang sama untuk diambil

sebagai sampel

2. Bila terdapat suatu kecenderungan

(47)

5. Sampel Random Berkelompok (Cluster

Sampling)

→Suatu Klaster (cluster) adalah suatu

kelompok dari subjek atau kesatuan

analisis yang berdektan satu dengan yang

lain secara geometrik.

Keuntungan dari cara ini adalah tidak

memerlukan daftar populasi sehingga

tidak ada biaya transportasi.

(48)

3 cara dalam pengambilan sampel yang dilakukan tidak secara random:

a. Sampel Dengan Maksud (Purposive Samping). Pengambilan sampel dilakukan dengan melihat unsur-unsur yang dikehendaki dari data yang sudah ada.

b. Sampel Tanpa Sengaja (Accidental Sampling). Sampel diambil berdasarkan keperluan saja. Tidak ada perencanaan ataupun pertimbangan khusus di dalamnya.Sampel diambil atas dasar seandainya saja, tanpa direncanakan lebih

dahulu.

c. Sampel Berjatah (Quota Sampling).

(49)

Teknik Penentuan Jumlah Sampel

1. Pengambilan sampel dengan pengembalian →Nn

Contoh:

untuk populasi berukuran 4 dengan

anggota-anggotanya A, B, C, D dan sampel yang diambil

berukuran 2 maka banyaknya sampel yang mungkin dapat diambil adalah 42₌₁₆

2. Pengambilan sampel tanpa pengembalian →

C

N =

n !( )!

!

n N n

N

(50)

Contoh:

Untuk populasi berukuran 5 dengan anggota-anggotanya A, B, C, D, E dan sampel yang diambil berukuran 2 maka banyaknya sampel yang mungkin dapat diambil adalah

10

)!

2

5

(!

2

!

5

2







(51)

(52)

Contoh Soal

1. Bola lampu produksi pabrik PHILLIPS memiliki umur rata-rata 1.600 jam dengan simpangan baku 225 jam, sedangkan bola lampu

produksi SHELL memiliki umur rata-rata

1.400 jam dengan simpangan baku 150 jam. Jika diambil sampel random sebanyak 150

bola lampu dari masing-masing merek untuk diuji, tentukan :

a. Beda rata-rata umur bola lampu tersebut

b. Simpangan baku rata-rata umur bola lampu tersebut

c. Probabilitas bahwa merek PHILLIPS memiliki umur rata-rata paling sedikit 175 jam lebih lama daripada merek SHELL

(53)

2. Empat persen barang di gudang A adalah cacat dan sembilan persen barang di gudang B adalah cacat. Jika diambil sampel random

sebanyak 150 barang dari gudang A dan 200 barang dari gudang B,

tentukan :

a. rata-rata beda dua proporsi sampel tersebut

b. Simpangan baku beda dua proporsi sampel tersebut

(54)

A medical clinic specializes in treating patient with allergies. Many of the clinic’s patients must receive allergy shots on a regular basis. The administrator of the clinic whishes to study (and eventually to reduce) the time it takes patients to get their

shots. When receiving a shot, a patient must:

• _{check in with a receptionist} • _{Wait for a nurse}

• _{Have the shot administered}

• _{Wait for a period of at least 15 minutes in case}

of an adverse reaction to the shot

(55)

• _{have a nurse check the patient for signs of}

reactions, and receive the nurse’s permission to check out

• _{Check out on the receptionist desk.}

In order to study the process, the clinic

administrator decides to observe patient’s

treatment times on a typical day. The administrator selects a day when a typical patient load is

expected and when no unusual delays are

anticipated. On the chosen day, the treatment time for each patient receiving an allergy shot is

(56)

•

Suppose treatment time average for 201

patients is 30 minutes with standard

deviation 3.47 minutes. Based on data

plotted to histogram it appears as bell

shaped and symmetrically, the population

of 201 patients appears to be

approximately normally distributed.

•

Furthermore, the administrator wishes to

monitor the effectiveness of the treatment

process on a daily basis : choose 5

(57)

Examples

1. A chain of audio/video equipment

discount store employs 36

salespeople. Daily dollar sales for

individual salespersons employed by

the chain have a mound-shaped

distribution with a mean of $2,000

and a standard deviation equal to

$300.

a. Suppose that the chain’s management decides to implement an incentive

program that awards a daily bonus to any salesperson who achieves a daily sales figure that exceeds $2,150.

calculate the probability that an

(58)

b. Suppose that (as an alternative) the

chain’s management decide to award a daily bonus on the entire sales force of 36 salespeople if all 36 achieve an average daily sales figure that exceed $2,150.

Calculate the probability that average daily sales for the entire sales force will exceed $2,150 (and, therefore, that the entire

sales force will earn the bonus) on any particular day.

c. Intuitively, it would be more difficult for an individual to achieve a daily sales figure that exceeds $2,150 or it would be more

difficult for the entire sales force to achieve an average sales figure that exceeds

$2,150? Are the probabilities you

(59)

Solution

Answer:

300

;

2000

;

36



x

s

n

x 2150

p

 x 2150

(60)

Cases

1. A resort hotel try to improve service by reducing variation in the time it takes to

clean and prepare rooms. In order to study the situation, five rooms are selected each day for 25 consecutive days, and the

required to clean and prepare each room is recorded. The data that’s obtained is given below:

a. Suppose the hotel whishes to use an chart to monitor the room cleaning and preparation process. Also suppose that, when the process in statistical control, the process mean is

μ=16 minutes and the process standard

deviation is σ=1.2 minutes. Find the center line, upper and lower control limit for the chart.

x

(61)

Cases

b. What assumption have you made in

calculating the control limits of part a?

how can you verify that this assumption is reasonable?

c. Plot the sample mean of the data versus the chart center line and control limit. Are any of the sample mean outside the control limits on the resulting chart?Hari 1

1 2 3 4 5 1 2 3 4 5 waktu 13 12.7 11.9 12.1 11.9 13.0 11.1 10.1 12.1 12

x

(62)

Solution

•

95.45% :

•

97.

(63)

(64)

(65)

Day

21

22

23

24

25

1

16,3

15.0

16.4

16.6

17.0

2

15.3

17.6

15.9

15.1

17.5

3

14.0

14.5

16.7

14.1

17.4

4

17.4

17.5

15.7

17.4

16.2

5

(66)

Case

2. A company is using a control chart to monitor an electrical characteristic. The desired mean

measurement for this characteristic is 1.000 and the standard deviation of individual

measurements of this characteristic is 12. The company takes nine readings of this

characteristic every hour, computes the average of the nine readings, and plot this average as a point on the control chart. The control limits for this chart have been set at 990 and 1.010. We will assume that measurement of the electrical characteristic are normally distributed:

a. How many standard deviations of the average have the upper and lower control limits been set above and

below the desired mean measurement for this characteristic?

b. If the mean and standard deviation of the electrical characteristic are at their desired levels, what’s the

probability than an hourly average of nine readings will be outside the established control limits?

(67)

Example

•

A food company processing company

wishes to asses whether p, the

proportion of all current purchasers

who would stop buying the cheese

spread if the new spout were used, is

less than 0.10. Suppose from 1000

(68)

•

The interval is:

•

Since the interval doesn’t contain 0

or 1, the sample size n is large

enough to assume that sampling

distribution of is approximately a

normal distribution with mean

and standard deviation

•

So that













1285 . 0 , 0715 . 0 1000 10 . 0 1 10 . 0 10 . 0 1

3 _ 

                n p p p pˆ 1 . 0

ˆ p  p      0094868 . 0 1000 9 . 0 1 . 0 1

ˆ  p _n p  

p









1





3.9



ˆ 063 . 0 ˆ                 

 P z

(69)

1. Bila semua kemungkinan contoh

berukuran 16 ditarik dari sebuah populasi

normal dengan nilai tengah 50 dan

simpangan baku 5, berapa peluang

bahwa suatu nilaitengah contoh akan

jatuh dalam selang waktu dari ?

(70)

Penyelesaian

• Dik : n=16; μ=50; σ=5 distribusi normal

• Dit.

• Jawab :



_x _x _x _x



p   1.9    0.4

 

 1.9 0.4

(71)

(72)

Terms

• _{Ruang keputusan: himpunan semua}

kemungkinan nilai dugaan yang dapat diambil oleh suatu penduga

• _{Penduga tak bias: statistik} _dikatakan

sebagai penduga tak bias bagi  bila

μ_=E()=

• _{Interval of Confidence :}

(73)

Estimation (Continued)

•

Estimation of a population mean:

Large-sample case

: Point estimate for a

population mean:



– _{Large-sample (1-}_{) 100% Confidence interval}

for a population mean (use the fact that For sufficient large sample size n>=30, the sampling distribution of the sample mean, ,

(74)

Estimation (Continued)

•

100 (1-



)% Confidence interval of

population normally distributed and the

sample size n is large:

In the case



is unknown:





















 

n

z

x

n

z

x

n

z

x







2





2



2

,





















 

n

s

z

x

n

s

z

x

n

s

z

x

2



,



(75)

(76)

Eg. Large number sample

The company has decided to carry out a 40-hour pilot production run of the new bags. Each hour, at randomly selected time during the hour, a bag is taken off the production line. The bag is then subjected to a breaking strength test. The 40 breaking strength obtained during the pilot production run are given below:

(77)

X_i x_i2 X

i xi2 Xi xi2

(78)

X_i x_i2

52.60 2766.76

54.00 2916.00

50.60 2560.36

49.90 2490.01

51.20 2621.44

49.20 2420.64

49.30 2430.49

48.30 2332.89

50.90 2590.81

(79)

Confidence interval for breaking strength

40 64493 .

1 96

. 1 5477

. 50 40

64493 .

1 96

. 1 5477

.

(80)

Eg. Large number sample

From previous examination, it is known that breaking strength is distributed normally. It is also known that standard deviation of population is 1.598:

Solution:

40

598

.

1

96

.

1

5477

.

50

40

598

.

1

96

.

1

5477

.

(81)

Exercise

1. The mean and standard deviation of

the sample of 100 bank customer

waiting times is 5.46 and 2.475

respectively.

a. Calculate 95% and 99% confidence of inteval for population means

b. Using 95% confidence interval, can the bank manager be 95% confident that population mean is less than 6

minutes?

c. Using 99% confidence interval, can the bank manager be 99% confident that population mean is less than 6

(82)

Estimation error and sample size

•

In the case

(83)

Example

A random sampling of 36 students on final

semester was chosen with GPA mean 2.6

and deviation standard 0.3 how big the

sample should be drawn if we wanna

(84)

•

Estimation of a population mean: small

sample case (n<30)

– _{Problems arising for small sample sizes and}

Assumption: the population has an

approximate normal distribution.

– _(1-_{) 100% Confidence interval using}

(85)

Estimation (Continued)

•

100 (1-



)% Confidence interval of

population normally distributed with the

sample size n is small (<30):





















 

n

s

t

x

n

s

t

x

n

s

t

x

2



,



(86)

eg

A survey was conducted to 20 households in a small city in order to predict education expenditure. Data collected is shown on Table below:

Household

s 1 2 3 4 5 6 7 8 9 10

cost (million

Rp) 2,30 4,50 4,00 5,00 3,80 7,20 6,25 5,75 6,70 7,80 household

s 11 12 13 14 15 16 17 18 19 20

cost (million

Rp) 6,80 5,30 8,00

15,1 0

13,2

0 4,50 2,00 4,70 5,75

10,1 0

a. Define mean estimation for education expenditure yearly per household

(87)

Solution

a. Mean estimation for education cost :

b. 95% confidence interval

44

.

6

ˆ



x



(88)

Determining sample size

100(1-α) percent confidence interval for μ

equal to B:

=B=error bound

In the case σ is unknown, use preliminary

sample.

• When n large enough, s replace σ

• When n small, s replace σ and t

distribution replace z.

2 2

   

  

B z

n 





(89)

Eg.

1. Consider a population having standard

deviation equal to 10. We wish to

estimate 95.44 percent confidence

interval for the mean of this population

with error bound equal to 1.

2. Suppose now that we take a random

sample of the sample determined in no 1.

if we obtain a sample mean equal to 295,

calculate the 95.44 percent confidence

(90)

Two Independent Samples

Difference Estimation of Population Mean (µ₁-µ₂)

Point Estimation:

With standard error:

a. In the case first ( ) and second population variances ( ) available, then

)

(

ˆ

₁ ₂

2

1







x



x



2 2 2

1 2 1 )

( ₁ ₂

n

x x



_





2 1



2 2

(91)

Two Independent Samples

b. Population Variances are unknown but equally, then

c. Population Variances are unknown but equally, then



2



) 1 2 ( ) 1 1 ( 1 1 2 1 2 2 2 1 2 1 ) ( ₁ ₂

         n n s n s n s n n s s g g x x 2 2 2 1 2 1 )

( ₁ ₂

n

s

n

s

(92)

Interval confidence (1-)100% for₁-₂: a. Known:

b. : are unknown but assume equally, then

Interval Confidence

2 2 2 1 2 1 2 1 2 1 2 2 2 1 2 1 2

1 ) ₂ ( ) ₂

( n n z x x n n z x

x                

                     2 1 2 int ) ( 2 1 2 1 2 1 2 int ) ( 2

1 ) 1 1 ( ) 1 1

(

2

2 s n n x x t s n n

t x

x  _v _jo    _v _jo

2 and 2 ) 1 ( ) 1 ( 2 1 2 1 2 2 2 2 1 1 2

int   

 



 v n n

n n s n s n s _jo 2 1

and



2 1

and



(93)

c. ₁ and ₂ are unknown and assumed not equally :                      2 2 2 1 2 1 ) ( 2 1 2 1 2 2 2 1 2 1 ) ( 2

1 ) ₂ ( ) ₂

( n s n s t x x n s n s t x

x  _v    _v

(94)

Two dependent samples

Confidence interval (1-α)100% for μ

_D

= μ

₁

– μ

₂

paired observation :

n s t

d n

s t

d 



2 d 



_D  



2 d

standard

deviation

:

s

mean

difference

:

d

(95)

example

Penelitian ingin membuktikan dampak suatu

diet baru, yang dinyatakan dapat

mengurangi bobot badan seseorang 4.5 kg

per 2 minggu. Sebanyak 7 wanita menguji

penggunaan metode diet tersebut. Berat

badan 7 wanita sebelum dan sesudah

mengikuti diet ditunjukkan oleh tabel berikut:

1 2 3 4 5 6 7

(96)

Solution

1 2 3 4 5 6 7 total

Sebelum 58.5 60.3 61.7 69.0 64.0 62.6 56.7

Sesudah 60.0 54.9 58.1 62.1 58.5 59.9 54.4 d_i

-1.5 5.4 3.6 6.9 5.5 2.7 2.3 24.9

2.25 29.16 12.96 47.61 30.25 7.29 5.29

(97)

Eg.

Two companies in cardboard industry compete and claim to be the best in the area. A researcher interested to prove

which one is the best so that he tested cardboard strength of 10 sheets from each company and the data is listed

below :

– _{Estimate cardboard strength difference, and}

calculate standard error!!!

– _{Construct 95% confidence interval for differences!!}

(98)

Solution

Mean difference estimator And standard error









66.94 10(9) (565) -32525) ( 10 ) 1 ( 5 , 56 10 55 60 50 106.94 10(9) (425) -19025) ( 10 ) 1 ( 5 , 42 10 40 35 30 2 2 2 2 2 2 2 2 2 2 1 2 1 1                    



n n x x n s x n n x x n s x i i   14 5 , 56 5 , 42 ˆ ₁ ₂

2

1   x  x   

 17 . 4 10 173,88 10 94 , 66 10 94 , 106 2 2 2 1 2 1 )

( ₁ ₂

(99)

95% Confidence Interval

7442 , 6 ; 2558 , 21 2558 . 7 14 17 , 4 740 , 1 14 )

( ₁ ₂ ₍₀_.₀₅_/ ₂_; ₎ ₍ ₎ 2 1         _ x s t x

x _dbeff _x _x

(100)

Two Dependent Samples

Mean estimator dependent population (µ_d)

Pairs 1 2 3 … n

Sample 1 (X₁) x₁₁ x₁₂ x₁₃ x_1n Sample 2 (X₂) x₂₁ x₂₂ x₂₃ x_2n D = (X₁-X₂) d₁ d₂ d₃ d_n

(101)

Confidence Interval (1-)100% for _d

Confidence Interval

n

s

t

d

n

s

t

d

n D

d

n 1) ( 1)

( ₂

2 







(102)

Eg.

A fitness center interested to test diet program. For that purpose they chose 10 member to be treated with diet program for 3 months. Information gathered are weight before and after practicing diet program:

Estimate weight before and after diet program, Accomplish for 95%!

Weight participant

1 2 3 4 5 6 7 8 9 10

Before (X1) 90 89 92 90 91 92 91 93 92 91

After (X2) 85 86 87 86 87 85 85 87 86 86

(103)

Estimation (Continued)

•

100 (1-



)% Confidence interval for

(104)

Eg. Proportion

• _{The manufacturer of the colorsmart-5000 television set}

claims that 95% of its sets last at least 5 years without needing a single repair. In order to test this claim, a consumer group randomly selects 400 consumers who have owned a colorsmart-5000 television set for 5 years. Of this 400 consumers, 316 say that their product did not need a repair, while 84 say that they did need at least one repair.

1. Find a 99% confidence interval for the proportion of all colorsmart-5000 television sets that have lasted at least 5 years without needing a single repair

(105)

Estimation (Continued)

•

In the case that true value of p is

unknown:























n

p

z

p

n

p

z

(106)

Determining sample size

100(1-α) percent confidence interval for p

equal to B:

In the case σ is unknown, use preliminary

sample.

• When n large enough, s replace σ

• When n small, s replace σ and t

distribution replace z.





2 2

1













B

z

p

(107)

•_{Estimation of the difference between two population}

means: Matched pairs

– _{Assumption: the population of paired differences is}

normally distributed  Procedure

•_{Estimation of the difference between two population}

proportions

– _{For sufficiently large sample size (n1 and n2 >= 30),}

the sampling distribution of based on independent random samples from two populations, is approximately normal

– _(1-_{) 100% Confidence interval for}

Estimation (Continued)

2 p 1

pˆ  ˆ

2 p 1

(108)

(109)

Examples

1. Dari suatu contoh acak 1000 rumah di sebuah kota, ditemukan bahwa 628 rumah

menggunakan pemanas gas alam. Buat selang kepercayaan 98% bagi proporsi rumah di kota ini yang menggunakan pemanas alam.

(110)

3. Suatu penelitian ingin menduga persentase penduduk di suatu kota yang menyetujui

pemberian fluor pada air minum mereka.

Berapa besar contoh yang diperlukan bila kita ingin yakin sekurang-kurangnya 95% bahwa nilai dugaan yang kita peroleh berbeda tidak lebih daripada 1% dari persentase yang

(111)

4. Seorang ahli genetika tertarik pada proporsi laki-laki dan perempuan yang mengidap

kelainan darah tertentu. Dalam suatu contoh 100 orang laki-laki, ternyata ada 24 yang

mengidap penyakit tersebut, sedangkan dari antara 100 orang perempuan yang diperiksa

ternyata ada 13 orang yang mengidap kelainan itu. Buatlah selang kepercayaan 99% bagi

(112)

Tugas

1. Suatu survei dilakukan untuk mengetahui sikap masyarakat terhadap kepemimpinan presiden SBY. Dibentuk kuesioner dan disebarkan ke 3000 responden yang terdiri dari 1750 laki-laki dan sisanya perempuan. Sikap masyarakat dikelompokkan sebagai sikap mendukung dan tidak mendukung. Ditemukan hanya 500

responden laki-laki yang mendukung

kepemimpinan SBY dan 1000 responden

perempuan mendukung. Buatlah selang

(113)

Exercise

2. Dalam sebuah eksperimen psikologi

untuk mengukur kecepatan waktu reaksi seseorang, dilakukan percobaan thd 25 orang secara acak. Data dari survei

sebelumnya menunjukkan bahwa

variansi waktu reaksi adalah 4 detik2

(114)

3. Indeks rata-rata saham perusahaan makanan dan minuman selama 10 tahun belakangan ini dibuktikan tidak menyebar normal. Seorang pemain saham tertarik untuk mengetahui

perbedaan indeks saham kedua perusahaan ini. Dia mencari indeks saham 15 perusahaan makanan dan dari 35 perusahaan minuman. Rata-rata indeks saham 15 perusahaan

makanan adalah 2.75 dengan standar deviasi 0.79 dan dari 35 perusahaan minuman adalah 3.01 dengan standar deviasi 0.67. Berikanlah selang kepercayaan 96% dan 98% bagi

(115)

Variance Estimation

 

   

 

  2 2 1, 2

(116)

Examples

1. Suatu pabrik aki mobil menyatakan

bahwa aki produksinya secara rata-rata

akan mencapai umur 3 tahun dengan

ragam 1tahun. Bila 5 aki mencapai umur

1.9, 2.4, 3.0, 3.5, dan 4.2 tahun, buatlah

selang kepercayaan 95% bagi σ

2

(117)

(118)

General Concepts of Hypothesis

testing

•

The procedures to be discussed are

(119)

Important Terms

Null hypothesis

Significance

Probability of error (alpha levels)

Confidence level

(120)

Assumptions

•

Group studies – comparing experimental

group with control group

•

Representative

values

to

describe

performance in both groups – mean and

Sd

(121)

Comparing Mean Values

Mean values will be different

(122)

Null Hypothesis

Any observable difference between two

mean values is simply due to chance

Significance testing either accepts or rejects

null hypothesis

Acceptance means that observable

difference was due to chance

(123)

Independent t-test

To determine significance between two

independent groups, i.e., experimental

group and control group

Experimental group receives IV; control

group does not

Was the difference between experimental

group mean and control group great enough

(124)

Significance

In essence, what is the likelihood that the

group that receives the IV will score higher

than the group who does not?

In essence, what is the probability that the

group who receives the IV will not score

higher than the group who does not?

In essence, how confident are you that the

group who receives the IV will score higher

(125)

Alpha Levels

The symbol p means probability of error

An alpha level of .05 means probability of

error is less than 5 out of 100 times that the

IV group might not score higher

(126)

Alpha levels (cont.)

An alpha level of .01 means that less than 1

out of 100 times the IV group might not

score higher

Or, 99% of the times the IV group will score

higher

(127)

Non independent t-test

To determine significance between pre and

post tests within one group

(128)

General Concepts of Hypothesis

testing

•

Formulation of Hypotheses

1. A null hypothesis H₀ is the hypothesis against which we hope to gather evidence. Usually this statement represent the status quo and is not rejected unless there is convincing sample evidence that is false

(129)

Type I and type II errors

•

Type I error (α) : reject Ho when it is true

•

Type II error (



): do not reject Ho when it

is false

State of nature

Decision Ho true Ho false

Reject Ho Type I error Correct decision

(130)

Hypothesis Steps

1. Define H

₀

2. Define H

_A

or H

₁

3. Determine α

(131)

Large Sample Hypothesis Test for

Population Mean

3. Define α

4. Define the rejection range 5. Define the statistics

6. Reject H₀ when

0 0

:

1.

H







0 1:

2.H   

n x

z





 



(132)

Hypothesis Test for Mean

0

H H1

0    30 n atau diket 0       n x z 0 0 0          2 / 2 / dan ,     z z z z z z z z       0    30 n atau diket tidak 1 , 0        n v n s x t 0 0 0          2 / 2 / dan ,     t t t t t t t t       2 / 2 / dan ,     z z z z z z z z       0 2 1 0 2 1 0 2 1 d d d               2 1 2 2 2 1 2 1 0 2 1

dan  



 n n d x x z      0 2

1  d



value statistcs

(133)

0 H 1 H 2 / 2 / dan t ,     t t t t t t t       0 2 1 0 2 1 0 2 1 d d d                   2 1 1 2 , diket tdk tetapi 1 1 2 1 2 2 2 2 1 1 2 2 1 2 1 2 1 0 2 1               n n s n s n s n n n n n s d x x t p p    0 2

1   d

(134)

Hypothesis Test for Mean

0 H 1 H 2 / ' 2 / ' ' ' dan t ,     t t t t t t t       0 2 1 0 2 1 0 2 1 d d d              







 



1 1 diket dan tdk 2 2 2 2 1 2 1 2 2 2 2 2 1 2 2 1 2 2 2 1 2 1 0 2 1 ' 2 1 1           n n s n n s n s n s v n s n s d x x t    0 2

1   d

(135)

0 H 1 H 0 d D   1 ,

0 _ _



 v n

(136)

Kasus:

Nasabah di suatu area mengeluhkan bahwa ATM

suatu bank yang ada di lokasi tersebut sering

kehabisan uang.

Manajemen memerlukan data:

1. Rata-rata uang disimpan di mesin ATM : 100 jt

rupiah

(137)

Data penarikan (dlm ratus ribu

rupiah)

(138)

Population Proportion

Sampel besar:

• _{Tentukan taraf nyata α} • _{Wilayah kritik :}

• _{Statistik :}

• _Keputusan:

0 0 : p p

H 

0 0

0

1: p p ,or p p ,or p p

H   

2 / 2

/ and

or , or

, _ _ _

 z z z z z z

z

z      

0 0

0 q np

np x

(139)

Sampel Kecil:

• _{Statistik : Hitung x (banyaknya keberhasilan)}

• _Keputusan:

0 0 : p p

H 

0 0

0

H   

' 2 / '

2 / '

' _,_or _,_or _and

 



 x k x k x k

k

(140)

Sampel besar:

• _{Statistik :}

• _Keputusan:

2 1

0 : p p

H  2 1 2 1 2 1

H   

2 / 2 / and or , or , _ _ _

 z z z z z z

z

z      

(141)

Variance Testing

2 2

0 :  ₀

H 2 0 2 2 0 2 2 0 2

1:  ,or   , 

(142)

Examples

1. Di suatu kota diinyatakan bahwa 50% rumah yang ada menggunakan pemanas gas alam. Suatu penelitian dilakukan dan dari suatu

contoh acak 1000 rumah di sebuah kota,

(143)

2. Seorang ahli genetika tertarik pada proporsi laki-laki dan perempuan yang mengidap

kelainan darah tertentu, karena menurut

penelitian sebelumnya laki-laki lebih rentan terhadap penyakit tersebut. Dalam suatu contoh 100 orang laki-laki, ternyata ada 24

yang mengidap penyakit tersebut, sedangkan dari antara 100 orang perempuan yang

(144)

Tugas

1. Suatu survei dilakukan untuk mengetahui sikap masyarakat terhadap kepemimpinan presiden SBY. Diduga perempuan lebih menyukai dan mendukung SBY dibandingkan dengan laki-laki. Dibentuk kuesioner dan disebarkan ke 3000 responden yang terdiri dari 1750 laki-laki dan sisanya perempuan. Sikap masyarakat dikelompokkan sebagai sikap mendukung dan tidak mendukung. Ditemukan hanya 500

responden laki-laki yang mendukung

(145)

Exercise

2. Dalam sebuah eksperimen psikologi

untuk mengukur kecepatan waktu reaksi seseorang, dilakukan percobaan thd 25 orang secara acak. Data dari survei

sebelumnya menunjukkan bahwa

variansi waktu reaksi adalah 4 detik2

(146)

(147)

Scales and Measures of

Association

Scale of Both Variables

Measures of Association

Nominal Scale

Pearson Chi-Square: χ

2

Ordinal Scale

Spearman’s rho

(148)

Chi-Square (χ2) and Frequency Data

• _{Up to this point, the inference to the population has been}

concerned with “scores” on one or more variables, such as CAT scores, mathematics achievement, and hours spent on the computer.

• _{We used these scores to make the inferences about population}

means. To be sure not all research questions involve score data.

• _{Today the data that we analyze consists of frequencies; that is,}

the number of individuals falling into categories. In other words, the variables are measured on a nominal scale.

• _{The test statistic for frequency data is Pearson Chi-Square.}

(149)

Determine Appropriate Test

• _{Chi Square is used when both variables are measured}

on a nominal scale.

• _{It can be applied to interval or ratio data that have}

been categorized into a small number of groups.

• _{It assumes that the observations are randomly}

sampled from the population.

• _{All observations are independent (an individual can}

appear only once in a table and there are no overlapping categories).

• _{It does not make any assumptions about the shape of}

(150)

Calculating Test Statistics

• _{Contrasts observed frequencies in each cell of a}

contingency table with expected frequencies.

• _{The expected frequencies represent the number of}

cases that would be found in each cell if the null hypothesis were true ( i.e. the nominal variables are unrelated).

• _{Expected frequency of two unrelated events is}

(151)

Calculating Test Statistics









e

e o

F

2

(

)











e

e o

F

2

(

)

(152)

Example

• _{As an example, there are 290 which have stock price}

degrade and 110 raise. This is a total of 400 (290 + 110) companies.

• _{We expect a 3/4 : 1/4 ratio. We need to calculate the}

expected numbers (you MUST use the numbers of companies, NOT the proportion!!!); this is done by multiplying the total companies by the expected proportions. This we expect 400 * 3/4 = 300 stock price degrade, and 400 * 1/4 = 100 stock price rising.

• _{Thus, for stock price degrade, obs = 290 and exp =}

(153)

• _{Now it's just a matter of plugging into the formula:}

• _{2 = (290 - 300)2 / 300 + (110 - 100)2 /}

100

• _{= (-10)2 / 300 + (10)2 / 100} • _{= 100 / 300 + 100 / 100}

• _{= 0.333 + 1.000} • _{= 1.333.}

• _{This is our chi-square value: now we need to see}

(154)

Degrees of Freedom

•

A critical factor in using the chi-square test is

the “degrees of freedom”, which is

essentially the number of independent

random variables involved.

•

Degrees of freedom is simply the number of

classes of offspring minus 1.

•

For our example, there are 2 classes of

(155)

Another Example

company observed expected

proportion expected number Class A 315 9/16 312.75

(156)

• _{You are given the observed numbers, and you determine}

the expected proportions.

• _{To get the expected numbers, first add up the observed to}

get the total number of. In this case, 315 + 101 + 108 + 32 = 556.

• _{Then multiply total by the expected proportion:}

• _{--expected class A = 9/16 * 556 = 312.75}

• _{--expected class B = 3/16 * 556 = 104.25}

• _{--expected lacss C = 3/16 * 556 = 104.25}

• _{--expected class D = 1/16 * 556 = 34.75}

(157)

Use the formula.

X2 = (315 - 312.75)2 / 312.75

+ (101 - 104.25)2 / 104.25

+ (108 - 104.25)2 / 104.25

+ (32 - 34.75)2 / 34.75

(158)

Degrees of freedom is 1 less than the

number of classes of offspring. Here, 4 - 1 =

3 d.f.

For 3 d.f. and p = 0.05, the critical chi-square

value is 7.815.

(159)

(160)

TYPE OF EFFECTS

Type of treatments:

_{Controls, standards, checks, or other item that may}

be used in points of reference in an experiment or an investigation

_{Discrete level of factors or variables (qualitative}

factors). E.g. types of machine, number of times of….., date of…..

_{Continuous level of factors or variables (quantitative}

factors), e.g. temperature, humidity, height, etc.

A factor might be called a set of random effects if the levels of that factor are a random sample from a

population of such levels.

(161)

Mixtures of k of v factors with the proportion of each factor being specified by experimenter or by the nature of the phenomenon under study and with there being one level for each factor in many cases.

Combination of two or more of the type of treatments above.

Fixed effects model: A model is called a fixed effects model if all of the factors in the model are fixed effects and it involves only one variance component.

Random effects model: A model is called a random effects model if all of the factors in the model are random effects.

(162)

Note: Most designs are mixed! Only a few designs; completely randomized designs: e.g. one-way, factorials, response surface) might be considered fixed.

Design issue : Should take sources of variation into consideration as fixed, random or residual effects!

Simple comparative experiments

The hypothesis testing framework The two-sample t-test

Checking assumptions, validity

Comparing more that two factor levels…the analysis of variance

ANOVA decomposition of total variability Statistical testing & analysis

Checking assumptions, model validity Post-ANOVA testing of means

(163)

Experimental design: Factorial Experiments – 1. Single factor

Experimental design

Multiple “treatments