Ukuran Sebaran (Keragaman)
Data
Dr. Akhmad Rizali
Ukuran keragaman
•
Dari tiga ukuran pemusatan, belum dapat
memberikan deskripsi yang lengkap bagi suatu
data
•
Perlu juga diketahui seberapa jauh pengamatan‐
pengamatan tersebut menyebar dari rata‐ratanya
•
Ada kemungkinan diperoleh rata‐rata dan median
yang sama, namun berbeda keragamannya
•
Beberapa ukuran keragaman yang sering kita
temui adalah range (rentang=kisaran=wilayah),
simpangan (deviasi), varian (ragam), simpangan
baku (standar deviasi) dan koefisien keragaman
These are measurements of how spread the
data is around the center of the distribution
f
x
f
x
Measures of Dispersion and Variability
Difference between lowest and highest numbers
Place numbers in order of magnitude,
then range = X
n- X
1Range = 5 - 2
= 3
2
2
3
4
= X
1= X
2= X
3= X
4Problem - no information
about how clustered the
data is
You could express dispersion in terms of
deviation from the mean, however, a sum of
deviations from the mean will always = 0.
i.e.
(X
i- X) = 0
So, take an absolute value to avoid this
Problem – the more numbers in the data set, the higher the SS
SIMPANGAN (deviation)
Contoh
• Misal, jumlah buku tulis yang dibawa 5 mahasiswa adalah 3, 5, 7, 7, 8. Rerata (mean) data tersebut adalah 30/5 = 6. Simpangan dihitung dengan mengurangi setiap nilai pengamatan dengan reratanya
No. Nilai observasi Simpangan (x ‐ x) Simpangan kuadrat (x ‐ x)2 1 3 3‐6 = ‐3 9 2 5 5‐6 = ‐1 1 3 7 7‐6 = 1 1 4 7 7‐6 = 1 1 5 8 8‐6 = 2 4 Jumlah 0 16
Agar nilainya tidak negatif, dapat di kuadratkan yang kemudian disebut simpangan kuadrat.
Sample mean deviation =
| X
i- X |
n
Essentially the average deviation from the mean
No. Nilai observasi Simpangan (x -x) 1 3 3-6 = -3 2 5 5-6 = -1 3 7 7-6 = 1 4 7 7-6 = 1 5 8 8-6 = 2 Jumlah 0 Simpangan rerata (x -x)/n (3-6)/5 = -3/5 (5-6)/5 = -1/5 (7-6)/5 = 1/5 (7-6)/5 = 1/5 (8-6)/5 = 2/5 0
Simpangan Rerata (Mean Deviation)
Sample SS
=
(X
i- X)
2=
SS is much more common than mean deviation
Another way to get around the problem of zero sums is to square the deviations. Known as sum of squares or SS
Xi2‐ (Xi)2/n
Sum of Square
Example
2
2
3
4
5
= X
1= X
2= X
3= X
4= X
5X = 3.2
Sample SS =
(X
i- X)
2SS = (2 - 3.2)
2+ (2 - 3.2)
2+
(3 - 3.2)
2+ (4 - 3.2)
2+ (5 -3.2)
2= 1.44 + 1.44 + 0.04 + 0.64 + 3.24
= 6.8
Problem –the more numbers in the data set, the higher the SS
VARIAN (Ragam)
•
Dalam prakteknya, simpangan jarang digunakan
karena sulit dimanipulasi secara matematis
•
Sebagai gantinya diperlukan kuadrat semua
simpangan tersebut kemudian dibagi derajad bebas
n‐1, dan disebut dengan varian (ragam)
•
Digunakan pembagi n‐1 agar menjadi penduga tak
bias
•
Ragam populasi dilambangkan dengan σ², sedang
ragam contoh dilambangkan dengan s
2The mean SS is known as the
variance
Population Variance (
2):
2=
(X
i-
)
2N
This is just SS
N
Problem - units end up squared
Our best estimate of
2is
sample variance (s
2):
S
2= (X
i
- X)
2n - 1
Note : divide by n-1known as degrees of freedom
Xi2‐ (Xi)2/n
n ‐ 1 =
Mengapa (n‐1) disebut derajad bebas (kebebasan)?
Perhatikan ilustrasi berikut:
•
Apabila seseorang hendak mengangkat 100 kg beras
dari lantai 1 ke lantai 3 dan ia harus mengangkat
maksimal sebanyak 5 kali, maka orang tersebut dapat
memilih menyelesaikannya dalam 2 kali angkat, 3 kali
atau sampai (n‐1) kali
•
Sampai dengan 4 (n‐1) kali, orang tersebut bebas
memilih berapa kg yang diangkat ke lantai 3
•
Namun pada angkatan terakhir (1 kali), mau tidak mau,
orang tersebut harus mengangkat semua beras yang
tersisa. Artinya kebebasan memilih jumlah yang
diangkat hanya (n‐1) kali
Standar Deviasi
•
Penggunaan ragam untuk mengukur keragaman,
diperoleh satuan kuadrat dari satuan semula
•
Apabila yang dihitung keragamannya adalah bobot
buah melon dengan satuan kg, maka ragamnya akan
mempunyai satuan kg²
•
Apabila yang diukur keragamannya adalah jumlah
petani dengan satuan orang, maka ragamnya akan
mempunyai satuan orang²??. Tentu saja hal ini sangat
tidak logis
•
Agar diperoleh satuan yang sama dengan satuan
asalnya, maka varian tersebut diakarkan. Akar dari
ragam disebut simpangan baku (s) atau dikenal dengan
standar deviasi
Standard Deviation
(Standar Deviasi)
=> square root of variance
=
(X
i-
)
2N
For a population: For a sample:s =
(X
i- X )
2n - 1
=
2s = s
2Contoh
2
2
3
4
5
= X
1= X
2= X
3= X
4= X
5X =
3.2
s =
(X
i- X )
2n - 1
= 1.304 s = (2 - 3.2)2 + (2 - 3.2)2+ (3 - 3.2)2 + (4 - 3.2)2 + (5 -3.2)2 5 - 1 = 1.44 + 1.44 + 0.04 + 0.64 + 3.24 4Hitung standar deviasi
Varian adalah kuadrat dari standar deviasi. Berapa nilainya??
CV
=s
X
Variance (s
2) and standard deviation (s) have
magnitudes that are
dependent on the
magnitudes
of the data.
The coefficient of variation is a relative measure, so variability of different sets of data may be compared (stdev relative to the mean)
Note that there are no
units – emphasizes that it
is a relative measure
X 100%
Coefficient of Variation (Koefisien Keragaman)
KK (V or sometimes CV)
Example:
2
2
3
4
5
= X
1= X
2= X
3= X
4= X
5s = 1.304
g
CV=s
X
X = 3.2
g
CV =
1.304 g
3.2 g
CV = 0.4075
or
CV =
40.75%
(X 100%)Attention there is not any UNIT, or %
There is an equation which describes the height of the normal curve in relation to its standard dev (
)X
2
3
2
3
68.27% 95.44% 99.73%f
The Normal Distribution (Distribusi Normal)
ƒ
-3 -2 -1 0 1 2 3 4
μ
= 0Normal distribution with σ = 1, with varying means
μ
= 1μ
= 2 5 If you get difficulties to keep this term, read statistics booksƒ
-4 -3 -2 -1 0 1 2 3 -5 4 5σ
= 1σ
= 1.5σ
= 2Normal distribution with μ = 0, with varying standard deviations
Symmetry means that the population is equally distributed around the mean i.e. the curve to the right side of the mean is a mirror imageof the curve to the left side
ƒ
Mean, median and mode
Symmetry and Kurtosis
Data may be positively skewed(skewed to the right)
Symmetry
ƒ
ƒ
Or negatively skewed(skewed to the left)
So direction of skew refers to the direction of longer tail
Symmetry
ƒ
mode median meanƒ
Kurtosis refers to how flat or peakeda curve is (sometimes referred to as peakedness or tailedness)
The normal curve is known as mesokurtic
ƒ
A more peaked curve is known as leptokurtic A flatter curve is known as
Latihan
• Banyaknya buah pisang yang terserang hama dari 16 tanaman
adalah 4, 9, 0, 1, 3, 24, 12, 3, 30, 12, 7, 13, 18, 4, 5, dan 15. Dengan menganggap data tersebut sebagai contoh, hitunglah varian, simpangan baku dan koefisien keragamannya. Statistik mana yang paling tepat untuk menggambarkan keragaman data tersebut?
• To study how first‐grade students utilize their time when assigned to a math task, researcher observes 24 students and records their time off task out of 20 minutes. Times off task (minutes) : 4, 0, 2, 2, 4, 1, 4, 6, 9, 7, 2, 7, 5, 4,13, 7, 7, 10, 10, 0, 5, 3, 9 and 8. For this data set, find : Mean and standard deviation, median and range Display the data in the histogram plot, dot diagram and also stem‐and‐ leaf diagram Determine the intervalsx ± s, x ± 2s, x ± 3s Find the proportion of the measurements that lie in each of this intervals. Compare your finding with empirical guideline of bell‐shaped distribution • The data below were obtained from the detailed record of purchases over several month. The usage vegetables (in weeks) for a household taken from consumer panel were (gram) : 84 58 62 65 75 76 56 87 68 77 87 55 65 66 76 78 74 81 83 78 75 74 60 50 86 80 81 78 74 87 Plot a histogram of the data! Find the relative frequency of the usage time that did not exceed 80 Calculate the mean, variance and the standard deviation Calculate the median and quartiles • The mean of corn weight is 278 g by ear and deviation standard is 9,64 g, and than we have 10 ears. If they are gotten from ten different fields, mean of plant height is Rp. 1200,‐ and its deviation standard is Rp 90,‐, which one have more homogenous, the weight of corn ear or the plant height? Explain your answer! Verify your results by direct calculation with the other data
• The employment’s salary at seed company, abbreviated, as follows : 18, 15, 21, 19, 13, 15, 14, 23, 18 and 16 rupiah. If these abbreviation is real salary divide Rp. 100.000,‐, find the mean, variance and deviation standard of them. • Computer‐aided statistical calculations. Calculation of the
descriptive statistic such as x and s are increasingly
tedious with large data sets. Modern computers have come a long way in alleviating the drudgery of hand calculation. Microsoft Exel, Minitab or SPSS are three of computing packages those are easy accessible to student because its commands are in simple English. Find these programs and install its at your computers. Bellow main and sub menu of Microsoft Excel, Minitab and SPSS program. Use these software to find x, s, s2, and coefisien of variation (CV) for data set in exercise b. Histogram and another illustration can also be created
•
Some properties of the standard deviation
if a fixed number c is added to all measurements in a data set, will the deviations (xi ‐x) remain changed? And consequently, will s² and s remain changed, too? Take data sample. If all measurements in a data set are multiplied by a fixed number d, the deviation (xi ‐x) get multiplied by d. Is it right? What about the s² and s? Take data sample.
Apply your computer software to explain your data sample. Verify your results by other data