CLASSIFICATION
Algoritma Decision Tree
K
EY
P
ROBLEM
No
Savings
Assets
Income
Credit
Risk
1
Medium
High
High
Good
2
Low
Low
Medium
Bad
3
High
Medium
Low
Bad
4
Medium
Medium
Medium
Good
5
Low
Medium
High
Good
6
High
High
Low
Good
7
Low
Low
Low
Bad
8
Medium
Medium
Medium
Good
Savings
Assets
Income
Credit Risk
Medium
Low
Medium
?
M
ENGHITUNG
I
MPURITY
Menghit ung Kesam aan dat a ( hom ogeneit y) at au ket idaksam aan dat a ( het erogeneit y) dalam sebuah t abel yang m engandung at ribut dan Kelas dari at ribut .
Sebuah t abel dikat akan Pure at au Hom ogenous j ika hanya m engandung sat u class. Jika m engandung lebih dari sat u kelas disebut I m pure at au Het erogeneous.
P
ROBABILITY
Atribut
Class
No
Savings
Assets
Income
Credit Risk
1
Medium
High
High
Good
2
Low
Low
Medium
Bad
3
High
Medium
Low
Bad
4
Medium
Medium
Medium
Good
5
Low
Medium
High
Good
6
High
High
Low
Good
7
Low
Low
Low
Bad
8
Medium
Medium
Medium
Good
Terdapat 3 Class Bad dan 5 Class Good. Total data 8 baris.
Probabilit y Class adalah :
E
NTROPY
P
ARENT
Ent ropy Parent = – 0.375 log ( 0.375) – 0.625 log ( 0.625) = – 0.375 ( – 0.426) – 0.625 ( – 0.205) = 0.15975 + 0.12815
= 0.29 Probabilit y Class adalah :
G
INI
I
NDEX
Gini I ndex = 1 – ( 0.3752 + 0.6252)
= 1 – ( 0.14 + 0.39) = 1 – 0.53
= 0.47 Probabilit y Class adalah :
C
LASSIFICATION
E
RROR
I
NDEX
Classificat ion Error I ndex = 1 – Max{ 0.375, 0.625} = 1 - 0.625
= 0.375 Probabilit y Class adalah :
S
UBSET
- S
AVINGS
Atribut Class
No Savings Credit Risk
1 Low Bad
No Savings Credit Risk
1 Low Bad
2 Low Bad
3 Low Good
No Savings Credit Risk
4 Medium Good
5 Medium Good
6 Medium Good
No Savings Credit Risk
S
UBSET
- A
SSETS
Atribut Class
No Assets Credit Risk
1 Low Bad
No Assets Credit Risk
1 Low Bad
2 Low Bad
No Assets Credit Risk
3 Medium Bad
4 Medium Good
5 Medium Good
6 Medium Good
No Assets Credit Risk
S
UBSET
- I
NCOME
No Income Credit Risk
1 Low Bad
No Income Credit Risk
1 Low Bad
2 Low Bad
3 Low Good
No Income Credit Risk
7 High Good
8 High Good
No Income Credit Risk
I
NFORMATION
G
AIN
I nform at ion Gain ( i) Ent ropy :
Ent ropy dari Parent Tabel D – Sum ( ( Jum lah Dat a Subset / Jum lah
Dat a Parent ) * Ent ropy set iap Subset )
I nform at ion Gain ( i) Gini I ndex :
Gini I ndex dari Parent Tabel D – Sum ( ( Jum lah Dat a Subset / Jum lah
Dat a Parent ) * Gini I ndex set iap Subset )
I nform at ion Gain ( i) Classificat ion Error :
Classificat ion Error dari Parent Tabel D – Sum ( ( Jum lah Dat a
I
NFORMATION
G
AIN
Savings Assets Income
Gini Index
Low (3) 0.46 Low (2) 0 Low (3) 0.46
Medium (3) 0 Medium (4) 0.375 Medium (3) 0.46
High (2) 0.5 High (2) 0 High (2) 0
Maxim um I nform at ion Gain = Subset Asset s Pure ( Hom ogen) Subset Asset s = Low dan High
Assets
Low
High Medium
Bad
D
ECISION
T
REE
R
ULE
Assets
Low
High Medium
Bad
Good
?
P
ARENT
- I
TERATION
#2
No Assets Savings Income Credit Risk
1 Medium High Low Bad
2 Medium Medium Medium Good
3 Medium Medium Medium Good
4 Medium Low High Good
Prob ( Bad) : 1/ 4 = 0.25 Prob ( Good) : 3/ 4 = 0.75
Gini I ndex : 1 – ( 0.252 + 0.752)
S
UBSET
S
AVINGS
- #2
No Savings Credit Risk
1 High Bad
2 Medium Good
3 Medium Good
4 Low Good
No Savings Credit Risk
1 High Bad
No Savings Credit Risk
1 Medium Good
2 Medium Good
No Savings Credit Risk
S
UBSET
I
NCOME
- #2
No Income Credit Risk
1 Low Bad
2 Medium Good
3 Medium Good
4 High Good
No Income Credit Risk
1 Low Bad
No Income Credit Risk
1 High Good
No Income Credit Risk
I
NFORMATION
G
AIN
- #2
Maxim um I nform at ion Gain = Subset Savings dan Subset I ncom e Pure ( Hom ogen) Subset Savings = Low, Medium dan High
D
ECISION
T
REE
R
ULE
- R
ESULT
# 1. I f Asset s = Low Then Credit Risk = Bad # 2. I f Asset s = High Then Credit Risk = Good
# 3a. I f Asset s = Medium And Savings = Low Then Credit Risk = Good # 4a. I f Asset s = Medium And Savings = High Then Credit Risk = Bad
# 5a. I f Asset s = Medium And Savings = Medium Then Credit Risk = Good
# 3b. I f Asset s = Medium And I ncom e = Low Then Credit Risk = Bad # 4b. I f Asset s = Medium And I ncom e = High Then Credit Risk = Good # 5b. I f Asset s = Medium And I ncom e = Medium Then Credit Risk = Good
Savings
Assets
Income
Credit Risk
Savings Or Income
Medium
Low
Medium
?
Bad / Bad
R
EFERENCES
|