• Tidak ada hasil yang ditemukan

Constructing decision trees

N/A
N/A
Protected

Academic year: 2024

Membagikan "Constructing decision trees"

Copied!
12
0
0

Teks penuh

(1)

DMML

, 21 Tan 2020

- -

Constructing

decision

trees

Building

smallest

tree

is NP -

complete

Greedy heuristic

- maximize

improvement

in

purity Measuring impunity

-

Mrs classification

rate

-

Entropy

- "

Information gain

"

-

Gini Index

(2)

Problem with

using information gain directly

:

Attribute

=

Aadhaar

no

/ Passport

N-

/

Serial

Nof

. ..

Suppose

me

pick Aadhaar

as next

question

Dn

Iz

- -

D Dr

Kompany

highest possible information gana

, but

totally

useless
(3)

Penalize attributes with

too many possible values

-

Compute entropy ( aim

index

of attribute itself !

Aadhaar

- N values, each once in table

Pr

--

IN for

each value

-

Is

, pi

hog

pi = -

I

,

f- log 's

= -

log IN

=

log

N
(4)

information

-

gain ( Ai )

-

improvement

in

purity if

we

choose Ai

entropy ( Ai )

information

-

gain

-

raho ( Ai )

r.

mfm#Ad

entropy ( Ai )

(5)

Two

well known

implement

alms

f decision trees

-

CART

-

Classification

&

Regression

Tree

(

avi Indy

Brennan

et

al

- C

4.5

-

Quinlan

-

Entropy

-

Continuous attributes ? e.g

"

salary

"

"

Cost

"
(6)

May

borrow lower &

upper band for

Ai

-

Granularity

Typical question should by Ai

Ev

?

how

do we

choose

v

?

( Ai Ai

Z= rv

? ?

All

we

know abort Ai

is what we see in

table

E N values across N rows V

,cuz C - - C UN

(7)

All these

values are

equivalent

#{

Vi

/

Vz

mfg

,

Vs

the

Vt Yo tht UN

Mo Mi

Ai

e

mg ? Try

each

mj

, evaluate

Should we choose

midpoints for mj

,

purity gann

,

keep the

or actual

vj

?

best

Better to use actual

vjs

. -

fr interpreted ihty of

model

-not

all

values are

possible

(8)

CART

-

Regression

-

produce

a number as an answer

( Typically

,

fit

a

function fkn

. . ,xn

) to

data

Training data has

a numeric value as

target

1*7*7

(9)

Regression Tree

like

a

classification tree

TA

variance

is

/ \

a

good

error

metric

¥

- m rows -

predict

mean

satay

(10)

CART

-

Classification And Regnum

Tree

( Only asks binary quests

-

how good

is

the classifier ?

-

what

is

the

correct measure

?

-

On whet data

can we

compute

the measure

?

L

Need data

for

which cowed- ane is

known

(11)

Only

labelled data we

have

is

training data

Withhold

some

training

data

for testing

Typically 702

to

build

model ,

30%

to

test

( Be

cauafd to

select

30%

"

randomly

"

-

Build

a model on

training

set

-

Evaluate

on

test

set

Somethin

-

training

data is

too sparse to

"

waste"

(12)

Cross Validation

- Choose

different subsets

as

test

data

-

Repeatedly build

models

10

-

fold

crossvalidation

-

902 tram

,

10%

test

=

(

covers

whole

data across 10

iteration

If

results are

good

,

go

back and build model on

full

data

Referensi

Dokumen terkait

Perbandingan hasil dari pengujian menggunakan 1 hidden layer dengan jumlah node hidden sebanyak 1 sampai 10 node pada data menggunakan 10-fold cross validation

Seluruh kelas Jalankan Kare, buka sebuah file data masukan, klik tombol Build/RDR, pilih mode pengujian Cross Validation dari box Test Option, text field jumlah

Proses ini dilakukan menggunakan metode 10-fold cross validation, yaitu metode membagi dataset menjadi 10 bagian yang mana 1 bagian diantaranya menjadi data uji

Dalam penelitian ini digunakan 10 fold- cross validation dimana 481 record pada data training dibagi secara random ke dalam 10 bagian dengan perbandingan yang sama

Proses ini dilakukan menggunakan metode 10-fold cross validation, yaitu metode membagi dataset menjadi 10 bagian yang mana 1 bagian diantaranya menjadi data uji

Berdasarkan akurasi yang didapat pada pengujian 10-fold cross validation untuk seratus data tugas akhir, metode Naïve Bayes Classifier dapat melakukan klasifikasi

menggunakan 10 K-Fold Cross Validation, artinya penulis membagi data menjadi 2, yaitu data uji dan data latih yang dibagi secara acak dan untuk mendapatkan nilai pengujian dari

Hasil pengujian dari indexing dan nonindexing dengan menggunakan 10- fold cross validation terhadap 153 data uji dapat disimpulkan bahwa nilai akurasi pada