• Tidak ada hasil yang ditemukan

Bab 04 Classification Part 1

N/A
N/A
Protected

Academic year: 2022

Membagikan "Bab 04 Classification Part 1"

Copied!
16
0
0

Teks penuh

(1)

Bab 4.1 - 1/44

Bab 4 Bab 4

Classification: Basic Concepts, Classification: Basic Concepts,

Decision Trees & Model Decision Trees & Model

Evaluation Evaluation

Part 1 Part 1

Classification With Decision Classification With Decision

treetree

(2)

Bab 4.1 - 2/44

Classification: Definition

(3)

Bab 4.1 - 3/44

Example of Classification Task

(4)

Bab 4.1 - 4/44

General Approach for

Building Classification Model

(5)

Bab 4.1 - 5/44

Classification Techniques

(6)

Bab 4.1 - 6/44

Example of Decision Tree

(7)

Bab 4.1 - 7/44

Another Example of Decision Tree

(8)

Bab 4.1 - 8/44

Decision Tree Classification Task

(9)

Bab 4.1 - 9/44

Apply Model to Test Data

(10)

Bab 4.1 - 10/44

Decision Tree Classification Task

(11)

Bab 4.1 - 11/44

Decision Tree Induction

(12)

Bab 4.1 - 12/44

General Structure of Hunt’s Algorithm

(13)

Bab 4.1 - 13/44

Hunt’s Algorithm

(14)

Bab 4.1 - 14/44

Design Issues of Decision Tree Induction

(15)

Bab 4.1 - 15/44

Methods for Expression Test Conditions

(16)

Bab 4.1 - 16/44

Test Condition for Nominal Attributes

(17)

Bab 4.1 - 17/44

Test Condition for Ordinal Attributes

(18)

Bab 4.1 - 18/44

Test Condition for Continues Attributes

(19)

Bab 4.1 - 19/44

Splitting Based on Continues Attributes

(20)

Bab 4.1 - 20/44

How to Determine the Best Split / 1

(21)

Bab 4.1 - 21/44

How to Determine the Best Split / 2

(22)

Bab 4.1 - 22/44

Measures of Node Impurity

(23)

Bab 4.1 - 23/44

Finding the Best Split / 1

(24)

Bab 4.1 - 24/44

Finding the Best Split / 2

(25)

Bab 4.1 - 25/44

Measure of Impurity: GINI

(26)

Bab 4.1 - 26/44

Computing GINI Index of a Single Node

(27)

Bab 4.1 - 27/44

Computing GINI Index for a Collection of Nodes

(28)

Bab 4.1 - 28/44

Binary Attributes: Computing GINI Index

(29)

Bab 4.1 - 29/44

Categorical Attributes: Computing GINI Index

(30)

Bab 4.1 - 30/44

Continuous Attributes: Computing GINI Index / 1

(31)

Bab 4.1 - 31/44

Continuous Attributes: Computing GINI Index / 2

(32)

Bab 4.1 - 32/44

Measure of Impurity:

Entropy

(33)

Bab 4.1 - 33/44

Computing Entropy of a Single Node

(34)

Bab 4.1 - 34/44

Computing information Gain After Splitting

(35)

Bab 4.1 - 35/44

Problems with Information Gain

(36)

Bab 4.1 - 36/44

Gain Ratio

(37)

Bab 4.1 - 37/44

Measure of Impurity:

Classification Error

(38)

Bab 4.1 - 38/44

Computing Error of a Single Node

(39)

Bab 4.1 - 39/44

Comparison among Impurity Measures

For binary (2-class) classification problems

(40)

Bab 4.1 - 40/44

Misclassification Error vs Gini index

(41)

Bab 4.1 - 41/44

Example: C4.5

Simple depth-first construction.

Uses Information Gain

Sorts Continuous Attributes at each node.

Needs entire data to fit in memory.

Unsuitable for Large Datasets.

 Needs out-of-core sorting.

You can download the software from:

http://www.cse.unsw.edu.au/~quinlan/c4.5r8.tar.

gz

(42)

Bab 4.1 - 42/44

Scalable Decision Tree Induction / 1

How scalable is decision tree induction?

 Particularly suitable for small data set

SLIQ (EDBT’96 — Mehta et al.)

 Builds an index for each attribute and only class list and the current attribute list reside in

memory

(43)

Bab 4.1 - 43/44

Scalable Decision Tree Induction / 2

SLIQ

Sample data for the class buys_computer

Disk-resident attribute lists Memory-resident class list

RID Credit_rating Age Buys_computer

1 excellent 38 yes

2 excellent 26 yes

3 fair 35 no

4 excellent 49 no

Credit_ratin

g RID

excellent 1 excellent 2 excellent 4

fair 3

age RID

26 2

35 3

38 1

49 4

RID Buys_compute

r node

1 yes 5

2 yes 2

3 no 3

4 no 6

0

1 2

3 4

5 6

(44)

Bab 4.1 - 44/44

Decision Tree Based Classification

Advantages

 Inexpensive to construct

 Extremely fast at classifying unknown records

 Easy to interpret for small-sized tress

 Accuracy is comparable to other classification techniques for many data sets

Practical Issues of Classification

 Underfitting and Overfitting

 Missing Values

 Costs of Classification

Referensi

Dokumen terkait

Number-theoretical information such as the abelianizations of low index subgroups (computed as ray class groups by class field theory), the form of the relations (in the simplest

To create a class, computer server need to enter the class name and then system will send all class list (class names and the IP server) names to all existing clients in the current

Class Node pada double linked list tidak sama dengan single linked list karena nodenya memiliki 3 atribuat yaitu data node next dan node prev.. Berikut ini class

b In this GCM_Full Matrix class, the valueUnit attribute is replaced by valueUnits, which shall contain an ordered list sequence of references the identifiers to the units

set set // replaces the value at the specified position in this // list with item and returns the previous value; if the // index is out of range index < 0 || index >= size, // throws

As per Table SA34 Kwazulu-Natal: The New Big 5 False BayKZN276 - Table SA37 Projects delayed from previous years... List all projects with planned completion dates in current year that

Supplemental Digital Appendix 1 Teaching Hospital Classification Criteria used by the OSHPD to identify teaching hospitals: • An initial list of potential teaching hospitals was

The research methodology used is the collection of diabetes data obtained from Kaggle, as many as 768 data with eight input attributes and 1 output attribute as a class,