• Tidak ada hasil yang ditemukan

Lect4 Association Rules

N/A
N/A
Protected

Academic year: 2017

Membagikan "Lect4 Association Rules"

Copied!
17
0
0

Teks penuh

(1)

Association Rules

Lecture 4/DMBI/IKI83403T/MTI/UI

Yudho Giri Sucahyo, Ph.D, CISA (yudho@cs.ui.ac.id)

Faculty of Computer Science, University of Indonesia

Objectives

Objectives

`

Introduction

`

What is Association Mining?

`

What is Association Mining?

`

Mining Association Rules

`

Al

ith

f A

i ti

R l Mi i

`

Algorithms for Association Rules Mining

(2)

Introduction

Introduction

` You sell more if customers can see the product.

` Customers that purchase one type of product are likely

` Customers that purchase one type of product are likely to be interested in other particular products.

` Market-basket analysis ¼ studying the composition of

h i b k f d h d d i i l

shopping basket of products purchased during a single shopping event.

` Market-basket data ¼ the transactional list of purchases p by customer. It is challenging, because

` Very large number of records (often millions of trans/day)

` Sparseness (each market basket contains only a small portion of

` Sparseness (each market-basket contains only a small portion of

items carried)

` Heterogeneity (those with different tastes tend to purchase a specific

subset of items)

University of Indonesia

subset of items).

Introduction (2)

Introduction (2)

` Product presentations can be more intelligently planned for specific times a day, days of the week, or holidays. for specific times a day, days of the week, or holidays.

` Can also involve sequential relationships.

` Market-basket analysis is an y undirected (along with ( g clustering) DM operation, seeking patterns that were previously unknown.

` Cross-selling

` The propensity for the purchaser of a specific item to purchase a different item

a different item

` Can be maximized by locating those products that tend to be purchased by the same consumer in places where both

(3)

What is Association Mining?

What is Association Mining?

`

Association rule mining (ARM):

` Finding frequent patterns association correlation or causal

` Finding frequent patterns, association, correlation, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories.

` Frequent pattern: pattern that occurs frequently in a database. `

Motivation: finding regularities in data

`

Motivation: finding regularities in data

` What products were often purchased together? — Beer and diapers?!

` What are the subsequent purchases after buying a PC?

` What kinds of DNA are sensitive to this new drug?

` Can we automatically classify web documents?

University of Indonesia

` Can we automatically classify web documents?

5

Why is Frequent Pattern or Association

Mining an Essential Task in DM?

Mining an Essential Task in DM?

` Foundation for many essential data mining tasks

` Association correlation causality

` Association, correlation, causality

` Sequential patterns, temporal or cyclic association, partial periodicity, spatial and multimedia association

p y p

` Associative classification, cluster analysis, iceberg cube, fascicles (semantic data compression)

` Broad applications

` Basket data analysis, cross-marketing, catalog design, sale campaign analysis

(4)

What is Association Mining?

What is Association Mining?

`

Examples:

` Rule form: “A → B [support confidence]”

` Rule form: A → B [support, confidence] .

` buys(x, “diapers”) → buys(x, “beers”) [0.5%, 60%]

` major(x, “CS”) ^ takes(x, “DB”) j ( , ) ( , ) → grade(x, “A”) [1%, 75%]g ( , ) [ , ]

`

A

support

of 0.5% for Assoc Rule means that 0.5%

of all the transaction show that diapers and beers are

purchased together.

`

A

confidence

of 60% means that 60% of the

customers who purchased diapers also bought beers.

`

Rules that satisfy both minimum support and

f d

h

h ld

ll d

University of Indonesia

minimum confidence threshold are called

strong

.

7

What is Association Mining?

What is Association Mining?

` A set of items is referred to as an itemset.

` An itemset that contains k items is a k-itemset

` An itemset that contains k items is a k-itemset.

` {beer, diaper} is a 2-itemset.

` If an itemset satisfies minimum support then it is a

` If an itemset satisfies minimum support, then it is a

frequent itemset.

` The set of frequent q k-itemsets is commonly denoted by y y Lkk.

` ARM is a two-step process:

` Find all frequent itemsets

` Generate strong AR from the frequent itemsets.

` The second step is the easiest of the two. Overall

f f i i AR i d t i d b th fi t t

(5)

Association Mining

Association Mining

mining association rules

(Agrawal et al SIGMOD93) (Agrawal et. al SIGMOD93)

Fast algorithm

(Agrawal et. al VLDB94)

Generalized A.R.

(Srikant et al; Han et al VLDB95)

Problem extension Better algorithms

Partitioning

(Agrawal et. al VLDB94) (Srikant et. al; Han et. al. VLDB95)

Quantitative A.R.

(Srikant et. al SIGMOD96)

Hash-based

(Park et. al SIGMOD95)

g

(Navathe et. al VLDB95)

Direct Itemset Counting

(Brin et. al SIGMOD97)

N-dimensional A.R.

(Lu et. al DMKD’98)

Parallel mining

(Agrawal et. al TKDE96)

( )

Meta-ruleguided mining

University of Indonesia 9

Distributed mining

(Cheung et. al PDIS96)

Incremental mining

(Cheung et. al ICDE96)

Many Kinds of Association Rules

Many Kinds of Association Rules

` Boolean association rule:

` If a rule concerns associations between the presence or absence

` If a rule concerns associations between the presence or absence of items.

` Example: buys(x, “diapers”) → buys(x, “beers”) [0.5%, 60%] (R1)

` Quantitative association rule:

` Describes associations between quantitative items. Quantitative values for items are partitioned into intervals

values for items are partitioned into intervals.

` Example: age(X,”30-39”) ∧ income(X,“42K-48K”) →

buys(X,“LCD TV”) (R2)

` Age and income have been discretized.

` Single-dimensional association rule ¼ R1

(6)

Many Kinds of Association Rules

Many Kinds of Association Rules

` Single-level association rule

` Example: age(X ”30-39”) → buys(X “laptop computer”)

` Example: age(X, 30 39 ) → buys(X, laptop computer )

` Multilevel association rule

` Example: p age(X,”30-39”) g ( ) → buys(X,“computer”)y ( p )

` Computer is a higher-level abstraction of laptop

` Various Extensions

` Mining maximal frequent patterns.

` If p is a maximal frequent pattern, then any superpattern of p is not frequent

not frequent.

` Used to substantially reduce the number of frequent itemsts generated in mining.

University of Indonesia 11

Mining Single-Dimensional

Boolean Assocation Rules

Boolean Assocation Rules

` Given

` A database of customer transactions

` A database of customer transactions

` Each transaction is a list of items (purchased by a customer in a visit)

` Find all rules that correlate the presence of one set of items with that of another set of items

` Example: 98% of people who purchase tires and auto accessories

` Example: 98% of people who purchase tires and auto accessories also get automotive services done

` Any number of items in the consequent/antecedent of rule

(7)

Application Examples

Application Examples

` Market-basket Analysis

` * → Fanta -- what the store should do to boost Fanta sales

` → Fanta what the store should do to boost Fanta sales

` Bodrex → * -- what other products should the store stocks up on if the store has a sale on Bodrex

` Attached mailing in direct marketing

University of Indonesia 13

Rule Measures: Support and Confidence

Rule Measures: Support and Confidence

` Find all the rules X & Y Z with

Customer buys diaper

Customer buys both

minimum confidence and support

` support,s, probability that a transaction contains {X Y Z}

buys diaper

transaction contains {X, Y, Z}

` confidence,c, conditional probability that a transaction having {X, Y} also

Customer buys beer

contains Z.

Transaction ID Items Bought

2000 A B C Let minimum support 50%, and

buys beer

2000 A,B,C

1000 A,C

4000 A,D

Let minimum support 50%, and

minimum confidence 50%, we have

— A C (50%, 66.6%)

C A (50% 100%)

,

(8)

Mining Association Rules -- Example

Mining Association Rules

Example

Transaction ID Items Bought Transaction ID Items Bought

2000 A,B,C

1000 A,C

Min. support 50% Min. confidence 50%

4000 A,D

5000 B,E,F Frequent Itemset Support

{A} 75%

{B} 50%

For rule AC:

support = support({A, C}) = 50%

{B} 50%

{C} 50%

{A,C} 50%

support support({A, C}) 50%

confidence = support({A, C})/support({A}) = 66.6%

The Apriori principle:

University of Indonesia 15

Any subset of a frequent itemset must be frequent.

Mining Frequent Itemsets: the Key Step

Mining Frequent Itemsets: the Key Step

¦Find the frequent itemsets: the sets of items that have minimum support

minimum support

‹A subset of a frequent itemset must also be a frequent itemset,

i e if {AB} is a frequent itemset both {A} and {B} should be a i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset

‹Iteratively find frequent itemsets with cardinality from 1 to k ‹Iteratively find frequent itemsets with cardinality from 1 to k

(k-itemset)

(9)

The Apriori Algorithm

The Apriori Algorithm

Ck: Candidate itemset of size k

L f i f i k

Lk : frequent itemset of size k

L1 = {frequent items};

f (k 1 L ! ∅ k++) d b i for (k = 1; Lk !=∅; k++) do begin

Ck+1 = candidates generated from Lk; for each transaction t in database do

increment the count of all candidates in Ck+1

that are contained in t

Lk+1 = candidates in Ck+1 with min_support end

return ∪k Lk;

University of Indonesia 17

The Apriori Algorithm – Example 1

The Apriori Algorithm Example 1

TID Items

Database D itemset sup.

{1} 2 itemset sup.{1} 2

C itemset L

S D itemset sup

C3 itemset L3

{2 3 5}

Scan D itemset sup

(10)

The Apriori Algorithm – Example 2

The Apriori Algorithm Example 2

Tid Items 1 3 4 5 6 7 9 2 1 3 4 5 13 3 1 2 4 5 7 11 4 1 3 4 8

Itemset Tid Support

1 2,3,4,5 80% (4)

3 1,2,4,5 80% (4)

Rule Support Confidence

7 ⇒ 5 40% (2) 100%

University of Indonesia 19

Frequent Patterns

with MinSupport = 40% Association Ruleswith MinSupport = 40% and MinConf = 70%

 

(11)

Major Drawbacks of Apriori

Major Drawbacks of Apriori

` Apriori has to read the database many times to test the support of candidate patterns. In order to find a frequent support of candidate patterns. In order to find a frequent pattern X with length 50, it has to traverse the database 50 times.

` On dense datasets with long patterns, as the length of the pattern increases, the performance of Apriori drops rapidly due to the explosion of the candidate patterns

due to the explosion of the candidate patterns.

University of Indonesia 21

TreeProjection Algorithm

(Agarwal, Aggarwal & Prasad 2000) (Agarwal, Aggarwal & Prasad 2000)

  null {7534, 5134, 7514, 134, 134} Lev el 0

7 {54, 54} 5 {34, 134, 14} 1 {34, 4, 3 4 Level 1

34, 34}

{4,4, 4, 4}

75 74 51 53 54 13 14 34 Level 2

754 514 534 134 Level 3

 

7 5 1 3 4

7

5 2

1 1 2

3 1 2 3

Lexicographical Tree

and

Triangular Matrix for Counting Frequent Patterns

with Length Two 3 1 2 3

4 2 3 4 4

(12)

Eclat Algorithm

(Zaki et al. 1997)

Eclat Algorithm

(Zaki et al. 1997)

  13457

tid‐list intersecti on

1 3 4 5 7

University of Indonesia 23

{} 

FP-Growth

(Han, Pei & Yin 2000)

FP Growth

(Han, Pei & Yin 2000)

  node‐links

1:3 5:1 5:1

7:1 5:1 7:1

(13)

CT-PRO

(Sucahyo & Gopalan 2004)

CT PRO

(Sucahyo & Gopalan 2004)

0 5 1

University of Indonesia 25

4 0 5

(c) Global CFP-Tree

Mining Very Large Database

Mining Very Large Database

Partition Algorithm

(Savasere, Omiecinski & Navathe 1995)

(14)

Mining Very Large Database

Mining Very Large Database

` Projection (Pei 2002)

 Tid     Items   

It It Items Items It

 3     7  5  1  4    <empty> 

      Items   

  <empty> 

Projection 1      Items  

  <empty> 

Projection 7

Projection 5

Projection 3

(a) Parallel Projection

(b) Partition Projection Projection 7  Projection 3 

University of Indonesia 27

Presentation of Association Rules

(Table Form)

(15)

Visualization of Association Rule

Using Rule Graph

Using Rule Graph

University of Indonesia 29

Visualization of Association Rule

Using Plane Graph

(16)

Conclusion

Conclusion

`

Association rule mining

`

probably the most significant contribution from the

database community in KDD

`

A large number of papers have been published

`

Many interesting issues have been explored

`

An interesting research direction

`

Association analysis in other types of data: spatial

`

Association analysis in other types of data: spatial

data, multimedia data, time series data, etc.

University of Indonesia 31

References

References

` Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques,

Morgan Kaufmann, 2001.

` David Olson and Yong Shi, Introduction to Business Data Mining, McGraw-Hill,

2007.

` Agarwal, R. C., Aggarwal, C. C. & Prasad, V. V. V. 2001, 'A Tree Projection g gg j

Algorithm for Generation of Frequent Item Sets', Journal of Parallel and

Distributed Computing (Special Issue on High-Performance Data Mining),vol. 61, no. 3, pp. 350-371.

` Han, J., Pei, J. & Yin, Y. 2000, 'Mining Frequent Patterns without Candidate

Generation', in Proceedings of the ACM SIGMOD International Conference on

Management of Data, Dallas, Texas, USA, pp. 1-12.

` Savasere, A., Omiecinski, E. & Navathe, S. 1995, 'An Efficient Algorithm for

Mining Association Rules in Large Databases', in Proceedings of the 21st

(17)

References (2)

References (2)

` Pei, J. 2002, Pattern-growth Methods for Frequent Pattern Mining, PhD Thesis, Simon Fraser University, Canada.

Z k M J 1997 'P ll l Al h f F D f A R l ' D

` Zaki, M. J. 1997, 'Parallel Algorithms for Fast Discovery of Association Rules', Data Mining and Knowledge Discovery: An International Journal,vol. 1, no. 4, pp. 343-373

` Sucahyo, Y. G. & Gopalan, R. P. 2004, 'CT-PRO: A Bottom-Up Non Recursive Frequent Itemset Mining Algorithm Using Compressed FP Tree Data Structure' in Proceedings Itemset Mining Algorithm Using Compressed FP-Tree Data Structure , in Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI),

Brighton, UK.

Referensi

Dokumen terkait

Analisis des- kriptif dengan menyajikan gambar dan grafik di- lakukan terhadap sebaran ukuran panjang dan bobot ikan yang tertangkap, hubungan panjang dan bobot, hubungan

Produk dari komoditas kehutanan, pertanian, perikanan bahkan sektor pertambangan yang paling menonjol aktivitasn- ya karena terlihat dalam jumlah dan intensitas yang

Kelompok materi integrasi tersebut yaitu dari mata pelajaran : Materi Gambar Teknik, Materi Material Teknik, Materi Pemesinan, Materi Kesehatan dan Keselamatan Kerja dan

Beberapa macam sirkulasi atmosfer mempunyai peran dalam pembentukan cuaca dan iklim di wilayah perairan Indonesia, baik itu Sirkulasi Global / Walker dan

terjemah ke penerjemah tersumpah à legalisir Kemenkumham (terjemahan) à legalisir Kemenlu (terjemahan) à Legalisir Kedubes Austria (asli dan terjemahan).

Tujuan penelitian ini adalah untuk mengidentifikasi dan menganalisis penerapan terapi perilaku: token ekonomi pada klien dengan gangguan sensori persepsi:

Keuangan Syariah , (Jakarta: Sinar Grafika, 2013), h.173.. dilakukan dalam jangka waktu tertentu, dengan bagi hasil yang keun tungannya berdasarkan kesepakatan bersama.

Dengan memanjat puji syukur kepada Allah SWT yang telah melimpahkan segala nikmat dan rahmat_Nya, sehingga penulisan skripsi dengan judul Pengaruh Pengembangan Karir