• Tidak ada hasil yang ditemukan

Negative and cyclic association rule

N/A
N/A
Protected

Academic year: 2018

Membagikan "Negative and cyclic association rule"

Copied!
33
0
0

Teks penuh

(1)

Under the guidance of

Dr.S.K.Gupta

Negative and Cyclic

association rules

Department of Computer Science & Engineering

Presented by

(2)

describes relationships between

item sets and implies the occurrence

of some item sets characterized by

the absence of others In contrast to

positive association rules.

Negative Association Rules

(3)

Find the set of items which do not

present in a transaction together.

Need of Negative AR :

Unexpected patterns and exceptional patterns are

referred to as

exceptions of rules in positive

association.

Example,

while ‘

bird(x) ⇒ flies(x)’ is a well-known

fact, but an exceptional rule is

bird(x), penguin(x)⇒

flies(x)’.

This exception indicates that unexpected patterns

and can involve negative terms and therefore

treated as a special case of negative rules.

(4)

Comparison with Positive AR

Positive association rules consider only

items enumerated in transactions like

people buying

milk & wheat bread

together.

Negative association rules might also

consider the same items, but in addition do

consider negated items (i.e. absent from

transactions) like people

buying milk &

bread

together but not

cold-drink

as well.

(5)

What is Negative Association Rules

What is Negative Association Rules

If X and Y are set of items

and {X} => {~Y} is a NAR

Negative Association rule.

Rereerfr

It means itemset X and Y are

negatively correlated.

In most of the case where X is

present

(6)

What is Negative AR contd.

What is Negative AR contd.

a negative rule

A ⇒

B also has a measure of

its

strength,

conf, defined as the ratio supp(A∪

B)/supp(A).

Support-confidence framework for negative rules

A and B are disjoint itemsets, that is, A ∩ B = ∅;

supp(A) ≥ min

sup

, supp(B) ≥ min

sup

;

supp(A⇒

B) = supp(A∪

B);

(7)

Negative Rules Forms

(Note that although rule in the form of :X -> :Y contains

(8)

Assumption 1: The minimum support is 30% and minimum confidence is 70%.

Assumption 2: The numeric attribute AGE ranges from 18 to 70 and is quantized into two groups - less than thirty and over thirty.

The rule that satisfies both minimum

support and minimum confidence criterion is “{age < 30} -> {coupe}”, the confidence of which is 75%. negative association rule exists: “{age >30}->{not purchasing

coupe}”’, which has a confidence of 83.3%. For the purpose of identifying purchase pattern, it is obvious that the latter has better predictive ability.

The preceding example illustrates that

(9)

Confidence of Negative AR

Confidence of Negative AR

To avoid counting them directly, we can

(10)

Locality of similarity

Locality of similarity

We cannot find the positive rules with small

support and confidence values because that will

result in many uninteresting rules.

To eliminate unwanted rules and focus on

potential interesting ones, we

predict possible

interesting negative AR

by incorporating

domain knowledge

of the data sets.

We use Taxonomy T for this.Which consist

vertex and directed edge.

Every vertex is a class and vertex which with

degree 0 is

most general

class and which

(11)

LOS contd.

LOS contd.

 Taxonomy T consists of vertexes and directed edges. Each

vertex represents a class.

 vertical relationship semantics is that the lower level vertex

values are instances of the values of immediate predecessor vertexes, i.e., the is-a relationship. In a vertical relationship is used to discover generalized association rules.

 semantics of the horizontal relationship is that the vertexes

on the same level having the same immediate predecessor (siblings to borrow from rooted tree terminology)

(12)

LOS Contd.

LOS Contd.

 Items belonging to the same LOS tend to participate in

similar association rules. This is generally true because members of such groups tend to have similar activity patterns.

 For example, in a retail database, instances are items

involved in transactions and customers are participants. If there is no preference for each person, the purchase

probability of each item will be evenly distributed over all brands.

 LOS can be extended to different levels following the same

parent node. For instance, it is more reasonable to put ‘IBM Aptiva’, ‘Compaq Deskpro’, ‘Notebook’, and ‘Parts’ into one LOS when viewing the database at a more abstract level.

(13)

LOS Contd.

(14)

Discovering Negative Rules

Discovering Negative Rules

to qualify as a negative rule, it must satisfy

two conditions:

first, there must exist a large deviation

between the estimated and actual

confidence ,that is

similarity measure

(SM).

second, the support and confidence are

(15)

Pruning

Pruning

In constructing candidate negative rules, there

are possibilities that an equivalent or similar

pair is generated

Another redundancy exists when items from

a LOS and all are sibling rules.

The pruning will either keep all positive ones

or keep all negative ones that have high

confidence.

An example is the pruning between rule

(16)

Algorithm

// finding all positive rules

1. FreqSet1 = {frequent 1-item sets} 2. Find all positive rules

// Generate Negative rules

3. Delete all items from taxonomy, t which are not frequent 4. For all rules r (positive rules)

5. TmpRuleSet = genNegCan(r)

6. For all rules tr (TmpRuleSet)

(17)

Example

(18)

Results

(19)

Results contd.

(20)

Conclusion

Conclusion

Given the number of positive rules P and the average size

of the LOS L, the complexity of the algorithm is O(P x L).

Complexity does not depend on the number of

transactions since it is assumed that the supports of item sets have been counted and stored for use in this as well as other mining applications.

The complexity of discovering positive rules depends on not

only the number of transactions, but also the sizes of attribute domains as well as the number of attributes.

The overall complexity of finding Negative ARwill be

(21)

Applications

Applications

Large DBs output results

- Helps to limit the search space in

huge databases by combining the

known positive associations with

negative rules as well based on domain

knowledge.

Example, the positive association of

(22)

Limitations

Limitations

First, we cannot simply pick

threshold

values for support and confidence

that

are guaranteed to be effective in sifting both

positive and negative rules.

Impractical volume of negative rules if not chosen

appropriately, which might impact performance

- For example, if there are 50,000 items in a store

then the possible combinations of items is 2

50000

wherein a majority of them will never appear together

even once in the entire database. Now if the absence

of a certain item combination is taken to mean

negative association, then we can generate millions of

negative association rules. However, most of these

rules are likely to be extremely uninteresting.

(23)

Cyclic Association Rules

Cyclic Association Rules

Some item sets occurs after a certain period

of times.

The rule has the minimum confidence and

support at

(24)

Overview

Overview

Step1: The dataset is divided into time segments.

Step2: Existing methods for discovering frequent item sets

in each segment.

Step3: Then pattern matching algorithms to detect cycles

in

association rules is applied.

Step4: techniques called cycle pruning and cycle skipping

which allow us to significantly reduce the amount of

(25)

Problem Definition

Problem Definition

We denote the i

th

time unit, i

>=

0; by t

i

.

That is t

i

corresponds to the time interval [i.t,(i+1).t)

where t is the unit of time.

We denote the set of transactions executed in t

i

by D[i].

support of an itemset X in D[j] is the fraction of

transactions in D[j] that contain the itemset.

confidence of a rule X → Y in D[j] is the fraction of

transactions in D[j] containing X that also contain Y.

An association rule X→Y holds in time unit tj if the

support of XUY in D[j] exceeds sup

min

and the

(26)

Problem Definition contd.

Problem Definition contd.

A cycle c is a tuple

(l,

o) consisting of a length

l

and an offset o (the first time unit in which the

cycle occurs), 0 < o <

l

. We say that an association

rule has a cycle c =

(l,

o) if the association rule

holds in every

l

th

time unit starting with time unit

t

o

For example, if the unit of time is an hour and

“Tea=>Biscuit” holds during the interval 7AM-8AM

every day (i.e., every 24 hours), then “Tea =>

Biscuit” has a cycle (24, 7).

A cycle (

l

i

, o

i

) is a multiple of another cycle (

l

j

, o

j

)

(27)

Problem Definition contd.

Problem Definition contd.

A time unit t

i

is said to be “part of cycle c” or

“participate in cycle c” if o = i mod l holds.

Example-if the binary sequence 001100010101

represents

the association rule X -> Y ; then X -> Y holds

in

(28)

Modifying existing algorithm

Modifying existing algorithm

The existing algorithms for discovering

association rules cannot be applied directly.

extend the set of items with time attributes,

(29)

The Sequential Algorithm

The Sequential Algorithm

Step1 : Finding association rules(in each time segment) – Maxima frequent item sets are generated.

– Association rules are generated from the large itemsets

Step2 : Cycle detection

– By patern matching algorithm

Complexity of the cycle detection phase has an upper

bound of

O( r * n * lmax ).

- r- no of rules detected -n is the no of segment

(30)

The Sequential Algorithm contd.

The Sequential Algorithm contd.

Cycle-Pruning, Cycle-Skipping and cycle

elimination

The major portion of the running time of the

sequential algorithm is spent to calculate the

support for itemsets

.

A cycle of the rule X-> Y is a multiple of a cycle of

itemset XUY

Cycle Skipping:

If time unit t

i

is not part of a cycle of an itemset

(31)

Cycle pruning :

If an itemset X has a cycle

(l,o),

then any of the subsets of X has the

cycle

(l,o).

Cycle elimination:

If the support for an itemset X is

below the minimum support

threshold sup

min

in time segment D[i],

(32)

Mining Negative Association Rules

Xiaohui Yuan ; EECS Dept., Tulane Univ., New

Orleans, LA, USA ; Buckles, B.P. ; Zhaoshan

Yuan ; Jian Zhang

Cyclic Association Rules

Banu Ozden, Sridhar Ramaswamy, Avi

Silberschatz ; Bell Laboratories Information

Sciences Research Center

Referense

(33)

Referensi

Dokumen terkait

Dari keseluruhan pengujian yang telah dilakukan pada metode association rule dan most-frequent item dapat disimpulkan bahwa penghitungan jarak menggunakan

In our approach, extraction and optimization of fuzzy association rules are done together using multi-objective genetic algorithm by considering the objectives such as fuzzy support,

Recommendation System In the online phase, recommendation system generated based on two level of the knowledge base, association rules and topic similarity.. 1 Recommendation Based on