Under the guidance of
Dr.S.K.Gupta
Negative and Cyclic
association rules
Department of Computer Science & Engineering
Presented by
describes relationships between
item sets and implies the occurrence
of some item sets characterized by
the absence of others In contrast to
positive association rules.
Negative Association Rules
•
Find the set of items which do not
present in a transaction together.
Need of Negative AR :
Unexpected patterns and exceptional patterns are
referred to as
exceptions of rules in positive
association.
Example,
while ‘
bird(x) ⇒ flies(x)’ is a well-known
fact, but an exceptional rule is
‘
bird(x), penguin(x)⇒
¬
flies(x)’.
This exception indicates that unexpected patterns
and can involve negative terms and therefore
treated as a special case of negative rules.
Comparison with Positive AR
Positive association rules consider only
items enumerated in transactions like
people buying
milk & wheat bread
together.
Negative association rules might also
consider the same items, but in addition do
consider negated items (i.e. absent from
transactions) like people
buying milk &
bread
together but not
cold-drink
as well.
What is Negative Association Rules
What is Negative Association Rules
If X and Y are set of items
and {X} => {~Y} is a NAR
Negative Association rule.
Rereerfr
• It means itemset X and Y are
negatively correlated.
•
In most of the case where X is
present
What is Negative AR contd.
What is Negative AR contd.
a negative rule
A ⇒
¬
B also has a measure of
its
strength,
conf, defined as the ratio supp(A∪
¬
B)/supp(A).
Support-confidence framework for negative rules
—
A and B are disjoint itemsets, that is, A ∩ B = ∅;
—
supp(A) ≥ min
sup, supp(B) ≥ min
sup;
—
supp(A⇒
¬
B) = supp(A∪
¬
B);
Negative Rules Forms
(Note that although rule in the form of :X -> :Y contains
Assumption 1: The minimum support is 30% and minimum confidence is 70%.
Assumption 2: The numeric attribute AGE ranges from 18 to 70 and is quantized into two groups - less than thirty and over thirty.
The rule that satisfies both minimum
support and minimum confidence criterion is “{age < 30} -> {coupe}”, the confidence of which is 75%. negative association rule exists: “{age >30}->{not purchasing
coupe}”’, which has a confidence of 83.3%. For the purpose of identifying purchase pattern, it is obvious that the latter has better predictive ability.
The preceding example illustrates that
Confidence of Negative AR
Confidence of Negative AR
•
To avoid counting them directly, we can
Locality of similarity
Locality of similarity
We cannot find the positive rules with small
support and confidence values because that will
result in many uninteresting rules.
To eliminate unwanted rules and focus on
potential interesting ones, we
predict possible
interesting negative AR
by incorporating
domain knowledge
of the data sets.
We use Taxonomy T for this.Which consist
vertex and directed edge.
Every vertex is a class and vertex which with
degree 0 is
most general
class and which
LOS contd.
LOS contd.
Taxonomy T consists of vertexes and directed edges. Each
vertex represents a class.
vertical relationship semantics is that the lower level vertex
values are instances of the values of immediate predecessor vertexes, i.e., the is-a relationship. In a vertical relationship is used to discover generalized association rules.
semantics of the horizontal relationship is that the vertexes
on the same level having the same immediate predecessor (siblings to borrow from rooted tree terminology)
LOS Contd.
LOS Contd.
Items belonging to the same LOS tend to participate in
similar association rules. This is generally true because members of such groups tend to have similar activity patterns.
For example, in a retail database, instances are items
involved in transactions and customers are participants. If there is no preference for each person, the purchase
probability of each item will be evenly distributed over all brands.
LOS can be extended to different levels following the same
parent node. For instance, it is more reasonable to put ‘IBM Aptiva’, ‘Compaq Deskpro’, ‘Notebook’, and ‘Parts’ into one LOS when viewing the database at a more abstract level.
LOS Contd.
Discovering Negative Rules
Discovering Negative Rules
to qualify as a negative rule, it must satisfy
two conditions:
first, there must exist a large deviation
between the estimated and actual
confidence ,that is
similarity measure
(SM).
second, the support and confidence are
Pruning
Pruning
In constructing candidate negative rules, there
are possibilities that an equivalent or similar
pair is generated
Another redundancy exists when items from
a LOS and all are sibling rules.
The pruning will either keep all positive ones
or keep all negative ones that have high
confidence.
An example is the pruning between rule
Algorithm
// finding all positive rules
1. FreqSet1 = {frequent 1-item sets} 2. Find all positive rules
// Generate Negative rules
3. Delete all items from taxonomy, t which are not frequent 4. For all rules r (positive rules)
5. TmpRuleSet = genNegCan(r)
6. For all rules tr (TmpRuleSet)
Example
Results
Results contd.
Conclusion
Conclusion
Given the number of positive rules P and the average size
of the LOS L, the complexity of the algorithm is O(P x L).
Complexity does not depend on the number of
transactions since it is assumed that the supports of item sets have been counted and stored for use in this as well as other mining applications.
The complexity of discovering positive rules depends on not
only the number of transactions, but also the sizes of attribute domains as well as the number of attributes.
The overall complexity of finding Negative ARwill be
Applications
Applications
Large DBs output results
- Helps to limit the search space in
huge databases by combining the
known positive associations with
negative rules as well based on domain
knowledge.
Example, the positive association of
Limitations
Limitations
First, we cannot simply pick
threshold
values for support and confidence
that
are guaranteed to be effective in sifting both
positive and negative rules.
Impractical volume of negative rules if not chosen
appropriately, which might impact performance
- For example, if there are 50,000 items in a store
then the possible combinations of items is 2
50000wherein a majority of them will never appear together
even once in the entire database. Now if the absence
of a certain item combination is taken to mean
negative association, then we can generate millions of
negative association rules. However, most of these
rules are likely to be extremely uninteresting.
Cyclic Association Rules
Cyclic Association Rules
Some item sets occurs after a certain period
of times.
The rule has the minimum confidence and
support at
Overview
Overview
Step1: The dataset is divided into time segments.
Step2: Existing methods for discovering frequent item sets
in each segment.
Step3: Then pattern matching algorithms to detect cycles
in
association rules is applied.
Step4: techniques called cycle pruning and cycle skipping
which allow us to significantly reduce the amount of
Problem Definition
Problem Definition
We denote the i
thtime unit, i
>=0; by t
i.
That is t
icorresponds to the time interval [i.t,(i+1).t)
where t is the unit of time.
We denote the set of transactions executed in t
iby D[i].
support of an itemset X in D[j] is the fraction of
transactions in D[j] that contain the itemset.
confidence of a rule X → Y in D[j] is the fraction of
transactions in D[j] containing X that also contain Y.
An association rule X→Y holds in time unit tj if the
support of XUY in D[j] exceeds sup
minand the
Problem Definition contd.
Problem Definition contd.
A cycle c is a tuple
(l,
o) consisting of a length
l
and an offset o (the first time unit in which the
cycle occurs), 0 < o <
l
. We say that an association
rule has a cycle c =
(l,
o) if the association rule
holds in every
l
thtime unit starting with time unit
t
o
For example, if the unit of time is an hour and
“Tea=>Biscuit” holds during the interval 7AM-8AM
every day (i.e., every 24 hours), then “Tea =>
Biscuit” has a cycle (24, 7).
A cycle (
l
i, o
i) is a multiple of another cycle (
l
j, o
j)
Problem Definition contd.
Problem Definition contd.
A time unit t
iis said to be “part of cycle c” or
“participate in cycle c” if o = i mod l holds.
Example-if the binary sequence 001100010101
represents
the association rule X -> Y ; then X -> Y holds
in
Modifying existing algorithm
Modifying existing algorithm
The existing algorithms for discovering
association rules cannot be applied directly.
extend the set of items with time attributes,
The Sequential Algorithm
The Sequential Algorithm
Step1 : Finding association rules(in each time segment) – Maxima frequent item sets are generated.
– Association rules are generated from the large itemsets
Step2 : Cycle detection
– By patern matching algorithm
Complexity of the cycle detection phase has an upper
bound of
O( r * n * lmax ).
- r- no of rules detected -n is the no of segment
The Sequential Algorithm contd.
The Sequential Algorithm contd.
Cycle-Pruning, Cycle-Skipping and cycle
elimination
The major portion of the running time of the
sequential algorithm is spent to calculate the
support for itemsets
.