Praktik Association Rules

(1)

[email protected]

(2)

Dataset Faceplate.csv

• Jalankan program berikut

• Berikan instruksi :

# install.packages("arules") library(arules)

fp.df <- read.csv("Faceplate.csv")

# remove first column and convert to matrix fp.mat <- as.matrix(fp.df[, -1])

# convert the binary incidence matrix into a transactions database fp.trans <- as(fp.mat, "transactions")

inspect(fp.trans)

## get rules

# when running apriori(), include the minimum support, minimum confidence, and target

# as arguments.

rules <- apriori(fp.trans, parameter = list(supp = 0.2, conf = 0.5, target = "rules"))

# inspect the first six rules, sorted by their lif inspect(head(sort(rules, by = "lif"), n = 6))

Possible rule : if red, then white

meaning that if a red faceplate is purchased, a white one is, too.

inspect(fp.trans)

Transaction,Red,White,Blue,Orange,Green,Yellow 1,1,1,0,0,1,0

2,0,1,0,1,0,0 3,0,1,1,0,0,0 4,1,1,0,1,0,0 5,1,0,1,0,0,0 6,0,1,1,0,0,0 7,1,0,1,0,0,0 8,1,1,1,0,1,0 9,1,1,1,0,0,0 10,0,0,0,0,0,1

(3)

Interpreting the Results

• For example, we can read rule #4 as follows:

If orange is purchased, then with confidence 100% white will also be

purchased. This rule has a lif ratio of 1.43.

> inspect(head(sort(rules, by = "lift"), n = 6))

lhs rhs support confidence lift count [1] {Red,White} => {Green} 0.2 0.5 2.500000 2 [2] {Green} => {Red} 0.2 1.0 1.666667 2 [3] {White,Green} => {Red} 0.2 1.0 1.666667 2 [4] {Orange} => {White} 0.2 1.0 1.428571 2 [5] {Green} => {White} 0.2 1.0 1.428571 2 [6] {Red,Green} => {White} 0.2 1.0 1.428571 2

{Red,White} => {Green}

antecedent consequent

Confidence = 2/4 = 0.5 Lif ratio = 0.5 / 0.2 = 2.5

(4)

Rule selection

• The goal is to find only the rules that indicate a strong dependence between the antecedent and consequent itemsets.

• To measure the strength of association implied by a rule, we use the

measures of confidence and lif ratio.

(5)

Implementation of association

rules

RECOMMENDATIONS UNDER “FREQUENTLY BOUGHT TOGETHER” ARE BASED ON

ASSOCIATION RULES

• Users who buy this item ofen buy that item as well

• Users who watched James Bond movies, also watched Jason Bourne movies.

Recommendations :

(6)

Dataset Movies

• Survey terhadap 6 orang untuk menilai 4 buah film

• Jalankan program berikut

# install.packages("arules") library(arules) fp.df <- read.csv("movies.csv")

# remove first column and convert to matrix fp.mat <- as.matrix(fp.df[, -1])

# convert the binary incidence matrix into a transactions database fp.trans <- as(fp.mat, "transactions")

inspect(fp.trans)

## get rules

# as arguments.

# inspect the first six rules, sorted by their lif inspect(head(sort(rules, by = "lif"), n = 6))

> inspect(head(sort(rules, by = "lift"), n = 6))

lhs rhs support confidence lift count

[1] {Interstellar,Hobbit} => {Frozen} 0.5 0.75 1.5 3 [2] {ImitationGame} => {Interstellar} 0.5 1.00 1.2 3 [3] {Interstellar} => {ImitationGame} 0.5 0.60 1.2 3 [4] {Frozen} => {Hobbit} 0.5 1.00 1.2 3 [5] {Hobbit} => {Frozen} 0.5 0.60 1.2 3 [6] {Frozen} => {Interstellar} 0.5 1.00 1.2 3

Survey,ImitationGame,Frozen,Interstellar,Hobbit 1,1,0,1,0

2,0,2,2,2 3,0,0,0,1 4,1,2,3,2 5,1,0,1,1 6,0,2,2,3

(7)

Dataset Charles Book Club

ChildBks,YouthBks,CookBks,DoItYBks,cefBks,ArtBks,GeogBks,ItalCook,ItalAtlas,ItalArt,Florence 0,1,0,1,0,0,1,0,0,0,0

1,0,0,0,0,0,0,0,0,0,0 0,0,0,0,0,0,0,0,0,0,0 1,1,1,0,1,0,1,0,0,0,0 0,0,1,0,0,0,1,0,0,0,0 1,0,0,0,0,1,0,0,0,0,1 0,1,0,0,0,0,0,0,0,0,0 0,1,0,0,1,0,0,0,0,0,0 1,0,0,1,0,0,0,0,0,0,0 1,1,1,0,0,0,1,0,0,0,0 0,0,0,0,0,0,0,0,0,0,0

all.books.df <- read.csv("CharlesBookClub.csv")

# create a binary incidence matrix count.books.df <- all.books.df[, 8:18]

incid.books.df <- ifelse(count.books.df > 0, 1, 0) incid.books.mat <- as.matrix(incid.books.df[, -1])

# convert the binary incidence matrix into a transactions database books.trans <- as(incid.books.mat, "transactions")

inspect(books.trans)

# plot data

itemFrequencyPlot(books.trans)

# run apriori function rules <- apriori(books.trans,

parameter = list(supp= 200/4000, conf = 0.5, target = "rules"))

# inspect rules

inspect(sort(rules, by = "lif"))

https://raw.githubusercontent.com/prnvg/Charles-Book-Club/master/CBC.csv

> inspect(sort(rules, by = "lift"))

lhs rhs support confidence lift count [1] {DoItYBks,GeoBks} => {YouthBks} 0.05450 0.5396040 2.264864 218 [2] {CookBks,GeoBks} => {YouthBks} 0.08025 0.5136000 2.155719 321 [3] {CookBks,RefBks} => {DoItYBks} 0.07450 0.5330948 2.092619 298 [4] {YouthBks,GeoBks} => {DoItYBks} 0.05450 0.5215311 2.047227 218 [5] {YouthBks,CookBks} => {DoItYBks} 0.08375 0.5201863 2.041948 335 [6] {YouthBks,RefBks} => {CookBks} 0.06825 0.8400000 2.021661 273 [7] {YouthBks,DoItYBks} => {GeoBks} 0.05450 0.5278450 1.978801 218 [8] {YouthBks,DoItYBks} => {CookBks} 0.08375 0.8111380 1.952197 335 [9] {DoItYBks,RefBks} => {CookBks} 0.07450 0.8054054 1.938400 298 [10] {RefBks,GeoBks} => {CookBks} 0.06450 0.7889908 1.898895 258 [11] {YouthBks,GeoBks} => {CookBks} 0.08025 0.7679426 1.848237 321 [12] {DoItYBks,GeoBks} => {CookBks} 0.07750 0.7673267 1.846755 310 [13] {YouthBks,ArtBks} => {CookBks} 0.05150 0.7410072 1.783411 206 [14] {DoItYBks,ArtBks} => {CookBks} 0.05300 0.7114094 1.712177 212 [15] {RefBks} => {CookBks} 0.13975 0.6825397 1.642695 559 [16] {ArtBks,GeoBks} => {CookBks} 0.05525 0.6800000 1.636582 221 [17] {YouthBks} => {CookBks} 0.16100 0.6757608 1.626380 644 [18] {DoItYBks} => {CookBks} 0.16875 0.6624141 1.594258 675 [19] {ItalCook} => {CookBks} 0.06875 0.6395349 1.539193 275 [20] {GeoBks} => {CookBks} 0.15625 0.5857545 1.409758 625 [21] {ArtBks} => {CookBks} 0.11300 0.5067265 1.219558 452

> inspect(head(sort(rules, by = "lif"), n = 6))

lhs rhs support confidence lif count

[1] {CookBks} => {GeogBks} 0.2727273 1.00 2.75 3

[2] {GeogBks} => {CookBks} 0.2727273 0.75 2.75 3

[3] {GeogBks} => {YouthBks} 0.2727273 0.75 1.65 3

[4] {YouthBks} => {GeogBks} 0.2727273 0.60 1.65 3

(8)

User-Based Collaborative Filtering

• Algorithm :

• Find users who are most similar to the user of interest (neighbors). This is done by

comparing the preference of our user to the preferences of other users.

• Considering only the items that the user has not yet purchased, recommend the ones that are most preferred by the user’s neighbors.

• This is the approach behind Amazon’s

“Customers Who Bought This Item Also

Bought…”

(9)

Dataset Bread and Milk

transaksi,barang1,barang2,barang3,barang4 1,bread,coke,milk,

2,beer,bread,, 3,beer,coke,diaper,milk 4,beer,bread,diaper,milk 5,coke,diaper,milk,

# install.packages("stringr") library(stringr) data <- read.csv("bread-milk.csv") datax <- data[,c(1,2)]

colnames(datax) <- c("transaksi", "barang") for (i in c(3:dim(data)[2])) { data1 <- data[,c(1,i)]

colnames(data1) <- c("transaksi", "barang") datax <- rbind(datax, data1)

}

## menghapus barang kosong dataX <- datax[1,]

for (i in c(2:dim(datax)[1])) { data1 <- datax[i,]

mystr <- data1$barang lenstr <- str_length(mystr) if (lenstr>0) { dataX <- rbind(dataX, data1) print (sprintf ("[%s - %d]", data1$barang, lenstr)) }

}

X <- with(dataX, table(transaksi,barang))

## Massage it into the format you're wanting hasilz <- cbind(name = rownames(X), apply(X, 2, as.numeric))

# remove first column and convert to matrix fp.matx <- as.matrix(hasilz[,-1]) nmkol <- colnames(fp.matx)

#--- aa <- as.numeric(factor(fp.matx)) bb <- matrix(aa, nrow = dim(fp.matx)[1],ncol = dim(fp.matx)[2]) cc <- bb - 1 colnames(cc) <- nmkol fp.mat <- cc

#---

# convert the binary incidence matrix into a transactions database fp.trans <- as(fp.mat, "transactions") inspect(fp.trans)

## get rules

# as arguments.

# inspect the first six rules, sorted by their lif inspect(head(sort(rules, by = "lif"), n = 17)) freq_is <- apriori(fp.trans, parameter = list(target = "frequent itemsets", support = 0.2)) inspect(head(sort(freq_is, by = "support"), n = 10))

> inspect(head(sort(rules, by = "lift"), n = 30))

lhs rhs support confidence lift count [1] {bread,diaper} => {beer} 0.2 1.0000000 1.666667 1 [2] {beer,coke} => {diaper} 0.2 1.0000000 1.666667 1 [3] {beer,milk} => {diaper} 0.4 1.0000000 1.666667 2 [4] {beer,bread,milk} => {diaper} 0.2 1.0000000 1.666667 1 [5] {bread,diaper,milk} => {beer} 0.2 1.0000000 1.666667 1 [6] {beer,coke,milk} => {diaper} 0.2 1.0000000 1.666667 1 [7] {coke} => {milk} 0.6 1.0000000 1.250000 3 [8] {milk} => {coke} 0.6 0.7500000 1.250000 3 [9] {diaper} => {milk} 0.6 1.0000000 1.250000 3 [10] {milk} => {diaper} 0.6 0.7500000 1.250000 3 [11] {bread,coke} => {milk} 0.2 1.0000000 1.250000 1 [12] {bread,diaper} => {milk} 0.2 1.0000000 1.250000 1 [13] {beer,coke} => {milk} 0.2 1.0000000 1.250000 1 [14] {coke,diaper} => {milk} 0.4 1.0000000 1.250000 2 [15] {beer,diaper} => {milk} 0.4 1.0000000 1.250000 2 [16] {beer,bread,diaper} => {milk} 0.2 1.0000000 1.250000 1 [17] {beer,coke,diaper} => {milk} 0.2 1.0000000 1.250000 1 [18] {bread} => {beer} 0.4 0.6666667 1.111111 2 [19] {beer} => {bread} 0.4 0.6666667 1.111111 2 [20] {coke} => {diaper} 0.4 0.6666667 1.111111 2 [21] {diaper} => {coke} 0.4 0.6666667 1.111111 2 [22] {beer} => {diaper} 0.4 0.6666667 1.111111 2 [23] {diaper} => {beer} 0.4 0.6666667 1.111111 2 [24] {coke,milk} => {diaper} 0.4 0.6666667 1.111111 2 [25] {diaper,milk} => {coke} 0.4 0.6666667 1.111111 2 [26] {diaper,milk} => {beer} 0.4 0.6666667 1.111111 2

>

inspect(head(sort(freq_is, by = "support"), n = 10))

items support count [1] {milk} 0.8 4 [2] {bread} 0.6 3 [3] {coke} 0.6 3 [4] {beer} 0.6 3 [5] {diaper} 0.6 3 [6] {coke,milk} 0.6 3 [7] {diaper,milk} 0.6 3 [8] {beer,bread} 0.4 2 [9] {bread,milk} 0.4 2 [10] {coke,diaper} 0.4 2

Praktik Association Rules - LMS-SPADA INDONESIA