Praktik Association Rules
[email protected]
Dataset Faceplate.csv
• Jalankan program berikut
• Berikan instruksi :
# install.packages("arules") library(arules)
fp.df <- read.csv("Faceplate.csv")
# remove first column and convert to matrix fp.mat <- as.matrix(fp.df[, -1])
# convert the binary incidence matrix into a transactions database fp.trans <- as(fp.mat, "transactions")
inspect(fp.trans)
## get rules
# when running apriori(), include the minimum support, minimum confidence, and target
# as arguments.
rules <- apriori(fp.trans, parameter = list(supp = 0.2, conf = 0.5, target = "rules"))
# inspect the first six rules, sorted by their lif inspect(head(sort(rules, by = "lif"), n = 6))
Possible rule : if red, then white
meaning that if a red faceplate is purchased, a white one is, too.
inspect(fp.trans)
Transaction,Red,White,Blue,Orange,Green,Yellow 1,1,1,0,0,1,0
2,0,1,0,1,0,0 3,0,1,1,0,0,0 4,1,1,0,1,0,0 5,1,0,1,0,0,0 6,0,1,1,0,0,0 7,1,0,1,0,0,0 8,1,1,1,0,1,0 9,1,1,1,0,0,0 10,0,0,0,0,0,1
Interpreting the Results
• For example, we can read rule #4 as follows:
If orange is purchased, then with confidence 100% white will also be
purchased. This rule has a lif ratio of 1.43.
> inspect(head(sort(rules, by = "lift"), n = 6))
lhs rhs support confidence lift count [1] {Red,White} => {Green} 0.2 0.5 2.500000 2 [2] {Green} => {Red} 0.2 1.0 1.666667 2 [3] {White,Green} => {Red} 0.2 1.0 1.666667 2 [4] {Orange} => {White} 0.2 1.0 1.428571 2 [5] {Green} => {White} 0.2 1.0 1.428571 2 [6] {Red,Green} => {White} 0.2 1.0 1.428571 2
{Red,White} => {Green}
antecedent consequent
Confidence = 2/4 = 0.5 Lif ratio = 0.5 / 0.2 = 2.5
Rule selection
• The goal is to find only the rules that indicate a strong dependence between the antecedent and consequent itemsets.
• To measure the strength of association implied by a rule, we use the
measures of confidence and lif ratio.
Implementation of association
rules
RECOMMENDATIONS UNDER “FREQUENTLY BOUGHT TOGETHER” ARE BASED ON
ASSOCIATION RULES
• Users who buy this item ofen buy that item as well
• Users who watched James Bond movies, also watched Jason Bourne movies.
Recommendations :
Dataset Movies
• Survey terhadap 6 orang untuk menilai 4 buah film
• Jalankan program berikut
# install.packages("arules") library(arules) fp.df <- read.csv("movies.csv")
# remove first column and convert to matrix fp.mat <- as.matrix(fp.df[, -1])
# convert the binary incidence matrix into a transactions database fp.trans <- as(fp.mat, "transactions")
inspect(fp.trans)
## get rules
# when running apriori(), include the minimum support, minimum confidence, and target
# as arguments.
rules <- apriori(fp.trans, parameter = list(supp = 0.2, conf = 0.5, target = "rules"))
# inspect the first six rules, sorted by their lif inspect(head(sort(rules, by = "lif"), n = 6))
> inspect(head(sort(rules, by = "lift"), n = 6))
lhs rhs support confidence lift count
[1] {Interstellar,Hobbit} => {Frozen} 0.5 0.75 1.5 3 [2] {ImitationGame} => {Interstellar} 0.5 1.00 1.2 3 [3] {Interstellar} => {ImitationGame} 0.5 0.60 1.2 3 [4] {Frozen} => {Hobbit} 0.5 1.00 1.2 3 [5] {Hobbit} => {Frozen} 0.5 0.60 1.2 3 [6] {Frozen} => {Interstellar} 0.5 1.00 1.2 3
Survey,ImitationGame,Frozen,Interstellar,Hobbit 1,1,0,1,0
2,0,2,2,2 3,0,0,0,1 4,1,2,3,2 5,1,0,1,1 6,0,2,2,3
Dataset Charles Book Club
ChildBks,YouthBks,CookBks,DoItYBks,cefBks,ArtBks,GeogBks,ItalCook,ItalAtlas,ItalArt,Florence 0,1,0,1,0,0,1,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0 0,0,0,0,0,0,0,0,0,0,0 1,1,1,0,1,0,1,0,0,0,0 0,0,1,0,0,0,1,0,0,0,0 1,0,0,0,0,1,0,0,0,0,1 0,1,0,0,0,0,0,0,0,0,0 0,1,0,0,1,0,0,0,0,0,0 1,0,0,1,0,0,0,0,0,0,0 1,1,1,0,0,0,1,0,0,0,0 0,0,0,0,0,0,0,0,0,0,0
# install.packages("arules") library(arules)
all.books.df <- read.csv("CharlesBookClub.csv")
# create a binary incidence matrix count.books.df <- all.books.df[, 8:18]
incid.books.df <- ifelse(count.books.df > 0, 1, 0) incid.books.mat <- as.matrix(incid.books.df[, -1])
# convert the binary incidence matrix into a transactions database books.trans <- as(incid.books.mat, "transactions")
inspect(books.trans)
# plot data
itemFrequencyPlot(books.trans)
# run apriori function rules <- apriori(books.trans,
parameter = list(supp= 200/4000, conf = 0.5, target = "rules"))
# inspect rules
inspect(sort(rules, by = "lif"))
https://raw.githubusercontent.com/prnvg/Charles-Book-Club/master/CBC.csv
> inspect(sort(rules, by = "lift"))
lhs rhs support confidence lift count [1] {DoItYBks,GeoBks} => {YouthBks} 0.05450 0.5396040 2.264864 218 [2] {CookBks,GeoBks} => {YouthBks} 0.08025 0.5136000 2.155719 321 [3] {CookBks,RefBks} => {DoItYBks} 0.07450 0.5330948 2.092619 298 [4] {YouthBks,GeoBks} => {DoItYBks} 0.05450 0.5215311 2.047227 218 [5] {YouthBks,CookBks} => {DoItYBks} 0.08375 0.5201863 2.041948 335 [6] {YouthBks,RefBks} => {CookBks} 0.06825 0.8400000 2.021661 273 [7] {YouthBks,DoItYBks} => {GeoBks} 0.05450 0.5278450 1.978801 218 [8] {YouthBks,DoItYBks} => {CookBks} 0.08375 0.8111380 1.952197 335 [9] {DoItYBks,RefBks} => {CookBks} 0.07450 0.8054054 1.938400 298 [10] {RefBks,GeoBks} => {CookBks} 0.06450 0.7889908 1.898895 258 [11] {YouthBks,GeoBks} => {CookBks} 0.08025 0.7679426 1.848237 321 [12] {DoItYBks,GeoBks} => {CookBks} 0.07750 0.7673267 1.846755 310 [13] {YouthBks,ArtBks} => {CookBks} 0.05150 0.7410072 1.783411 206 [14] {DoItYBks,ArtBks} => {CookBks} 0.05300 0.7114094 1.712177 212 [15] {RefBks} => {CookBks} 0.13975 0.6825397 1.642695 559 [16] {ArtBks,GeoBks} => {CookBks} 0.05525 0.6800000 1.636582 221 [17] {YouthBks} => {CookBks} 0.16100 0.6757608 1.626380 644 [18] {DoItYBks} => {CookBks} 0.16875 0.6624141 1.594258 675 [19] {ItalCook} => {CookBks} 0.06875 0.6395349 1.539193 275 [20] {GeoBks} => {CookBks} 0.15625 0.5857545 1.409758 625 [21] {ArtBks} => {CookBks} 0.11300 0.5067265 1.219558 452
> inspect(head(sort(rules, by = "lif"), n = 6))
lhs rhs support confidence lif count
[1] {CookBks} => {GeogBks} 0.2727273 1.00 2.75 3
[2] {GeogBks} => {CookBks} 0.2727273 0.75 2.75 3
[3] {GeogBks} => {YouthBks} 0.2727273 0.75 1.65 3
[4] {YouthBks} => {GeogBks} 0.2727273 0.60 1.65 3
User-Based Collaborative Filtering
• Algorithm :
• Find users who are most similar to the user of interest (neighbors). This is done by
comparing the preference of our user to the preferences of other users.
• Considering only the items that the user has not yet purchased, recommend the ones that are most preferred by the user’s neighbors.
• This is the approach behind Amazon’s
“Customers Who Bought This Item Also
Bought…”
Dataset Bread and Milk
transaksi,barang1,barang2,barang3,barang4 1,bread,coke,milk,
2,beer,bread,, 3,beer,coke,diaper,milk 4,beer,bread,diaper,milk 5,coke,diaper,milk,
# install.packages("arules") library(arules)
# install.packages("stringr") library(stringr) data <- read.csv("bread-milk.csv") datax <- data[,c(1,2)]
colnames(datax) <- c("transaksi", "barang") for (i in c(3:dim(data)[2])) { data1 <- data[,c(1,i)]
colnames(data1) <- c("transaksi", "barang") datax <- rbind(datax, data1)
}
## menghapus barang kosong dataX <- datax[1,]
for (i in c(2:dim(datax)[1])) { data1 <- datax[i,]
mystr <- data1$barang lenstr <- str_length(mystr) if (lenstr>0) { dataX <- rbind(dataX, data1) print (sprintf ("[%s - %d]", data1$barang, lenstr)) }
}
X <- with(dataX, table(transaksi,barang))
## Massage it into the format you're wanting hasilz <- cbind(name = rownames(X), apply(X, 2, as.numeric))
# remove first column and convert to matrix fp.matx <- as.matrix(hasilz[,-1]) nmkol <- colnames(fp.matx)
#--- aa <- as.numeric(factor(fp.matx)) bb <- matrix(aa, nrow = dim(fp.matx)[1],ncol = dim(fp.matx)[2]) cc <- bb - 1 colnames(cc) <- nmkol fp.mat <- cc
#---
# convert the binary incidence matrix into a transactions database fp.trans <- as(fp.mat, "transactions") inspect(fp.trans)
## get rules
# when running apriori(), include the minimum support, minimum confidence, and target
# as arguments.
rules <- apriori(fp.trans, parameter = list(supp = 0.2, conf = 0.5, target = "rules"))
# inspect the first six rules, sorted by their lif inspect(head(sort(rules, by = "lif"), n = 17)) freq_is <- apriori(fp.trans, parameter = list(target = "frequent itemsets", support = 0.2)) inspect(head(sort(freq_is, by = "support"), n = 10))
> inspect(head(sort(rules, by = "lift"), n = 30))
lhs rhs support confidence lift count [1] {bread,diaper} => {beer} 0.2 1.0000000 1.666667 1 [2] {beer,coke} => {diaper} 0.2 1.0000000 1.666667 1 [3] {beer,milk} => {diaper} 0.4 1.0000000 1.666667 2 [4] {beer,bread,milk} => {diaper} 0.2 1.0000000 1.666667 1 [5] {bread,diaper,milk} => {beer} 0.2 1.0000000 1.666667 1 [6] {beer,coke,milk} => {diaper} 0.2 1.0000000 1.666667 1 [7] {coke} => {milk} 0.6 1.0000000 1.250000 3 [8] {milk} => {coke} 0.6 0.7500000 1.250000 3 [9] {diaper} => {milk} 0.6 1.0000000 1.250000 3 [10] {milk} => {diaper} 0.6 0.7500000 1.250000 3 [11] {bread,coke} => {milk} 0.2 1.0000000 1.250000 1 [12] {bread,diaper} => {milk} 0.2 1.0000000 1.250000 1 [13] {beer,coke} => {milk} 0.2 1.0000000 1.250000 1 [14] {coke,diaper} => {milk} 0.4 1.0000000 1.250000 2 [15] {beer,diaper} => {milk} 0.4 1.0000000 1.250000 2 [16] {beer,bread,diaper} => {milk} 0.2 1.0000000 1.250000 1 [17] {beer,coke,diaper} => {milk} 0.2 1.0000000 1.250000 1 [18] {bread} => {beer} 0.4 0.6666667 1.111111 2 [19] {beer} => {bread} 0.4 0.6666667 1.111111 2 [20] {coke} => {diaper} 0.4 0.6666667 1.111111 2 [21] {diaper} => {coke} 0.4 0.6666667 1.111111 2 [22] {beer} => {diaper} 0.4 0.6666667 1.111111 2 [23] {diaper} => {beer} 0.4 0.6666667 1.111111 2 [24] {coke,milk} => {diaper} 0.4 0.6666667 1.111111 2 [25] {diaper,milk} => {coke} 0.4 0.6666667 1.111111 2 [26] {diaper,milk} => {beer} 0.4 0.6666667 1.111111 2
>
inspect(head(sort(freq_is, by = "support"), n = 10))items support count [1] {milk} 0.8 4 [2] {bread} 0.6 3 [3] {coke} 0.6 3 [4] {beer} 0.6 3 [5] {diaper} 0.6 3 [6] {coke,milk} 0.6 3 [7] {diaper,milk} 0.6 3 [8] {beer,bread} 0.4 2 [9] {bread,milk} 0.4 2 [10] {coke,diaper} 0.4 2