JURIKOM (Jurnal Riset Komputer), Vol. 8 No. 6, Desember 2021 e-ISSN 2715-7393 (Media Online), p-ISSN 2407-389X (Media Cetak) DOI 10.30865/jurikom.v8i6.3574 Hal 382−385
http://ejurnal.stmik-budidarma.ac.id/index.php/jurikom |Page 382
Comparative Analysis of Apriori Algorithm and Hash-Based Algorithm in Market Basket Analysis
Ahmad Ari Aldino1, Arfinia Rahma2, Damayanti2, Setiawansyah1,*
1 Faculty of Engineering and Computer Science, Informatics, Universitas Teknokrat Indonesia, Bandar Lampung, Indonesia
2 Faculty of Engineering and Computer Science, Information System, Universitas Teknokrat Indonesia, Bandar Lampung, Indonesia Email: 1[email protected], 2[email protected], 3[email protected], 4,*[email protected]
Email Penulis Korespondensi: [email protected] Submitted 12-06-2021; Accepted 20-12-2021; Published 30-12-2021
Abstract
Grocery stores are now experiencing competition in the business world that is getting tighter, making businesses have to think hard in developing strategies to face competition. In developing strategies that benefit companies can take advantage of information technology.
Information technology can help business companies in conducting their business. In this case, business companies can utilize the data generated by information systems to assist in decision making if processed correctly; such data can produce valuable information. Data Mining is the process of using artificial intelligence mathematical statistics techniques and Machine Learning to extract and identify useful information and related knowledge from various large databases/ Data Warehouse (Kennedi Tampubolon, 2013). In this study, researchers used a priori algorithm and Hash-Based Algorithm to determine consumer spending patterns or consumer shopping cart data used as much as 1023 transaction data with a minimum value of 0.03 and Confidence of 0.5. This study resulted in an Apriori algorithm producing seven rules and forming a combination of 2 items with a rule strength of 13.14% and accuracy of 92.80%. Hash-Based Algorithm 7 Rule developed as many as two itemsets with a rule strength of 14.35%and formed an accuracy of 107.76%. From the results of the algorithm, comparison can be concluded that Hash-Based Algorithm is better than Apriori algorithm.
Keywords: Data Mining; Association Rules; Market Basket Analysis; Apriori Algorithm; Hash-Based Algorithm; Rapid Miner
1. INTRODUCTION
Grocery stores are now experiencing competition in the business world that is getting tighter, making businesses have to think hard in developing strategies to face competition. In developing strategies that benefit companies can take advantage of information technology[1][2]. Information technology can help business companies in conducting their business. In this case, business companies can utilize the data generated by information systems to assist in decision making if processed correctly; such data can produce valuable information[3].
Reymart is a shop that sells a wide variety of goods such as household appliances, home appliances, accessories, makeup, clothing, groceries, and others—established in 2016 which is addressed on manggris street, north Kota Bumi district of North Lampung regency. Reymart store has consumer purchase transactions that happen every day, making a lot of transaction data left alone and just making the data archived.
Based on consumer purchasing, data can be processed into data that can support decision-makers. The amount of purchase transaction data on Reymart can be used as a consumer shopping pattern. in processing the information is required an algorithm to process the data of existing consumer purchase transactions to produce a way of consumer shopping carts
The a priori algorithm is one basic algorithm proposed by Agrawal andSrikant in 1994 to find frequent itemsets on association rules. The main idea of the a priori algorithm is to first look for frequent itemsets (sets of items - items that meet the minimum support of the transaction database[4]. Second, eliminate itemsets with low frequencies based on a predetermined minimum level of support. Next, build the association rules of itemset that meet the minimum confidence value in the database [3], [5].
Whilehash-based algorithms use hashing techniques to filter out itemsets that are not important to generate the next itemset. When the support for a k-itemset candidate is calculated by searching the database, hash-based algorithm collects information about (k-1)-itemset using all possibilities (k1)-itemset is divided into a hash table using a hash function( which uses a prime number for modulo operation)[6][7]. The second part is done as long as the hash bucket value exceeds the supported minimum. This limit passed by the hash-based algorithm was replaced with the a priori algorithm because it is no more efficient than the a priori algorithm[8], [9].
2. RESEARCH METHODOLOGY
2.1 Research Framework 1. Problem
The research phase begins with the determination of research problems, namely not yet knowing the pattern of consumer spending that occurs in Reymart stores
2. Opportunity
The opportunity found is to know the pattern of purchase so that there is a link between the items 3. Approach
Referred to this study is how researchers approach existing problems to find solutions in this study, among others, through the association/market basket analysis method to know the pattern of customer purchases.
JURIKOM (Jurnal Riset Komputer), Vol. 8 No. 6, Desember 2021 e-ISSN 2715-7393 (Media Online), p-ISSN 2407-389X (Media Cetak) DOI 10.30865/jurikom.v8i6.3574 Hal 382−385
http://ejurnal.stmik-budidarma.ac.id/index.php/jurikom |Page 383 4. Identification &Assessment
5. referred to in this study is related to the attributes that will be used in this research so that the results that will be processed can be in accordance with the expected purpose of sales transaction rules
6. Proposed
The proposal presented in this study is to analyze transaction data with the market basket analysis method.
7. Evaluation
In this study, researchers used the Apriori algorithm and hash-based algorithm to make comparisons using Rapidminer 9.8.
8. Result
This study resulted in frequent itemset and association rules 2.2 Research Stages
1. Stage 1: Literature and Literature Study a. Literature Review
Book, Journal b. Data collection
Observation, Document, Interview c. Identification
Previous Research, Problem Definition, Scope of research 2. Stage 2: Analysis and Determination of Attributes
a. The need to know consumer spending patterns b. Identify consumer spending patterns
c. The method used for identification is basket analysis
d. It is necessary to compare the Apriori and Hash-Based algorithms 3. Stage 3: Data Selection
a. Categorization b. Cleaning c. Transformation 4. Stage 4: Data Processing
a. Comparing the Apriori Hash-Based algorithm and using Rapid Miner 5. Stage 5: Closing
a. Conclusion: The results of market basket analysis with Apriori and Hash-Based algorithms are the itemset frequency b. Suggestion: For market basket analysis with Apriori and Hash-Based algorithms
3. RESULT AND DISCUSSION
To get the results that the author wants to look for, the author uses Rapid Miner 9.8 to implement A priori and Hash-Based Algorithms with minimum support of 2% and minimum confidence of 50%. And the data used 3,425 transaction data and 557 items from December 2020 to February 2021. First will be done Preprocessing, solving with A priori and Hash-Based algorithms until the establishment of association rules[10].
3.1 Discussion
After applying the Apriori algorithm and Hash-Based Algorithm using Rapid Miner 9.8 using the same Minimum Support and Minimum Confidence Values, there are seven relationship rules generated as:
1. If a Paddle Pop buyer buys Jelly Mia - Mia a. Apriori Algorithm:
- Support: 3.00%
- Confidence: 81.00%
- Accuracy: 2.43%
b. Hash-Based - Support: 3.71%
- Confidence: 80.85%
- Accuracy: 3.00%
2. If you buy Jelly Mia – Mia, then buy Paddle Pop a. Apriori Algorithm:
- Support: 3.00%
- Confidence: 72.00%
- Accuracy: 2.16%
b. Hash-Based - Support: 3.71%
- Confidence: 64.41%
- Accuracy: 2.39%
3. If you buy Zinc, then buy French Fries with Tomato Sauce a. Apriori Algorithm:
- Support: 3.00%
- Confidence: 65.00%
- Accuracy: 1.95%
b. Hash-Based - Support: 3.23%
- Confidence: 64.71%
- Accuracy: 2.09%
4. If you buy ABC Sambal Original, then buy Indomie
a. Apriori Algorithm: - Support: 3.00%
JURIKOM (Jurnal Riset Komputer), Vol. 8 No. 6, Desember 2021 e-ISSN 2715-7393 (Media Online), p-ISSN 2407-389X (Media Cetak) DOI 10.30865/jurikom.v8i6.3574 Hal 382−385
http://ejurnal.stmik-budidarma.ac.id/index.php/jurikom |Page 384 - Confidence: 62.00%
- Accuracy: 1.86%
b. Hash-Based
- Support: 3.13%
- Confidence: 60.38%
- Accuracy: 1.89%
5. If you buy Citra Sari Roti, then buy Lifebuoy a. Apriori Algorithm:
- Support: 3.00%
- Confidence: 54.00%
- Accuracy: 1.62%
b. Hash-Based - Support: 3.13%
- Confidence: 54.24%
- Accuracy: 1.70%
6. If you buy Lifebouy, then buy CitraSari Roti a. Apriori Algorithm:
- Support: 3.00%
- Confidence: 52.00%
- Accuracy: 1.56%
b. Hash-Based - Support: 3.13%
- Confidence: 52.46%
- Accuracy: 1.64%
7. If you buy Sedaap then buy French Fries with Tomato Sauce a. Apriori Algorithm:
- Support: 3.00%
- Confidence: 52.00%
- Accuracy: 1.56%
b. Hash-Based - Support: 3.13%
- Confidence: 53.33%
- Accuracy: 1.64%
Based on the data, the value of the strength level of the association rules of the Apriori algorithm is 13.14%. It can be seen that the algorithm with a higher level of association rules is Apriori 92.80%. After obtaining the strength association rule level, then calculating the accuracy value for both algorithms using the accuracy formula (Gunadi and Indra, 2012)
∑Support Algoritma A
∑Support Algoritma B (1)
1. Accuracy level of Apriori algorithm
0,21
0,2263=0,927972=92,80%
2. Hash-Based Algorithm Accuracy Level
0,02263
0,21 =1,077619= 107,76%
Based on the calculations above, the accuracy rate of the Apriori algorithm is 92.80% san the hash-based algorithm is 107.76%. So hash-based algorithms are more significant than a prior algorithms.
4.1 Implementation of Apriori algorithm
Entering the Minimum Support (M) value of 3% and Minimum Confidence 50% that has been determined generates rules such as images.
Figure 1. Result of apriori algorithm rules.
The results of the Algorithm show that seven rules result from the Apriori algorithm that meets the minimum support requirements and minimum confidence, the rules of the items can be formed; the following are the rules formed:
1) If you buy Paddle Pop, then buy Jelly Mia Mia with a Support value of 3.00 % and a Confidence value of 81.00%
2) If you buy Jelly Mia- Mia, then buy Paddle Pop with a Support value of 3.00 %, Confidence value of 72.00%
3) If buying Zinc, then buy French Fries Sambal Tomat with Support 3.00 % Confidence value 65.00%
4) If you buy ABC sambal asli, then buy Indomie with Support 3.00 % Confidence value 62.00%
5) If buying Citra Sari Roti, then buy Lifeboy stem with Support 3.00 % Confidence value 54.00%
6) If buying Lifeboy Batang, then buy Citra Sari Roti with Support 3.00 % Confidence value 52.00%
7) If buying Sedaap, then buy French Fries Sambal Tomat with Support 3.00 % Confidence value 52.00%
JURIKOM (Jurnal Riset Komputer), Vol. 8 No. 6, Desember 2021 e-ISSN 2715-7393 (Media Online), p-ISSN 2407-389X (Media Cetak) DOI 10.30865/jurikom.v8i6.3574 Hal 382−385
http://ejurnal.stmik-budidarma.ac.id/index.php/jurikom |Page 385 4.2 Hash-Based Algorithm Implementation
By entering a Minimum Support Value of 4% and Minimum Confidence of 40% that meets the minimum requirements, it can be seen the results of Hash-Based Algorithms such as images
Figure 2. Result Rules-based Hash Algorithm
The results of the Establishment of the Association show that seven rules are resulting from Hash-Based Algorithms that meet the minimum support requirements and minimum confidence that can be formed rules of the items; the following are rules formed:
1) If you buy Citra Sari Roti, then buy Lifebouy Batang with a Support value of 3.13% and a Confidence value of 54.24%
2) If you buy Abc Sambal Asli, then buy Indomie with a Support value of 3.13% and a Confidence value of 60.38%
3) If you buy Jelly Mia- Mia, then buy Paddle Pop with a Support value of 3.71% and a Confidence value of 64.41%
4) If buying lifebuoy Batang then buy Citra Sari Roti with a Support value of 3.13 % n Confidence value of 52.46%
5) If you buy Paddle Pop, then buy Jelly Mia-Mia with a Support value of 3.71% and a Support Confidence value of 80.85%
6) If buying Zinc, then buy French Fries Sambal Tomat with Support value 3.23%Confidence value 65.71%
7) If you buy Sedaap, then buy French Fries Sambal Tomat with a Support value of 3.13 %, Confidence value of 53.33
4. CONCLUSION
The results of the study concluded, The Market Basket Analysis method or the Consumer Shopping Basket using the Apriori Algorithm and Hash-Based Algorithm can be used to assist Reymart Stores in knowing consumer shopping patterns and items that consumers often purchase. By using Rapid Miner 9.8 and Ms. Excel in applying the Apriori Algorithm and Hash- Based Algorithm produces ten association rules for Apriori and 6 for Hash-Based. The rules resulting from applying the Apriori Algorithm and the Hash-Based algorithm can be used as a stock for the items listed in the rules. There are weaknesses in each algorithm after a comparison is made as follows: a) The Support value generated by the Apriori Algorithm is lower than the Support value of the Hash-Based Algorithm; b) The accuracy value produced by the Apriori Algorithm is lower than the Hash - Based AlgorithmBagian ini berisi kesimpulan yang menjawab hal segala permasalahan yang terdapat didalam penelitian.
REFERENCES
[1] O. S. A. Destiyati and E. Aribowo, “Analisis Perbandingan Algoritma Apriori Dan Algoritma Hash Based Pada Market Basket Analysis Di Apotek UAD,” JSTIE (Jurnal Sarj. Tek. Inform., vol. 3, no. 1, pp. 1–10, 2015.
[2] S. P. Adithama, F. K. S. Dewi, and E. Hariyadi, “Penerapan Algoritma Apriori dan Fuzzy Tsukamoto untuk Rekomendasi Jumlah Pembelian Barang dan Promo pada Toko Serba Ada (Implementation of Apriori and Fuzzy Tsukamoto Algorithms for Number of Purchases of Goods and Promo Recommendation at Convenience Store,” JUITA J. Inform., vol. 8, no. 2, pp. 261–270, 2020.
[3] D. Listriani, A. H. Setyaningrum, and F. Eka, “Penerapan Metode Asosiasi Menggunakan Algoritma Apriori Pada Aplikasi Analisa Pola Belanja Konsumen (Studi Kasus Toko Buku Gramedia Bintaro),” J. Tek. Inform., vol. 9, no. 2, 2016.
[4] A. G. Novianti and D. Prasetyo, “Penerapan Algoritma K-Nearest Neighbor (K-NN) untuk Prediksi Waktu Kelulusan Mahasiswa,”
in Seminar Nasional APTIKOM (SEMNASTIKOM), 2017, pp. 108–113.
[5] N. Nurdin and D. Astika, “Penerapan Data Mining Untuk Menganalisis Penjualan Barang Dengan Menggunakan Metode Apriori Pada Supermarket Sejahtera Lhokseumawe,” TECHSI-Jurnal Tek. Inform., vol. 7, no. 1, pp. 132–155, 2019.
[6] A. K. Prasidya and C. Fibriani, “Analisis Kaidah Asosiasi Antar Item Dalam Transaksi Pembelian Menggunakan Data Mining Dengan Algoritma Apriori (Studi Kasus: Minimarket Gun Bandungan, Jawa Tengah),” JUTI J. Ilm. Teknol. Inf, vol. 15, no. 2, p.
173, 2017.
[7] P. I. Purnamasari, F. Marisa, and I. D. Wijaya, “SISTEM PENDUKUNG KEPUTUSAN REKOMENDASI PAKET MENU MENGGUNAKAN ALGORITMA APRIORI DENGAN METODE MARKET BASKET ANALISIS PADA KINGS FOOD KENDARI,” J. SPIRIT, vol. 11, no. 1, 2019.
[8] D. Setianingsih and R. F. Hakim, “Penerapan data mining dalam analisis kejadian tanah longsor di indonesia dengan menggunakan association rule algoritma apriori,” in Prosiding Seminar Nasional Matematika dan Pendidikan Matematika UMS, 2015, vol. 2015, pp. 731–741.
[9] K. Ummi, “Analisa Data Mining Dalam Penjualan Sparepart Mobil Dengan Menggunakan Metode Algoritma Apriori (Studi Kasus:
Di Pt. Idk 1 Medan),” CSRID (Computer Sci. Res. Its Dev. Journal), vol. 8, no. 3, pp. 155–164, 2016.
[10] T. D. Prakoso, “Penemuan Pola Asosiasi Pada Data Restoran Menggunakan Algoritma Hash Based,” Senamika, vol. 1, no. 1, pp.
71–80, 2020.