Data Clustering Mining Applying the K-Means Algorithm, Cervical Cancer Behavior Risk

(1)

Data Clustering Mining Applying the K-Means Algorithm, Cervical Cancer Behavior Risk

Ridha Maya Faza Lubis^1,*, Jen-Peng Huang², Pai-Chou Wang², Kiki Khoifin¹, Mula Sigiro³, Joel Panjaitan⁴

1Department of Business and Management, Southern Taiwan University of Science and Technology, Tainan, Taiwan

2Department of Information Management, Southern Taiwan University of Science and Technology, Tainan, Taiwan

3Department of Physics Education, University of HKBP Nommensen, Medan, Indonesia

4Department of Electrical Engineering, Academy of Deli Serdang Engineering, Medan, Indonesia Email: ^1,*[email protected], ²[email protected], ³[email protected], ⁴[email protected],

5[email protected], ⁶[email protected] Correspondence Author Email: [email protected]

Abstract−Nowadays, cancer is often heard as a topic of conversation for both men and women in Indonesia and even in the world, in addition to the symptoms that are not too significant and also the lack of public awareness to carry out periodic health checks, which has a negative impact on health. This lack of care is also caused by several factors, namely the lack of the community's economy, too busy with work (other matters) and even some people are not ready to know and accept the disease they are suffering from. Based on all the factors causing the reluctance of medical examinations, of course, it requires us to carry out examinations so that we can prevent and treat them early if they are diagnosed with certain diseases. There are several cancers with predominant sufferers and even only suffered by women, one of which is cervical cancer. In 2020 it is estimated that cases of cervical cancer will increase by 3.4% from 6.6% in 2018 to 9% and even cervical cancer will also become the third deadly disease in women after breast cancer and lung cancer. From this it can be seen that the percentage of deaths caused by cervical cancer is always increasing. Therefore, to reduce the high mortality rate, a clustering technique was carried out to group the data into their respective clusters based on the similarity of characteristics between one data and another. The algorithm used is K-Means with the rapid miner tester application. The final result obtained is that cluster 1 has more data and it is stated that out of 72 data on Cervical Cancer only 28 are declared as sufferers of Cervical Cancer and 44 other data are not.

Keywords: Data Mining; Clustering; K-Means; Cervical Cancer

1. INTRODUCTION

In addition to the symptoms that are not too serious and the lack of public awareness to carry out periodic health checks, which has a negative effect on health, cancer is a topic of debate for both men and women in Indonesia and even around the world today. Numerous other factors, such as a weak local economy, being overburdened at work or with other obligations, and even a lack of readiness on the part of some individuals to recognize and accept their disease, contribute to the lack of care. Given all the variables that contribute to people's resistance to medical exams, it is obvious that we must perform them in order to detect diseases early and begin treatment. Cervical cancer is one of many diseases that primarily affect women and even only affects them.

Cervical cancer is already the third most lethal disease for women after lung and breast cancer, with an estimated 3.4% rise in cases from 6.6% in 2018 to 9% in 2020. This indicates that the proportion of fatalities brought on by cervical cancer is continuously rising [1]. Therefore, early prevention, diagnosis, and treatment initiatives can be implemented to lower the high mortality rate. It is possible to prevent and identify cervical cancer by gathering data about its prevalence. However, since there is a lot of data to sort through and diagnose, this study was created to make it simpler for readers or medical professionals to categorize symptoms by the group.

Clustering is the term for this gathering. The K-medoids algorithm, K-Means, AHC, and other clustering algorithms can all be used in data mining to group an item. The K-Means algorithm was used in this research to cluster cases of cervical cancer.

K-Means The data mining technique known as clustering is widely applied and also very simple to comprehend when applied. Unsupervised modeling is used to carry out the application process, and it functions by grouping the data used into one group first based on similarities in data characteristics and then creating additional groups for groups that vary from other data groups in terms of those characteristics [2][3].

In their earlier study from 2019, researchers Rizki Muliono and Zulfikar Sembiring looked at a problem with paying teachers' allowances after they had created and finished teaching materials like syllabuses, lecture contracts, lesson plans, and lesson plans. By using the K-Means algorithm clustering technique, LP2MP was able to perform the assessment in a more efficient manner by grouping the data based on values that were near to the characteristics. The accuracy degree of the study's conclusions showed a difference of 53.33 percent [4]. Hendro Priyatman, et al. conducted additional research in 2019 using the K-means clustering algorithm, which is the same clustering algorithm used in this study. K-means can be used with prediction clusters in addition to cluster clustering to use the findings of the clustering to predict an object in the future. Researchers in this study forecasted when students would graduate from a higher education school. With the results of the research, the algorithm is claimed to be able to display graduation information, but to determine the time of graduation it cannot because graduation relies on the students themselves. The iteration process that is conducted is quite brief, only doing 2 iterations[5]. This research was carried out by determining based on classification techniques, researchers by Siti

(2)

Silvia Arifin, et al in 2021 who used the SVM algorithm to classify cervical cancer into two classes, namely data stated positive and negative. Subsequent research with the same problem, namely determining cervical cancer with a different technique in this study. 59 training datasets from 72 databases are used with 4 out of 19 characteristics.

With an accuracy rate of 92.9%, the classification results are rated as being very excellent, and the python test results are valued at 87% [6]. In a study published in 2020 by Andrian, et al., two classification techniques for detecting cervical cancer were compared. According to the results of the level of accuracy obtained, the KNN method is judged to be more sophisticated and superior than the Random Forest method with a high level of accuracy. This comparison is based on the level of accuracy obtained from each of these methods so that one method can be used as a pattern detector for cases of cervical cancer sufferers. of 214 test samples, of 90.6% for KNN and 88.7% for Random forest [7].

2. RESEARCH METHODOLOGY

2.1 Research Stages

Figure 1 below is the stages of the research conducted.

Figure 1. Research Stages More explanation of figure 1 as follows:

a. Problem Analysis: At this step, the researcher analyzes the difficulties that arise and also does a literature review to make it easier to prepare the following procedure.

b. The dataset can be accessed from many dataset supplier websites, as well as through doing observations or interviews.

c. Preprocessing is used to correct data that is inappropriate as well as to fill in missing values when they occur during data processing (ambiguous). Preprocessing is thought to be necessary before beginning the data mining process.

d. Implementation of the Clusterization Algorithm, in which the K-Means algorithm and the rapidminer program are both used to describe the clustering process.

e. To sum up, the K-Means technique is used to create the final outcomes of the cervical cancer clustering procedure in a study.

2.2 Data Mining

The phrase "data mining" refers to the process of using fresh knowledge discovered in databases as valuable information that can aid in decision-making. Data mining is a technique that can be used to develop models and patterns that, at times, are just right for usage in addition to being able to learn new information. Moreover, data mining can be applied to massive databases that need to be reorganized in order to find new knowledge, as well as machine learning, artificial intelligence, statistics, and mathematics [8–11]. More details can be seen in Figure 2.

Figure 2. Slices of Data Mining Disciplines

(3)

Data mining has now advanced to the point where it contributes significantly to the solution of a problem.

On the basis of hundreds or even thousands of data points, data mining can also be used to forecast future decisions in order to reduce the expense of a potentially negative outcome and to raise the intended profit [12], [13].

2.3 Clustering

Clustering is the process of combining a record or cluster with another cluster that contains identical or related things. The only difference between clustering and classification and prediction is that clustering doesn't use a target variable. Many clustering techniques, such as Fuzzy Subtractive, Fuzzy C-Means, K-Means, K-Medoids, DBSCAN, CLARANS, and Improved K-Means, can be used to group data. The purpose of grouping one cluster with another cluster based on the similarity of distance (closest distance) of each attribute into the same cluster is the same regardless of the numerous clustering methods that can be applied, each with its benefits and drawbacks [14–16].

2.4 K-Means

K-Means is an algorithm that is regarded as being simple to use and comprehend, in addition to the fact that it is implemented by grouping data according to the average (mean) that has the closest value. K-Means is claimed to be comparable to the Expectation-Maximization method in that it similarly employs the original centroid center (centroid beginning point) to calculate the iteration process in determining new clusters [17]. Using modeling and unsupervised clustering, K-Means combines data from multiple sources into clusters that share traits and differ from other groups in certain ways [2][3]. The steps for putting the K-Means algorithm into practice are as follows [12], [18–22] :

1. Determine how many clusters there are.

2. Do the centroid center calculation (in the first iteration, use the data value taken randomly to serve as the initial centroid)

𝐾𝑖 = ¹

𝑀∑^𝑀_𝑗=1𝑋_𝑗 (1)

3. To get the shortest distance, use the starting centroid.

𝑑𝐸𝑢𝑐𝑙𝑖𝑑𝑒𝑎𝑛(𝑋, 𝑌) = √∑^𝑛_𝑖=1(𝑋𝑖− 𝑌𝑖)² (2)

Information:

d(x,y) = data distance to x to cluster center y Xi = the i-th data on the n-th data attribute Yi = the j-th data on the n-th data attribute

4. The closest data is sorted into one cluster, and the farthest data is added to another cluster.

5. Use the new centroid value to rerun the iteration process. The new centroid is based on the position of the cluster that is the closest to the data. until the final iteration, repeat the first equation. The iteration process ends if the centroid class's location changes or if the centroid class remains stationary.

3. RESULT AND DISCUSSION

In order to form two clusters (cluster 0 and cluster 1) by using the K-Means algorithm, a data mining technique is undoubtedly necessary given the rise in cervical cancer deaths in Indonesia. Of course, the researcher uses the dataset used as sample data in this study, as well as any source dataset from

https://archive.ics.uci.edu/ml/datasets/Cervical+Cancer+Behavior+Risk, 72 data points with 19 attributes total the quantity of data used. An example research data Table 1 is shown below:

Table 1. Samples of Cervical Cancer

Number C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 ... C18 C19

1 10 13 12 4 7 9 10 1 8 7 ... 11 8

2 10 11 11 10 14 7 7 5 5 4 ... 4 4

3 10 15 3 2 14 8 10 1 4 7 ... 3 15

4 10 11 10 10 15 7 7 1 5 4 ... 4 4

5 8 11 7 8 10 7 8 1 5 3 ... 4 7

6 10 14 8 6 15 8 10 1 3 4 ... 9 6

7 10 15 4 6 14 6 10 5 3 7 ... 3 5

8 8 12 9 10 10 5 10 5 5 5 ... 7 12

9 10 15 7 2 15 6 10 1 3 5 ... 15 15

10 7 15 7 6 11 8 8 5 3 3 ... 4 4

11 7 15 7 10 14 7 9 1 3 8 ... 3 9

12 10 15 8 9 15 7 10 1 3 7 ... 5 9

(4)

Number C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 ... C18 C19

13 10 15 12 10 15 6 10 1 3 3 ... 6 11

14 9 12 14 9 15 10 9 3 6 3 ... 3 11

15 2 15 15 6 13 8 9 1 3 3 ... 7 3

... ... ... ... ... ... ... ... ... ... ... ... ... ...

71 9 12 13 10 13 6 6 5 14 13 ... 13 15

72 10 14 14 6 12 7 8 5 15 12 ... 15 15

Based on table 1, it can be explained that the attribute C1 is used as an initialization for the attribute sexual risk behavior, C2 is initialized for the attribute eating behavior, C3 is initialized for the attribute personal hygiene behavior, C4 is intention aggregation, C5 is initialized for the attribute intention commitment, C6 is initialized for the attribute attitude consistency, C7 initialization for attitude spontaneity attribute, C8 initialization for norm significant person attribute, C9 initialization for norm fulfillment attribute, C10 initialization for perception vulnerability attribute, C11 initialization for perception severity attribute, C12 initialization for motivation strength attribute, C13 initialization for motivation willingness attribute, C14 initialization for social support emotionality attribute, C15 initialization for social support appreciation attribute, C16 initialization for instrumental social support attribute, C17 initialization for empowerment attribute knowledge, C18 initialization for the empowerment abilities attribute and C19 initialization for the empowerment desires attribute.

3.1 Implementation of the K-Means Algorithm Iteration 1:

1. The quantity of clusters K=2 (K1 and K2) 2. First cluster node (centroid).

Table 2. Initial Cluster Center (Initial Centroid) Attribute Initial Centroid Data

15 45

C1 2 10

C2 15 11

C3 15 14

C4 6 10

C5 13 15

C6 8 10

C7 9 10

C8 1 5

C9 3 15

C10 3 14

C11 4 10

C12 15 15

C13 3 9

C14 7 9

C15 6 4

C16 7 3

C17 7 14

C18 7 11

C19 3 15

3. Calculate the distance between the data and the cluster center with the euclidean distance.

𝑑_{𝐸𝑢𝑐𝑙𝑖𝑑𝑒𝑎𝑛}(𝑋, 𝑌) = √∑^𝑛_𝑖=1(𝑋_𝑖− 𝑌_𝑖)² (2)

Data 1:

𝐾1=

√

(10 − 2)²+ (13 − 15)²+ (12 − 15)²+ (4 − 6)²+ (7 − 13)²+ (9 − 8)²+(10 − 9)²+ (1 − 1)²+ (8 − 3)²+ (7 − 3)²+ (3 − 4)²+ (14 − 15)²+ (8 − 3)²+ (5 − 7)²+ (7 − 6)²+ (12 − 7)²+

(12 − 7)²+ (11 − 7)²+ (8 − 3)²= 227

𝐾₂=

√

(10 − 10)²+ (13 − 11)²+ (12 − 14)²+ (4 − 10)²+ (7 − 15)²+ (9 − 10)²+(10 − 10)²+ (1 − 5)²+ (8 − 15)²+ (7 − 14)²+ (3 − 10)²+ (14 − 15)²+ (8 − 9)²+ (5 − 9)²+ (7 − 4)²+ (12 − 3)²+

(12 − 14)²+ (11 − 11)²+ (8 − 15)²= 433

Data 2:

(5)

𝐾₁=

√

(10 − 2)²+ (11 − 15)²+ (11 − 15)²+ (10 − 6)²+ (14 − 13)²+ (7 − 8)²+(7 − 9)²+ (5 − 1)²+ (5 − 3)²+ (4 − 3)²+ (2 − 4)²+ (15 − 15)²+ (13 − 3)²+ (7 − 7)²+ (6 − 6)²+ (5 − 7)²+

(5 − 7)²+ (4 − 7)²+ (4 − 3)²= 205

𝐾2=

√

(10 − 10)²+ (11 − 11)²+ (11 − 14)²+ (10 − 10)²+ (14 − 15)²+ (7 − 10)²+(7 − 10)²+ (5 − 5)²+ (5 − 15)²+ (4 − 14)²+ (2 − 10)²+ (15 − 15)²+ (13 − 9)²+ (7 − 9)²+ (6 − 4)²+

(5 − 3)²+ (5 − 14)²+ (4 − 11)²+ (4 − 15)²= 571

Calculate the 3^rd data to the 72^nd data with calculations like the steps above. The results of the shortest distance after calculating up to the 72^nd data, as in table 3 below.

Table 3. Shortest Distance (Iteration 1) Data K1 K2 Closest Distance Cluster

1 227 433 227 K1

2 205 571 205 K1

3 463 845 463 K1

4 215 605 215 K1

5 183 725 183 K1

6 159 501 159 K1

7 460 952 460 K1

8 313 465 313 K1

9 645 735 645 K1

... ... ... ... ... ...

62 620 216 216 K2

63 330 456 330 K1

64 268 412 268 K1

65 547 227 227 K2

66 455 201 201 K2

67 472 120 120 K2

68 565 187 187 K2

69 719 163 163 K2

70 453 261 261 K2

71 648 168 168 K2

72 733 259 259 K2

4. The closest data is sorted into one cluster, and the farthest data is added to another cluster.

5. Do the iteration process again using the new centroid value, the new centroid is determined based on the location of the cluster with the minimum distance to the data. do the first equation again until the last iteration if the location of the centroid class moves and if the centroid class does not move then the iteration process stops.

𝐾𝑖 = ¹

𝑀∑𝑀 𝑋𝑗 𝑗=1 𝐾1(𝐶1) = ¹

45(10 + 10 + 10 + 10 + 8 + 10 + 10 + 8 + 10 + 7 + 7 + 10 + 10 + 9 + 2 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 = 9,578

𝐾1(𝐶2) = ¹

45(13 + 11 + 15 + 11 + 11 + 14 + 15 + 12 + 15 + 15 + 15 + 15 + 15 + 12 + 15 + 15 + 15 + 12 + 11 + 12 + 15 + 12 + 13 + 15 + 13 + 15 + 11 + 14 + 8 + 15 + 10 + 11 + 15 + 3 + 15 + 10 + 9 + 14 + 12 + 15 + 11 + 13 + 12 + 13 + 13 = 12,8000

Compute Centroid 1 (K1) to attribute C19 and the subsequent centroid search (K2) using the same methods as earlier. Table 4 is the result of the new centroid calculation:

Table 4. Initial Cluster Center (Iteration 2) Attribute Initial Centroids

K1 K2

C1 9,578 9,815

C2 12,800 12,778 C3 10,444 12,148

(6)

Attribute Initial Centroids

K1 K2

C4 7,689 8,259

C5 13,222 13,556

C6 7,089 7,333

C7 8,711 8,444

C8 2,378 4,370

C9 5,778 13,000

C10 5,800 13,037

C11 3,467 8,593

C12 11,822 14,037 C13 8,667 11,407

C14 7,378 9,296

C15 5,933 6,556

C16 10,000 11,000 C17 9,156 12,852 C18 8,111 11,333 C19 9,200 12,074

Table 4. The initial cluster center is used as the initial centroid for the calculation process in the 2nd iteration and looking for the closest distance value, do it like the steps in iteration 1. Repeat if the cluster grouping results move and the process stops if the results of the previous iteration grouping with the next iteration do not move.

3.2 Using Rapidminer to evaluate the K-Means algorithm

Figure 3. Input Sample Data File

According to Figure 3, enter the Cervical Cancer Behavior Risk data into the rapidminer application.

Connect the data and then run it to make sure the data needs to be adjusted (preprocessing) if there are any missing or ambiguous data. If the data cannot be adjusted for other data, delete the data (cannot be estimated). The ensuing information will be preprocessed.

Gambar 4. Preprocessing data

Figure 4 shows that there are no missing or unclear data points, eliminating the need for preprocessing so that the data may be used right away in the clustering procedure for the Cervical Cancer Behavior Risk. The operators and input data utilized in the rapidminer test are listed below.

(7)

Figure 5. Clustering Operator Input

Based on Figure 5 it can be explained that before carrying out the clustering process, the researcher determined the number of clusters that were more effective by using 3 K-means clustering (clustering k=2, clustering k=4 and clustering k=6). To connect the three clustering operators with the input data, you can use the multiply operator like 5, then use the performance operator to see the number of clusters that match and are more precise. Connect the three performances according to the number of clusters that have been set in the clustering parameter with the conditions k=2 (clustering k=2), k=4 (clustering k=4) and k=6 (clustering k=6). The third parameter of performance (k=2, k=4 and k=6) in the main criterion section select Davies Bouldin. After all operators are connected, then run the process, for performance results can be seen in the following image.

Figure 6. Performance for each number of clusters

Figure 6 shows that this study is more suitable (correct) using 2 clusters, namely the Daavies Bouldin value of -1.413 while for the performance of 4 clusters the Daavies Bouldin value is only -1.421 and the performance for 6 clusters is smaller than the other two performances, namely -1.570. So that in applying the k-means algorithm for grouping Cervical Cancer Behavior Risk, 2 clusters are formed (cluster yes and cluster no).

Figure 7. Cluster Models

Based on Figure 7 it can be seen that the data grouped into cluster 0 is 28 data and cluster 1 is 44 data.

Detailed data based on each cluster can be seen in the following table 5.

Table 5. Cluster Grouping

Number C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 ... C18 C19 Cluster

1 10 13 12 4 7 9 10 1 8 7 ... 11 8 cluster 1

2 10 11 11 10 14 7 7 5 5 4 ... 4 4 cluster 0

3 10 15 3 2 14 8 10 1 4 7 ... 3 15 cluster 0

(8)

Number C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 ... C18 C19 Cluster

4 10 11 10 10 15 7 7 1 5 4 ... 4 4 cluster 0

5 8 11 7 8 10 7 8 1 5 3 ... 4 7 cluster 0

6 10 14 8 6 15 8 10 1 3 4 ... 9 6 cluster 0

7 10 15 4 6 14 6 10 5 3 7 ... 3 5 cluster 0

8 8 12 9 10 10 5 10 5 5 5 ... 7 12 cluster 0

9 10 15 7 2 15 6 10 1 3 5 ... 15 15 cluster 1

10 7 15 7 6 11 8 8 5 3 3 ... 4 4 cluster 0

11 7 15 7 10 14 7 9 1 3 8 ... 3 9 cluster 0

12 10 15 8 9 15 7 10 1 3 7 ... 5 9 cluster 0

13 10 15 12 10 15 6 10 1 3 3 ... 6 11 cluster 0

14 9 12 14 9 15 10 9 3 6 3 ... 3 11 cluster 1

15 2 15 15 6 13 8 9 1 3 3 ... 7 3 cluster 0

... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

71 9 12 13 10 13 6 6 5 14 13 ... 13 15 cluster 1

72 10 14 14 6 12 7 8 5 15 12 ... 15 15 cluster 1

Based on table 5, it is explained that cluster 0 is expressed as data that is grouped as a patient with Cervical Cancer (cervical cancer) and cluster 1 is grouped as data that does not suffer from Cervical Cancer (cervical cancer). Data grouped into cluster 0 is data (2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 15, 16, 17, 18, 19, 21, 30, 42, 50, 53 , 58, 59, 60, 61, 63, 64, 66) while the data grouped into cluster 1 are data data (1, 9, 14, 20, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 33, 34,35, 36, 37, 38, 39, 40, 41, 43, 44, 45, 46, 47, 48, 49, 51, 52, 54, 55, 56, 57, 62, 65, 67, 68, 69, 70, 71 and 72).

Figure 8. Diagram of Each Cluster

Figure 8 shows that cluster 1 has more data and it is stated that out of 72 data on Cervical Cancer (cervical cancer) only 28 are declared as sufferers of Cervical Cancer (cervical cancer) and 44 other data are not.

4. CONCLUSION

Conclusions that can be made after using clustering data mining techniques in determining cervical cancer with 72 data and 19 attributes. The data is grouped into 2 clusters using the K-Means algorithm. Testing the K-Means algorithm was carried out using the rapidminer application. Clustering results obtained as many as 28 data that were declared as cervical cancer sufferers (grouped into cluster 0) and as many as 44 data that were declared not grouped as cervical cancer sufferers (cluster 1).

REFERENCES

[1] N. A. Wantini, N. Indrayani, “Deteksi dini kanker serviks dengan inspeksi visual asam asetat (IVA)”, Jurnal Ners dan Kebidanan (Journal of Ners and Midwifery), vol. 6, no. 1, pp. 27–34, 2019.

[2] M. Marsono, D. Saripurna, M. Zunaidi, “Analisis Data Mining Pada Strategi Penjualan Produk PT Aquasolve Sanaria Dengan Menggunakan Metode K-Means Clustering”, J-SISKO TECH (Jurnal Teknologi Sistem Informasi dan Sistem Komputer TGD), vol. 4, no. 1, pp. 127, 2021, doi:10.53513/jsk.v4i1.60.

[3] B. D. Mudzakkir, “Pengelompokan Data Penjualan Produk Pada Pt Advanta Seeds Indonesia Menggunakan Metode K- Means”, Jurnal Mahasiswa Teknik Informatika, vol. 2, no. 2, pp. 34–40, 2018.

(9)

[4] R. Muliono, Z. Sembiring, “Data Mining Clustering Menggunakan Algoritma K-Means Untuk Klasterisasi Tingkat Tridarma Pengajaran Dosen”, CESS (Journal of Computer Engineering, System and Science), vol. 4, no. 2, pp. 272–279, 2019.

[5] H. Priyatman, F. Sajid, D. Haldivany, “Klasterisasi Menggunakan Algoritma K-Means Clustering untuk Memprediksi Waktu Kelulusan Mahasiswa”, Jurnal Edukasi Dan Penelitian Informatika (JEPIN), vol. 5, no. 1, pp. 62, 2019.

[6] S. S. Arifin, A. M. Siregar, T. Al Mudzakir, “Klasifikasi Penyakit Kanker Serviks Menggunakan Algoritma Support Vector Machine (SVM)”, Conference on Innovation and Application of Science and Technology (CIASTECH), pp. 521–

528, 2021.

[7] E. S. Salim et al., “Analisa Metode Random Forest Tree dan K-Nearest Neighbor dalam Mendeteksi Kanker Serviks”, Jurnal Ilmu Komputer Dan Sistem Informasi (JIKOMSI), vol. 3, no. 2, pp. 97–101, 2020.

[8] M. Jamaris, “Implementasi Metode Rough Set Untuk Menentukan Kelayakan Bantuan Dana Hibah Fasilitas Rumah Ibadah”, INOVTEK Polbeng - Seri Informatika, vol. 2, no. 2, pp. 161, 2017, doi:10.35314/isi.v2i2.203.

[9] S. Al Syahdan, A. Sindar, “Data Mining Penjualan Produk Dengan Metode Apriori Pada Indomaret Galang Kota”, Jurnal Nasional Komputasi dan Teknologi Informasi (JNKTI), vol. 1, no. 2, 2018, doi:10.32672/jnkti.v1i2.771.

[10] H. Juliansa, S. Defit, S. Sumijan, “Identifikaasi Tingkat Kerusakan Peralatan Laboratorium Komputer Menggunakan Metode Rough Set”, Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 2, no. 1, pp. 410–415, 2018, doi:10.29207/resti.v2i1.274.

[11] E. Buulolo, Data Mining Untuk Perguruan Tinggi, Deepublish, 2020.

[12] Dewi Eka Putri, Eka Praja Wiyata Mandala, “Hybrid Data Mining berdasarkan Klasterisasi Produk untuk Klasifikasi Penjualan”, Jurnal KomtekInfo, vol. 9, pp. 68–73, 2022, doi:10.35134/komtekinfo.v9i2.279.

[13] S. M. Dewi, A. P. Windarto, D. Hartama, “Penerapan Datamining Dengan Metode Klasifikasi Untuk Strategi Penjualan Produk Di Ud.Selamat Selular”, KOMIK (Konferensi Nasional Teknologi Informasi dan Komputer), vol. 3, no. 1, pp.

617–621, 2019, doi:10.30865/komik.v3i1.1669.

[14] S. Sindi et al., “Analisis algoritma k-medoids clustering dalam pengelompokan penyebaran covid-19 di indonesia”, (JurTI) Jurnal Teknologi Informasi, vol. 4, no. 1, pp. 166–173, 2020.

[15] D. A. I. C. Dewi, D. A. K. Pramita, “Analisis Perbandingan Metode Elbow dan Silhouette pada Algoritma Clustering K- Medoids dalam Pengelompokan Produksi Kerajinan Bali”, Matrix : Jurnal Manajemen Teknologi dan Informatika, vol.

9, no. 3, pp. 102–109, 2019, doi:10.31940/matrix.v9i3.1662.

[16] A. N. Fadhilah, A. Jananto, “KLASTERISASI LITERATUR MAHASISWA MENGGUNAKAN METODE AHC DI DINAS KEARSIPAN DAN PERPUSTAKAAN PROVINSI JAWA TENGAH”2021.

[17] Y. Syahra, “Penerapan Data Mining Dalam Pengelompokkan Data Nilai Siswa Untuk Penentuan Jurusan Siswa Pada SMA Tamora Menggunakan Algoritma K-Means Clustering”, Jurnal SAINTIKOM (Jurnal Sains Manajemen Informatika dan Komputer), vol. 17, no. 2, pp. 228, 2018, doi:10.53513/jis.v17i2.70.

[18] . F., F. T. Kesuma, S. P. Tamba, “Penerapan Data Mining Untuk Menentukan Penjualan Sparepart Toyota Dengan Metode K-Means Clustering”, Jurnal Sistem Informasi dan Ilmu Komputer Prima(JUSIKOM PRIMA), vol. 2, no. 2, pp.

67–72, 2020, doi:10.34012/jusikom.v2i2.376.

[19] S. A. Rahmah, “KLASTERISASI POLA PENJUALAN PESTISIDA MENGGUNAKAN METODE K-MEANS CLUSTERING ( STUDI KASUS DI TOKO JUANDA TANI KECAMATAN HUTABAYU RAJA )”vol. 1, no. 1, pp.

1–5, 2020.

[20] I. Nasution, A. P. Windarto, M. Fauzan, “Penerapan Algoritma K-Means Dalam Pengelompokan Data Penduduk Miskin Menurut Provinsi”, Building of Informatics, Technology and Science (BITS), vol. 2, no. 2, pp. 76–83, 2020.

[21] A. Nursia, W. Ramdhan, W. M. Kifti, “Analisis Kelayakan Penerima Bantuan Covid-19 Menggunakan Metode K – Means”, Building of Informatics, Technology and Science (BITS), vol. 3, no. 4, pp. 574–583, 2022, doi:10.47065/bits.v3i4.1399.

[22] Z. K. A. B. Kurnia Drajat Wibowo, “Movie Recommendation System Using Knowledge-Based Filtering and K-Means Clustering”, Building of Informatics, Technology and Science (BITS), vol. 3, no. 4, pp. 460–465, 2022, doi:10.47065/bits.v3i4.1236.