Saran - ALGORITMA POINT CENTER PADA KLASTERISASI K-MEANS UNTUK MENINGKATKAN KETEPATAN PREDIKSI

Agar penelitian ini bisa ditingkatkan, berikut saran-saran yang diusulkan sebagai tahapan berikutnya dari penelitian:

1. Menerapkan algoritma point center k-means untuk objek penelitian yang berbeda misalnya untuk dataset biomedic atau dataset lainnya.

2. Membandingkan hasil algoritna point center k-means dengan algoritma clustering lainnya.

Aggarwal, C. C. (2015). Data Mining: The Textbook. Springer International Publishing. https://doi.org/10.1007/978-3-319-14142-8

Aggarwal, C. C., & Yu, P. S. (2008). Privacy-Preserving Data Mining: Models and Algorithms. Security, Privacy and Trust in Modern Data Management.

https://doi.org/10.1145/335191.335438

Arar, Ö. F., & Ayan, K. (2015). Software defect prediction using cost-sensitive neural network. Applied Soft Computing, 33, 263–277.

https://doi.org/10.1016/j.asoc.2015.04.045

Arora, I., Tetarwal, V., & Saha, A. (2015). Open issues in software defect prediction. Procedia Computer Science, 46(Icict 2014), 906–912.

https://doi.org/10.1016/j.procs.2015.02.161

Azzalini, A., & Scarpa, B. (2012). Data Analysis and Data Mining an Introduction.

Berndtsson, M., Hansson, J., Olsson, B., & Lundell, B. (2008). Thesis Projects: A Guide for Students in Computer Science and Information Systems. Springer.

https://doi.org/10.1007/978-1-84800-009-4

Berry, M. J. a., & Linoff, G. S. (2004). Data mining techniques: for marketing, sales, and customer relationship management. Portal.Acm.Org. Retrieved from http://portal.acm.org/citation.cfm?id=983642

Bishnu, P. S., & Bhattacherjee, V. (2012). Software fault prediction using quad tree-based K-means clustering algorithm. IEEE Transactions on Knowledge and

Data Engineering, 24(6), 1146–1150.

https://doi.org/10.1109/TKDE.2011.163

Celebi, M. E., Kingravi, H. A., & Vela, P. A. (2013). A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert

Systems with Applications, 40(1), 200–210.

Cleuziou, G. (2008). An extended version of the k-means method for overlapping clustering. 2008 19th International Conference on Pattern Recognition, 1–4.

https://doi.org/10.1109/ICPR.2008.4761079

Dawson, C. W. (2009). Projects in Computing and Information Systems.

Information Systems Journal (Vol. 2). Retrieved from http://www.sentimentaltoday.net/National_Academy_Press/0321263553.Add ison.Wesley.Publishing.Company.Projects.in.Computing.and.Information.Sy stems.A.Students.Guide.Jun.2005.pdf

Dean, J. (2014). Big Data, Data Mining, and Machine Learning. Canada: SAS Institute Inc.

Duwairi, R., & Abu-Rahmeh, M. (2015). A novel approach for initializing the spherical K-means clustering algorithm. Simulation Modelling Practice and Theory, 54, 49–63. https://doi.org/10.1016/j.simpat.2015.03.007

Erisoglu, M., Calis, N., & Sakallioglu, S. (2011). A new algorithm for initial cluster centers in k-means algorithm. Pattern Recognition Letters, 32(14), 1701–

1705. https://doi.org/10.1016/j.patrec.2011.07.011

Huang, F., & Liu, B. (2017). Software defect prevention based on human error theories. Chinese Journal of Aeronautics, 30(3), 1054–1070.

https://doi.org/10.1016/j.cja.2017.03.005

Irsoy, O., Yildiz, O. T., & Alpaydin, E. (2012). Design and analysis of classifier learning experiments in bioinformatics: Survey and case studies. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(6), 1663–1675.

https://doi.org/10.1109/TCBB.2012.117

Khanmohammadi, S., Adibeig, N., & Shanehbandy, S. (2017). An improved overlapping k-means clustering method for medical applications. Expert

Systems with Applications, 67, 12–18.

https://doi.org/10.1016/j.eswa.2016.09.025

Kumar, K. M., & Reddy, A. R. M. (2017). An efficient k-means clustering filtering

https://doi.org/10.1016/j.ins.2017.07.036

Larose, D. T. (2005). Discovering Knowledge in Data An Introduction to Data Mining. Canada: JohnWiley & Sons, Inc. https://doi.org/10.1128/AAC.03728-14

Li, W., Huang, Z., & Li, Q. (2015). Three-way decisions based software defect prediction. Knowledge-Based Systems, 91, 263–274.

https://doi.org/10.1016/j.knosys.2015.09.035

Maimon, O., & Rokach, L. (2010). Data Mining and Knowledge Discovery Handbook. Data Mining and Knowledge Discovery Handbook. New York:

Springer. https://doi.org/10.1007/0-387-25465-x_2

Mesquita, D. P. P., Rocha, L. S., Gomes, J. P. P., & Rocha Neto, A. R. (2016).

Classification with reject option for software defect prediction. Applied Soft

Computing Journal, 49, 1085–1093.

https://doi.org/10.1016/j.asoc.2016.06.023

Moeyersoms, J., Junqué De Fortuny, E., Dejaeger, K., Baesens, B., & Martens, D.

(2015). Comprehensible software fault and effort prediction: A data mining approach. Journal of Systems and Software, 100, 80–90.

https://doi.org/10.1016/j.jss.2014.10.032

Naldi, M. C., & Campello, R. J. G. B. (2014). Evolutionary k-means for distributed

data sets. Neurocomputing, 127, 30–42.

https://doi.org/10.1016/j.neucom.2013.05.046

Nidheesh, N., Abdul Nazeer, K. A., & Ameer, P. M. (2017). An enhanced deterministic K-Means clustering algorithm for cancer subtype prediction from gene expression data. Computers in Biology and Medicine, 91, 213–221.

https://doi.org/10.1016/j.compbiomed.2017.10.014

Rahman, M. A., & Islam, M. Z. (2014). A hybrid clustering technique combining a

71(August), 345–365. https://doi.org/10.1016/j.knosys.2014.08.011

Rahman, M. A., Islam, M. Z., & Bossomaier, T. (2015). ModEx and Seed-Detective: Two novel techniques for high quality clustering by using good initial seeds in K-Means. Journal of King Saud University - Computer and

Information Sciences, 27(2), 113–128.

https://doi.org/10.1016/j.jksuci.2014.04.002

Reddy, D., & Jana, P. K. (2012). Initialization for K-means Clustering using Voronoi Diagram. Procedia Technology, 4, 395–400.

https://doi.org/10.1016/j.protcy.2012.05.061

Siavvas, M. G., Chatzidimitriou, K. C., & Symeonidis, A. L. (2017). QATCH - An adaptive framework for software product quality assessment. Expert Systems with Applications, 86, 350–366. https://doi.org/10.1016/j.eswa.2017.05.060 Usman, G., Ahmad, U., & Ahmad, M. (2013). Improved K-means clustering

algorithm by getting initial cenroids. World Applied Sciences Journal, 27(4), 543–551. https://doi.org/10.5829/idosi.wasj.2013.27.04.1142

Wahono, R. S. (2015). A Systematic Literature Review of Software Defect Prediction : Research Trends , Datasets , Methods and Frameworks. Journal of Software Engineering, 1(1), 1–16.

Witten, I. H., Frank, E., & Hall, M. a. (2011). Data Mining Practical Machine Learning Tools and Techniques Third Edition. Data Mining (Vol. 277).

https://doi.org/10.1002/1521-3773(20010316)40:6<9823::AID-ANIE9823>3.3.CO;2-C

Wu, X., Kumar, V., Ross, Q. J., Ghosh, J., Yang, Q., Motoda, H., … Steinberg, D.

(2008). Top 10 algorithms in data mining. Knowledge and Information Systems (Vol. 14). https://doi.org/10.1007/s10115-007-0114-2

Žalik, K. R. (2008). An efficient k′-means clustering algorithm. Pattern

Recognition Letters, 29(9), 1385–1391.

https://doi.org/10.1016/j.patrec.2008.02.014

A. Source code metode k-means dengan tools R 1. Dataset Iris

library(readxl)

iris <- read_excel("D:/Nusa Mandiri/Data Mining/Tugas/iris.xls", range = "B1:F151") View(iris)

iris.features=iris

iris.features$X__1<-NULL View(iris.features)

result <- kmeans(iris.features, 3) result

table(result$cluster, iris$X__1) 2. Dataset NASA MDP PC1

PC1 <- read_excel("D:/Nusa Mandiri/Dataset/PC1.xls", range = "A1:AL760")

View(PC1)

PC1.features=PC1

PC1.features$Defective <- NULL View(PC1.features)

result <- kmeans(PC1.features, 2) result

table(result$cluster, PC1$Defective) 3. Dataset NASA MDP PC2

PC2 <- read_excel("D:/Nusa Mandiri/Dataset/PC2.xls", range = "A1:AK1586")

View(PC2)

PC2.features=PC2

PC2.features$Defective <- NULL View(PC2.features)

result <- kmeans(PC2.features, 2) result

table(result$cluster, PC2$Defective) 4. Dataset NASA MDP PC3

PC3 <- read_excel("D:/Nusa Mandiri/Dataset/PC3.xls", range = "A1:AL1126")

View(PC3)

PC3.features=PC3

PC3.features$Defective <- NULL View(PC3.features)

result <- kmeans(PC3.features, 2) result

5. Dataset NASA MDP PC4

PC4 <- read_excel("D:/Nusa Mandiri/Dataset/PC4.xls", range = "A1:AL1400")

View(PC4)

PC4.features=PC4

PC4.features$Defective <- NULL View(PC4.features)

result <- kmeans(PC4.features, 2) result

table(result$cluster, PC4$Defective) 6. Dataset NASA MDP MW1

MW1 <- read_excel("D:/Nusa Mandiri/Dataset/MW1.xls", range = "A1:AL265")

View(MW1)

MW1.features=MW1

MW1.features$Defective <- NULL View(MW1.features)

result <- kmeans(MW1.features, 2) result

table(result$cluster, MW1$Defective) 7. Dataset NASA MDP CM1

CM1 <- read_excel("D:/Nusa Mandiri/Dataset/CM1.xls", range = "A1:AL345")

View(CM1)

CM1.features=CM1

CM1.features$Defective <- NULL View(CM1.features)

result <- kmeans(CM1.features, 2) result

table(result$cluster, CM1$Defective) 8. Dataset NASA MDP KC1

KC1 <- read_excel("D:/Nusa Mandiri/Dataset/KC1.xls", range = "A1:V2097")

View(KC1)

KC1.features=KC1

KC1.features$Defective <- NULL View(KC1.features)

result <- kmeans(KC1.features, 2) result

table(result$cluster, KC1$Defective) 9. Dataset NASA MDP KC3

KC3 <- read_excel("D:/Nusa Mandiri/Dataset/KC3.xls", range = "A1:AN201")

View(KC3)

KC3.features=KC3

KC3.features$Defective <- NULL

result <- kmeans(KC3.features, 2) result

table(result$cluster, KC3$Defective) 10. Dataset NASA MDP MC2

MC2 <- read_excel("D:/Nusa Mandiri/Dataset/MC2.xls", range = "A1:AN128")

View(MC2)

MC2.features=MC2

MC2.features$Defective <- NULL View(MC2.features)

result <- kmeans(MC2.features, 2) result

table(result$cluster, MC2$Defective)

B. Source code proposed method dengan tools R 1. Dataset Iris

library(readr)

iris <- read_excel("D:/Nusa Mandiri/Data Mining/Tugas/iris.xls", range = "B1:F151") head(iris)

iris_var = data.frame(iris$petal.width, iris$petal.length) plot(iris_var, col=iris$variety)

start <- matrix(c(2, 0.2, 2.3, 6.4, 1, 6.9), nrow = 3, ncol = 2) start

result <- kmeans(iris_var, start) result

PC1 <- read_excel("D:/Nusa Mandiri/Dataset/PC1.xls", range =

"A1:AL760") View(PC1)

PC1_var = data.frame(PC1$HALSTEAD_LEVEL, PC1$HALSTEAD_EFFORT)

start <- matrix(c(0, 0.5, 4279633.01, 56.150), nrow = 2, ncol = 2) start

result <- kmeans(PC1_var, start) result

table(result$cluster, PC1$Defective) plot(PC1_var, col=$result$cluster)

points(result$centers, col = 1:4, pch = 4, cex = 2)

3. Dataset NASA MDP PC2 library(readxl)

PC2 <- read_excel("D:/Nusa Mandiri/Dataset/PC2.xls", range =

"A1:AK1586") View(PC2)

PC2_var = data.frame(PC2$HALSTEAD_ERROR_EST, PC2$HALSTEAD_EFFORT)

start <- matrix(c(5.52, 0, 882628.7, 23.22), nrow = 2, ncol = 2) start

result <- kmeans(PC2_var, start) result

PC3 <- read_excel("D:/Nusa Mandiri/Dataset/PC3.xls", range =

"A1:AL1126") View(PC3)

PC3_var = data.frame(PC3$HALSTEAD_LEVEL, PC3$HALSTEAD_EFFORT)

start <- matrix(c(0.01, 0.5, 12751451.28, 36.19), nrow = 2, ncol = 2)

start

result <- kmeans(PC3_var, start) result

PC4 <- read_excel("D:/Nusa Mandiri/Dataset/PC4.xls", range =

"A1:AL1400") View(PC4)

PC4_var = data.frame(PC4$HALSTEAD_LEVEL, PC4$HALSTEAD_EFFORT)

start <- matrix(c(0.01, 0, 1401719.45, 0), nrow = 2, ncol = 2)

start

result <- kmeans(PC4_var, start) result

MW1 <- read_excel("D:/Nusa Mandiri/Dataset/MW1.xls", range =

"A1:AL265")

MW1_var = data.frame(MW1$ESSENTIAL_DENSITY, MW1$HALSTEAD_EFFORT)

start <- matrix(c(0, 0, 176348.54, 91.35), nrow = 2, ncol = 2)

start

result <- kmeans(MW1_var, start) result

CM1 <- read_excel("D:/Nusa Mandiri/Dataset/CM1.xls", range =

"A1:AL345") View(CM1)

CM1_var = data.frame(CM1$ESSENTIAL_LEVEL, CM1$HALSTEAD_EFFORT)

start <- matrix(c(0.01, 0.08, 1804682, 36577.66), nrow = 2, ncol = 2)

start

result <- kmeans(CM1_var, start) result

KC1 <- read_excel("D:/Nusa Mandiri/Dataset/KC1.xls", range =

"A1:V2097") View(KC1)

KC1_var = data.frame(KC1$ESSENTIAL_ERROR_EST, KC1$HALSTEAD_EFFORT)

start <- matrix(c(0.77, 0.06, 79207.4, 2607.044), nrow = 2, ncol = 2)

start

result <- kmeans(KC1_var, start) result

KC3 <- read_excel("D:/Nusa Mandiri/Dataset/KC3.xls", range =

"A1:AN201") View(KC3)

KC3_var = data.frame(KC3$ESSENTIAL_LEVEL,

start <- matrix(c(0.02, 0.08, 239295.4, 18073.8), nrow = 2, ncol = 2)

start

result <- kmeans(KC3_var, start) result

table(result$cluster, KC3$Defective) plot(KC3_var, col=$result$cluster)

points(result$centers, col = 1:4, pch = 4, cex = 2) 10. Dataset NASA MDP MC2

library(readxl)

MC2 <- read_excel("D:/Nusa Mandiri/Dataset/MC2.xls", range =

"A1:AN128") View(MC2)

MC2_var = data.frame(MC2$ESSENTIAL_LEVEL, MC2$HALSTEAD_EFFORT)

start <- matrix(c(0.01, 0.08, 627370.7, 31190.5), nrow = 2, ncol = 2)

start

result <- kmeans(MC2_var, start) result

table(result$cluster, MC2$Defective) plot(MC2_var, col=$result$cluster)

points(result$centers, col = 1:4, pch = 4, cex = 2)

Dalam dokumen ALGORITMA POINT CENTER PADA KLASTERISASI K-MEANS UNTUK MENINGKATKAN KETEPATAN PREDIKSI CACAT SOFTWARE TESIS RISKI ANNISA (Halaman 87-97)