Agar penelitian ini bisa ditingkatkan, berikut saran-saran yang diusulkan sebagai tahapan berikutnya dari penelitian:
1. Menerapkan algoritma point center k-means untuk objek penelitian yang berbeda misalnya untuk dataset biomedic atau dataset lainnya.
2. Membandingkan hasil algoritna point center k-means dengan algoritma clustering lainnya.
Aggarwal, C. C. (2015). Data Mining: The Textbook. Springer International Publishing. https://doi.org/10.1007/978-3-319-14142-8
Aggarwal, C. C., & Yu, P. S. (2008). Privacy-Preserving Data Mining: Models and Algorithms. Security, Privacy and Trust in Modern Data Management.
https://doi.org/10.1145/335191.335438
Arar, Ö. F., & Ayan, K. (2015). Software defect prediction using cost-sensitive neural network. Applied Soft Computing, 33, 263–277.
https://doi.org/10.1016/j.asoc.2015.04.045
Arora, I., Tetarwal, V., & Saha, A. (2015). Open issues in software defect prediction. Procedia Computer Science, 46(Icict 2014), 906–912.
https://doi.org/10.1016/j.procs.2015.02.161
Azzalini, A., & Scarpa, B. (2012). Data Analysis and Data Mining an Introduction.
Berndtsson, M., Hansson, J., Olsson, B., & Lundell, B. (2008). Thesis Projects: A Guide for Students in Computer Science and Information Systems. Springer.
https://doi.org/10.1007/978-1-84800-009-4
Berry, M. J. a., & Linoff, G. S. (2004). Data mining techniques: for marketing, sales, and customer relationship management. Portal.Acm.Org. Retrieved from http://portal.acm.org/citation.cfm?id=983642
Bishnu, P. S., & Bhattacherjee, V. (2012). Software fault prediction using quad tree-based K-means clustering algorithm. IEEE Transactions on Knowledge and
Data Engineering, 24(6), 1146–1150.
https://doi.org/10.1109/TKDE.2011.163
Celebi, M. E., Kingravi, H. A., & Vela, P. A. (2013). A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert
Systems with Applications, 40(1), 200–210.
Cleuziou, G. (2008). An extended version of the k-means method for overlapping clustering. 2008 19th International Conference on Pattern Recognition, 1–4.
https://doi.org/10.1109/ICPR.2008.4761079
Dawson, C. W. (2009). Projects in Computing and Information Systems.
Information Systems Journal (Vol. 2). Retrieved from http://www.sentimentaltoday.net/National_Academy_Press/0321263553.Add ison.Wesley.Publishing.Company.Projects.in.Computing.and.Information.Sy stems.A.Students.Guide.Jun.2005.pdf
Dean, J. (2014). Big Data, Data Mining, and Machine Learning. Canada: SAS Institute Inc.
Duwairi, R., & Abu-Rahmeh, M. (2015). A novel approach for initializing the spherical K-means clustering algorithm. Simulation Modelling Practice and Theory, 54, 49–63. https://doi.org/10.1016/j.simpat.2015.03.007
Erisoglu, M., Calis, N., & Sakallioglu, S. (2011). A new algorithm for initial cluster centers in k-means algorithm. Pattern Recognition Letters, 32(14), 1701–
1705. https://doi.org/10.1016/j.patrec.2011.07.011
Huang, F., & Liu, B. (2017). Software defect prevention based on human error theories. Chinese Journal of Aeronautics, 30(3), 1054–1070.
https://doi.org/10.1016/j.cja.2017.03.005
Irsoy, O., Yildiz, O. T., & Alpaydin, E. (2012). Design and analysis of classifier learning experiments in bioinformatics: Survey and case studies. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(6), 1663–1675.
https://doi.org/10.1109/TCBB.2012.117
Khanmohammadi, S., Adibeig, N., & Shanehbandy, S. (2017). An improved overlapping k-means clustering method for medical applications. Expert
Systems with Applications, 67, 12–18.
https://doi.org/10.1016/j.eswa.2016.09.025
Kumar, K. M., & Reddy, A. R. M. (2017). An efficient k-means clustering filtering
https://doi.org/10.1016/j.ins.2017.07.036
Larose, D. T. (2005). Discovering Knowledge in Data An Introduction to Data Mining. Canada: JohnWiley & Sons, Inc. https://doi.org/10.1128/AAC.03728-14
Li, W., Huang, Z., & Li, Q. (2015). Three-way decisions based software defect prediction. Knowledge-Based Systems, 91, 263–274.
https://doi.org/10.1016/j.knosys.2015.09.035
Maimon, O., & Rokach, L. (2010). Data Mining and Knowledge Discovery Handbook. Data Mining and Knowledge Discovery Handbook. New York:
Springer. https://doi.org/10.1007/0-387-25465-x_2
Mesquita, D. P. P., Rocha, L. S., Gomes, J. P. P., & Rocha Neto, A. R. (2016).
Classification with reject option for software defect prediction. Applied Soft
Computing Journal, 49, 1085–1093.
https://doi.org/10.1016/j.asoc.2016.06.023
Moeyersoms, J., Junqué De Fortuny, E., Dejaeger, K., Baesens, B., & Martens, D.
(2015). Comprehensible software fault and effort prediction: A data mining approach. Journal of Systems and Software, 100, 80–90.
https://doi.org/10.1016/j.jss.2014.10.032
Naldi, M. C., & Campello, R. J. G. B. (2014). Evolutionary k-means for distributed
data sets. Neurocomputing, 127, 30–42.
https://doi.org/10.1016/j.neucom.2013.05.046
Nidheesh, N., Abdul Nazeer, K. A., & Ameer, P. M. (2017). An enhanced deterministic K-Means clustering algorithm for cancer subtype prediction from gene expression data. Computers in Biology and Medicine, 91, 213–221.
https://doi.org/10.1016/j.compbiomed.2017.10.014
Rahman, M. A., & Islam, M. Z. (2014). A hybrid clustering technique combining a
71(August), 345–365. https://doi.org/10.1016/j.knosys.2014.08.011
Rahman, M. A., Islam, M. Z., & Bossomaier, T. (2015). ModEx and Seed-Detective: Two novel techniques for high quality clustering by using good initial seeds in K-Means. Journal of King Saud University - Computer and
Information Sciences, 27(2), 113–128.
https://doi.org/10.1016/j.jksuci.2014.04.002
Reddy, D., & Jana, P. K. (2012). Initialization for K-means Clustering using Voronoi Diagram. Procedia Technology, 4, 395–400.
https://doi.org/10.1016/j.protcy.2012.05.061
Siavvas, M. G., Chatzidimitriou, K. C., & Symeonidis, A. L. (2017). QATCH - An adaptive framework for software product quality assessment. Expert Systems with Applications, 86, 350–366. https://doi.org/10.1016/j.eswa.2017.05.060 Usman, G., Ahmad, U., & Ahmad, M. (2013). Improved K-means clustering
algorithm by getting initial cenroids. World Applied Sciences Journal, 27(4), 543–551. https://doi.org/10.5829/idosi.wasj.2013.27.04.1142
Wahono, R. S. (2015). A Systematic Literature Review of Software Defect Prediction : Research Trends , Datasets , Methods and Frameworks. Journal of Software Engineering, 1(1), 1–16.
Witten, I. H., Frank, E., & Hall, M. a. (2011). Data Mining Practical Machine Learning Tools and Techniques Third Edition. Data Mining (Vol. 277).
https://doi.org/10.1002/1521-3773(20010316)40:6<9823::AID-ANIE9823>3.3.CO;2-C
Wu, X., Kumar, V., Ross, Q. J., Ghosh, J., Yang, Q., Motoda, H., … Steinberg, D.
(2008). Top 10 algorithms in data mining. Knowledge and Information Systems (Vol. 14). https://doi.org/10.1007/s10115-007-0114-2
Žalik, K. R. (2008). An efficient k′-means clustering algorithm. Pattern
Recognition Letters, 29(9), 1385–1391.
https://doi.org/10.1016/j.patrec.2008.02.014
A. Source code metode k-means dengan tools R 1. Dataset Iris
library(readxl)
iris <- read_excel("D:/Nusa Mandiri/Data Mining/Tugas/iris.xls", range = "B1:F151") View(iris)
iris.features=iris
iris.features$X__1<-NULL View(iris.features)
result <- kmeans(iris.features, 3) result
table(result$cluster, iris$X__1) 2. Dataset NASA MDP PC1
PC1 <- read_excel("D:/Nusa Mandiri/Dataset/PC1.xls", range = "A1:AL760")
View(PC1)
PC1.features=PC1
PC1.features$Defective <- NULL View(PC1.features)
result <- kmeans(PC1.features, 2) result
table(result$cluster, PC1$Defective) 3. Dataset NASA MDP PC2
PC2 <- read_excel("D:/Nusa Mandiri/Dataset/PC2.xls", range = "A1:AK1586")
View(PC2)
PC2.features=PC2
PC2.features$Defective <- NULL View(PC2.features)
result <- kmeans(PC2.features, 2) result
table(result$cluster, PC2$Defective) 4. Dataset NASA MDP PC3
PC3 <- read_excel("D:/Nusa Mandiri/Dataset/PC3.xls", range = "A1:AL1126")
View(PC3)
PC3.features=PC3
PC3.features$Defective <- NULL View(PC3.features)
result <- kmeans(PC3.features, 2) result
5. Dataset NASA MDP PC4
PC4 <- read_excel("D:/Nusa Mandiri/Dataset/PC4.xls", range = "A1:AL1400")
View(PC4)
PC4.features=PC4
PC4.features$Defective <- NULL View(PC4.features)
result <- kmeans(PC4.features, 2) result
table(result$cluster, PC4$Defective) 6. Dataset NASA MDP MW1
MW1 <- read_excel("D:/Nusa Mandiri/Dataset/MW1.xls", range = "A1:AL265")
View(MW1)
MW1.features=MW1
MW1.features$Defective <- NULL View(MW1.features)
result <- kmeans(MW1.features, 2) result
table(result$cluster, MW1$Defective) 7. Dataset NASA MDP CM1
CM1 <- read_excel("D:/Nusa Mandiri/Dataset/CM1.xls", range = "A1:AL345")
View(CM1)
CM1.features=CM1
CM1.features$Defective <- NULL View(CM1.features)
result <- kmeans(CM1.features, 2) result
table(result$cluster, CM1$Defective) 8. Dataset NASA MDP KC1
KC1 <- read_excel("D:/Nusa Mandiri/Dataset/KC1.xls", range = "A1:V2097")
View(KC1)
KC1.features=KC1
KC1.features$Defective <- NULL View(KC1.features)
result <- kmeans(KC1.features, 2) result
table(result$cluster, KC1$Defective) 9. Dataset NASA MDP KC3
KC3 <- read_excel("D:/Nusa Mandiri/Dataset/KC3.xls", range = "A1:AN201")
View(KC3)
KC3.features=KC3
KC3.features$Defective <- NULL
result <- kmeans(KC3.features, 2) result
table(result$cluster, KC3$Defective) 10. Dataset NASA MDP MC2
MC2 <- read_excel("D:/Nusa Mandiri/Dataset/MC2.xls", range = "A1:AN128")
View(MC2)
MC2.features=MC2
MC2.features$Defective <- NULL View(MC2.features)
result <- kmeans(MC2.features, 2) result
table(result$cluster, MC2$Defective)
B. Source code proposed method dengan tools R 1. Dataset Iris
library(readr)
iris <- read_excel("D:/Nusa Mandiri/Data Mining/Tugas/iris.xls", range = "B1:F151") head(iris)
iris_var = data.frame(iris$petal.width, iris$petal.length) plot(iris_var, col=iris$variety)
start <- matrix(c(2, 0.2, 2.3, 6.4, 1, 6.9), nrow = 3, ncol = 2) start
result <- kmeans(iris_var, start) result
PC1 <- read_excel("D:/Nusa Mandiri/Dataset/PC1.xls", range =
"A1:AL760") View(PC1)
PC1_var = data.frame(PC1$HALSTEAD_LEVEL, PC1$HALSTEAD_EFFORT)
start <- matrix(c(0, 0.5, 4279633.01, 56.150), nrow = 2, ncol = 2) start
result <- kmeans(PC1_var, start) result
table(result$cluster, PC1$Defective) plot(PC1_var, col=$result$cluster)
points(result$centers, col = 1:4, pch = 4, cex = 2)
3. Dataset NASA MDP PC2 library(readxl)
PC2 <- read_excel("D:/Nusa Mandiri/Dataset/PC2.xls", range =
"A1:AK1586") View(PC2)
PC2_var = data.frame(PC2$HALSTEAD_ERROR_EST, PC2$HALSTEAD_EFFORT)
start <- matrix(c(5.52, 0, 882628.7, 23.22), nrow = 2, ncol = 2) start
result <- kmeans(PC2_var, start) result
PC3 <- read_excel("D:/Nusa Mandiri/Dataset/PC3.xls", range =
"A1:AL1126") View(PC3)
PC3_var = data.frame(PC3$HALSTEAD_LEVEL, PC3$HALSTEAD_EFFORT)
start <- matrix(c(0.01, 0.5, 12751451.28, 36.19), nrow = 2, ncol = 2)
start
result <- kmeans(PC3_var, start) result
PC4 <- read_excel("D:/Nusa Mandiri/Dataset/PC4.xls", range =
"A1:AL1400") View(PC4)
PC4_var = data.frame(PC4$HALSTEAD_LEVEL, PC4$HALSTEAD_EFFORT)
start <- matrix(c(0.01, 0, 1401719.45, 0), nrow = 2, ncol = 2)
start
result <- kmeans(PC4_var, start) result
MW1 <- read_excel("D:/Nusa Mandiri/Dataset/MW1.xls", range =
"A1:AL265")
MW1_var = data.frame(MW1$ESSENTIAL_DENSITY, MW1$HALSTEAD_EFFORT)
start <- matrix(c(0, 0, 176348.54, 91.35), nrow = 2, ncol = 2)
start
result <- kmeans(MW1_var, start) result
CM1 <- read_excel("D:/Nusa Mandiri/Dataset/CM1.xls", range =
"A1:AL345") View(CM1)
CM1_var = data.frame(CM1$ESSENTIAL_LEVEL, CM1$HALSTEAD_EFFORT)
start <- matrix(c(0.01, 0.08, 1804682, 36577.66), nrow = 2, ncol = 2)
start
result <- kmeans(CM1_var, start) result
KC1 <- read_excel("D:/Nusa Mandiri/Dataset/KC1.xls", range =
"A1:V2097") View(KC1)
KC1_var = data.frame(KC1$ESSENTIAL_ERROR_EST, KC1$HALSTEAD_EFFORT)
start <- matrix(c(0.77, 0.06, 79207.4, 2607.044), nrow = 2, ncol = 2)
start
result <- kmeans(KC1_var, start) result
KC3 <- read_excel("D:/Nusa Mandiri/Dataset/KC3.xls", range =
"A1:AN201") View(KC3)
KC3_var = data.frame(KC3$ESSENTIAL_LEVEL,
start <- matrix(c(0.02, 0.08, 239295.4, 18073.8), nrow = 2, ncol = 2)
start
result <- kmeans(KC3_var, start) result
table(result$cluster, KC3$Defective) plot(KC3_var, col=$result$cluster)
points(result$centers, col = 1:4, pch = 4, cex = 2) 10. Dataset NASA MDP MC2
library(readxl)
MC2 <- read_excel("D:/Nusa Mandiri/Dataset/MC2.xls", range =
"A1:AN128") View(MC2)
MC2_var = data.frame(MC2$ESSENTIAL_LEVEL, MC2$HALSTEAD_EFFORT)
start <- matrix(c(0.01, 0.08, 627370.7, 31190.5), nrow = 2, ncol = 2)
start
result <- kmeans(MC2_var, start) result
table(result$cluster, MC2$Defective) plot(MC2_var, col=$result$cluster)
points(result$centers, col = 1:4, pch = 4, cex = 2)