• Tidak ada hasil yang ditemukan

Identifying clusters structure of rare events using random forest clustering ABSTRACT

N/A
N/A
Protected

Academic year: 2024

Membagikan "Identifying clusters structure of rare events using random forest clustering ABSTRACT"

Copied!
1
0
0

Teks penuh

(1)

Identifying clusters structure of rare events using random forest clustering ABSTRACT

Given highly imbalanced data, most learning algorithms faced the challenge to accurately predict rare events, while such cases were the ones that carry importance and useful knowledge. In a binary class label dataset, the rare events are the ones in the minority class.

This study used a stroke dataset with a binary class label and the class imbalance ratio was 54:1. In addition to that, the dataset contained missing values and mixed data types. To identify the intrinsic structures in the minority class (the stroke group), Random Forest Clustering was used to produce the proximity matrix and fed to Partition around Medoid (PAM) clustering method to identify the optimal number of clusters. The proximity plot seems to show there could be cluster tendency and k=2 was identified to be the best as compared to k=3 to k=5. Based on the internal cluster validation, however, the silhouette coefficient width was small (0.1) indicating much of the data objects were within the other boundary of the other class. We have suggested a further investigation plan in this paper for the next action.

Referensi

Dokumen terkait

The elbow method to determine the optimal number of clusters and k-means as the clustering algorithm shows that courses with high average values are in cluster

This study aims to determine the performance of the LED array arrangement without making hardware first and only using input variables in the form of input temperature,

The method that will be used in this research is to take a dataset about “Public comments on the increase in fuel prices in Indonesia” that predicts the

This study uses a price intelligence approach using the k-means clustering method for price grouping based on the closest competitor and demand forecasting using linear regression to

RESEARCH METHODOLOGY This study builds a classification model to analyze the work readiness of Telkom University students using the Multinomial Logistic Regression method and Random

Conclusion Based on the results of the study, it can be concluded that the use of Machine Learning Random Forest models can be used to predict factors that affect flight delays in

The k-means clustering and cosine similarity methods are techniques that can be used in heart disease prediction systems to group patient data into similar groups or to measure the

Confusion Matrix for Urgency Metrics Source: researcher property Confusion Matrix: The confusion matrix for urgency refer to Fig 4, provides an overview of the model's classification