• Tidak ada hasil yang ditemukan

Classification Techniques in Data Analysis

N/A
N/A
Rizki Izandi

Academic year: 2024

Membagikan " Classification Techniques in Data Analysis"

Copied!
20
0
0

Teks penuh

(1)

Data Mining

(Logistic Regression)

Dr. Sajarwo Anggai, S.ST., M.T.

NIDN : 0421108703

Data

Mining

(2)

Data Mining : Pertemuan 3

• Menyiapkan data training

• Algoritma Regresi Logistik

• Evaluasi Model

• Tugas

(3)

Data Training

• Siapkan data training dalam bentuk excel atau csv yang nantinya akan di load ke dalam Aplikasi.

• Data dapat diambil di dalam negeri maupun luar negeri atau

dibangun sendiri sesuai dengan kebutuhan.

(4)

Logistic Regression (Regresi Logistik)

Regresi logistik (kadang disebut model logistik atau model logit), dalam statistika digunakan untuk prediksi probabilitas kejadian suatu peristiwa dengan mencocokkan data pada fungsi logit kurva logistik.

https://id.wikipedia.org/wiki/Regresi_logistik

(5)

Type of Logistic Regression

https://www.geeksforgeeks.org/understanding-logistic-regression/

1. Binomial: In binomial Logistic regression, there can be only two possible types of the dependent variables, such as 0 or 1, Pass or Fail, etc.

2. Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered types of the dependent variable, such as

“cat”, “dogs”, or “sheep”

3. Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of dependent variables, such as “low”,

“Medium”, or “High”.

(6)

Formula

https://www.sciencedirect.com/topics/medicine-and-dentistry/logistic-regression-analysis

(7)

Sigmoid Activation Function

(8)

Rumus Linear masih ingat???

Independent Variable

Slope Intercept

https://towardsdatascience.com/introduction-to-logistic-regression-66248243c148

(9)

Sigmoid Function

https://towardsdatascience.com/introduction-to-logistic-regression-66248243c148

(10)

Pembuatan Model 1

(11)

Test & Score

(12)

Cross Entropy

https://www.trivusi.web.id/2022/08/loss-function.html

Cross Entropy adalah loss function yang banyak digunakan pada tugas klasifikasi. Cross Entropy mengukur perbedaan antara dua distribusi probabilitas untuk variabel acak tertentu atau serangkaian peristiwa. 

Cross Entropy digunakan saat menyesuaikan bobot model selama training.

Tujuannya adalah untuk meminimalkan loss, yaitu semakin kecil loss semakin baik modelnya. Model yang sempurna memiliki cross-entropy loss 0. Metode ini biasanya berfungsi untuk klasifikasi multi-kelas dan multi-label.

Cross-entropy dapat dihitung menggunakan probabilitas kejadian dari P dan Q, sebagai berikut:

(13)

Log Loss

Log Loss, merupakan Binary Cross Entropy. Loss function jenis ini mengukur kinerja model

klasifikasi, di mana outputnya adalah probabilitas dengan nilai antara 0 dan 1.

Saat probabilitas yang diprediksi semakin jauh dari label sebenarnya, Log loss akan meningkat.

Model yang sempurna akan memiliki Log Loss 0.

https://www.trivusi.web.id/2022/08/loss-function.html

(14)

Model Evaluation (Evaluasi Model)

https://orangedatamining.com/widget-catalog/evaluate/testandscore/

• Area under ROC is the area under the receiver-operating curve.

• Classification accuracy is the proportion of correctly classified examples.

• F-1 is a weighted harmonic mean of precision and recall (see below).

• Precision is the proportion of true positives among instances classified as positive, e.g. the proportion of Iris virginica correctly identified as Iris virginica.

• Recall is the proportion of true positives among all positive instances in the data, e.g. the number of sick among all diagnosed as sick.

• Specificity is the proportion of true negatives among all negative instances, e.g. the number of non-sick among all diagnosed as non-sick.

• LogLoss or cross-entropy loss takes into account the uncertainty of your prediction based on how much it varies from the actual label.

• Matthews correlation coefficient takes into account true and false positives

and negatives and is generally regarded as a balanced measure which can

be used even if the classes are of very different sizes.

(15)

Pembuatan Model 2

(16)
(17)

Confusion Matrix

(18)

Tugas

1. Buat Model untuk Regresi Logistik

2. Metrik Pengukuran Regresi Logistik + Confusion Matrix (Uraikan) 3. Kelebihan dan Kekurangan Regresi Logistik

4. Cari 10 Jurnal terkait pemanfaatan Regresi Logistik 5. Diskusikan dalam Forum

6. Tuliskan dalam laporan (dikumpulkan saat UTS)

(19)

Referensi Tambahan

1. https://sekolahstata.com/apa-itu-logistic-regression/

2. https://media.neliti.com/media/publications/31101-penerapan-metode-regresi-logistik-dalam-a7967 35e.pdf

3. https://www.datacamp.com/tutorial/understanding-logistic-regression-python 4. https://id.wikipedia.org/wiki/Regresi_logistik

5. https://en.wikipedia.org/wiki/Logistic_function

6. https://aws.amazon.com/id/compare/the-difference-between-linear-regression-and-logistic-regressio n/

7. https://towardsdatascience.com/introduction-to-logistic-regression-66248243c148

8. https://medium.com/@cmukesh8688/logistic-regression-sigmoid-function-and-threshold-b37b82a4 cd79

9. https://realpython.com/logistic-regression-python

10. https://www.trivusi.web.id/2023/03/perbedaan-mae-mse-rmse-dan-mape.html 11. https://www.askpython.com/resources/regression-error-metrics

12. https://www.kaggle.com/datasets/deependraverma13/diabetes-healthcare-comprehensive-dataset

(20)

Terima Kasih

Sajarwo Anggai

Dosen – Universitas Pamulang NIDN : 0421108703

Email : [email protected] WA : 082343006557

Universitas Pamulang Magister Teknik

Informatika

Referensi

Dokumen terkait

Dalam penelitian ini, dilakukan analisis komparasi empat algoritma klasifikasi data mining yaitu logistic regression, decision tree, naïve bayes dan neural network

Logistic Regression : Interpreting, inference and model building Logistic Regression Model for binary, Nominal and Ordinal response Study and examination requirements and forms of

In this study, two data mining techniques, decision tree and logistic regression, were used to model CHD using Framingham Heart Study FHS data.. Random Undersampling technique was

©Daffodil International University i APPROVAL This project titled “Classification of Chronic Kidney Disease CKD Using Data Mining Techniques,” submitted by Faisal Arafat, ID No:

This paper introduces a new hybrid classification algorithm that combines logistic regression and decision trees to predict customer

The document explores the application of data mining techniques to enhance the efficiency of ATM transactions in

19 Supervised Models 20 Regression 20 Training and Testing of Data 22 Classification 24 Logistic Regression 24 Supervised Clustering Methods 26 Mixed Methods 31 Tree-Based

Critical analysis of data types in ABL The empirical status of data types for different learning tasks and their treatment by known numerical data mining methods such as regression,