Debtors Prospective Assessment Application using Naive Bayes at Mitra Sejahtera Cooperative

(1)

Debtors Prospective Assessment Application using Naive Bayes at Mitra Sejahtera Cooperative

Indra Griha Tofik Isa^1,*, Beni Junedi²

1 Informatics Management Department, Politeknik Negeri Sriwijaya, Palembang, Indonesia

2 Mathematic Education Department, Universitas Bina Bangsa, Serang, Indonesia Email: ^1,*[email protected], ²[email protected]

Corresponding Author: [email protected]

Submitted: 13/03/2022; Accepted: 12/05/2022; Published: 31/07/2022

Abstract−Utilization of historical data into new knowledge can increase added value for its users, including Mitra Setia Cooperative (KMS) which has debtor data that is not utilized. “Not Paid Off” potentioal of debtors cannot be detected as early as possible. In this study using the Naive Bayes algorithm in classifying the feasibility of prospective debtors based on the classification of "Paid Off" and

"Not Paid Off" based on parameter of Age, Sex, Amount of Loan, Occupation, Income, and Repayment Period. The research stages consist of (1) Research Initiation, (2) Data Selection, (3) Data Preprocessing, (4) System Design, (5) Program Implementation and (6) Program Testing. The purpose of this study is to minimize the increase in bad loans by implementing the Naive Bayes method in the application of the assessment of prospective debtors. The final result is a debtors prospective assessment application at Mitra Sejahtera Cooperative with an accuracy rate of 86%.

Keywords: Debtor Prospective Assessment; Naive Bayes; Mitra Setia Cooperatives (KMS); Classification

1. INTRODUCTION

Cooperatives are an inseparable part and are a representation of the Indonesian people's mutual cooperation which called as “gotong royong”. Basically, cooperatives are based on mutual assistance which are driven by the desire to help others to improve economic life [1]. Currently, cooperatives are under the Ministry of Cooperatives and Small and Medium Enterprises (KemenkopUKM) and spread as many as 45,489 units with a total membership of 22,463,738 members throughout Indonesia until the end of 2019 [2]. If the average is calculated from year to year there is an increase in the number of cooperatives in Indonesia, such as from 2011 to 2015 there was an increase in active cooperatives as many as 4139 units [3]. Meanwhile, in West Java, the number of cooperatives is relatively high, namely as many as 13,247 cooperatives with a total membership of 2,040,509 [2]. One of the cooperatives in West Java, namely the Mitra Sejahtera Cooperative (KMS) located in Sukabumi City, was established for the welfare of teachers in SMK Negeri 1 Sukabumi City. KMS is a combination of consumption cooperatives and savings and loan cooperatives. Currently, KMS has 484 memberships and total assets of Rp. 1,854,350,000 as of the end of 2019. In terms of working capital, KMS has an upward trend over the last 5 years from 2015 to 2019 of Rp. 1,434,000,000, as shown in the figure. 1 below:

Figure 1. Data Total Capital of Mitra Sejahtera Cooperative period of 2015 – 2019

In carrying out its operational activities, which consist of selling and purchasing consumer goods, savings and loans and processing data on cooperative members at KMS, it has been computerized since 2012 by computer system desktop based. During the operational of KMS, especially in the process of loan to the members, on average around 7%

to 10% of the year there are bad loans which have an impact on decreasing cash circulation and income for KMS. If this is repeated and the percentage of bad loans tends to rise, it will lead to the potential for cash flow stagnation of KMS and even cooperatives to go bankrupt so that KMS is no longer able to prosper its members. Some of the causes of the frequent occurrence of bad loans, one of which is that in the initial assessment process the debtor data was not carried out in depth by looking at the criteria for prospective debtors and the history of bad credit data that had occurred previously.

In this study to minimize the increase in bad loans is implementing the Naive Bayes method in the application of assessment of prospective debtors. Naive Bayes is one of the classification techniques in finding conclusions using probability [4]. The large number of credit data reports in KMS so far has only been used as an annual report and has not

1434 1420 1532 1662 1732

2015 2016 2017 2018 2019

Data of Total Capital of Mitra Sejahtera Cooperatives (in Million Rupiah)

(2)

been used as the new “knowledge” that can be used for recommendations in decision making. So that in this study the data report will be used as the basis for providing assessment recommendations for prospective debtors. The parameters involved in the assessment of prospective debtors include age, occupation, loan amount, sex, income, repayment period and payment status with 862 training data from 2015 to 2019. In implementing Naive Bayes in the application, also focusing to functional aspects of users, to make it easier for users to operate the application [5].

The method used in this research is classification, where the problem of data classification is the large variety of variations from the implementation of data mining in applications [6]. This is due to problems in studying the relationship of interest between feature variables and target variables [7]. Classification is one part of the data mining method to find functions and models that distinguish the class of an object whose label is not known[8] [9]. In fulfiling this goal, the classification forms a model that can distinguish a set of data into different classes based on certain functions and rules.

There are various models in the classification, including decision trees, mathematical formulas and "If-Then" rules. [10].

Technically, the classification algorithm implemented in this research is Naive Bayes, which is one part of the classification technique which uses statistical and probability methods in its calculations. Naive Bayes was first developed by a British scientist, namely Thomas Bayes which was originally used to predict future possibilities based on the history of previous experiences. Naive Bayes classification is assumed that there is no relationship between the characteristics of a particular class with other classes, even though the class has characteristics or not [11].

Several other studies related to the implementation of naive bayes are in the preferencing of students at SMAN 5 Pamekasan, where the data classification process is carried out with the parameters of student exam assessments. The training data is 720 data records with data retrieval 5 times the amount of testing data. The final result of this study resulted in an accuracy value of 92.11% and an error rate of 7.02% [12]. Other research on the KSPPS BMT Cooperative was carried out for the eligibility of prospective credit members with 472 data training data. The final result of this research is an application product with test results in the form of 82% precision, 80% accuracy and 94% recall [13]. Meanwhile, in A. P. Fadilah's research, implementing Naive Bayes in the selection of course concentrations at the Indonesian Computer University, which consisted of concentrations in Information Systems Engineering and Information Technology. The selection is made based on the assessment of courses consisting of Programming Lab I, Programming Lab II, Programming Lab III, Web Programming, Computer Networking, Information System Concepts, Business Process Analysis, E-Business Concepts and Information System Management. The training data used were 375 data with an accuracy rate of 81% [14].

2. RESEARCH METHODOLOGY

There are 7 stages in research methodology, as shown in figure 2 below:

Figure 2. Research Stage

As described in the previous stage where the method implemented in this study uses Naive Bayes. At the Naive Bayes related to the study, material data is structured data sourced from customer data and credit payment reports for Mitra Setia Cooperatives. Naive Bayes is an algorithm that studies the probability of an object with certain characteristics belonging to a certain group/class or a probabilistic classifier. The Naive Bayes algorithm is called "naive" because it makes the assumption that the occurrence of certain features is independent of the occurrence of other features.

2.1 Research Initiation

This stage is the initial activity in the research conducted. In this stage, the amount of data used as training data in the form of credit reports from the 2015 - 2019 period is determined with a total of 862 data. Design in system development using structured and implementation using desktop-based applications. The detail of specifications for software and hardware requirements are listed in table 1 below:

Table 1. Software and Hardware Specification

No Deskripsi Spesifikasi Minimum

1 Operating System Windows 7 / +

2 Database System MySQL

3 System Development Software Microsoft Visual Studio 2012

4 Processor Intel Pentium i3 / setara

5 RAM 4 GB

6 HDD 200 GB

Research

Initiation Data Selection Data Pre Processing

System Design

Program Implementation

Program Testing

(3)

2.2 Data Selection

After determining the data that will be used as the basis for research, the next step is to sort out the data technically what parameters will be used. Overall, the credit report contains 20 parameters, but in this study only 7 parameters were used, namely age, occupation, loan amount, gender, income, repayment period and payment status. Table 2 shows the parameters in the credit report (yellow color shows the parameters used in the study).

Table 2. Credit Report Parameter

No Parameter Description

1 Account No. Account No of Debtor / Customer 2 Customer Name Full name of Debtor / Customer

3 Address Living and Correspondence Address of Debtor / Customer 4 Place of Birth Place of Birth of Debtor / Customer

5 Date of Birth Date of Birth of Debtor / Customer

6 Age Age of Debtor / Customer

7 Sex Sex of Debtor / Customer

8 Amount of Loan the amount of the loan borrowed by the Debtor / Customer 9 Occupation Occupation of Debtor / Customer

10 Amount of Interest Interest charged to the debtor / customer based on the loan amount 11 Total of Loan The sum of the loan amount and the amount of interest

12 Fine The amount of the delay multiplied by the interest variable 13 Accumulation of Delay Number of days late payment from the due date

14 Due Date of Payment Debtor / customer deadlines in making payments

15 Income Income of Debtor / Customer

16 Repayment Period Payment Period of Debtor / Customer

17 Payment Status Payment status of loan, defined by PAID OFF and NOT PAID OFF 18 Member Date The date the debtor/customer registered as a member of the cooperative 19 Marital Status Marital Status of Debtor / Customer

20 Name of Heir The Heir of Debtor / Customer 2.3 Data Pre Processing

In this stage, data cleaning is carried out according to 7 parameters which are a reduction of the 20 parameters of the KMS credit report. Eliminating redundant data and assigning variable values to each of these parameters. The valid data will then become training data, which will later become the basis for making recommendations for conclusions in inputting new data. Table 3 shows the specific data of the parameters used.

Table 3. Specific data of 7 Parameter of Credit Report

No Parameter Data Spesifik

1 Age 20 – 35 year and 36 – 55 year

2 Sex Male | Female

3 Amount of Loan 5000.000 to 10.000.000 | 10.000.000 to 20.000.000

4 Occupation PNS | NON-PNS

5 Income Less than 4.000.000 | More than of equal 4.000.000 6 Repayment Period 12 Bulan | 24 Bulan

7 Payment Status Paid Off | Not Paid Off 2.4 System Design

System Design using structured design which include designing Context Diagram, Data Flow Diagram, and Entity Relationship Diagram. Besides, designing database architecture and mockup user interface of the application.

2.5 Program Implementation

After designing the system, the next stage is the implementation of the system design into a programming language. The tools and programming language used in this research is Microsoft Visual Studio 2012 with visual basic programming language, while the database uses MySQL.

2.6 Program Testing

Program testing is carried out after program implementation, where program testing is carried out using 2 testing techniques, these are (1) blackbox testing which tests the program from a functional perspective. In this test, a test scenario is made to test the input - process - output system part; (2) Testing the training data, which in this test tests the level of accuracy of the training data in the application made. The test is carried out by inputting data records, and comparing the test results with the actual data, so that the percentage (%) of the error rate from the test data is obtained. The parties

(4)

involved in this test consisted of the chairman of the cooperative, the finance department of the cooperative and several administrators from the Mitra Sejahtera Cooperative.

3. RESULT AND DISCUSSION

3.1 Data Selection

The data selected was based on the Mitra Sejahtera Cooperative credit report which was selected randomly during the 2015 – 2019 period. Overall the data taken were 862 credit data report. As previously described, there are 20 parameters in the credit report, but in this study the data taken only consisted of 7 parameters. Figure 3 shows the form of raw data in the form of customer credit reports for Mitra Sejahtera Cooperatives:

Figure 3. Customer Credit Report of Mitra Sejahtera Cooperative

Seven (7) parameters consisting of Age, Sex, Loan Amount, Employment, Income, Repayment Period and Repayment Status are recapitulated into a 7-parameter raw data table with a total of 862 data records, as shown in table 4 below:

Table 4. The Raw Data of 7 Parameter

No Age Sex Loan

Amount Occupation Income Repayment

Period

Paymet Status

1 25 Male 10.000.000 Teacher 4.200.000 24 Month Paid Off

2 43 Male 5.000.000 PNS 4.200.000 12 Month Paid Off

3 37 Male 5.000.000 Private Sector 2.500.000 12 Month Paid Off

4 44 Female 15.000.000 Private Sector 3.000.000 24 Month Paid Off 5 42 Male 10.000.000 Bankir of Bank Mega 4.000.000 24 Month Paid Off

6 27 Female 10.000.000 Teacher 4.000.000 12 Month Not Paid Off

7 36 Female 10.000.000 TU Staff 4.200.000 12 Month Not Paid Off

8 28 Female 12.500.000 Bankir of BRI Bank 3.850.000 12 Month Paid Off

9 52 Male 5.000.000 Honor 2000.000 12 Month Not Paid Off

10 33 Male 12.500.000 Honor 2000.000 12 Month Paid Off

11 42 Female 12.500.000 TU Staff 3.600.000 12 Month Paid Off

... ... ... ... ... ... ... ...

862 49 Female 10.000.000 PNS 3.600.000 12 Month Not Paid Off

3.2 Data Pre Processing

In order to make it easier to read and interpret the training data, preprocessing is carried out where data is cleaned by removing redundant data (duplicate data), data with missing values, and categorizing both parameters (class) and data records in it. Class categorization using the following data: C1 = Age; C2 = Sex; C2 = Loan Amount; C4 = Occupation;

C5 = Income; C6 = Repayment Period; C7 = Payment Status. The “Occupation” class (C4) consists of various job variants, namely civil servants (PNS), BUMN employees, private employees, entrepreneurs, and daily workers. So that in order to facilitate the weighting of the scores, it is divided into 2 main categories, namely PNS (P) and NON-PNS (NP). Overall, the description of the class categories used in table 5 is as follows:

(5)

Table 5. Class Category Description

No Parameter Class Category Description

1 Age U1 Between 20 – 35

U2 Between 36 – 55

2 Sex L Male

P Female 3 Loan Amount

JP1 5.000.000 to 10.000.000 JP2 10.000.001 to

20.000.000

4 Occupation P PNS

NP NON-PNS

5 Income P1 < 5.000.000 P2 >= 5.000.000 6 Repayment

Period

W1 12 Month W2 24 Month 7 Payment

Status

L Paid Off TL Not Paid Off

In class Payment Status (C7), from 862 data, there is 8.7% data or 75 data records in “Payment Status” = Not Paid Off (TL). While the remaining 93.2% or 787 data records obtained the value of Repayment Status = Paid Off (L). The following table 6 shows the data records that have been adjusted to the class category:

Table 6. Class Category Record Data

No C1 C2 C3 C4 C5 C6 C7

1 25 L 10 P 4.2 24 L

2 43 L 5 P 4.2 12 L

3 37 L 5 NP 2.5 12 L

4 44 P 15 NP 3 24 L

5 42 L 10 NP 4 24 L

6 27 P 10 P 4.0 12 TL

7 36 P 10 P 4.2 12 TL

8 28 P 12.5 NP 3.85 12 L

9 52 L 5 NP 2 12 TL

10 33 L 12.5 NP 2 12 L

11 42 P 12.5 P 3.6 12 L

... ... ... ... ... ... ... ...

862 862 P 10 P 3.6 12 TL

3.3 Implementasi Naive Bayes dalam Aplikasi Asesmen Debitur Koperasi Mitra Sejahtera

In implementing Naive Bayes, the probability of each C1 to C6 is calculated to the probability of C7 (Payment Status).

The first stage is calculated probability L and TL from C7:

P(C7)

P(C7 = “L”) = 787 / 862 = 0,91 P(C7 = “TL”) = 75 / 862 = 0,087

The next step is to calculate the probability of C1 to C6 on C7, hereinafter referred to as X. For example, there is data testing with data: (1) Age = 25 years; (2) Sex = Female; (3) Loan Amount = IDR 8,000,000; (4) Occupation = Private Employee; (5) Income = IDR 3,600,000 and (6) Repayment Period = 12 Months. From the testing data, based on table 5, the criteria class can be made as follows:

1. Age = 25 → C1=U1

2. Sex = Female → C2=P

3. Loan Amount = IDR 8.000.000 → C3=JP1 4. Occupation = Private Employee → C4=NP 5. Income = IDR 3.600.000 → C5=P1 6. Repayment Period = 12 Months → C6=W1

Next is to calculate the amount of data from each criterion C1 to C6 on C7, for example, based on 862 training data, criteria C1 = U1 with criteria status C7 = L totaling 524 data, while criteria C1 = U1 with criteria status C7 = TL a total of 27 data. The next step is to calculate the number of criteria C2 = P with criteria status C7 = L, 326 data are obtained

(6)

and criteria C2 = P with criteria status C7 = TL are 40 data. The same steps are performed to calculate C3 to C6. After calculating the amount of data from each criterion, the probability is calculated using the P(X|C7) formulation as follows:

P(X|C7)

The next step is to multiply P(X|C7) with P(C7), both with L and TL values, so that the following calculation results:

The data testing calculation result by C7 score = L P(X|C7 = L) x P(C7 = L) =

(0.66 x 0.41 x 0.61 x 0.56 x 0.25 x 0.73) x (0,91) = 0.0153 The data testing calculation result by C7 score = TL P(X|C7 = TL) x P(C7 = TL) =

(0.36 x 0.53 x 0.44 x 0.65 x 0.37 x 0.46) x (0,087) = 0.0008

The results of these calculations show that the largest value of P(X|C7) x P(C7) is 0.0153 which means that the testing data produces a value of C7 is L ("Paid Off") or in other words the value of C7 = L has a greater value compared to C7 = TL. So that the testing data with criteria values C1 = U1, C2 = P, C3 = JP1, C4 = NP, C5 = P1 and C6 = W1 has a probability value of “Paid Off” out which is greater than the probability value of “Not Paid Off”.

3.4 System Design

In designing system using structured approaching by Context Diagram, as shown in figure 4 below which has 1 entity, namely “Admin”. Admin inputs customer datase and debtor data, while Admin receives Assessment information result from the system

Figure 4. Context Diagram of The Application

The process is described in detail in the Data Flow Diagram (DFD) in Figure 5, which consists of Input Customer Dataset, Assessment of Prospective Debtors and Printed Assessment Results and 2 data stores namely debtor dataset and assessment results.

Figure 5. Data Flow Diagram of the Application

(7)

3.5 System Implementation

In this stage the system implementation uses Microsoft Visual Studio 2012 tools with mySQL database. Figure 7 (Left) shows the results of the implementation of the system in the form of an assessment feature for prospective debtors. In this form the admin fills in the criteria for prospective debtors by inputting data on Age, Sex, Loan Amount, Occupation, Income and Repayment Period. After the data is filled in and valid, then click the CEK button to find out the results of the prospective debtor assessment. So that the results of the assessment appear in the form of a dialog box whether the customer with the ID is accepted or rejected as shown in Figure 6 (Right), based on the input data that refers to the training data of the Mitra Sejahtera Cooperative customer.

Figure 6. Assessment Form Prospective Debtors (Left), The result of Assessment (Right) 3.6 Program Testing

3.6.1 Blackbox Testing

Testing is used to measure the functional quality of the assessment application for prospective debtors are Blackbox Testing, where there are 25 test scenarios. Table 7 shows the results of the Blackbox Testing test:

Table 7. Result of Blackbox Testing

No Testing Scenario Result Desc.

1 Input login form using valid username dan password

Data accepted notification appear and go on

main menu OK

2 Input login form using invalid username dan password

Data restricted notification appear and keep

on login menu OK

3 Click button of Debtors prospective menu Debtors prospective assessment form appear OK 4 Input debtor prospective form using relevant

data Recommendation result appear OK

... ... ... ...

25 Click debtor assessment result printed Printed page appear OK 3.6.2 Accuracy Data Testing

This test is carried out to test the accuracy of the algorithm implemented in the application for the assessment of prospective debtors. The data that has been inputted is 35 testing data taken from customer data for the December 2019 period. The data that has been inputted into the application is then compared with real customer data for the December 2019 period to see the level of data fail from the input data in the application. The results of the comparison between the input data and real customer data for the December 2019 period found that 5 data had data discrepancies between the application input data and the customer's real data for the December 2019 period, which means that there is an error rate of 14% or an accuracy rate of 86%.

4. CONCLUSION

The implementation of Naive Bayes in Debtors Prospective Assessment Application provides easy recommendations in assessing potential debtors who have the potential to be paid off or not paid off through 6 parameters consisting of age, sex, loan amount, occupation, income and repayment period. From the results of testing the algorithm using 35 testing data, an accuracy value of 86% is obtained. The suggestion for further development is to increase the amount of training

(8)

data so that the quality of the recommendations produced is more precise and combines other classification methods with Naive Bayes

REFERENCES

[1] Mahasiswa-Ekonomi-Syariah, EKONOMI KOPERASI. Pasuruan: Fakultas Agama Islam - Universitas Yudharta Pasuruan, 2018.

[2] Kementerian-Koperasi-dan-Usaha-Kecil-dan-Mikro, “Laporan Data Koperasi Per 31 Desember 2019,” Jakarta, 2019.

[3] S. H. Permana, “Strategi Peningkatan Usaha Mikro, Kecil, Dan Menengah (Umkm) Di Indonesia,” Aspirasi, vol. 8, no. 1, pp.

93–103, 2017.

[4] S. Suthaharan, Machine Learning Models and Algorithms for Big Data Classification. North Carolina: Springer, 2016.

[5] I. G. T. Isa, “Kansei Engineering Approach in Software Interface Design,” J. Sci. Innovare, vol. 1, no. 01, pp. 22–26, 2018, doi:

10.33751/jsi.v1i01.680.

[6] M. K. Sari, E. Ernawati, and I. Wisnubhadra, “Pembangunan Aplikasi Klasifikasi Mahasiswa Baru untuk Prediksi Hasil Studi Menggunakan NaÃ¯ve Bayes Classifier,” J. Buana Inform., vol. 7, no. 2, pp. 135–142, 2016, doi: 10.24002/jbi.v7i2.492.

[7] C. C. Aggarwal, “An Introduction to Data Classification,” in Data Classification: Algorithms and Applications, C. C. Aggarwal, Ed. New York, USA: CRC Press, 2014, pp. 1–31.

[8] R. B. Hadiprakoso and I. K. S. Buana, “Performance Comparison of Feature Extraction and Machine Learning Classification Algorithms for Face Recognition,” IJICS (International J. Informatics Comput. Sci., vol. 5, no. 3, pp. 250–257, 2021, doi:

10.30865/ijics.v5i3.3333.

[9] D. A. A. AlHammadi and M. S. Aksoy, “Data Mining in Higher Education,” Period. Eng. Nat. Sci., vol. 1, no. 2, pp. 1–4, 2013, doi: 10.21533/pen.v1i2.17.

[10] Bustami, “Penerapan Algoritma Naive Bayes untuk Mengklasifikasi Data Nasabah Asuransi,” J. Inform., vol. 8, no. 1, pp. 884–

898, 2014.

[11] F. Marisa, “Educational Data Mining (Konsep dan Penerapan),” J. Teknol. Inf., vol. 4, no. 2, pp. 91–93, 2013.

[12] I. Listiowarni, “Implementasi Naïve Bayessian dengan Laplacian Smoothing untuk Peminatan dan Lintas Minat Siswa SMAN 5 Pamekasan,” J. Sisfokom (Sistem Inf. dan Komputer), vol. 8, no. 2, p. 124, 2019, doi: 10.32736/sisfokom.v8i2.652.

[13] D. A. Kurniawan and Y. I. Kurniawan, “Aplikasi Prediksi Kelayakan Calon Anggota Kredit Menggunakan Algoritma Naïve Bayes,” J. Teknol. dan Manaj. Inform., vol. 4, no. 1, 2018, doi: 10.26905/jtmi.v4i1.1831.

[14] A. P. Fadillah and B. Hardiyana, “Penerapan Naïve Bayes Classifier Untuk Pemilihan Konsentrasi Mata Kuliah,” J. Teknol. dan Inf., vol. 8, no. 2, 2018, doi: 10.34010/jati.v8i2.1039.