Loan Default Prediction in Microfinance Group Lending with Machine Learning

(1)

Loan Default Prediction in Microfinance Group Lending with Machine Learning

Kadek Dwi Pradnyana^1*, Raden Aswin Rahadi¹

1 School of Business and Management, Institut Teknologi Bandung, Jakarta, Indonesia

*Corresponding Author: [email protected]

Accepted: 15 December 2022 | Published: 31 December 2022

DOI:https://doi.org/10.55057/ijbtm.2022.4.4.8

__________________________________________________________________________________________

Abstract: Microfinance fintech enables the unbanked and underbanked communities to access credit by offering small, no collateral loans. Microfinance institutions (MFI) usually use credit scoring to filter out risky borrowers. Credit scoring method for individual loans has been widely studied. However, none are for group lending where members are women micro- entrepreneurs in a developing country, and jointly responsible for loan repayment. This research try to build a credit default prediction model for microfinance group lending using machine learning techniques. We examine six different machine learning methods, including XGBoost, logistic regression, linear discriminant analysis (LDA), decision trees, k-nearest neighbour (KNN) and random forest. The XGBoost model performs the best during the first modeling phase. With an accuracy of 0.97 and an AUC score of 0.85, it performs better than other models. Decision tree and random forest give comparable outcomes, with AUCs of 0.81 and 0.80 and accuracies of 0.81, 0.95, and 0.97. In an effort to increase performance, class balancing is performed. The XGBoost model's performance was successfully enhanced, resulting in an increase in AUC from 0.85 to 0.89. Its accuracy stays the same as 0.97. False positive and false negative rates for this model are both low (2.05% and 1.38%, respectively).

Consequently, the model has been effectively developed and is capable of differentiating between bad and good loans.

Keywords: credit model, default prediction, machine learning, microfinance, group lending ___________________________________________________________________________

1. Introduction

Financial technology (fintech) technologies including payments, lending, and investing have experienced tremendous growth. It promotes more and faster transactions, more access to capital, and financial wellness. The fintech sector in Indonesia has expanded in the 2020–2022 period. According to studies from Google, Bain, and Temasek, digital payments are expected to reach US$351 billion by 2025, along with digital loan books reaching US$35 billion and fintech investment AUM reaching US$28 billion (AC Ventures, 2022).

The possibility in fintech is huge, though, as adoption rates still only account for a small portion of the overall addressable market. Tare 63 million enterprises in Indonesia that are micro small and medium enterprises (MSME), and there are roughly 186 million people of working age who fall into the middle to lower per capita expenditure category (PwC, 2019). Out of this entire population, the majority of MSMEs and those with middle to low incomes still lack access to finance. Fintech lending has a great opportunity to capitalize on this problem and

(2)

assist widen much-needed credit access.

Microfinance fintech enables the unbanked and underbanked communities to access credit by offering small, no collateral loans. It has a history that dates back to Bangladesh in 1974. When visiting a Bangladeshi village, Professor Muhammad Yunus became aware of the numerous challenges women faced in carrying out their meager economic activity (Vidal & Agustí, 2018). He calculated that $27 in tiny change might be sufficient to enable all of them raise their standards. This concept contributed to establish Grameen Bank, one of the microcredit institutions with the largest customers worldwide, with more than $1 billion in outstanding loans and more than 8 million users (Grameen Bank, 2015).

Microfinance is well known as a critical instrument for alleviating poverty and promoting social and economic well-being. It assists them in diversifying their income, stabilizing household spending (Samer et al., 2015). It has been demonstrated that microfinance has a significant influence on the economy development at the macro level (Imai et al., 2012). In addition, positive effects on food security, heath, literacy, and female empowerment have been observed. (Littlefield et al., 2003).

Figure 1: Peer-to-peer Lending Model Source: Author’s Analysis

Typically, microfinance institutions (MFI) adopt a peer-to-peer business model as shown in figure 1. They use platforms that act as intermediaries to connect people wanting to lend money to borrowers who have listed their needs on the platforms MFI utilizes an technology platform, such as a website or mobile application, to bring together lenders and borrowers. The P2P lending platforms make profit from the transactions between lenders and borrowers in a variety of methods, including profit-sharing, commission fee, and extra interest for borrowers (Rizyameza, 2020).

There are two microfinance models such as individual lending and group lending. Individual lending is a loan for personal consumption. In contrast to individual lending, group lending is a loan for business owners or micro-entrepreneurs to support productive activities, such as people who run a small shop or are seeking to start a new business. In group lending, each group member is jointly responsible for the debt repayments of the others under the joint liability scheme (Urs & Lehner, 2009). In group lending, borrowers who live in the same area are arranged into a group. Every week, borrowers will do repayment by gathering in the same place and meet the MFI’s field officer to do repayment. If a borrower is unable to pay, the group is responsible to pay the loan. This scheme creates social pressure among the borrowers hence lowering the probability of default payments.

For microfinance institutions (MFI), credit risk is the most prevalent risk. Simply put, credit risk is the potential for an unfavorable circumstance when borrowers (business owners) fail to repay the loan amount. Since many of the borrowers are low class with low income and low capacity, the risk is more significant for the MFIs. Additionally, the loans are unsecured, which means there is no collateral, leaving no assets to pay losses in the case of a customer default.

(3)

One of the main causes of loan defaults is a poor or inadequate credit assessment of the loan.

Prior to approving any loan, it is necessary to evaluate the customers’ character, ability to repay, and cash flows of their businesses. This helps in making decisions about whether to provide a customer a loan with a proper loan amount. Unworthy borrowers may receive loans or receive larger sums of money as a result of a poor assessment. If loans are offered to people who are unable to repay them since they do not have enough money or willingness to make the installment will lead to credit defaults.

Financial institutions use credit prediction model to determine a potential borrower's creditworthiness through statistical analysis. It uses quantitative measures of the performance and characteristics of individual applicants to calculate a credit score or a probability of default using various internal data and/or external data.

The goal of credit prediction model is to classify the input sample into one of the class labels using a variety of observed variables or sample-related features. The classifier's input is made up of many pieces of information that characterize the social and demographic traits (gender, marital status, employment, degree of education) and financial status (loan amount, tenure, salary, expenses) of the applicant. The output of the classifier must then reflect the applicant's creditworthiness. Credit prediction model’s most common form aims to categorize credit applicants into good (those who are responsible for repaying the debt) or bad (those who should be rejected due to the high likelihood of defaulting) classifications (García et al., 2012).

Credit scoring has been explored by many research and implemented widely on individual loans for consumptive purposes. All of the research focuses on consumptive loans. However, we were unable to locate any documentation on the use of credit scoring for productive loans using group lending schemes. Likewise, this study (Caire et al., 2006) found the same results.

According to published research, group lending interactions among group members strongly influence repayment behavior. Scoring is more challenging because it is unlikely that measured features can accurately predict the result of these interactions (Schreiner, 2003).

The aim of this paper is try to create credit default prediction model for microfinance group lending scheme using the latest advancement in machine learning. Then we evaluate the effectiveness of various machine learning algorithms for microfinance credit scoring problems in group lending schemes. There are six models being reviewed here: XGBoost, logistic regression, decision trees, random forest, KNN and LDA. By using accuracy, and Area under Receiver Operating Characteristic (AUC ROC) performance measures, the best accurate model for this data will be compared and analyzed.

2. Literature Review

2.1 Credit Risk

Credit risk refers to the possibility that a loan may not be repaid. It relates to the risk that a lender's cash flows would be halted if a borrower doesn't pay back the pay loan principal or interest. Credit risks are determined by the borrower's capacity to repay a loan.. Lenders use the 5Cs to evaluate the creditworthiness of potential borrowers when determining the credit risk of a consumer loan. The method computes five borrower attributes and loan terms to assess the probability of default and, subsequently, the likelihood of a lender losing money. The five Cs of credit stand for character, capacity, capital, collateral, and conditions.

(4)

2.2 Credit Prediction

Credit risk prediction is an effective technique for assessing if a potential borrower will payback a loan, especially in peer-to-peer lending. Most companies utilize a form of credit scoring to evaluate a borrower's creditworthiness by analyzing borrower-supplied information, bureau data, and/or third-party data. Typically, the credit model uses quantitative indicators of previous loan history and loan attributes to forecast the future performance of loans with comparable characteristics.

2.3 Supervised Machine Learning

Supervised learning is a branch of machine learning that is also defined as supervised machine learning. It utilizes training examples data to help make predictions about new data. In supervised learning, the system is provided with labeled data throughout its training phase, instructing it on how to relate each input feature to output value. It is similar to learning while being observed by a teacher. Putting spam in a separate folder from your email is a typical example of how supervised learning is used in businesses.

Supervised learning can be further grouped into regression and classification. In this research we will use classification supervised machine learning. A classification algorithm attempts to categorize inputs into a predetermined number of groups or classes based on the labeled sample it was trained on. Classification algorithms can be utilized for binary classifications, such as classifying a credit card transaction as fraudulent or not and classifying the weather forecast for tomorrow as rainy or not.

2.4 Previous Studies

Among the many various statistical methods, logistic regression is most typically applied method in the business. Using logistic regression, it is simple to turn the coefficients into a credit scorecard. A credit scorecard may be designed and developed in a variety of ways, but a basic scorecard is made up of many features that are divided into intervals or groups, with each group providing a total of points. When predicting a applicant’s behavior, the information of the applicant are mapped into the scorecard, which produces a final score (Hovdenakk, 2021).

However, in some cases in predicting default, logistic regression might has low accuracy because it is incapable of effectively resolving cases that involve nonlinear and interacting effects of explanatory features (Lessmann et al., 2015).

Some cutting-edge machine learning methods, such support vector machines, have demonstrated greater prediction accuracy versus logistic regression) (Dong et al., 2010). Since the introduction of ensemble approaches, including bagging and boosting techniques, the performance of scoring models based on machine learning has grown dramatically (E.

Dumitrescu et al., 2022). A comprehensive analytical report examined 41 techniques using a variety of evaluation standards and datasets for credit scoring (Lessmann et al., 2015). They found evidence that the random forest method, which is the randomized variant of bagged decision trees, significantly outperforms logistic regression and has grown into one of the most important methods for credit prediction application (E.-I. Dumitrescu et al., 2020). A study (Li, 2019) discovered that XGBoost (extreme gradient boosting) performs significantly better in forecasting loan default than logistic regression. Another study (Tian et al., 2020) indicated that XGBoost outperformed both LDA and the logistic regression model, receiving the maximum score of 78% utilizing personal consumption loan data from a lending organizations.

(5)

3. Method

Figure 2: Research Method

This research's method is depicted in figure 2. It began with problem identification in the first chapter, continued with a review of the relevant literature, and listed the study questions and objectives. Data collection, exploration, and preliminary processing make up the bulk of this research. The data pre-processing step is comprised of multiple steps, including imputation, outlier removal, feature binning, and normalization. Before data can be input into an algorithm for machine learning, a feature selection process is performed to determine which features are essential and to exclude any irrelevant information. Steps of model creation include basic model development and class balancing. Finally, the resulting model is evaluated based on AUC score, accuracy, precision, and recall.

3.1 Data Collection

The sample dataset was taken from a microfinance institution (MFI) internal database that contains information on 16,1715 loans that were disbursed between January and August of 2021. A loan is considered to be a bad loan if it is ever more than 30 days past due; otherwise, it is a good loan. The good loan in the dataset is represented by 0 and the bad loan by 1, respectively. For the modeling process, this becomes a dependent variable.

3.2 Data Exploration

Data exploration process aims to examine and find the characteristics and initial patterns of a data. This step describes the distribution of each feature in the sample using data visualization and statistical methods. This method uses the seaborn and matplotlib Python modules. The categorical features will be presented using bar graph and numerical features using histogram graph.

3.3 Data Pre-processing

The obtained sample data contains noise, missing values, and some are in an unacceptable format, preventing their direct application to machine learning models. Data pre-processing is required for cleaning the data and preparing it for a machine learning modeling, that boost the model's accuracy and efficiency.

Data pre-processing start with handling missing data. It can significantly increase bias, make processing and analysis of the data more difficult, and reduce efficiency of several machine learning models. Discarding entire rows and/or columns with missing values is a primary technique for working with imperfect datasets. Another approach is to substitute the median or

(6)

mode for the missing value.

The next step is to check if there is any outlier in the sample. An item that sits apart from the other things is known as an outlier. They may result from observation or implementation mistakes. Examining the maximum, minimum, mean, and standard deviation statistics for each attribute is one technique to find the outliers. In the future, the outlier can be eliminated by simply excluding them from the sample data set.

Data binning is a method for minimizing the effect of minor observational biases. Initial data values are split up into several intervals (bins). Subsequently, the values are replaced with a generic value derived for that bin. This has an effect of flattening the input data, and also minimizes the chances of overfitting.

The objective of normalization is to scale features to a similar scale. This increases the efficiency and consistency of the model. Numerical feature values can be normalized by changing them from their native range into a standard range. This process is known as max- min normalization. The minimum and maximum values of that feature are changed into 0 and 1, respectively.

Several machine learning algorithms are unable to directly perform on categorical features.

Therefore, features must be transformed into numerical by encoding process. There are occasionally columns in datasets that contain categorical features, such as the gender, which has categorical values such as male and female. One-hot encoding will create additional columns, one for each unique value in the set of the categorical attribute. Therefore, whenever a male is present, the value in the male column will be 1, and the female column will be 0, and vice versa.

Multicollinearity usually results when there is a strong correlation between two or more predictor features. In other words, the features are redundant and it is able to determine one feature using the other as a predictor. Some machine learning model findings become skewed as a result of the duplicate information this produces. A technique for calculating the variance inflation factor (VIF) is used to assess the degree of multicollinearity among independent features.

3.4 Feature Selection

As a dimensionality reduction strategy, feature selection seeks to pick a small subset from the initial dataset by eliminating pointless, redundant, or noisy features. In most cases, feature selection can result in improved learning performance, increased accuracy, and reduced computing time. There are two types of feature selection techniques: supervised, which can be used to labeled data, and unsupervised, which can be applied to unlabeled data.

3.5 Model Development 3.5.1 XGBoost

Extreme Gradient Boosting, or XGBoost is an effective open-source implementation of the gradient boosting technique. Gradient boosting is a machine learning method that aims to correctly estimate a target variable by aggregating the results of a series of weaker, simpler models. It was intended to be highly effective, more powerful than existing variants, and computationally efficient (Chen & Guestrin, 2016). Although it was initially created in C++, it provides an API in a number of other programming languages. This study utilizes the XGBoost package available in Pyhton’s scikit-learn module.

(7)

3.5.2 Logistic Regression

Logistic regression is a classification method that predicts categorical dependent variable using a given set of independent features. Consequently, the dependent variable is binary, with data labeled as either 1 (representing yes) or 0 (representing no). A logistic regression model statistically predicts P(Y=1) as a function of X. It is one of the fundamental machine learning methods that can be used for a variety of classification problems.

3.5.3 Decision Tree

Decision Tree is a type of supervised machine learning in which input is partitioned continuously based on given criteria (Wu, 2022). A tree is made from decision nodes and leaves. The leaves represent decisions and outcomes, and the decision nodes are where data is divided. For classification problems, the algorithm begins at the tree's root node. Decision tree compares the attribute value of the subsequent node with those of its sub-nodes. The procedure is iterated until the leaf node of the tree is reached. The logic underlying the decision tree is easily grasped due to its tree-like shape.

3.5.4 Random Forest

Random Forest is a supervised machine learning that takes the average of a collection of decision trees on various subsets of a data set to enhance the predicting performance (Wu, 2022). Instead of depending on a single decision tree, the random forest algorithm considers the results from each tree and estimates the overall result using the majority votes from each tree. Increasing the number of trees improves the outcome's accuracy. A random forest method eliminates the restrictions of a decision tree. It decreases dataset overfitting and boosts precision.

3.5.6 K-nearest Neighbors

The KNN algorithm assumes that related elements are located nearby. In other words, similar objects are close together. The KNN method considers the similarity between the new case/data. The KNN method maintains all of the data that is available and classifies incoming data points based on how similar they are to data that has already been saved. This indicates that when new data becomes available, it may be quickly sorted into a suitable category. KNN is applicable to both classification and regression problems (Zhang, 2016).

3.5.7 Linear Discriminant Analysis (LDA)

LDA is a technique for dividing data that involves discovering connections between the high- dimensional data and the learner line. Despite its simplicity, LDA frequently yields solid, respectable, and understandable classification outcomes. LDA is frequently used as a baseline before utilizing other, more complex and flexible classification methods. Ronald A. Fisher invented the Linear Discriminant, which has some practical applications as a classifier (Fisher, 1936).

3.6 Model Evaluation 3.6.1 Confusion Matrix

The confusion matrix is a tool that describes the performance of a classification algorithm. It consists of true positive (TP), true negative (TN), false positive (FP), and false negative (FN) as shown in figure 3. True Positive represents the number of borrowers who are actually default and classified as default. True Negative shows the number of borrowers who are actually not default and classified as not default. False Positive is the number of borrowers who are actually not default and classified as default. False Negative is the number of borrowers who are actually default and classified as not default.

(8)

Figure 3: Confusion Matrix

3.6.2 Accuracy, Precision and Recall

Accuracy is the number of correctly classified data over the total number of data.

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃 + 𝑇𝑁 𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁 + 𝑇𝑁

Precision is the ratio of correctly predicted positive samples to the total predicted positive sample.

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃 𝑇𝑃 + 𝐹𝑃 Recall is the proportion of positives to all positive samples.

𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃 𝑇𝑃 + 𝐹𝑁

3.6.3 AUC ROC

The AUC (Area Under The Curve) ROC (Receiver Operating Characteristics) represents the degree or level of separability, or the model's ability to differentiate between classes. It measures the predictive power of a model (Hosmer et al., 2013). The quality of the model can be categorized based on the AUC score as shown in table 1.

𝐴𝑈𝐶 =1

2(1 + 𝑇𝑃

𝑇𝑃 + 𝐹𝑁− 𝐹𝑃 𝐹𝑃 + 𝑇𝑁

Table 1:

AUC Score Model Quality 0.5-0.7 Poor

0.7-0.8 Acceptable 0.8-0.9 Excellent

>0.9 Outstanding Source: (Hosmer et al., 2013)

4. Results

The sample data contains 16,1715 rows. The dependent variable is the "is_bad" column that indicates whether a loan is good (never late more than 30 days) or bad (ever more than 30 days late). The total of good loans is 15,642 and the bad loans is 1,073. There are 38 independent features, of which 22 are numerical and 16 are categorical features, as shown in Table 2 and 3 respectively.

(9)

Table 2: Numerical Features

Features Definition

Age The age of the borrowers when they applied for the loans Branch_active_borrower Total active borrowers in a branch where the new borrower

applies

Branch_attendance The number of borrowers attending weekly meetings Branch_dpd_30 Percentage of bad loans (late more than 30 days) Branch_installment Average installment amount of borrowers in the branch Branch_non_trade_portion Average non trade sectors to all sectors

Branch_limit_monthly Weekly average of existing borrowers’ loan size

Branch_join_liability Percentage of joint liability repayments of existing borrowers Expense Borrowers stated monthly expenses

Guarantor_income Income of the loan guarantor Other_income Other income stated borrowers

Area_dpd_30 The percentage of bad loans (due past date more >= 30 days) Branch_age Number of months since a branch first disbursed loans Branch_dpd_0 Total DPD 0 borrowers divided by total borrowers.

Branch_frequency Average repayment done by existing borrowers in a branch Branch_new_borrower_fo New monthly borrowers recruited by field officers

Branch_os Average outstanding amount of a branch

Branch_plafond_weekly Weekly average of existing borrowers’ loan size Branch_week Average repayment’s week of the existing borrowers Guarantor_age Age of the guarantor of the borrowers.

Income Borrowers stated income

Weekly_repayment The amount of weekly repayment submit by borrower

Table 3: Categorical Features

Features Definition

Business_type Type of business that the borrowers own Guarantor_address_flag Type of address

Guarantor_job Job of the loan guarantor

Guarantor_relationship Type of relationship between guarantor and borrower Loan_amount_submission The amount of the loan requested by the borrower Marital_status Marital status of the borrower

Type_of_gas Type of gas used by borrower

Type_of_flooring Type of flooring in used at borrower’s house

Education Borrower education

Guarantor_education Education of the loan guarantor Guarantor_marital_status Marital status of the loan guarantor Income_source Income of the borrowers

Loan_purpose The loan purpose stated by the borrower Sector Sector of the borrower’s business

Type_of_vehicle Type of vehicle own by borrower: car and motorcycle Has_fridge Whether or not the borrowers has a fridge in their house

There are some features that contain noise and outliers, as seen in the data exploration steps.

Therefore, data pre-processing is required to get the data ready before developing a model. The

(10)

guarantor address flag, income source, and expense fields have few rows with missing values.

the missing value in the features is replaced with mode. The outlier on the age features is also removed. All numerical features are normalized into the range of 0 - 1 using a min-max scaler.

And finally the categorical features were encoded using a one-hot encoder. This operation increases the number of columns, resulting in 118 columns in the new example data set.

Multicollinearity was tested using the measurement of the variance inflation factor (VIF). Some characteristics is removed since they have an extremely high VIF scores. The sample data set has 94 characteristics (columns) left after the process.

4.1 Initial Models

The sample then was split randomly into two parts: training dataset and testing dataset. Authors kept 20% of data as testing data and 80% for training. Machine learning models were trained using the training dataset, and they were later tested using the testing dataset.

The XGBoost performs the best, as seen by its 0.85 AUC score. It refers to the model capable of differentiating between the two classes. The model's accuracy is 0.97, indicating that it properly classifies data points 97% of the time. Decision tree and random forest produce similar results with AUC of 0.80, and accuracy of 0.95 and 0.97 respectively. K-nearest neighbors, linear discriminant analysis and logistic regression are considered as poor models with AUC lower than 0.7.

Table 4: Initial Model Performance Comparison

Model AUC Accuracy Precision Recall

XGBoost 0.85 0.97 0.86 0.71

Logistic Regression 0.67 0.75 0.14 0.58

Decision Tree 0.80 0.95 0.64 0.62

Random Forest 0.80 0.97 0.87 0.60

K-nearest Neighbors 0.64 0.94 0.66 0.29

Linear Discriminant Analysis 0.64 0.94 0.54 0.30

4.2 Class Balancing

The class in this sample data set is unbalanced since there are much more negative cases (good loans) than positive cases (bad loans). The data was therefore biased in favor of one class.

However, the majority of machine learning algorithms were created under the assumption that each class category would have a uniform distribution.

One of the popular techniques for unbalanced classification models is the balanced weight. In order to improve model performance, it adjusts the class weights of the majority and minority classes during the model training phase. As a result, class weights will be in charge of allocating equal weights to all categories.

Positive cases (bad loans) make up just 7% of the sample data. To put it another way, the ratio of positive to negative is 1:13.2. Thus, to balance the weights of both classes, the weight of the positive sample must be 13.2 times more than that of the negative samples. Following rebalancing, the model is trained and tested once more.

(11)

Table 5: Machine Learning Performance After Class Balancing

Model AUC Accuracy Precision Recall

XGBoost 0.89 0.97 0.72 0.79

Logistic Regression 0.67 0.75 0.14 0.58

Decision Tree 0.79 0.95 0.61 0.62

Random Forest 0.80 0.97 0.85 0.61

K-nearest Neighbors 0.76 0.86 0.28 0.65

Linear Discriminant Analysis 0.66 0.73 0.14 0.58

The performance of each applied machine learning algorithm after class balancing is shown in table 5. The AUC score of the XGBoost classifier is rising from 0.85 to 0.89 while its accuracy remains constant. There have been minimal changes to the logistic regression, decision tree, and random forest. However, AUC is improving for K-nearest neighbors and linear discriminant analysis while their accuracy is declining. Prior to class balancing, accuracy for KNN and LDA was 0.94; after class balancing, accuracy was 0.86 and 0.73 respectively.

All metrics in XGBoost are improving including AUC score, accuracy, and recall. Even though the precision is decreasing from 0.86 to 0.72, the recall is increasing from 0.71 to 0.79. In this case, the recall is more important than precision, since the ability to identify positive samples (bad loans) is more important than ability to correctly classify positive samples to a total number of classified positive samples. This model also has low false positive (2.05%) and false negative (1.38%) rates as shown in figure 4.

Figure 4: The Final XGBoost ML Model

5. Discussion

The last model selected is an XGBoost model with class balance, which outperforms all other machine learning models. It has an AUC of 0.89, which is considered to have excellent discrimination, and can distinguish between bad and good loans. The accuracy is likewise very good, at 0.97. According to other studies, eXtreme Gradient Boosting (XGBoost) performs better when compared to other techniques. Top programmers frequently use it in Kaggle competitions.

This study demonstrates that it is feasible to develop a model with strong predictive ability for microfinance group financing. On top of that, it is advised that the model be incorporated into the MFI's risk management system.

(12)

6. Conclusion

We employ supervised machine learning techniques to build a credit default prediction model.

The model is designed specifically for microfinance group lending schemes where members are women micro-entrepreneurs in a developing country, and jointly responsible for loan repayment.

The sample received from a microfinance institution. Firstly, it go to pre-processing step, where certain missing values are filled in with the median and mode, the outlier is eliminated, numerical characteristics are normalized using a min-max scaler, and categorical features are encoded. We apportion the data into training and test sets, with an 80% - 20% split. Training is used to train the machine learning model, whereas testing data is used to check the accuracy of the model. We examine six different machine learning methods, including XGBoost, linear discriminant analysis (LDA), decision trees, random forests, logistic regression, and k-nearest neighbour (KNN).

In the initial modeling stage, the XGBoost model performs the best. It outperforms other models with an AUC score of 0.85 and an accuracy of 0.97. Decision tree and random forest produce similar results with AUC of 0.80, and accuracy of 0.95 and 0.97 respectively. Other models resulting weaker performance.

Class balancing is carried out in an effort to improve performance. It successfully improved the performance of the XGBoost model to increase AUC from 0.85 to 0.89. It accuracy remains constant at 0.97. This model has low false positive (2.05%) and false negative (1.38%) rates.

Hence, it has excellent capability to distinguish bad and good loan.

References

AC Ventures. (2022). 2022 the Coming of Age of Indonesia’s Fintech Industry - AC Ventures.

https://acv.vc/fintech-indonesia-2022/

Caire, D., Barton, S., de Zubiria, A., Alexiev, Z., Dyer, J., Bundred, F., & Brislin, N. (2006).

A HANDBOOK FOR DEVELOPING CREDIT SCORING SYSTEMS IN A MICROFINANCE CONTEXT. www.microLINKS.org.

Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13-17-August-2016, 785–794. https://doi.org/10.1145/2939672.2939785

Dong, G., Lai, K. K., & Yen, J. (2010). Credit scorecard based on logistic regression with random coefficients. Procedia Computer Science, 00, 0–000.

https://doi.org/10.1016/j.procs.2010.04.278

Dumitrescu, E., Hué, S., Hurlin, C., & Tokpavi, S. (2022). Machine learning for credit scoring:

Improving logistic regression with non-linear decision-tree effects. European Journal of Operational Research, 297(3), 1178–1192. https://doi.org/10.1016/J.EJOR.2021.06.053 Dumitrescu, E.-I., Hué, S., Hurlin, C., & tokpavi, sessi. (2020). Machine Learning or

Econometrics for Credit Scoring: Let’s Get the Best of Both Worlds. SSRN Electronic Journal. https://doi.org/10.2139/SSRN.3553781

Fisher, R. A. (1936). THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS. Annals of Eugenics, 7(2), 179–188. https://doi.org/10.1111/J.1469- 1809.1936.TB02137.X

García, V., Marqués, A. I., & Sánchez, J. S. (2012). Non-parametric statistical analysis of machine learning methods for credit scoring. Advances in Intelligent Systems and

(13)

Computing, 171 AISC, 263–272. https://doi.org/10.1007/978-3-642-30864-2_25/COVER Grameen Bank. (2015). Grameen bank’s cumulative loan disbursement since inception crosses the threshold of BDT. https://demo.grameenbank.org/wp-content/uploads/bsk-pdf- manager/GB-2015_33.pdf

Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied Logistic Regression: Third Edition. Applied Logistic Regression: Third Edition, 1–510.

https://doi.org/10.1002/9781118548387

Hovdenakk, A. H. (2021). Machine learning vs logistic regression in credit scoring: A trade- off between accuracy and interpretability? https://bora.uib.no/bora- xmlui/handle/11250/2762661

Imai, K. S., Gaiha, R., Thapa, G., & Annim, S. K. (2012). Microfinance and Poverty—A Macro

Perspective. World Development, 40(8), 1675–1689.

https://doi.org/10.1016/J.WORLDDEV.2012.04.013

Lessmann, S., Baesens, B., Seow, H. V., & Thomas, L. C. (2015). Benchmarking state-of-the- art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124–136. https://doi.org/10.1016/J.EJOR.2015.05.030 Li, Y. (2019). Credit risk prediction based on machine learning methods. 14th International

Conference on Computer Science and Education, ICCSE 2019, 1011–1013.

https://doi.org/10.1109/ICCSE.2019.8845444

Littlefield, E., MORDUCH, J., & HASHEM, S. (2003). Is Microfinance an Effective Strategy to Reach the Millennium Development Goals? CGAP FocusNote No 24.

Loiseau-Aslanidi, O., Thiagarajah, N. S., & Tolstova, V. (2020). Automating Interpretable Machine Learning Scorecards. www.economy.comwww.moodysanalytics.com

PwC. (2019). Indonesia’s Fintech Lending: Driving Economic Growth through Financial

Inclusion - Executive Summary.

https://www.pwc.com/id/en/fintech/PwC_FintechLendingThoughtLeadership_Executive Summary.pdf

Rizyameza, A. (2020). Credit Scorecard Implementation in Agricultural Peer-to-peer (P2P) Lending: A Case of PT Berkembang.

Samer, S., Majid, I., Rizal, S., Muhamad, M. R., Sarah-Halim, & Rashid, N. (2015). The Impact of Microfinance on Poverty Reduction: Empirical Evidence from Malaysian Perspective.

Procedia - Social and Behavioral Sciences, 195, 721–728.

https://doi.org/10.1016/J.SBSPRO.2015.06.343

Schreiner, M. (2003). SCORING: THE NEXT BREAKTHROUGH IN MICROCREDIT?

Building financial systems that work for the poor SCORING: THE NEXT BREAKTHROUGH IN MICROCREDIT? CGAP, 7.

Tian, Z., Xiao, J., Feng, H., & Wei, Y. (2020). Credit Risk Assessment based on Gradient Boosting Decision Tree. Procedia Computer Science, 174, 150–160.

https://doi.org/10.1016/J.PROCS.2020.06.070

Urs, S., & Lehner, M. (2009). Group Lending versus Individual Lending in Microfinance.

www.sfbtr15.de

Vidal, R. L., & Agustí, J. S. (2018). Microcredit in the developed countries: the case of Barcelona. https://ec.europa.eu/migrant-integration/sites/default/files/2019-10/EWI05- Microcreditinthedevelopedcountries_thecaseofBarcelona.pdf

Wu, W. (2022). Machine Learning Approaches to Predict Loan Default. Intelligent Information Management, 14(5), 157–164. https://doi.org/10.4236/IIM.2022.145011

Zhang, Z. (2016). Introduction to machine learning: k-nearest neighbors. Annals of Translational Medicine, 4(11). https://doi.org/10.21037/ATM.2016.03.37