Jurnal Teknik Informatika dan Sistem Informasi ISSN 2407-4322
Vol. 10, No. 1, Maret 2023, Hal. 1010-1018 E- ISSN 2503-2933 1010
http://jurnal.mdp.ac.id [email protected]
Clustering Countries According To The Logistics Performance Index
M. Mujiya Ulkhaq
Department of Industrial Engineering, Diponegoro University, Indonesia e-mail: [email protected]
Abstract
This study aims to cluster countries according to the logistics performance index (LPI) 2018 data. LPI is one of the indicators for measuring logistics sector performance based on a survey developed by the World Bank since 2007 and has been widely accepted throughout the world. There are 160 countries involved in organizing the 2018 edition of the LPI. It helps countries to understand their current position and develop strategies and policies to improve their performance in world trade. Three clustering algorithms (i.e., k-means, k-medoids, and clustering large applications) are used. To obtain the optimal number of clusters, the elbow method is used. According to the elbow method, the optimal cluster is three for those three clustering algorithms. The countries belong to the first cluster are considered the best performers, while countries in the third cluster are the worst performance in terms of logistics performance. This study is expected to give an insight into how to implement clustering algorithms into the real-world data set and how to interpret the results.
Keywords—clustering, clustering algorithm, cluster validation, logistics performance index
1. INTRODUCTION
According to the Council of Supply Chain Management Professionals, logistics refers to
“the process of planning, implementing, and controlling procedures for the efficient and effective transportation and storage of goods including services, and related information from the point of origin to the point of consumption for the purpose of conforming to customer requirements. This definition includes inbound, outbound, internal, and external movements”
[1]. Logistics is one of the key elements of trade [2], and logistics performance significantly influences the volume of bilateral trade. This can increase competitiveness not only for companies, but also for countries, which are increasingly realizing the importance of logistics in world trade [3]. Consequently, these conditions create the need to develop specific measurement systems for logistics performance, and strategies to advance a country’s performance.
The logistics performance index (LPI) is one of the indicators for measuring logistics sector performance based on a survey developed by the World Bank since 2007 and has been widely accepted throughout the world (there are 160 countries involved in organizing the 2018 edition of the LPI). The LPI is a powerful tool for countries to compare and assess their logistics performance on a global platform, and to understand logistics challenges and areas for improvement [4]. LPI helps countries to understand their current position and develop strategies and policies to improve their performance in world trade.
LPI is increasingly being used by political authorities to develop strategies [5]. For example, the European Commission has used LPI in its Transport Evaluation Panel, and its performance evaluation at the Customs Union [5]. In this regard, various international transport associations and institutions support the World Bank in preparing and implementing the LPI
A
1011 Jatisi ISSN 2407-4322 Vol. 10, No. 1, Maret 2023, Hal. 1010-1018 E-ISSN 2503-2933
M. Mujiya Ulkhaq, et., a; [Clustering Countries According To The Logistics Performance Index]
survey [6]. It is therefore recognized that an increase in a country’s LPI score means a greater volume of global trade [2], [3], [6], [7]. In Indonesia, LPI is formally used to measure the performance of the Ministry of Trade and is also used by the Asia-Pacific Economic Cooperation organization to measure the impact of initiatives to improve connectivity in supply chains [8].
This research aims to cluster countries according to the LPI 2018 data. Clustering is a process of classifying data—which have features—into clusters. It has been addressed in many contexts and by researchers in many disciplines, such as in marketing, e.g., [9], [10], [11], [12], biology, e.g., [13], [14], image processing, e.g., [15], [16], psychology, e.g., [17], [18], and even in clustering the happiness countries [19], [20]. Many different types of clustering algorithms have been proposed in the literature. In this research, three clustering algorithms are presented and then compared to look for ‘‘the best’’ way to partition the countries.
This research employed R, a programming language for statistical computing and graphics. It is motivated by the recognition of R in the field of statistics, data mining, and machine learning; and also, by the aid of its well-established clustering packages. This study is also intended to assist researchers who have programming skills in R language but have little experience in clustering data.
2. LOGISTICS PERFORMANCE INDEX
Logistics performance index (LPI) is one of the indicators that assesses the logistics performance of a country issued by the World Bank. The LPI is based on a survey conducted of logistics professionals in the countries surveyed regarding their perceptions of the logistics performance of the country concerned. There are six indicators in the LPI, namely:
1. Customs: efficiency of customs and border management permits,
2. Infrastructure: the quality of infrastructure related to trade and transportation,
3. International shipments: the ease of arranging the delivery of goods at competitive prices, 4. Logistics quality: competence and quality of logistics services,
5. Tracking and tracing: the ability to track and trace shipments, and
6. Timeliness: the frequency of delivery of goods that can reach the recipient of the goods within the scheduled or expected time.
The six LPI indicators are divided into two main categories, namely: (i) the area of policy regulation, which is the input of the supply chain (i.e., customs, infrastructure, and logistics quality), and (ii) the results of supply chain performance, which is the output (i.e., timeliness, international shipments, and tracking and tracing).
The LPI value is built from these six indicators using principal component analysis (PCA). Scores were normalized by subtracting the sample mean and dividing by the standard deviation before performing PCA. The result of the PCA is a single indicator, namely the weighted average of the six indicators. The weights were chosen to maximize the percentage of variation in the six LPI indicators. In the 2018 edition of LPI, the weights of the six indicators are as follows: customs = 0.40; infrastructure = 0.42; international shipments = 0.40; logistics quality = 0.42; tracking and tracing = 0.41; timeliness = 0.40 [21].
In the latest edition of the LPI report (2018), Indonesia is in the 46th position with an overall LPI score of 3.15. This is Indonesia’s best value from all LPI editions. This position is better from the previous edition of LPI (2016), where at that time the overall value is 2.98 and is in the 63rd position. In the 2014 edition of LPI, Indonesia got an overall score of 3.08 and was in the 53rd position. In the 2012 edition of LPI, Indonesia got an overall score of 2.94 and was in the 59th position. In the 2010 edition of LPI, Indonesia got an overall score of 2.76 and was
Jatisi ISSN 2407-4322
Vol. 10, No. 1, Maret 2023, Hal. 1010-1018 E- ISSN 2503-2933 1012
M. Mujiya Ulkhaq, et., a; [Clustering Countries According To The Logistics Performance Index]
in the 75th position. In the first edition of LPI (2007), Indonesia got an overall score of 3.01 and was in the 43rd position.
3. DATA
The data set used in this research is adopted from LPI of the World Bank (https://lpi.worldbank.org/). The data consists of the LPI score, including the lower and upper bound, scores for each indicator of LPI (i.e., customs, infrastructure, international shipments, logistics quality, tracking and tracing, as well as timeliness), as well as countries’ ranks according to the LPI’s overall value and the values for each indicator. There are 160 countries in LPI 2018.
4. METHODOLOGY
The basic steps in the clustering process can be summarized as follows (see Figure 1).
1. Data cleansing and imputation
Real-world databases often contain errors (trivial or non- trivial, syntactic or semantic) and missing values. Data preprocessing might be necessary-to ensure the information is consistent, accurate, and high-quality-prior to their utilization in clustering analysis. Refer to the previous section to recall the process of data cleansing and imputation in this research.
2. Feature selection
This step aims to choose proper features on which clustering analysis is to be conducted. In this research, three indicators of LPI as inputs (i.e., customs, infrastructure, and logistics quality) are used as features.
3. Clustering analysis
It refers to the choice of clustering algorithms. Several clustering algorithms have been proposed by scholars. Obviously, it is not possible to present and review all the algorithms;
instead, in the following subsection, only algorithms used in this study will be presented.
4. Cluster validation
Once clusters have been obtained by performing clustering algorithms, such a question could arise: “How well do the obtained clusters fit the data set?” The question is essential since several different clustering algorithms (or different configurations of similar clustering algorithm) could generate different clusters; thus, one could analyze different clustering algorithms and choose the algorithm that best fits the data.
5. Interpretation
In several cases, experts and professionals in the field of application somehow have to integrate the result obtained from the clustering algorithm with other analyses or experimental evidence to draw a correct conclusion as well as gain insightful knowledge.
1013 Jatisi ISSN 2407-4322 Vol. 10, No. 1, Maret 2023, Hal. 1010-1018 E-ISSN 2503-2933
M. Mujiya Ulkhaq, et., a; [Clustering Countries According To The Logistics Performance Index]
Figure 1. Steps of The Clustering Process Source: [20]
4.1 Clustering Algorithms
Scholars have proposed several different types of clustering algorithms. In addition, several taxonomies to structure those different types of algorithms were available, see e.g., [16], [22]. In this subsection, three clustering algorithms used in this study are presented. The following is a brief explanation for each algorithm used in this research.
1. k-means
k-means by [23] is arguably the most broadly clustering algorithm used in literature due to the computational speed and its simplicity. It requires a distance matrix and a number of clusters (k). Initially, each object or observation is connected with one cluster according to its distance to the centroid or cluster centre. The objective of this algorithm is to minimize the average squared distance between observations in the same cluster. The predefined number of clusters is one of the main limitations of this algorithm since the final clusters depend on the choice of the number of the number of clusters. Moreover, k-means is considered as sensitive to the initial seed selection.
2. k-medoids
This algorithm is also called as partitioning around medoid (PAM). This algorithm was proposed by [24]. A medoid can be defined as the representative of the objects in the cluster, whose dissimilarities with all the other objects in the cluster is minimum. It is considered as a less sensitive (or robust) alternative to k-means algorithm since k-medoids uses medoids as centroids as an alternative of means which is used in k-means.
3. Clustering large applications (CLARA)
The algorithm by [25] is an extension to k-medoids which deals with huge data (having more than several thousand objects or data points). This extension aims to reduce storage problems and computational time. Instead of identifying all medoids for all data set, the algorithm considers only a small sample of the data with fixed size. Consequently, k-medoids algorithm is applied to look for an optimal number of medoids for the predefined sample.
CLARA repeats the sampling and clustering processes a pre-specified number of times to minimize sampling bias.
4.2 Clustering Performance Evaluation
The method for evaluating the performance of clustering algorithms is called cluster validation; it regards as one of the most central concerns in clustering analysis [26]. There are two criteria proposed for cluster validation, i.e., compactness (or cohesion) and separation. The former means that the members of each partition should be as close as possible to each other;
and the later implies that the partitions should be widely spaced. Validity measures used for assessing the performance of the algorithms with respect to those previous two criteria can be classified into relative, internal and external validation.
Jatisi ISSN 2407-4322
Vol. 10, No. 1, Maret 2023, Hal. 1010-1018 E- ISSN 2503-2933 1014
M. Mujiya Ulkhaq, et., a; [Clustering Countries According To The Logistics Performance Index]
Relative validation assesses the clustering by changing different parameter values for the same algorithm (for instance, changing the number of clusters k). It is commonly used for investigating the optimal number of clusters. Internal validation is according to the information inherent to the data set and assesses the quality of the cluster algorithm without any external information. Conversely, the external validation measures the similarity between the clustering algorithm’s result and the ‘‘correct’’ partitioning (or the ground truth label) of the data set.
Since the ground truth label is unavailable (this study used the real data set, not artificial data set), only relative and internal validations were used here. In this study, the elbow method is used as a relative validation. It is performed by running the particular algorithm several times with a rising number of cluster k. Its sum of squared errors is then calculated and plotted against the number of clusters k. If the plot seems like an arm, then the ‘‘elbow’’ of the arm corresponds to the optimal number of clusters.
5. RESULTS AND DISCUSSION
This section describes the results of the clustering based on the algorithms previously mentioned. The first algorithm used is k-means. kmeans function in R is used (in cluster package). In R, the format is kmeans(df,centers), where df is the dataset and centers is the predefined number of clusters. In this study, the elbow method is used to investigate the optimal number of clusters. To run the elbow method, the function of fviz_nbclust in factoextra package in used. The format is fviz_nbclust(df,kmeans, method=”wss”). wss is the total within-cluster sum of square, which is measures the compactness of the clustering and we want it to be as small as possible. The elbow graph for k- means is shown in Figure 2 (a). Note that the curve is plotted in solid line, while the dotted line connects the start and end points of the curve, and the dashed line is orthogonal to the dotted line that crosses the curve, maximizing the distance between the dashed line and the solid curve.
It gives the optimal number of clusters = 3. The cluster membership is shown in the Appendix.
The centroid of each cluster is shown in Table 1.
(a) k-means (b) k-medoids
(c) CLARA
Figure 2. The elbow graphs
1015 Jatisi ISSN 2407-4322 Vol. 10, No. 1, Maret 2023, Hal. 1010-1018 E-ISSN 2503-2933
M. Mujiya Ulkhaq, et., a; [Clustering Countries According To The Logistics Performance Index]
Table 1. The Cluster Centres (or Medoids) of Each Algorithm
Algorithms and Clusters Customs Infrastructure Logistics Quality k-means:
• Cluster 1
• Cluster 2
• Cluster 3
• 3.681
• 2.843
• 2.277
• 3.927
• 2.963
• 2.230
• 3.883
• 3.050
• 2.368 k-medoids:
• Cluster 1
• Cluster 2
• Cluster 3
• 3.621
• 2.705
• 2.316
• 3.840
• 2.800
• 2.160
• 3.800
• 2.856
• 2.330 CLARA:
• Cluster 1
• Cluster 2
• Cluster 3
• 3.631
• 2.625
• 2.167
• 4.021
• 2.767
• 1.995
• 3.919
• 2.838
• 2.279
Figure 3. Map Of Cluster Membership According to The K-Means Algorithm
Next, k-medoids (PAM) is performed by means of the function pam in cluster package. The format is pam(df,centers). By also employing the elbow graph, the optimal number of clusters is found, i.e., 3. The elbow graph for k-medoids is shown in Figure 2 (b). The cluster membership is shown in the Appendix. The medoid of each cluster is presented in Table 1. The function clara in cluster package is used for identifying cluster membership in CLARA algorithm. The format is clara(df,centers). By also employing the elbow graph, the optimal number of clusters is identified, i.e., 3 (see Figure 2 (c)). The cluster membership is shown in the Appendix and the medoid of each cluster is presented in Table 1.
Providing users with meaningful insights from the original data could be considered as the ultimate goal of clustering analysis. It allows users to effectively solve the problems they face. This subsection would discuss how to interpret the algorithm result as we can gain some insights and knowledge. The k-means algorithm is chosen to be analyzed. Note that it does not
Jatisi ISSN 2407-4322
Vol. 10, No. 1, Maret 2023, Hal. 1010-1018 E- ISSN 2503-2933 1016
M. Mujiya Ulkhaq, et., a; [Clustering Countries According To The Logistics Performance Index]
make k-means the best algorithm among others since the decision is rather arbitrary. Map of cluster membership is shown in Figure 3. Each cluster’s characteristic will be discussed as follows.
The first cluster consists of 27 countries, from four regions: 16 from Europe, seven from Asia, two from America, and two from Australia and Oceania. This cluster has all features that make it the best performers in terms of logistics performance. The second cluster consists of 45 countries, from four regions: 16 from Europe, 15 from Asia, 8 from America, and 6 from Africa. Countries belong to this cluster are considered as the “middle-class” in terms of logistics performance. Lastly, the last cluster consists of 88 countries, the highest value among all clusters. Countries belong to this cluster are considered as the worst performance in terms of logistics performance.
6. CONCLUSION
This research has demonstrated how to do clustering using the LPI data of 2018. The features used as the basis for clustering are customs, infrastructure, and logistics quality. Three clustering algorithms are selected in this study (i.e., k-means, k-medoids, and CLARA). Note that there is no the best clustering algorithm. k-means was selected arbitrarily as a representative of the algorithm to show the interpretation of its result. The (selected) final clustering contains three clusters whose characteristics are described in Section 4. It can be arguably inferred that the first cluster has the best performers countries while the third cluster is the worst in terms of logistics performance.
REFERENCES
[1] Council of Supply Chain Management Professionals, 2013, Supply Chain Management
Terms and Glossary. Available in:
https://cscmp.org/CSCMP/Academia/SCM_Definitions_and_Glossary_of_Terms/CSCM P/Educate/SCM_Definitions_and_Glossary_of_Terms.aspx?hkey=60879588-f65f-4ab5- 8c4b-6878815ef921
[2] Martí, L., Puertas, R., & García, L. 2014, The Importance of The Logistics Performance Index In International Trade. Applied Economics, 46(24), 2982-2992.
[3] Hausman, W. H., Lee, H. L., & Subramanian, U. 2013, The Impact of Logistics Performance On Trade. Production and Operations Management, 22(2), 236-252.
[4] Gogoneata, B. 2008, An Analysis of Explanatory Factors of Logistics Performance of A Country. The Amfiteatru Economic Journal, 10(24), 143-156.
[5] das Chagas, H. X., de Moura, V. A., de Oliveira, R. M. N., de Macedo Ferreira, N., &
Akabane, G. K. 2018, Brazilian Foreign Trade: A Logistics Performance Index Analysis Into The Global Environment. In POMS International Conference, December 10-12, 2018, Rio de Janeiro, Brazil.
[6] Çemberci, M., Civelek, M. E., & Canbolat, N. 2015, The Moderator Effect of Global Competitiveness Index On Dimensions of Logistics Performance Index. Procedia-social and behavioral sciences, 195, 1514-1524.
1017 Jatisi ISSN 2407-4322 Vol. 10, No. 1, Maret 2023, Hal. 1010-1018 E-ISSN 2503-2933
M. Mujiya Ulkhaq, et., a; [Clustering Countries According To The Logistics Performance Index]
[7] Ekici, Ş. Ö., Kabak, Ö., & Ülengin, F. 2016, Linking to Compete: Logistics and Global Competitiveness Interaction. Transport Policy, 48, 117-128.
[8] Göçer, A., Özpeynirci, Ö., & Semiz, M. 2022, Logistics Performance Index-Driven Policy Development: An Application to Turkey. Transport Policy, 124, 20-32.
[9] Minako F. S., Ulkhaq M. M., ‘Sa Nu D., Pratiwi A. R. A., Akshinta P. Y. 2019, Clustering Internet Shoppers: An Empirical Finding From Indonesia. In Proceedings of The 5th International Conference on E-business and Mobile Commerce, 35-39.
[10] Ulkhaq M. M., Fidiyanti F., Adyatama A., Maulani Z. A., & Nugroho A. S. 2019, Segmentation of Cinema Audiences: An Empirical Finding From Indonesia. In Proceedings of The 2nd International Conference on Data Storage and Data Engineering, 3-8.
[11] Utami A. A., Ginanjar A. R., Fadlia N., Lubis I. A., & Ulkhaq M. M. 2019, Using Shopping and Time Attitudes To Cluster Food Shoppers: An empirical finding from Indonesia. Journal of Physics: Conference Series, 1284, 012005.
[12] Susanty, A., Akshinta, P. Y., Ulkhaq, M. M., & Puspitasari, N. B. 2021, Analysis of The Tendency of Transition Between Segments of Green Consumer Behavior With A Markov Chain Approach. Journal of Modelling in Management.
[13] Kapourani C.A., Sanguinetti G. 2019, Melissa: Bayesian Clustering and Imputation of Single-Cell Methylomes. Genome Biology, 20, 61.
[14] Wang J., Li M., Deng Y., & Pan Y. 2010, Recent Advances In Clustering Methods For Protein Interaction Networks. BMC Genomics, 11(S3), S10.
[15] Cai D., He X., Li Z., Ma W.Y., Wen J.R. 2004, Hierarchical Clustering of WWW Image Search Results Using Visual, Textual and Link Information. In Proceedings of the 12th Annual ACM International Conference on Multimedia, 952-959.
[16] Jain A.K., Murty M.N., Flynn P.J. 1999, Data Clustering: A Review. ACM Computing Surveys, 31, 264-323.
[17] Brusco M.J., Steinley D., Stevens J., Cradit J.D. 2019, Affinity Propagation: An Exemplar- Based Tool For Clustering In Psychological Research. British Journal of Mathematical and Statistical Psychology, 72, 155-182.
[18] Van Lettow B., Vermunt J.K., de Vries H., Burdorf A., van Empelen P. 2013, Clustering of Drinker Prototype Characteristics: What Characterizes The Typical Drinker? British Journal of Psychology, 104, 382-399.
[19] Ulkhaq, M. M. 2021, Clustering Countries According To The World Happiness Report.
Statistica & Applicazioni, XVIII(2), 197-220.
[20] Ulkhaq, M. M. & Adyatama, A. 2020, Clustering Countries According To The World Happiness Report 2019. Engineering and Applied Science Research, 48(2), 137-150.
Jatisi ISSN 2407-4322
Vol. 10, No. 1, Maret 2023, Hal. 1010-1018 E- ISSN 2503-2933 1018
M. Mujiya Ulkhaq, et., a; [Clustering Countries According To The Logistics Performance Index]
[21] Arvis, J.-F., Ojala, L., Shepherd, B., Raj, A., Dairabayeva, K., Kiiski, T. 2018, Connecting to Compete 2018 Trade Logistics In The Global Economy: The Logistics Performance Index and Its Indicators. Washington, DC: World Bank.
[22] Xu R., Wunsch II D. 2005, Survey of Clustering Algorithms. IEEE Transactions on Neural Networks, 16, 645-678.
[23] MacQueen J. 1967, Some Methods For Classification and Analysis of Multivariate Observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, 281-297.
[24] Kaufman L., Rousseeuw P.J. 1990, Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, Hoboken.
[25] Kaufman L., Rousseeuw P.J. 1986, Clustering Large Data Sets (With Discussion). In E.S.
Gelsema and L.N. Kanal (Eds.), Pattern Recognition in Practice II (pp. 425-437).
Elsevier, Amsterdam.
[26] Halkidi M., Batistakis Y., Vazirgiannis M. 2001, On Clustering Validation Techniques.
Journal of Intelligent Information Systems, 17, 107-145.