Collaborative Filtering with Dimension Reduction Technique and Clustering for E-Commerce Product

(1)

Collaborative Filtering with Dimension Reduction Technique and Clustering for E-Commerce Product

Daffa Barin Tizard Riyadi, Z K A Baizal^*

School of Computing, Informatics, Telkom University, Bandung, Indonesia Email: ¹[email protected], ^2,*[email protected]

Correspondence Author Email: [email protected]

Abstrak−Pesatnya perkembangan pengguna internet selama dekade terakhir menyebabkan peningkatan penggunaan e- commerce. Kesuksesan e-commerce dipengaruhi dengan adanya sistem pemberi rekomendasi. Collaborative Filtering (CF) adalah salah satu metode sistem pemberi rekomendasi yang paling sering digunakan. Namun dalam kasus riil, permasalahan sparsity sering terjadi. Hal ini umumnya disebabkan karena hanya sebagian kecil user yang memberikan rating terhadap item.

Dalam penelitian ini, penelitian ini menggunakan pemanfaatan metode clustering dan dimension reduction pada Amazon Review Dataset untuk mengatasi masalah sparsity. Metode clustering dengan K-Means digunakan untuk mengelompokkan user berdasarkan preferensi item. Sedangkan, metode Singular Value Decomposition (SVD) digunakan sebagai metode dimension reduction. Hasil penelitian menunjukkan bahwa perpaduan SVD dan K-Means ini berhasil untuk memprediksi rating dengan nilai RMSE lebih kecil dari 2, peningkatan performa yang signifikan dibandingkan penelitian sebelumnya. Penggunaan SVD terbukti mampu mengatasi sparsity, dengan penurunan RMSE sebesar 9.372%.

Kata Kunci: Sistem Pemberi Rekomendasi; Collaborative Filtering; Amazon Review Dataset; K-Means; Singular Value Decomposition

Abstract−The rapid development of internet users over the last decade has led to an increase in the use of electronic commerce (e-commerce). The existence of a recommender system influences the success of e-commerce. Collaborative Filtering (CF) is one of the most frequently used recommender system methods. However, in real cases, sparsity problems generally occur. This is generally caused because only a small number of users give ratings to items. In this study, we propose the combination of clustering and dimension reduction methods on the Amazon Review Data to overcome the sparsity problem. The clustering method with K-Means is used to group users based on item preferences. Meanwhile, we used Singular Value Decomposition (SVD) for dimension reduction to improve the performance of the recommender system in sparse data. The results show that the combination of SVD and K-Means is successful in predicting ratings with an RMSE value of less than 2, significant performance increase compared to previous study. The use of SVD is proven to be able to overcome sparsity, with a decrease in RMSE of 9.372%.

Keywords: Recommender System; Collaborative Filtering; Amazon Review Dataset; K-Means; Singular Value Decomposition

1. INTRODUCTION

Electronic commerce (e-commerce) has experienced rapid growth. Online shopping has become one of the habits of internet users worldwide. This is supported by the development of internet users over the last decade. As of January 2023, internet users reached more than 5 billion [1]. The recommender system has a vital role in the development and success of selling items in e-commerce which has been proven to help increase user satisfaction and form user loyalty [2] and other essential aspects such as income and growth [3].

Collaborative Filtering (CF) is one of the most frequently used methods in building recommender systems [4]. In its development, one of the CF methods is memory-based CF. Memory-based CF is divided into two methods, which are user-based and item-based. The main idea of the CF user-based method is to assume similar users have the same preference for an item [5]. Meanwhile, the main idea of the item-based method is to assume the user will buy an item that has the same characteristics as the item purchased before [5]. Another method is model-based CF by utilizing user ratings to train a machine-learning model to obtain user rating prediction results for an item [6]. Traditional machine learning models commonly used in CF-based models include decision trees, support vector machines, rule-based methods, regression models, bayes classifiers, and neural networks [7].

The main problem that often happens when using the CF method is the sparsity in rating items which causes poor recommendation results. Thus, large user and item data dimensions must be used [5]. However, high dimensional data can reduce CF based recommender system efficiency. The method that can be used to deal with the problem of high-dimensional data is to use clustering [8]. However, this method cannot deal with sparsity problems caused by a rating matrix, which is an incomplete relationship between users and items [5]. In real cases, it is common for users only give ratings to some items. One solution to the problem of sparsity in clustering results is to use the dimension reduction method by removing data characteristics that are not dominant to improve the performance of the recommender system.

Research on recommender systems has been carried out in the past few years. Several experiments have been done using variety of algorithms to create a recommender system in different domains. In [9] Fakhri, A. A.

et al. used user-based CF by using other users’ preferences for a certain item. Pearson similarity was used to calculate the similarity between users and predicting users’ rating. This study produces high MAE with value as high as MAE = 2.531. The cause of this is the sparsity problem in the dataset. In [10] Ayundhita, M. S. et al.

(2)

proposed a conversational recommender system based on functional requirements in e-commerce domain. This study shows recommender system perform better by prioritizing functional requirements compared to common recommender system used in e-commerce. In [11] Ardimansyah, M. I. et al. used matrix factorization at the preprocessing stage to deal with sparsity problems in the memory-based CF method. Ardimansyah, M. I. et al.

found that the sparsity level in the MovieLens dataset was 93.7%. In this study, sparse matrix that represents item ratings given by users in the case where user didn’t give item ratings will be represented as empty value. After the preprocessing, all the empty value in the sparse matrix will be filled with result of matrix factorization itself, thus turning the previous matrix into dense. The results obtained show a significant increase in recommender system performance after conducting matrix factorization in both user-based CF and item-based CF. In [6] Zarzour H. et al. creates a new approach to developing a CF recommender system by combining clustering and dimension reduction methods in MovieLens dataset. This is intended to improve the performance of the recommended system and overcome sparsity and cold-start problems, as well as make it easier to increase system scalability. K-Means were used to cluster user based on item preferences and creating a cluster center-item rating matrix that represents average rating of user in a cluster for each item. SVD were used to obtain decomposition matrices and later will be carried to calculate the similarity for each matrix. Furthermore, the similarity was used to predict the ratings.

This study shows the success of dimension reduction in improving the performance of the recommender system compared with KNN based and K-Means based recommender system. In [12] Stratigi, M. et al. compared the ratings datasets and review datasets in Amazon Review Data [13] by implementing Valence Aware Dictionary for sEntiment Reasoning (VADER) approach. The results of this study indicate low performance with RMSE > 2 in the recommender system that is made. This is due to the potential for inconsistency between user ratings and reviews.

From related study, it is proven by combining clustering and dimension reduction can overcome the sparsity problem. We propose a recommender system that combine clustering and dimension reduction methods to overcome the sparsity problem in the Amazon Review Data. K-Means and SVD are the algorithms that we use for user clustering and dimension reduction respectively. We used Amazon Review Data [13] in the five core Musical Instrument category as dataset, which contains user ratings for musical instrument items. It is expected that the system to be built can produce an accurate CF recommender system and solve problems that commonly occur in CF recommender systems in general.

2. RESEARCH METHODOLOGY

2.1 Research Stages

Figure 1 shows the process of building our proposed recommender system.

Figure 1. Recommender System Creation Flow

In this study, we built an e-commerce recommender system. To create a prediction for items, cluster- similarity matrix as shown below,

(

csm_1,1 csm_1,2 csm_1,3 csm_1,4 ⋯ csm_1,n csm_2,1 csm_2,2 csm_2,3 csm_2,4 ⋯ csm_2,n csm_3,1 csm_3,2 csm_3,3 csm_3,4 ⋯ csm_3,n csm_4,1 csm_4,2 csm_4,3 csm_4,4 ⋯ csm_4,n

⋮ ⋮ ⋮ ⋮ ⋱ ⋮

csm_n,1 csm_n,2 csm_n,3 csm_n,4 ⋯ csm_n,n)

needs to be created. Both rows and columns represent user cluster. Notation csm(a, b) represents value of the similarity between a-th cluster and b-th cluster. Building process of recommender system until item prediction is given is shown in Figure 1. First, dataset is preprocessed until user-item rating matrix is created. Based on user- item rating matrix, user clustering using K-Means is used to cluster users and transforming user-item rating matrix

(3)

into centroid-item rating matrix. In the dimension reduction stage, SVD is used to get reduced U matrix. Based on reduced U matrix, system will create cluster-similarity matrix in similarity calculation stage. Prediction for items will be provided by using equation (equation similarity) using cluster-similarity matrix.

2.2 Dataset

In this study we used 5-core Amazon Review Data [13] in the Musical Instruments category as the dataset. The dataset has 231392 ratings from 27530 users and 10620 items. Thus, the sparsity level of the data is 99.9% as shown in Figure 2, or 6% more sparse than the MovieLens dataset [11]. Before the preprocessing, the dataset consisted of several columns, according to Table 1.

Figure 2. User-Item Rating Matrix Sparsity Level Visualization Table 1. Dataset Column Description

Column Description

reviewerID User’s unique value asin Item’s unique value reviewerName User’s name

vote Rating’s helpfulness vote

style Product metadata

reviewText User’s reviews in text overall Item’s rating given by user summary User’s review conclusion unixReviewTime User rating’s unix time reviewTime User rating’s raw time image Images posted by users 2.3 Preprocessing

In the context of our proposed recommender system, only a few columns of data are needed, which are 'reviewerID', 'asin', and 'overall'. The researcher also only chose data users who had at least rated items 25 times [14]. After the preprocessing is complete, the dataset has 133797 ratings from 27014 users and 1769 items, with a sparsity level of 99.7%. After clean data is obtained, a user-item rating matrix is made according to Figure 3. In the user-item rating matrix, columns represent items, rows represent items, and r(m, n) represents a user's rating m on item n.

Figure 3. User-Item Rating Matrix 2.4 User Clustering

User clustering is used to cluster user based on item’s preferences with K-Means. K-Means is among the most used clustering method in recommender system industry's [6]. In making a recommender system, K-Means algorithm is used to group users in K size. Each containing users with similar interests in items in terms of ratings.

The main idea of the K-Means algorithm is to minimize a squared error function in the following equation Argmin

S

∑^k_i=1∑_x_j_∈S_i‖x_j− c_i‖² (1)

(4)

where {x₀, … , x_n} are sets of objects, S = {S₀, … , S_k} are groups of objects to minimize the sum of squares, and centroid c_j is the average of S_j. After users are grouped based on preferences for items, the next step is to transform the user-item rating matrix into a centroid-item rating matrix according to Figure 4.

Figure 4. Transformation of User-Item Matrix to Centroid-Item Rating Matrix

Likewise with the user-item rating matrix, in centroid-item rating matrix, columns represent items, rows represent clusters, and a_k,n represents the average user rating of items in cluster K.

2.5 Dimension Reduction

One way to solve the sparsity problem is to use dimension reduction [14]. SVD is one of the most successful and frequently used dimensional reduction techniques in recommender systems. In the dimension reduction paradigm, SVD is classified as a numerical method, in contrast to PCA, which uses an analytic approach. The main idea of SVD is to find feature spaces with lower dimensions. The SVD of a matrix A of size m × n can be expressed with following equation,

A_m,n= U_m,rS_r,rV_r,n^T (2)

The U matrix, m × r in size, with the notation r representing the r-th eigenvector of the matrix A^T× A [15].

S is a diagonal matrix of size r × r with a singular value of matrix A. Meanwhile, V^T is an orthogonal matrix of size r × n, which is a right singular vector with columns representing items [16]. Dimension reduction with SVD was made by using only (100 − t)% of the first column in the U matrix, (100 − t)% of the first rank in S, and (100 − t)% of the first row in V^T matrix. In our proposed recommender system to be built, only the U matrix was used for dimension reduction. The matrix below is U matrix before dimension reduction.

(

u_1,1 u_1,2 u_1,3 ⋯ u_1,r−1 u_1,r u_2,1 u_2,2 u_2,3 ⋯ u_2,r−1 u_2,r u_3,1 u_3,2 u_3,3 ⋯ u_3,r−1 u_3,r

⋮ ⋮ ⋮ ⋱ ⋮ ⋮

u_m,1 u_m,2 u_m,3 ⋯ u_m,r−1 u_m,r)

After dimension reduction was carried out, only (100 − t)% of the U column remained. The matrix below is reduced U matrix after dimension reduction is done.

(

u_1,1 u_1,2 u_1,3 ⋯ u1,⌊r(100−t)%⌋

u_2,1 u_2,2 u_2,3 ⋯ u2,⌊r(100−t)%⌋

u_3,1 u_3,2 u_3,3 ⋯ u3,⌊r(100−t)%⌋

⋮ ⋮ ⋮ ⋱ ⋮

u_m,1 u_m,2 u_m,3 ⋯ um,⌊r(100−t)%⌋) 2.6 Similarity Calculation

After the dimension reduction stage, the matrix similarity value was measured using cosine similarity (COS). COS is commonly used to measure the similarity between two vectors. COS is one of the most frequently used similarity calculation methods in recommender systems with the CF approach [6]. COS was used to measure the similarity between user clusters and create a cluster-similarity matrix. In accordance with equation Error! Reference source not found.) [17], the notations x and y represent the a-th cluster and the b-th cluster which are rows of the U matrix represented in vectors. The ‖a‖ and ‖b‖ notations represent the lengths of the a and b vectors, respectively.

(5)

csm(a, b) = ^a⋅b

‖a‖‖b‖ (3)

2.7 Rating Prediction

After obtaining the cluster-similarity matrix, the stages of predicting item ratings in a cluster were performed using the following equation [18].

r̂_c,i= r̅_c+^Σ^n∈N^csm^c,n^×|r^n,i^−r̅ⁿ^|

Σn∈N csmc,n (4)

with r̂_c,i denotes predicted rating for item i in the cluster c, r̅_c denotes the average rating of cluster c, N denotes the neighbors' cluster vector of cluster c, sim_c,n denotes the similarity of cluster c, r_n,i denotes item rating i from cluster n, and r̅_n represents the average rating of neighbor cluster vector n.

3. RESULT AND DISCUSSION

In this section we will explain the stages of making a recommender system by providing sample data for each stage. The recommender system that we built has been tested 5 times with different K values where K = {10,20,30,40,50}. The dimension reduction process with SVD was tested 3 times with different t values where t = {10,20,30}, which are represented by K-Means-10-reduction, K-Means-20-reduction, and K-Means-30- reduction.

3.1 Dataset

The dataset sample before preprocessing is shown in Table 2.

Table 2. Dataset Sample

overall verified reviewTime reviewerID asin reviewerName …

5 True 11 16, 2016 AOZVLEOST0J3G B009ZX8ZJG JD …

5 True 01 12, 2016 AJ94N3I1A434D B00063678K W. D. …

5 True 04 28, 2016 A156CKCKZPH8S5 B0002E1G5C Cenerentola …

5 True 08 8, 2013 A1E5FQZTUM8OC1 B0002GXYVO Kenwood …

5 False 10 4, 2014 A1OR8R0BADATAI B001IYODDC Guitar Dog … In this study we split the dataset into training and testing data with a percentage of 80% and 20%

respectively after the preprocessing. We make sure that for every user and item that present in testing data must also present in training data. Sample of both testing and training data are shown in Table 3 and Table 4 respectively.

Table 3. Training Data Sample

overall reviewerID asin

5 A1RIVJ07HKE1F9 B006MVECVE

5 A2PX3NR1IVTXPG B008FDSWJ0

5 A27HOMP2IFOOYC B009MDMQJ4

5 A3EZTFLYUHCHBF B00XQCRR5U

5 A3EMSV4RL1SV4N B0002GMGYA

Table 4. Testing Data Sample

overall reviewerID asin

4 A1TDR4NT5G28NI B001TMCVDW

3 A26RNTHH6P38YJ B0002GWFEQ

5 A29A4ZAX7XDLCJ B0002E1NNC

4 A20PWG5RGSXGH5 B0002OP0WC

2 AGIW1EQ9HIZN2 B00GROJWAW

3.2 User-Item Rating Matrix

The following matrix is a sample for user-item rating matrix,

(

0 0 0 0 0

0 0 0 0 0)

(6)

where rows represent users and column represent items. The matrix has high sparsity level, this is due to the size of the matrix is 26485 × 1769 only with the limit of a user rating at least 25 times. The sparsity level of this matrix is 99.7%.

3.3 Centroid-Item Rating Matrix

The result of user clustering with K-Means can be transformed into centroid-item rating matrix. Matrix below is the sample data of centroid-item rating matrix,

(

0 0 4 0 0

5 0 5 0 0

5 1 5 0 0

0 0 0 0 0

0 0 0 0 0)

where rows represent clusters and column represents items. Value 0 is possible due to the high sparsity of the user-item rating matrix, as there will be a possible case of clusters without a single user rating for a certain item.

3.4 Reduced U Matrix

In this study, we will only be considering U matrix after the SVD process. in the case of K = 10 and t = 10, the U matrix has size 10 × 10 with rows representing clusters and columns representing eigenvectors of a matrix A^T× A according to the following 10 × 10 matrix,

(

−0.721 0.423 ⋯ 0.007 0.011

−0.291 −0.514 ⋯ 0.001 −0.017

⋮ ⋮ ⋱ ⋮ ⋮

−0.161 −0.147 ⋯ 0.102 −0.016

−0.409 0.360 ⋯ 0.012 −0.002)

After dimension reduction was carried out, only 90% of the U column remained according to the 10 × 9 reduced U matrix below with the system removing the last column from the U matrix.

(

−0.721 0.423 ⋯ 0.007

−0.291 −0.514 ⋯ 0.001

⋮ ⋮ ⋱ ⋮

−0.161 −0.147 ⋯ 0.102

−0.409 0.360 ⋯ 0.012) 3.5 Cluster-Similarity Matrix

After the similarity calculation stage had been carried out, the U matrix turned into a cluster-similarity matrix,

(

1 0.000202 ⋯ 0.000192 0.000026

0.000202 1 ⋯ −0.000279 −0.000038

⋮ ⋮ ⋱ ⋮ ⋮

0.000192 −0.000279 ⋯ 1 −0.000036

0.000026 −0.000038 ⋯ −0.000036 1 )

measuring 10 × 10 in size, with each row and column representing a cluster. The diagonal matrix has a value of 1 because it is the similarity between the two clusters.

3.6 Application Implementation

In this stage, a performance measurement algorithm is needed to evaluate the recommender systems, such as RMSE. The main idea of RMSE is to calculate the square root of the average value of all the differences between predicted and actual value. Therefore, a small RMSE value implies better performing recommender system [19].

RMSE can be expressed in equation [20] where y_i denotes the actual value, ŷ_i is the system’s given predicted value, and n is number of ratings predicted in a trial.

RMSE = √¹

n∑ⁿ_i=1|y_i− ŷ_i| (5)

Research generally shows a high RMSE value with RMSE > 1. This is common in research with high dimensional data [12] because the sparsity level in the dataset reaches a value of 99.7% even after preprocessing with a user at least giving a rating 25 times at items. However, the success of SVD is proven to increase the performance of the recommender system in the dimension reduction process. Figure 5 shows the RMSE comparison between all the tested approaches. At K = 10, the greater the value of t, the lower the performance

(7)

due to the loss of information from the U matrix, which is only 10 × 10 in size. For all values K > 10, K-Means- 30-reduction works better than all methods when using K > 10. Meanwhile, for all values 10 < K < 50 K-Means- 20-reduction also performs better than K-Means. K-Means-10-reduction is not suitable for application with K >

20.

Figure 5. Comparison of RMSE

Table 5 shows the performance difference in RMSE between K-Means and K-Means-30-reduction. K- Means-30-reduction can generate a smaller RMSE value 9.372% smaller than the traditional K-Means.

Table 5. Performance Comparison Between K-Means and K-Means-30-reduction in RMSE

K

RMSE RMSE

Difference (%) K-Means K-Means-30-

reduction

20 2.055066 1.885936 8.583%

30 2.263957 2.153186 5.016%

40 2.398194 2.255984 6.111%

50 2.539080 2.311766 9.372%

average 2.31407425 2.151718 7.270%

4. CONCLUSION

CF-based recommender system has the main problem of sparsity which can produce a poor recommender system.

This research focuses on developing a recommender system in the e-commerce domain and improving the recommender system's performance by increasing accuracy. The results show a success in generating item recommendation with our proposed method. However, the system generally produces high RMSE values, yet significantly lower than previous research using the same dataset. This is caused by the high sparsity level of the dataset, which reaches 99.7% even after preprocessing of limiting to only user that has done at least 25 ratings.

Therefore, the neighbor similarity level in the cluster-similarity matrix tends to become very low.

REFERENCES

[1] “Citation & Republication — DataReportal – Global Digital Insights.” https://datareportal.com/citation (accessed Jan.

13, 2023).

[2] C. Xu, D. Peak, and V. Prybutok, “A customer value, satisfaction, and loyalty perspective of mobile application recommendations,” Decis Support Syst, vol. 79, pp. 171–183, Nov. 2015, doi: 10.1016/J.DSS.2015.08.008.

[3] W. Lu, S. Chen, K. Li, and L. V. S. Lakshmanan, “Show Me the Money: Dynamic Recommendations for Revenue Maximization,” Proceedings of the VLDB Endowment, vol. 7, no. 14, pp. 1785–1796, Aug. 2014, doi:

10.14778/2733085.2733086.

[4] B. Ramzan et al., “An Intelligent Data Analysis for Recommendation Systems Using Machine Learning,” Sci Program, vol. 2019, 2019, doi: 10.1155/2019/5941096.

[5] P. M. Chu and S. J. Lee, “A novel recommender system for E-commerce,” Proceedings - 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, CISP-BMEI 2017, vol. 2018- January, pp. 1–5, Feb. 2018, doi: 10.1109/CISP-BMEI.2017.8302310.

[6] H. Zarzour, Z. Al-Sharif, M. Al-Ayyoub, and Y. Jararweh, “A new collaborative filtering recommendation algorithm based on dimensionality reduction and clustering techniques,” 2018 9th International Conference on Information and Communication Systems, ICICS 2018, vol. 2018-January, pp. 102–106, May 2018, doi: 10.1109/IACS.2018.8355449.

[7] C. C. Aggarwal, “Data Mining,” 2015, doi: 10.1007/978-3-319-14142-8.

[8] H. Liu, J. Wu, T. Liu, D. Tao, and Y. Fu, “Spectral Ensemble Clustering via Weighted K-Means: Theoretical and Practical Evidence,” IEEE Trans Knowl Data Eng, vol. 29, no. 5, pp. 1129–1143, May 2017, doi:

10.1109/TKDE.2017.2650229.

(8)

[9] A. A. Fakhri, Z. K. A. Baizal, and E. B. Setiawan, “Restaurant Recommender System Using User-Based Collaborative Filtering Approach: A Case Study at Bandung Raya Region,” J Phys Conf Ser, vol. 1192, no. 1, May 2019, doi:

10.1088/1742-6596/1192/1/012023.

[10] M. S. Ayundhita, Z. K. A. Baizal, and Y. Sibaroni, “Ontology-based conversational recommender system for recommending laptop,” in Journal of Physics: Conference Series, 2019, vol. 1192, no. 1. doi: 10.1088/1742- 6596/1192/1/012020.

[11] M. I. Ardimansyah, A. F. Huda, and Z. K. A. Baizal, “Preprocessing matrix factorization for solving data sparsity on memory-based collaborative filtering,” in Proceeding - 2017 3rd International Conference on Science in Information Technology: Theory and Application of IT for Education, Industry and Society in Big Data Era, ICSITech 2017, 2017, vol. 2018-January. doi: 10.1109/ICSITech.2017.8257168.

[12] M. Stratigi, X. Li, K. Stefanidis, and Z. Zhang, “Ratings vs. Reviews in Recommender Systems: A Case Study on the Amazon Movies Dataset,” Communications in Computer and Information Science, vol. 1064, pp. 68–76, 2019, doi:

10.1007/978-3-030-30278-8_9/COVER.

[13] J. Ni, J. Li, and J. McAuley, “Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects,” EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference, pp. 188–197, 2019, doi: 10.18653/V1/D19-1018.

[14] N. T. Son, D. H. Dat, N. Q. Trung, and B. N. Anh, “Combination of Dimensionality Reduction and User Clustering for Collaborative-Filtering,” ACM International Conference Proceeding Series, pp. 125–130, Dec. 2017, doi:

10.1145/3168390.3168405.

[15] F. Anowar, S. Sadaoui, and B. Selim, “Conceptual and empirical comparison of dimensionality reduction algorithms (PCA, KPCA, LDA, MDS, SVD, LLE, ISOMAP, LE, ICA, t-SNE),” Comput Sci Rev, vol. 40, p. 100378, May 2021, doi: 10.1016/J.COSREV.2021.100378.

[16] S. M. Atif, S. Qazi, and N. Gillis, “Improved SVD-based initialization for nonnegative matrix factorization using low- rank correction,” Pattern Recognit Lett, vol. 122, pp. 53–59, May 2019, doi: 10.1016/J.PATREC.2019.02.018.

[17] M. Gupta, A. Thakkar, Aashish, V. Gupta, and D. P. S. Rathore, “Movie Recommender System Using Collaborative Filtering,” Proceedings of the International Conference on Electronics and Sustainable Communication Systems, ICESC 2020, pp. 415–420, Jul. 2020, doi: 10.1109/ICESC48915.2020.9155879.

[18] S. Suriati et al., “A Hybrid Approach using Collaborative filtering and Content based Filtering for Recommender System,” J Phys Conf Ser, vol. 1000, no. 1, p. 012101, Apr. 2018, doi: 10.1088/1742-6596/1000/1/012101.

[19] R. Sallam, M. Hussein, H. M.-I. J. of, and undefined 2020, “An enhanced collaborative filtering-based approach for recommender systems,” researchgate.net, 2020, doi: 10.5120/ijca2020920531.

[20] M. C. M. Oo and T. Thein, “An efficient predictive analytics system for high dimensional big data,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 1, pp. 1521–1532, Jan. 2022, doi:

10.1016/J.JKSUCI.2019.09.001.