Lodging Recommendations Using the SparkML Engine ALS and Surprise SVD

(1)

Lodging Recommendations Using the SparkML Engine ALS and Surprise SVD

Sageri Fikri Ramadhan^*, ZK Abdurahman Baizal, Rita Rismala School of Computing, Informatics Study Program, Telkom University, Bandung, Indonesia

Email: ^1,*[email protected], ²[email protected]

3[email protected]

Email Penulis Korespondensi: [email protected]

Abstrak− Sistem rekomendasi adalah suatu proses atau alat yang digunakan untuk memberikan prediksi kepada pengguna untuk memilih sesuatu berdasarkan domain yang ada. Sistem ini sudah menjadi kebutuhan primer bagi industri digital modern saat ini seperti pada sektor hiburan, perbelanjaan, dan layanan jasa. Pada penelitian ini, penulis berfokus kepada bagaimana mengembangkan sistem rekomendasi untuk jasa akomodasi penginapan. Penulis menggunakan metode Alternating Least Square dan Singular Value Decomposition untuk melakukan prediksi dan rekomendasi penginapan kepada pengguna.

Kata Kunci: Rekomendasi, Penginapan, ALS, SVD, Rating, NLTK

Abstract−Recommendation system is a process or tool used to provide predictions for users to choose something based on an existing domain. This system has become a primary need for today's modern digital industry such as in the entertainment, shopping, and service sectors. In this research, we focus on how to develop a recommendation system for accommodation services. We use the Alternating Least Square and Singular Value Decomposition methods to predict and recommend lodging to users.

Keywords: Recommendation, Lodging, ALS, SVD, Rating, NLTK

1. INTRODUCTION

Tourism is a social, cultural and economic phenomenon that occurs in one event which leads one to visit a place or region that has a different environment for the perpetrators for personal recreational needs or professional needs such as business [1]. The perpetrators of these activities can be known as tourists or tourists. As in the initial understanding of tourism, this activity does have a chain effect in various perspectives, for example in the economic perspective, this activity impacts on market growth, growing interest in buying and selling power and as a form of mutual benefit between tourists and objects or subjects visited in the framework of tourism carried out.

The massive needs of the world community for tourism activities are experiencing significant growth over the years. This can be seen in the data released by the World Tourism Organization in its report titled in the attached statistical data showing that in July 2019, the total number of tourists visiting a registered place on an international scale managed to bring tourists as many as 1.4 million visitors and managed to record gross income of 1.4 billion in units of the United States Dollar [2]. Growth in tourist arrivals and gross income can be seen in Figure 1.

According to Richard Sharpley, the sector affected by the advancement of world tourism is lodging services. According to him also, this sector is a fundamental element in the continuation of successful tourism run businesses [3]. Iinfrequently lodging services such as hotels or inns are used as factors that influence travelers visiting a tourist destination, this is seen from how close the accommodation will be rented to the center of tourist attractions.

Airbnb is an online service that provides services as a third party or intermediary that bridges users who will be looking for temporary accommodation such as hostels, inns, apartments, etc. to parties who rent places that can be occupied by users [4]. Airbnb is a service that is quite well-known among world travelers because of the ease of access and integration of lodging places in various places in almost all parts of the world. With its service that is focused on users, Airbnb promises 24-hours service to help users make payments, find suitable lodging, to the collaboration of various parties in providing rental housing.

Figure 1. Gross income balance

(2)

As the tourism industry develops, the accommodation sector is directly proportional to its rapid development. In these conditions, the user will be faced with various accommodation service options which can often be confusing when choosing the best accommodation. So from these findings, there needs to be a system that can provide recommendations to users in choosing accommodation. In this case, we will examine two things, namely converting user reviews as rating values by calculating the polarity value, and making recommendations using the SparkML Engine that already has the Alternating Least Square and Surprise Engine that already has the Singular Value Decomposition methods [5]–[7].

The SparkML Engine and the Surprise Engine were chosen by us because there were previous studies that used the same tool for different implementations, and we will analyze the two methods contained in SparkML and Surprise Engine, which method is better to provide recommendations by measuring values rather than Root Mean Square Error obtained by both methods [8]. The aim of this research is to convert sentiment into rating value and use existing engines to provide the best recommendations for users and shows that the current era in computing computation has been helped by the existence of one or several engines that assist in data processing.

The stages of the research we will undertake outline will discuss: identification of problems, study of literature, design research schemes, and make comparisons of the methods we use in this research. Identifying problems as we have outlined in this paragraph, we want to compare two different methods and two engines, and show how user reviews can be converted as rating values. Literature review is the part where we explain again related to existing research related to the research that we are currently compiling. Then we design a system schema design, and will do a method comparison by looking at the accuracy obtained from the two methods

2. RESEARCH METHODOLOGY

2.1 Dataset

Before exploring how we predict ratings, we will explain related datasets that we obtained through InsideAirbnb.

The dataset includes the following: details regarding listings, dates, availability, and prices in New York, Insights into the outline of the property together with facilities and details regarding the host, the description of host and placement of the property, price, last review, and availability, review comments by the user for a particular property. The dataset we get is shaped like in Table 1, we will process the data in the process of rating prediction and take only the columns used in the process

Table 1. Dataset

listing_id id date reviewer_id reviewer_name comments

2060 158 22/09/2008 2865 Thom

very nice neighborhood close enough to A train comfortable bed and clean home

2595 19176 05/12/2009 53267 Cate Great experience.

2595 1238204 07/05/2012 1783688 Sergey Hi to everyone!

2595 30430122 21/04/2015 6429364 Sonya I love this space It is truly a gem 2.2 Design Scheme

In this study, we carried out the stages needed to determine the best results in determining recommendations. Our design scheme is shown in Figure 2

2.3 Literature Review

Literature Review is a study of related studies that have been published and support success in building this research and guide all research activities. in this first literature, which deals with singular value decomposition,

Figure 2. Design Scheme

(3)

Sageri Fikri Ramadhan, Copyright ©2020, MIB, Page 891 recommendation system is one of the most important technology and collaborative filtering are popular approaches used in providing recommendation system. With the massive development of electronic commerce system nowadays, the size of users and goods is growing fast, resulting in extreme sparsity of user rank data sets.

Traditional similarity measurement methods are less successful in this situation, sparsity of user ratings is the main reason for causing poor quality. To overcome this problem, in this paper, collaborative filtering algorithm based on singular value decomposition explained. This approach predicts the ranking of items for which the user has not yet been assessed, and then uses the singular value decomposition method as well as Pearson similarity measurements to find the user's target neighbor, finally generating recommendation [9].

Then in the next literature related to alternating least square, studying user-item preferences from implicit feedback is a central problem of the recommendation system research. One approach to dealing with implicit feedback is to minimize the objective function of ranking rather than the prediction error of the mean squared. The objective function ranking is usually expensive to optimize, and this difficulty is usually overcome by sampling the objective function. In this paper, they propose an efficient ranking method based on computation that optimizes the original objective function, without sampling. RankALS is inspired by Implicit-based Prediction method. The main components of RankALS are matrix factorization models, ranking-based objective functions, and alternating least squares optimizers [10]

Further literature regarding SparkML Engine, here in this work they present MLlib, Spark's distributed machine learning library, and the largest such library. Libraries target large-scale learning arrangements that utilize data-parallelism or model-parallelism to store and operate on data or models. MLlib consists of a fast and measurable implementation of standard learning algorithms for general learning settings including classification, regression, collaborative filtering, grouping, and dimension reduction. It also provides various underlying statistics, linear algebra, and optimization primitives [11].

With our literature review, we want to design a system that can convert user reviews as a rating value and the results of the rating value will be used in a recommendation system with a collaborative filtering scheme to determine predictions of items that will be suggested to users

2.4 Rating Prediction

Rating prediction uses review_details.csv data, it contains information like listing_id, review_comments, reviewer_id, reviewer name, date. Prediction uses only reviewer_id, listing_id and review_comments to predict the rating. We use the Natural Language Toolkit to help us make rating predictions [12]. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that's specifically attuned to sentiments expressed in social media. VADER uses a mix of A sentiment lexicon could be a list of lexical options (e.g., words) that area unit typically tagged consistent with their linguistics orientation as either positive or negative [12], [13].

2.5 Alternating Least Square

Alternating Least Square is a large matrix of user and item interactions and knows the latent (or hidden) features that are related between users and items. Based on the literature that has been read by the author, this method has a fairly short execution time in compiling GraphLab and MapReduce performed on the Movie Lens and Netflix dataset [14].

In Figure 3, the visualization of the table or referred to as the Large Matrix, contains data related to user 𝑈 and item 𝑉. The product of row 𝑈 and column 𝑉 express the suitability of user 𝑈 and item 𝑉 . there are often imperfect, in the sense that some users tend to only rate items that have been used, so there are some cell blanks. Here ALS works to predict latent factors. Because in principle this algorithm only predicts values for items that have been valued by previous users, not items that have not been assessed at all, according to the principle of collaborative filtering is the wisdom crowd or judgments that depend on public judgment. This algorithm also has other important variables namely 𝑊 (Weight), 𝑓 (Rank), and λ (lambda).f is the latent factor that influences rather than the factorization of the user and item matrices, 𝑊 is the magnitude of the effect that will affect the rating, and λ is the parameter that produces the final result that the items selected are in accordance with the preferences of the user. The translation of the algorithm can be seen in Equation 1 and Equation 2 [6].

Figure 3. Visualize of Dataset in Matrix Form

(4)

𝑥_𝑢= (𝑌𝑊_𝑢𝑌^𝑇+ 𝜆𝐼)⁻¹ (1) 𝑦_𝑖= (𝑋𝑊_𝑖𝑋^𝑇+ 𝜆𝐼)⁻¹ (2)

For this example, we'll pick the item vectors 𝑦_𝑖 . We then take the derivative of the loss function with respect to the other set of vectors, the user vectors 𝑥_𝑢 and solve for the non-constant vectors (the user vectors). To clarify it a bit, let us assume that we have 𝑚 users and 𝑛 items, so our ratings matrix is 𝑚 × 𝑛.

a. The row vector 𝑟_𝑢 represents users u's row from the ratings matrix with all the ratings for all the items (so it has dimension 1 × 𝑛)

b. We introduce the symbol 𝑌, with dimensions 𝑛 × 𝑑, to represent all item row vectors vertically stacked on each other

c. Lastly, 𝐼 is the identity matrix which has dimension 𝑑 × 𝑑 to ensure the matrix multiplication's dimensionality will be correct when we add the λ.

In SparkML Engine, we are made easy using ALS tools contained in the engine. calculations and other things are done behind the scenes, we only need to determine the right parameters to produce a good level of accuracy, calculated by looking at the resulting RMSE value. The procedure used in the ALS method contained in the SparkML Engine described in the Table 2.

Table 2. Using ALS method on SparkML Engine

In the Table 1, The implementation in spark.ml has the following parameters:

a. rank is the number of latent factors in the model (defaults to 10).

b. maxIter is the maximum number of iterations to run (defaults to 10).

c. regParam specifies the regularization parameter in ALS (defaults to 1.0).

2.6 Singular Value Decomposition

Singular Value Decomposition is a matrix factoring by breaking down a matrix into two unitary matrices 𝑈 and 𝑉, and a diagonal matrix 𝑆 containing a scale factor called a singular value. The translation of the concept algorithm can be seen in Equation 3 [8].

𝑹 = 𝑼 𝑺 𝑽^𝑻 (3)

To clarify it a bit, let us assume that we have 𝑚 users and 𝑛 items, so our ratings matrix is 𝑚 × 𝑛. 𝑅(𝑛 × 𝑚) is number of records as rows and number of dimensions/features as columns. 𝑈(𝑛 × 𝑛) is orthogonal matrix containing eigenvectors of 𝐴𝐴^𝑇. 𝑆(𝑛 × 𝑚) meaning ordered singular values in the diagonal. Square root of eigenvalues associated with 𝐴𝐴^𝑇. 𝑉(𝑚 × 𝑚) meaning orthogonal matrix containing eigenvectors of 𝐴𝐴^𝑇.

In Suprise Engine, we are made easy using SVD tools contained in the engine. calculations and other things are done behind the scenes, we only need to determine the right parameters to produce a good level of accuracy, calculated by looking at the resulting RMSE value. The procedure used in the SVD method contained in the Surprise Engine described in the Table 3 [7].

Table 3. Using ALS method on SparkML Engine

In the Table 2, The implementation in spark.ml has the following parameters:

a. n_factors, which controls the dimension of the latent space (i.e. the size of the vectors user and item). Usually, the quality of the training set predictions grows with as n_factors gets higher.

b. n_epochs, which defines the number of iterations of the SGD procedure).

Algorithm 1 SparkML – ALS

1 #Assumsion SparkML Engine has been setup before 2

3 Set = ALS(maxIter, regParam, userCol, itemCol, 4 ratingCol, rank)

5 #ALS() is a procedure in SparkML that contains ALS Method and Calculation 6 model = Set.fit(data_train)

7 predictions = model.transform(data_test)

Algorithm 2 Suprise - SVD

1 #Assumsion Surprise Engine has been setup before 2

3 Set = SVD(n_factors, n_epoch)

4 #SVD() is a procedure in Surprise that contains SVD Method and Calculation 5 model = Set.fit(data_train)

6 data_test = data_train.build_testset() 7 predictions = Set.test(data_test)

(5)

In making comparisons, we compared the RMSE values generated from the two methods. Root Mean Squared Error (RMSE) is one way to evaluate linear regression models by measuring the accuracy of the estimated results of a model [15]. RMSE is calculated by squaring the error (prediction - observation) or in other terms (𝑦𝑖 − 𝑦𝑖)̂ divided by the amount of data (in other words is to find the average), then rooted. The smaller the value produced, the better the resulting accuracy. the equation in finding RMSE is listed in Equation 4.

𝑅𝑀𝑆𝐸 = √^{∑(𝑦𝑖− 𝑦𝑖)}^̂

𝑛 (4)

3. RESULTS AND DISCUSSION

3.1 Data Collection

We have collected data obtained through the site insideairbnb.com where the dataset contains userId, listingId, and review. This dataset explains users of Airbnb services in the New York, United States. The data is shown in Table 4.

Table 4. Data Collection

Listing_id Reviewer_id Comment

43300250 332274505 Wonderful place and an perfect location The pl...

43320440 318190347 Great time

43320440 278355992 Immediate responses Very cool place

43325856 18351733 The photos don’t do this place justice This pl...

43327284 57291971 One of the nice Airbnb’s I’ve stayed in The pl...

3.2 Data Pre-Processing

This is a basic step to get good results by reducing the noise contained in the review. The pre-processing steps performed from the dataset are Counting the number of words and characters in the review, eliminating unnecessary stop words for analysis. Stop words are words that have insignificant meaning and we need to eliminate them. We have done this by getting rid of the list of words we consider to be a stop word by using the NLTK corpus. Extraction of opinion opinions are now extracted and ready to be analyzed. The application of the lexicon-based algorithm namely VADER. The data is shown in Table 5.

Table 5. Data Pre-Processing

Before After

very nice neighborhoodclose enough to

"a" train\

very nice neighborhoodclose enough to a train

\r comfortable bed and clean home over all\rjennys cat is very sweet

comfortable bed and clean home over all jennys cat is very sweet

3.3 Extraction of User Opinions with VADER

VADER is implemented by importing SentimentIntensityAnalyzer from installed vaderSentiment as in section 2.

the polarity_scores() function is obtained instead of SentimentIntensityAnalyzer(x), where x is the user's opinion, giving a sentiment score as output. This gives positive, negative, neutral, and compound scores. Based on compound scores we can determine the orientation of sentiments. Compound Score is a measure obtained by calculating the sum of all standard lexicon scores between -1 (most negative) and +1 (most positive) . This score is in the form of a python dictionary and only compound scores are extracted. After extraction of the combined scores for each user opinion, based on the values given in Table 6, the number of positive, negative, and neutral opinions is calculated.

Table 6. Scoring Table idx Positif

Score

Negative Score

0 28.4 0.0

1 0.0 2.1

2 80.4 0.0

3 19.0 0.0

4 18.5 0.0

(6)

3.4 Rating Result

The results of the positive scores and negative scores obtained (shown on Table 6) are then converted to a rating value with a scale of numbers from 0 to 5. Determination of the rating value is based on how high a sentence has a positive score, then the rating value will be reduced if there is a negative value of magnitude can reduce the final points rather than the predetermined rating value. The final output is shown on Table 7.

Table 7. Rating Result idx Positif

Score

Negative Score

rating

0 28.4 0.0 4

1 0.0 2.1 0

2 80.4 0.0 5

3 19.0 0.0 3

4 18.5 0.0 3

3.5 Hyperparameter Tuning – SparkML Engine (ALS)

Hyperparameter is a parameter whose value is set before the learning process begins [16]. In the SparkML Engine, there are a series of parameters used to run the ALS procedure. the function of Hyperparameter is to produce good output. In determining the optimum parameters, you can use a method called Grid Search. Grid Search is a traditional way to perform hyperparameter optimization. It works by searching exhaustively through a specified subset of hyperparameters [17]. The benefit of the search grid is that it is guaranteed to find the optimal combination of parameters supplied. The drawback is that it can be very time consuming and computationally expensive. the use of hyperparameter can be seen in Table 8.

Table 8. Hyperparameter Tuning

in the following parameters: rank, maxIter, and regParam are arranged in an array of arrays as shown in Table 7, so when running tuning iterates according to the values in the array

3.6 Run the SparkML Engine

As explained in section 2, we use the SparkML Engine in conducting Collaborative Filtering recommendations with the Alternating Least Square method. Before we running this method, we must make a split data form whole datasets, we will partition the dataset into two parts, namely Data train (with 80% composition of the entire dataset) and Data test (with a composition of 20% of the entire dataset), this is important done because with a larger Data train will affect the quality of the output on the Data test. In carrying out the ALS method with SparkML Engine, the steps that need to be done are: Hyperparameter tuning which is used at the same time to do modeling using the existing data train, then to test the models that have been designed to the test data, and calculate the accuracy of the results to be compared with the method SVD. Hyperparameter tuning used in SparkML Engine, how to use it is shown in Table 8. After running hyperparameter tuning, the results will be used to fill the parameters used to run SparkML Engine, the use of SparkML Engine is shown in Table 2. Then the accuracy of the modeling that has been designed will be calculated with the existing test results.

3.7 Output – SparkML Engine (ALS)

The results of the recommendation using the SparkML Engine Alternating Least Square method, is a list of accommodation recommendations and predictive values aimed at Airbnb users whose dataset we are currently processing. Sample results can be seen in Table 9.

Algorithm 3 SparkML – Hyperparameter Tuning

1 #Assumsion SparkML Engine has been setup before 2 #import CrossValidator, ParamGridBuilder from SparkML

3 evaluator = RegressionEvaluator(metricName='rmse', labelCol='rating', predictionCol='prediction') 4

5 paramGrid = ParamGridBuilder().addGrid(als.rank, [1, 5, 10]).addGrid(als.maxIter, [20]).addGrid(als.regParam, [0.05, 0.1, 0.5]).build()

6

7 crossval = CrossValidator(estimator=als, estimatorParamMaps=paramGrid, evaluator=evaluator, numFolds=10)

8

9 cvModel = crossval.fit(trainingRatings) 10 cvModel.bestModel.extractParamMap()

(7)

idx userID Listing Recommendation

Rating

0 281328 5803 3.1

1 389482 4.5

2 4459383 12940 3.7

3 5860123 2.8

4 29212599 16289102 3.0

5 16289973 4.7

6 3620415 1645915 2.7

7 4830799 3.7

3.8 Hyperparameter Tuning - Surprise Engine (SVD)

Hyperparameter is a parameter whose value is set before the learning process begins [16]. In the Surprise Engine, there are a series of parameters used to run the SVD procedure. the function of Hyperparameter is to produce good output. In determining the optimum parameters, you can use a method called Grid Search. Grid Search is a traditional way to perform hyperparameter optimization. It works by searching exhaustively through a specified subset of hyperparameters [17]. The benefit of the search grid is that it is guaranteed to find the optimal combination of parameters supplied. The drawback is that it can be very time consuming and computationally expensive. the use of hyperparameter can be seen in Table 10.

Table 10. Hyperparameter Tuning

in the following parameters: n_epoch, n_factors are arranged in an array of arrays as shown in figure x, so when running tuning iterates according to the values in the array

3.9 Run the Surprise Engine

As explained in section 2, we use the Surprise Engine in conducting Collaborative Filtering recommendations with the Singular Value Decomposition method. Before we running this method, we must make a split data form whole datasets, we will partition the dataset into two parts, namely Data train (with 80% composition of the entire dataset) and Data test (with a composition of 20% of the entire dataset), this is important done because with a larger Data train will affect the quality of the output on the Data test. In carrying out the SVD method with Surprise Engine, the steps that need to be done are: Hyperparameter tuning which is used at the same time to do modeling using the existing data train, then to test the models that have been designed to the test data, and calculate the accuracy of the results to be compared with the method ALS. Hyperparameter tuning used in Surprise Engine, how to use it is shown in Table 10. After running hyperparameter tuning, the results will be used to fill the parameters used to run Surprise Engine, the use of Surprise Engine is shown in Table 3. Then the accuracy of the modeling that has been designed will be calculated with the existing test results.

3.10 Output - Surprise Engine (SVD)

The results of the recommendation using the Surprise Engine Singular Value Decomposition method, is a list of accommodation recommendations and predictive values aimed at Airbnb users whose dataset we are currently processing. Sample results can be seen in Table 11.

Table 11. A cut of the recommendations idx userID Listing

Recommendation

Rating

0 60033560 28368234 4.6

1 34684755 4.6

2 1615066 28941710 4.66

3 34684755 4.65

4 44709554 29858182 4.73

Algorithm 4 Surprise – Hyperparameter Tuning

1 #Assumsion Surprise Engine has been setup before 2 #import GridSearchCV from Surprise Engine

3 param_grid = {'n_epochs': [5, 10], 'lr_all': [0.002, 0.005,0.1],'reg_all': [0.4, 0.6], 'n_factors':[100,500]}

4

5 grid_search = GridSearchCV(SVD, param_grid, measures=['rmse','mae']) 6

7 grid_search.fit(data)

(8)

idx userID Listing Recommendation

Rating

5 38699887 4.72

6 34734701 39428172 4.86

7 35916489 4.85

3.11 Comparison

In making comparisons, we compared the RMSE values generated from the two methods. The results of the two methods RMSE, the Alternating Least Square method run by SparkML Engine gets a score of 1.0125 and the Singular Value Decomposition method run by the Surprise Engine gets a score 0.3658, see Figure 4.

4. CONCLUSION

Recommendation system is a system that aims to provide predictors of a domain to its users. on this occasion, we have tested the recommendation system with the Alternating Least Square and Singular Value Decomposition methods using SparkML Engine and Surprise Engine. The use of both Engines is based on the author wanting to test the engines that currently exist in assisting IT experts in processing data faster. From the test the author has done, it appears that the Singular Valued Decomposition method produces a smaller RMSE value, which means a good level of accuracy compared to the Alternating Least Square method. The difficulty experienced by the author is when running the Singular Value Decomposition method requires a longer running time than the Alternating Least Square method. Our hope for the future is that there are optimizations in tweaking parameters on the two engines which can produce more precise accuracy and faster data processing, and informing the reader that currently there are many kinds of tools developed in the scope of Data Processing and Machine Learning which simplify our task as IT officers

REFERENCES

[1] R. Prosser, “Tourism,” Encycl. Appl. Ethics, pp. 386–406, 2012, doi: 10.1016/B978-0-12-373932-2.00072-7.

[2] UNWTO, “International Tourism Highlights,” UNWTO Tourism Highlights: 2019 Edition, 2019. . [3] R. Sharpley, Tourism, Tourists and Society. 2018.

[4] N. Evans and N. Evans, “Airbnb,” in Strategic Management for Tourism, Hospitality and Events, 2019.

[5] N. Deshai, B. V. D. S. Sekhar, and S. Venkataramana, “Mllib: machine learning in apache spark,” Int. J. Recent Technol.

Eng., 2019.

[6] I. S. Wahyudi, “Big data analytic untuk pembuatan rekomendasi koleksi film personal menggunakan Mlib. Apache Spark,” Berk. Ilmu Perpust. dan Inf., vol. 14, no. 1, p. 11, 2018, doi: 10.22146/bip.32208.

[7] N. Hug, “Surprise, a Python library for recommender systems,” URL http//surpriselib. com, 2017.

[8] R. Mhetre and D. P. G, “Movie Recommendation Engine using Collaborative Filtering with Alternative Least Square and Singular Value Decomposition Algorithms,” IJARCCE, 2019, doi: 10.17148/ijarcce.2019.8216.

[9] L. Yanxiang, G. Deke, C. Fei, and C. Honghui, “User-based clustering with top-N recommendation on Cold-Start problem,” in Proceedings of the 2013 3rd International Conference on Intelligent System Design and Engineering Applications, ISDEA 2013, 2013, doi: 10.1109/ISDEA.2012.381.

[10] G. Takács and D. Tikk, “Alternating least squares for personalized ranking,” in RecSys’12 - Proceedings of the 6th ACM Conference on Recommender Systems, 2012, doi: 10.1145/2365952.2365972.

[11] X. Meng et al., “MLlib: Machine learning in Apache Spark,” J. Mach. Learn. Res., 2016.

[12] H. R. M. , S. D. . Harish Rao M , Shashikumar D.R, “Automatic Product Review Sentiment Analysis Using Vader and Feature Visulaization,” Int. J. Comput. Sci. Eng. Inf. Technol. Res., 2017, doi: 10.24247/ijcseitraug20178.

[13] C. J. Hutto and E. Gilbert, “VADER: A parsimonious rule-based model for sentiment analysis of social media text,” in Proceedings of the 8th International Conference on Weblogs and Social Media, ICWSM 2014, 2014.

[14] E. V. V. Cervantes, L. V. C. Quispe, and J. E. O. Luna, “Performance of alternating least squares in a distributed approach Figure 4. Score of RMSE

(9)

[15] T. Chai and R. R. Draxler, “Root mean square error (RMSE) or mean absolute error (MAE)? -Arguments against avoiding RMSE in the literature,” Geosci. Model Dev., 2014, doi: 10.5194/gmd-7-1247-2014.

[16] E. Hazan, A. Klivans, and Y. Yuan, “Hyperparameter optimization: A spectral approach,” in 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings, 2018.

[17] J. Bergstra and Y. Bengio, “Random search for hyper-parameter optimization,” J. Mach. Learn. Res., 2012.