Tourism Recommendation System using Weighted Hybrid Method in Bali Island

(1)

Tourism Recommendation System using Weighted Hybrid Method in Bali Island

Diffo Elza Pratama^*, Dade Nurjanah, Hani Nurrahmi School of Computing, Informatics, Telkom University, Bandung, Indonesia Email: ^1,*[email protected], ²[email protected],

3[email protected]

Correspondence Author Email: [email protected]

Abstract−Tourism is a promising sector for global economic growth, as it has shown resilience during the global crisis. In Bali, tourism is a leading sector alongside agriculture and industry, making a significant contribution to regional and community development. However, Bali's popularity as a sought-after tourist destination also raises the need for an information system that can provide destination recommendations. To overcome the problem of information overload, a recommendation system is needed. This study tested the tourism recommendation system in Bali using the Weighted Hybrid technique which combines two methods, namely Collaborative Filtering and Content-Based using the weighted value technique. Collaborative Filtering, Content-Based, and Weighted Hybrid approaches will be compared in this study to improve the performance and accuracy of current recommendation systems. Utilizing the MAE, MSE, and RMSE values, the evaluation is carried out by comparing the evaluation matrices of the three Collaborative Filtering, Content-Based, and Weighted Hybrid methods. With MAE, MSE, and RMSE values of 0.4854, 0.4034, and 0.6351 respectively, the evaluation findings show that the Weighted Hybrid technique beats Collaborative Filtering and Content-Based with a weight value of 0.4.

Keywords: Collaborative Filtering; Content-Based; Recommendation System; Weighted Hybrid

1. INTRODUCTION

The tourism sector has emerged as a promising industry for global economic growth. Despite global crises, the tourism sector has shown positive growth since 1950. At that time, the number of tourist arrivals reached 25 million people, which increased to 278 million people in 1980, 528 million people in 1995, and reached 1.1 billion people in 2014 (Ratman, 2016). These data indicate that the tourism industry has the potential to drive economic growth.

In Indonesia, the tourism sector has experienced rapid development. Bali, as one of the famous tourist destinations, is not only popular in Indonesia but also at the regional (Asia) and international levels [1].

In Bali, tourism has become one of the main focus sectors alongside agriculture and small to medium-scale industries. The growth and development of the tourism sector in Bali have significantly contributed to regional development and the local economy. Along with the progress of tourism, there is an increasing abundance of information about Bali. However, this also creates difficulties for tourists in selecting information that aligns with their preferences when planning a visit to the island of Bali [2].

Additionally, the book entitled "Recommender Systems: introduction and Challenges" clarifies that users typically make bad decisions when it comes to selecting information, which is a common consequence of information overload. For instance, people frequently struggle to select the appropriate goods and services on e- commerce sites. Users are overwhelmed by the abundance of information available, which confuses them and causes them to make bad decisions. In a different situation, when it comes to choosing music genres, users frequently fail to identify the genres that match their preferences, which makes listening to the selected genre of songs quickly boring. In facing the challenge of excessive information, a recommendation system can be an effective solution to assist tourists in finding recommendations that suit their needs [3].

A recommendation system is a software technology that provides suggestions for items that align with the user's preferences [4]. Over the years, various approaches have been developed to generate recommendations. The application of recommendation systems is widespread in the fields of music, movies, and commerce. Some examples of applications that use recommendation systems include movie recommendations on MovieLens, music recommendations on Spotify, video recommendations on YouTube, and product recommendations on Shopee [5]–

[8]. Research conducted by Marwa Hussien Mohamed, Mohamed Helmy Khafagy, and Mohamed Hasan Ibrahim explains that recommendation systems use various filtering methods such as Collaborative Filtering (CF), Content- Based (CB), and Hybrid approaches [9]. Previous studies have explored various methods for developing recommendation systems.

Research [10] demonstrates that employing CF technique for food recommendations resulted in a commendable accuracy of 86%. However, CF encounters the challenge of the "Cold Start Problem," where the recommendation system lacks adequate information about similar interest users for new users. Likewise, a separate study [11] employing CF in an e-commerce application also faces the "Cold Start Problem" since user preference data is unavailable for generating recommendations. Conversely, research [12] using the CB approah achieved an impressive accuracy of 85% in recommending hotels. Nonetheless, the CB method relies on classifying users based on product features or content, lacking the ability to comprehend user preferences based on their characteristics. In summary, these studies suggest that CF struggles with finding preferences for new users, while CB methods face limitations in determining user preferences due to a scarcity of information. In contrast,

(2)

combining CF and CB methods, as indicated in research [13] and [14], yielded superior accuracies of 75% and 83.5%, respectively. This hybrid approach not only addresses the "Cold Start Problem" but also considers user preferences based on their rating evaluations.

The authors compare the prior methods with the suggested strategy based on earlier studies that are discussed in paragraph 5. In this particular study, the researchers focused on developing a tourism recommendation system specifically tailored for the Bali region. They employed a Weighted Hybrid technique, which integrates two methods using weighted values, to generate personalized recommendations. This approach takes into consideration the prediction values derived from multiple recommendation system methods and treats them as variables in a linear combination. The methods utilized in the research encompass CF with Singular Value Decomposition (SVD), known for its ability to minimize evaluation results and provide users with relevant recommendations [7]. Additionally, the CB method was employed, incorporating Cosine Similarity and TF-IDF (Term Frequency Inverse Document Frequency) calculations, along with RandomForest algorithm, as they have demonstrated improved accuracy and commendable performance [14]–[16].The predictions obtained through the Weighted Hybrid technique were assessed using evaluation metrics such as MAE (Mean Absolute Error), MSE (Mean Squared Error), and RMSE (Root Mean Squared Error) [17]. The primary aim of the authors was to explore and analyze the effectiveness of the combined approach in comparison to the individual methods. Moreover, the proposed method aimed to surpass the accuracy achieved by CF and CB methods.

2. RESEARCH METHODOLOGY

In this study, the authors developed a hybrid recommendation system by merging singular value decomposition- based Collaborative Filtering techniques with Content-Based techniques employing TF-IDF feature extraction, Cosine Similarity, and RandomForest. This hybrid recommendation system operates in parallel, allowing the two methods to be combined and used at the same time [16]. The TripAdvisor website's WebHarvy application was used to collect the dataset for this investigation through data crawling methods. The WebHarvy application's crawling results are generally still not functioning correctly, so it is important to do data preparation techniques to remove inaccurate data.

2.1 Research Stages

Figure 1. Architectural System Design

The system's workflow is depicted in Figure 1. Initially, data on tourist destinations and user ratings are collected from the TripAdvisor website using data crawling techniques. The system then proceeds with two distinct processes: the Collaborative Filtering method for rating prediction and the Content Based method for rating prediction. Subsequently, the Weighted Hybrid calculation combines the predictions from both methods. Finally, the evaluation phase assesses the accuracy of the Weighted Hybrid approach by computing RMSE, MSE, and MAE values.

2.2 Dataset

The researchers collected data for this study by utilizing crawling techniques with the WebHarvy application on the TripAdvisor website. The main purpose was to gather comprehensive information about tourist destinations in Bali, including ratings, user reviews, and detailed specifics such as place names, categories, and comments. Once the crawling process was concluded, the acquired data was automatically stored in (.csv) format. The dataset successfully accumulated a total of 14,163 reviews, covering 50 distinct tourist spots across Bali. Table 1 exemplifies user rating data, while Table 2 illustrates detailed information on tourist destinations in Bali.

(3)

Table 1. User Ratings Dataset id_review id_place rating

Diffo 2 4.0

Dwiki 5 5.0

Diki 10 3.0

In Table 1, the data contains information regarding the selection of ratings from each user for the places he has visited. The column in the data contains 'id_review', 'id_place', and 'rating'

Table 2. Tourist information dataset

id_place name_place category reviewer

Pantai Pasir Putih 2 Friends I visited this beach 30 years ago, and it trul…

Tanah Lot Temple 5 Family The Tanah Lot Temple is at the northwest side..

Gunung Batur 10 Solo This amazing Mountain in Bali...

In Table 2, the data contains information regarding information from each tourist spot as well as reviews from each user of tourist attractions on the island of Bali. The column in the data contains 'id_place', 'name_place', and 'category', and 'reviewer'

2.3 Preprocessing

Data preprocessing is an initial step in data mining that involves transforming raw data, which is collected from various sources, into cleaner and more usable information for further analysis. The goal of this stage is to improve the quality of the dataset, ensuring that the information derived from it is more refined than the original data. To achieve this, several techniques are applied during data preprocessing. These techniques include: Converting all words to lowercase, Removing special characters and punctuation, Tokenization of data, Stopword removal, Applying lemmatization [18].

2.4 Content Based

In the content-based approach, the system analyzes that users who have rated an item in the past are likely to give similar ratings to similar items in the future. Thus, the relationship between items that have been rated by previous users and other items in the dataset is used to determine the most suitable items for the target user. In other words, the system seeks items that have a strong correlation with the user's profile [9]. To match items with the target user, the techniques of TF-IDF feature extraction and Cosine Similarity are utilized. TF-IDF (Term Frequency Inverse Document Frequency) is a statistical method designed to describe the importance of words in a document within a collection or corpus. In this study, TF-IDF is used to build item profiles in the content-based approach.

The TF formula can be expressed as follows:

𝑡𝑓𝑡,𝑑= ^𝑓^𝑡,𝑑

∑_{𝑡𝑙∈𝑑}𝑓_𝑡𝑙𝑑 (1)

𝑡𝑓𝑡,𝑑 measures how frequently a term appears in a document, with 𝑓𝑡,𝑑 indicating the count of occurrences of term t in document d. The summation ∑_𝑡𝑙∈𝑑𝑓_𝑡𝑙𝑑 represents the overall frequency of term t across all documents.

The inverse document frequency (IDF) formula is represented as follows:

𝑖𝑑𝑓_𝑡= 𝑙𝑜𝑔^𝑁

𝑁_𝑡 (2)

𝑖𝑑𝑓_𝑡 represents the inverse document frequency, where N is the total number of documents,where 𝑁_𝑡 is the number of documents that contain the term t. Therefore, the overall TF-IDF formula can be represented as follows:

𝑡𝑓𝑖𝑑𝑓𝑡,𝑑= 𝑡𝑓𝑡,𝑑 × 𝑖𝑑𝑓𝑡 (3)

The term 𝑡𝑓𝑖𝑑𝑓_𝑡,𝑑 denotes the weight of a term that is produced by multiplying the inverse document frequency (𝑡𝑓_𝑡,𝑑) by the term frequency (𝑖𝑑𝑓_𝑡). This study uses the Cosine Similarity algorithm to determine how similar user-profiles and object profiles are to one another. The Cosine Similarity technique calculates the degree of similarity between any two vectors, including document query vectors, in a collection. The formula for cosine similarity is as follows:

𝑠𝑖𝑚 (𝑝, 𝑞) = ^{𝑝⃗⋅𝑞⃗⃗}

|𝑝⃗|×|𝑞⃗⃗|= ^∑^𝑐∈𝐶^𝑟^𝑐𝑝^𝑟^𝑐𝑞

√∑_𝑐∈𝐶𝑟_𝑐𝑝²×√∑𝑐∈𝐶𝑟_𝑐𝑞²

(4)

The computation of Cosine Similarity yields values ranging from 0 to 1. A value of 0 indicates no similarity between the items, whereas a value of 1 indicates a strong similarity between the items . Subsequently, in this research, the RandomForest algorithm is employed for rating prediction [14], [19].

(4)

2.5 Collaborative Filtering

CF (Collaborative Filtering) is an approach that gathers and analyzes user behavior data such as feedback, ratings, preferences, and activities. Implementing a recommendation system using CF involves several stages. It begins with collecting user ratings on items to construct a user-item rating matrix. This matrix is then used to identify similar users or items through similarity computations. Subsequently, the system predicts ratings for unrated items, and their rankings are based on these predicted scores [9]. The SVD (Singular Value Decomposition) model is employed in this approach, which includes steps like creating the training data and applying the SVD model to it.

SVD breaks down the matrix M into its components, making subsequent computations more manageable. In CF, SVD is primarily used for matrix calculations, where the item-user matrix is central, with users represented in rows and items in columns. The result of SVD comprises three matrices: U, Σ, and Vt, where:

𝑀 = 𝑈 ∑ 𝑉^𝑡 (5)

M represents the matrix to be decomposed, in our case, it is the matrix of all user ratings for tourist attractions. U represents the user feature matrix; users are the ones who have evaluated the tourist attractions. Σ represents the weight diagonal matrix that provides information on how much dimensionality reduction should be applied. Vt represents the item feature matrix; in our case, the items are the tourist attractions, and T represents specific attractions with their ratings. After modeling, the final step is to calculate the predicted ratings for all items that have not been evaluated by users [20].

2.6 Weighted Hybrid

Weighted Hybrid is a hybrid technique that integrates recommendations from two different methods Collaborative Filtering with SVD and Content-Based with Cosine Similarity and RandomForest. It accomplishes this by assigning weights to each method and aggregating the weights of both approaches to produce Hybrid recommendations. It is crucial to emphasize that the hybrid weights employed in the Weighted Hybrid approach must be below 1 to ensure a balanced combination of the methods [16]. The Weighted Hybrid method can be expressed in the following manner:

𝑅_{ℎ𝑦𝑏𝑟𝑖𝑑} = ((1 − 𝑊_𝑎) ∗ 𝑅_𝑐𝑓) + (𝑊_𝑎∗ 𝑅_𝑐𝑏) (6)

In the given equation, 𝑅_{ℎ𝑦𝑏𝑟𝑖𝑑} represents the Hybrid prediction, 𝑊_𝑎 denotes the weight assigned to the Collaborative Filtering and Content-Based methods, while 𝑅_𝑐𝑓 and 𝑅_𝑐𝑏 refer to the rating predictions obtained from Collaborative Filtering and Content-Based approaches, respectively [21].

2.7 Evaluation

In general, many studies on system recommendations have been evaluated using the accuracy of MAE (Mean Absolute Error), MSE (Mean Square Error), and RMSE (Root Mean Square Error). The accuracy value of the generated suggestions increases with a decrease in the MAE, MSE, and RMSE values [16].

The MAE formula can be described as follows;

𝑀𝐴𝐸 = ^∑^𝑁^𝑛=1^|𝑟̂^𝑛^−𝑟^𝑛^|

𝑁 (7)

The MSE formula can be described as follows;

𝑀𝑆𝐸 = ^∑^𝑁^𝑛=1^(𝑟̂^𝑛^−𝑟^𝑛⁾²

𝑁 (8)

The RMAE formula can be described as follows;

𝑅𝑀𝑆𝐸 = ^∑^𝑁^𝑛=1^(𝑟̂^𝑛^−𝑟^𝑛⁾²

𝑁 (9)

In the given statement, 𝑟̂_𝑛 represents the predicted rating, and 𝑟_𝑛 represents the actual rating during testing.

N represents the total number of rating prediction pairs between the testing data and the predicted results [17]

3. RESULT AND DISCUSSION

The data used in this study comes from the TripAdvisor website and is collected through web scraping techniques.

Dataset properties are described in more detail in Table 1 in the research methodology section. The dataset must go through the preprocessing stage in preparation for modeling before testing, as explained in the previous preprocessing stage. After preprocessing, the data is separated into two parts, training data, and testing data, using the 5-fold cross-validation method. The test data is used to assess the effectiveness of the model in predicting rankings, while the training data is used to develop a model that will learn how to make predictions in recommendation systems. At the testing stage, evaluation is done by contrasting the results of the rating predictions tested using the created model. By testing the model using 5-fold cross-validation on each created model

(5)

Collaborative Filtering, Content-based Filtering, and Weighted Hybrid models rating predictions are achieved.

Evaluation measures like MAE, MSE, and RMSE are used to contrast the differences between the original rating and the estimated rating. The purpose of this scenario is to examine the variations in the output of each constructed model. It is feasible to identify which technique yields the best predictions by computing the MAE, MSE, and RMSE values for each model, considering these two assessment criteria. For each model, the cross-validation procedure is run 11,331 times with training data and 2,832 times with testing data. The choice of the best model will be significantly influenced by the accuracy ratings of each model.

3.1 Preprocessing Results

a.) Converting all words to lowercase

Lowercasing refers to the process of converting all letters in a text to lowercase. The main objective is to remove unnecessary differences in the representation of words that have similar meanings. Lowercasing plays a crucial role in text-related algorithms like feature extraction and word frequency calculations, as it enables treating uppercase and lowercase versions of the same word as identical entities. This step is commonly employed as an initial stage in data cleaning. Table 3 showcases the outcomes obtained before and after employing the lowercase transformation.

Table 3. Converting all words to lowercase

Before After

You do not need a guide!! The path is very obvious!!

We send we we’re just going to the temple when asked if we wanted a guide, the temple is half way up then once at the temple we just carried on we were asked multiple times but they soon left us alone! Tried to make us pay entrance fee as well which we didn’t pay we just carried on walking.

you do not need a guide!! the path is very obvious!!

we send we we’re just going to the temple when asked if we wanted a guide, the temple is half way up then once at the temple we just carried on we were asked multiple times but they soon left us alone! tried to make us pay entrance fee as well which we didn’t pay we just carried on walking.

In Table 3 before the lowercase letters were processed, the sentences in the paragraphs still used capital letters. Then after processing lowercase, the sentence changes to all lowercase.

b.) Remove special character and removing punctuation

The process of removing special characters and removing punctuation serves the same purpose as removing punctuation from the text. However, they differ in terms of the specific character types targeted for deletion.

Removing special characters involves removing certain non-alphanumeric characters in the text, including math symbols, emoji, and Unicode characters. On the other hand, omitting punctuation involves removing all punctuation in the text, such as periods, commas, exclamation points, question marks, quotation marks, and other punctuation symbols. By removing special characters and punctuation, the goal is to clean up the text and ensure that only alphanumeric characters remain. This can be useful in preparing texts for further analysis. This process can be done simultaneously with lowercase conversion. The results of applying special character and punctuation removal can be observed in Table 4.

Table 4. Remove special character and removing punctuation

Best experience ever!!! The guide Mr. Ari was really nice. Not only expert as a guide but also perfect photographer, he takes us a lot of beautiful photos. For those who are afraid of climbing up/down the slippery slopes, He would hold ur hand and help u. There were 4 stops during the climb up. Before the climb, he would give out torch lights to help see our steps. When reached the top he provide really nice simple light breakfast. Tiring but the experiences and views were soooo worth it. One of the activities that must be included in the bucket list when you are on vacation in Bali. If you need some information about it, you can ask our guide Mr. Ari at #TripInBali

best experience ever the guide mr ari was really nice not only expert as a guide but also perfect photographer he takes us a lot of beautiful photos for those who are afraid of climbing updown the slippery slopes he would hold ur hand and help u there were stops during the climb up before the climb he would give out torch lights to help see our steps when reached the top he provide really nice simple light breakfast tiring but the experiences and views were soooo worth it one of the activities that must be included in the bucket list when you are on vacation in bali if you need some information about it you can ask our guide mr ari at tripinbali

In Table 4 before the process of removing special characters and punctuation of sentences in the paragraph above, there were still punctuation marks such as exclamation points, periods, commas, and hashtag symbols.

Then after the special character removal process is carried out, the punctuation marks or hashtag symbols are removed.

c.) Tokenization

(6)

Tokenization is a method used to break down text into smaller units known as tokens. These tokens can represent meaningful components within the text, such as words, phrases, or individual characters. The primary objective of tokenization is to simplify subsequent text-processing tasks, such as sentiment analysis, language modeling, or information retrieval. Tokenization is commonly carried out after performing lowercase conversion and removing special characters and punctuation. Table 5 showcases the outcomes observed before and after implementing the tokenization process.

Table 5. Tokenization

Amazing sunrise in Bali, one of the most beautiful sunrise in the world! It takes around 2 hours climbing the mountain, I think that it was one of the most incredible experience of my life, I recommend you to do it.

['amazing', 'sunrise', 'in', 'bali', 'one', 'of', 'the', 'most', 'beautiful', 'sunrises', 'in', 'the', 'world', 'It', 'takes', 'around', 'hours', 'climbing', 'the', 'mountain', 'i', 'think', 'that', 'it', 'was', 'one', 'of', 'the', 'most', 'incredible', 'experience', 'of', 'my', 'life', 'i', 'recommend', 'you', 'to', 'do', 'it']

In Table 5 before the tokenization process, the sentences in the paragraph were still in the form of sentences in general. Then after the tokenization process is carried out, the sentence is split into words in the form of an array.

d.) Stopword removal

Stopword removal is the procedure of eliminating commonly used and high-frequency words from a text. This process aims to discard irrelevant words and prioritize those that carry more significant meaning, thus improving the quality of text analysis. Furthermore, by removing stopwords, the dataset size can be reduced, which helps optimize memory usage during processing. This step is usually conducted after tokenization. The outcomes of applying stopword removal before and after can be observed in Table 6.

Table 6. Stopword Removal

['amazing', 'sunrise', 'in', 'bali', 'one', 'of', 'the', 'most', 'beautiful', 'sunrises', 'in', 'the', 'world', 'It', 'takes', 'around', 'hours', 'climbing', 'the', 'mountain', 'i', 'think', 'that', 'it', 'was', 'one', 'of', 'the', 'most', 'incredible', 'experience', 'of', 'my', 'life', 'i', 'recommend', 'you', 'to', 'do', 'it']

['amazing', 'sunrise', 'bali', 'most', 'beautiful', 'sunrises', 'world', 'takes', 'around', 'hours', 'climbing', 'mountain', 'think', 'most', 'incredible', 'experience', 'life', 'recommend']

In Table 6, before the stopword removal process was carried out, the sentences in the paragraph still contained words that were often used, such as 'in', 'of', 'the', 'it', 'i', 'that', 'was', and one' '. Then after the stopword removal process, the default word is removed.

e.) Applying lemmatization

Lemmatization is a text-processing technique that aims to convert words into their base or canonical form. Its primary goal is to reduce word variation and ensure a consistent representation for better analysis. By transforming words into their base forms, lemmatization helps to capture the intrinsic meaning of words and improve the overall comprehension of the text. It is often considered the final step in the text-processing pipeline. Table 7 demonstrates the outcomes before and after implementing the lemmatization process.

Table 7. Applying lemmatization

['amazing', 'sunrise', 'bali', 'most', 'beautiful', 'sunrises', 'world', 'takes', 'around', 'hours', 'climbing', 'mountain', 'think', 'most', 'incredible', 'experience', 'life', 'recommend']

['amaze', 'sunrise', 'bali', 'most', 'beautiful', 'sunrise', 'world', 'take', 'around', 'hour', 'climb', 'mountain', 'think', 'most', 'incredible', 'experience', 'life', 'recommend']

In Table 7 before the lemmatization process was carried out, the sentences in the paragraphs still contained standard words such as 'sunrises', 'takes', 'hours', and 'climbing'. Then after the lemmatization process was carried out, the standard words were changed to 'sunrise', 'take', 'hour', and 'climb'.

Table 8. Converting Tokenized data to Sentences

['amaze', 'sunrise', 'bali', 'most', 'beautiful', 'sunrise', 'world', 'take', 'around', 'hour', 'climb', 'mountain', 'think', 'most', 'incredible', 'experience', 'life', 'recommend']

amaze sunrise bali most beautiful sunrise world take around hour climb mountain think most incredible experience life recommend

(7)

Before proceeding with the data modeling process, the preprocessed data, which is in token form, needs to be converted into regular sentences. This step is necessary to transform the tokenized data back into a coherent sentence structure for further analysis. Table 8 provides the results obtained before and after the conversion to regular sentences.

3.2 Experiment Results

In Collaborative Filtering testing, the SVD model is used to predict ratings based on 'id_review' and 'id_place' data. This model utilizes rating patterns found by other users to generate predictions. On the other hand, in Content- Based testing, feature extraction using TF-IDF is performed to represent information from the place name, category, and user comments. By using Cosine Similarity calculation and the RandomForest model, rating predictions are built based on similarity with the training data. Furthermore, in Weighted Hybrid testing, rating predictions combine the results from Collaborative Filtering and Content-Based Filtering methods. Using a formula (5) and assigning a weight of 0.4, 0.5, and 0.7. The Hybrid rating prediction is calculated by combining predictions from the two previous methods. Metrics like MAE (Mean Absolute Error), MSE (Mean Squared Error), and RMSE (Root Mean Squared Error) are employed to assess the rating predictions. These measurements reveal the degree to which the rating forecasts correspond to the actual values. Insights into the precision and caliber of the produced models can be obtained by contrasting the predictions with the true values in the testing data. The graphic displays the results of the matrix examination.

a.) In the first experiment, a test was carried out with a Weight Hybrid weight value of 0.4.

Figure 2. Evaluation MAE Figure 3. Average MAE

In Figure 2, it can be observed that the Weighted Hybrid method achieves the lowest MAE values in each fold compared to Collaborative Filtering and Content-Based methods. Furthermore, Figure 3 shows that the average MAE of the Weighted Hybrid method is the lowest, with a value of 0.4854, while Collaborative Filtering has a value of 0.5083, and Content-Based has a value of 0.6936.

Figure 4. Evaluation MSE Figure 5. Average MSE

In Figure 4, it can be observed that the Weighted Hybrid method achieves the lowest MSE values in each fold compared to Collaborative Filtering and Content-Based methods. Furthermore, Figure 5 shows that the average MSE of the Weighted Hybrid method is the lowest, with a value of 0.4034, while Collaborative Filtering has a value of 0.4293, and Content-Based has a value of 0.7894.

(8)

Figure 6. Evaluation RMSE Figure 7. Average RMSE

In Figure 6, it can be observed that the Weighted Hybrid method achieves the lowest MSE values in each fold compared to Collaborative Filtering and Content-Based methods. Furthermore, Figure 7 shows that the average MSE of the Weighted Hybrid method is the lowest, with a value of 0.6351, while Collaborative Filtering has a value of 0.6543, and Content-Based has a value of 0.8884.

b.) In the second experiment, a test was carried out with a Weight Hybrid weight value of 0.5.

Figure 8. Evaluation MAE Figure 9. Average MAE

In Figure 8, the Weight Hybrid method achieves an unstable MAE value because sometimes the value goes down and up at each fold compared to the Collaborative Filtering method. Furthermore, in Figure 9, it shows that the average MAE of the Weight Hybrid method is not much different from the value of Collaborative Filtering, namely with a value of 0.5021 while Collaborative Filtering gets a value of 0.5039 and Content- Based of 0.6915.

In Figure 10, the Weight Hybrid method achieves a poor score compared to the Collaborative Filtering method for each fold and is still better than the Content-Based. Furthermore, in Figure 11 the average MSE of

(9)

the Weight Hybrid method is in the middle of the two methods which have been compared with a value of 0.4321 while Collaborative Filtering has the lowest value, namely 0.4206 and Content-Based has the highest value. value 0.7840.

Figure 12. Evaluation RMSE Figure 13. Average RMSE

In Figure 12, the Weight Hybrid method achieves a poor score compared to the Collaborative Filtering method for each fold and is still better than the Content-Based. Furthermore, in Figure 13 the average RMSE of the Weight Hybrid method is in the middle of the two methods which have been compared with a value of 0.6573 while Collaborative Filtering has the lowest value of 0.6485 and Content-Based has the highest value.

value 0.8854.

c.) In the third experiment, a test was carried out with a Weight Hybrid weight value of 0.7.

Figure 14. Evaluation MAE Figure 15. Average MAE

In Figure 14, the Weight Hybrid method achieves almost the same MAE value as the Content-Based method in each fold, and the Collaborative Filtering method is much better than the other two methods. Furthermore, in Figure 15 the average MAE value of the Collaborative Filtering method reaches the lowest value with a value of 0.5052 while the Weighted Hybrid value reaches 0.6852 and Content-Based reaches a value of 0.6923.

(10)

In Figure 16, the Weight Hybrid method achieves a poor MSE value compared to the Collaborative Filtering method at each fold and is still better than the Content-Based method. Furthermore, in Figure 17 the average MSE value of the Collaborative Filtering method reaches the lowest value with a value of 0.4240 while the Weighted Hybrid value reaches 0.6626 and Content-Based reaches a value of 0.7871.

In Figure 18, the Weight Hybrid method achieves a poor RMSE value compared to the Collaborative Filtering method at each fold and is still better than the Content-Based method. Furthermore, in Figure 19 the average RMSE value of the Collaborative Filtering method reaches the lowest value with a value of 0.6510 while the Weighted Hybrid value reaches 0.8140 and Content-Based reaches a value of 0.8869.

In the tests that have been carried out, in the test with a weight value of 0.4, the weighted hybrid technique consistently achieves the lowest loss rate when compared to the Collaborative Filtering and Content-Based methods, according to the findings of MAE, MSE, and RMSE carried out on the model being evaluated.

Whereas in tests with weight values of 0.5 and 0.7, the Collaborative Filtering method is still better than the Weight Hybrid method. Therefore, it can be said that the Weight Hybrid approach has better performance if it is given a weight of 0.4.

4. CONCLUSION

In this work, the authors compare the suggested Weighted Hybrid method with the previous approach in the context of a recommendation system for travel. The study primarily focuses on the island of Bali and uses data from the TripAdvisor website. Collaborative filtering and content-based methods are combined in a weighted hybrid approach. The evaluation metrics MAE (Mean Absolute Error), MSE (Mean Square Error), and RMSE (Root Mean Square Error) were used to compare results. The Weighted Hybrid technique consistently has the lowest scores in evaluation findings, with MAE, MSE, and RMSE values of 0.4854, 0.4034, and 0.6351 respectively in tests with a weight value of 0.4. Reduced loss in the recommendation system is indicated by the lower value of this scoring metric, indicating increased performance and accuracy. Therefore, it can be said that the Weighted Hybrid technique improves the performance and accuracy of the recommendation system significantly.

REFERENCES

[1] V. Paramarta, R. Roro Vemmi Kesuma Dewi, F. Rahmanita, S. Hidayati, and D. Sunarsi, “Halal Tourism in Indonesia:

Regional Regulation and Indonesian Ulama Council Perspective,”, vol 10, pp. 497-505, Feb. 2021, doi: 10.6000/1929- 4409.2021.10.58.

[2] A.A.A Ribeka Martha Purwahita, Putu Bagus Wisnu Wardhana, I Ketut Ardiasa, and I Made Winia, “Dampak Covid-19 terhadap Pariwisata Bali Ditinjau dari Sektor Sosial, Ekonomi, dan Lingkungan (Sebuah Tinjauan Pustaka),” Jurnal Kajian dan Terapan Pariwisata, vol. 1, no. 2, pp. 68–80, May. 2021, doi: 10.53356/diparojs.v1i2.29.

[3] F. Ricci, B. Shapira, and L. Rokach, “Recommender systems: Introduction and challenges,” in Recommender Systems Handbook, Second Edition, Springer US, 2015, pp. 1–34. doi: 10.1007/978-1-4899-7637-6_1.

[4] J. Cristy Patty, E. Thea Kirana, M. Sandra Diamond Khrismayanti Giri, M. Teknik Informatika, and U. Atma Jaya Yogyakarta, “Recommendations System for Purchase of Cosmetics Using Content-Based Filtering,” International Journal of Computer Engineering and Information Technology, vol. 10, no. 1, pp. 1–5, Jan. 2018, [Online]. Available:

www.google.com

[5] A. Paullier and R. Sotelo, “A recommender systems’ algorithm evaluation using the lenskit library and movielens databases,” in IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, BMSB, IEEE Computer Society, pp. 1-7, Oct. 2020. doi: 10.1109/BMSB49480.2020.9379914.

[6] F. Fessahaye et al., “T-RECSYS: A Novel Music Recommendation System Using Deep Learning.”, pp. 1-6, Jan. 2019, doi: 10.1109/ICCE.2019.8662028

(11)

[7] R. Singh and S. Vijaykumar, “A Recommender System for YouTube Video based on deep neural network,” International Journal of Computer Sciences and Engineering, vol. 7, no. 6, pp. 160–163, Jun. 2019, doi: 10.26438/ijcse/v7i6.160163.

[8] R. H. #1 and Z. K. A. B. #2, “E-Commerce Recommender System on the Shopee Platform Using Apriori Algorithm”, vol.

7, no. 2, pp. 53-64, Aug. 2022, doi: 10.34818/indojc.2022.7.2.650.

[9] M. H. Mohamed, M. H. Khafagy, and M. H. Ibrahim, “Recommender Systems Challenges and Solutions Survey,” in Proceedings of 2019 International Conference on Innovative Trends in Computer Engineering, ITCE 2019, Institute of Electrical and Electronics Engineers Inc., Feb. 2019, pp. 149–155. doi: 10.1109/ITCE.2019.8646645.

[10] N. Rajabpour, A. Mohammadighavam, A. Naserasadi, and M. Estilayee, “TFR: A Tourist Food Recommender System based on Collaborative Filtering,” Int J Comput Appl, vol. 181, no. 11, pp. 30–39, Aug. 2018, doi:

10.5120/ijca2018917695.

[11] L. Jiang, Y. Cheng, L. Yang, J. Li, H. Yan, and X. Wang, “A trust-based collaborative filtering algorithm for E-commerce recommendation system,” J Ambient Intell Humaniz Comput, vol. 10, no. 8, pp. 3023–3034, Aug. 2019, doi:

10.1007/s12652-018-0928-7.

[12] K. Wahyudi, J. Latupapua, R. Chandra, and A. S. Girsang, “Hotel content-based recommendation system,” in Journal of Physics: Conference Series, Institute of Physics Publishing, vol. 1485, no. 1, May. 2020. doi: 10.1088/1742- 6596/1485/1/012017.

[13] J. Singh, “Collaborative filtering based hybrid music recommendation system,” in Proceedings of the 3rd International Conference on Intelligent Sustainable Systems, ICISS 2020, Institute of Electrical and Electronics Engineers Inc., Dec.

2020, pp. 186–190. doi: 10.1109/ICISS49785.2020.9315913.

[14] A. Melese, “Food and Restaurant Recommendation System Using Hybrid Filtering Mechanism,” Monthly Journal by TWASP, vol. 4, no. 4, pp. 268–281, Apr. 2021, doi: 10.5281/zenodo.4712849.

[15] I. K. G. Aryadi Pramarta and Z. K. A. Baizal, “HYBRID RECOMMENDER SYSTEM USING SINGULAR VALUE DECOMPOSITION AND SUPPORT VECTOR MACHINE IN BALI TOURISM.”, vol. 7, no. 2, pp. 408-418, Jun. 2022, doi: 10.29100/jipi.v7i2.2770

[16] Y. Amri, A. #1, Z. K. A. B. #2, A. Toto, and W. #3, “Tourism Recommender System using Weighted Parallel Hybrid Method with Singular Value Decomposition”, vol. 6, no 2, pp. 53-64, Sep. 2021, doi: 10.34818/indojc.2021.6.2.579.

[17] W. Wang and Y. Lu, “Analysis of the Mean Absolute Error (MAE) and the Root Mean Square Error (RMSE) in Assessing Rounding Model,” in IOP Conference Series: Materials Science and Engineering, Institute of Physics Publishing, vol.

324, no. 1, Apr. 2018. doi: 10.1088/1757-899X/324/1/012049.

[18] D. Sarkar, Text Analytics with Python. Apress, 2019, pp. 1-674, doi: 10.1007/978-1-4842-4354-1.

[19] G. Yunanda, D. Nurjanah, and S. Meliana, “Recommendation System from Microsoft News Data using TF-IDF and Cosine Similarity Methods,” Building of Informatics, Technology and Science (BITS), vol. 4, no. 1, pp. 277-284, Jun.

2022, doi: 10.47065/bits.v4i1.1670.

[20] N. Ifada, T. F. Rahman, and M. K. Sophan, “Comparing collaborative filtering and hybrid based approaches for movie recommendation,” in Proceeding - 6th Information Technology International Seminar, ITIS 2020, Institute of Electrical and Electronics Engineers Inc., Oct. 2020, pp. 219–223. doi: 10.1109/ITIS50118.2020.9321014.

[21] Hong-Quan Do, Tuan-Hiep Le, Byeongnam Yoon, “Dynamic Weighted Hybrid Recommender Systems”. In 2020 22nd International Conference on Advanced Communication Technology (ICACT), pp. 644-650, Feb. 2020, doi:

10.23919/ICACT48636.2020.9061465