Implementation Graph Sampling and Aggregation (GraphSAGE) Method for Job Recommendation System

(1)

Implementation Graph Sampling and Aggregation (GraphSAGE) Method for Job Recommendation System

Dewa Made Wijaya, Kemas Rahmat Saleh Wiharja^* School of Computing, Informatics, Telkom University, Bandung, Indonesia

Email: ¹[email protected], ^2,*[email protected] Correspondence Author Email: [email protected]

Abstract−Finding job is currently a challenge, especially for final-year students. Career Development Centre (CDC) is a service that is provided by a university for its students. However, a more sophisticated system is needed that not only provides job information but provides job recommendations based on their interests, skills, and experience. Developing a GraphSAGE- based job recommendation system can help provide suitable jobs according to user preferences. GraphSAGE works by embedding nodes or feature vectors at each node or node in a graph. GraphSAGE aggregates information from neighbouring nodes and propagates that information using different model layers. By combining the feature information of each node, the resulting representation can be richer in information and also more accurate. The development of the GraphSAGE system uses a dataset from the "Job Recommendation Challenge" from Kaggle which consists of 3 data, namely job data, user dataset, and applicant dataset. This study also uses GAT to provide a value or weight for each node before GraphSAGE process the graph.

Based on experimental results, this GraphSAGE model has an accuracy value of 97.5% and this value is 13% greater than its comparison, namely FNN (Feedforward Neural Network) commonly used at tabular dataset. This comparison helps us know that which the best model we have to use to the dataset. The model also tested on the Movie dataset, Food dataset, and Epinions dataset.

Keywords: Recommendations; Jobs; GraphSAGE; Embedding; Graph Attention Network; Feedforward Neural Network.

1. INTRODUCTION

The rapid development of Information Technology, especially web technology, has caused major changes in the process of searching for job information by final-year students, fresh graduates, or alumni who have just graduated in the range of 0 - 2 years at a university. Finding job information is a very important thing to pay attention to in the current era. In February 2016, the number of unemployed university graduates in Indonesia reached more than 695 thousand people, which increased by 20% compared to the previous year [1]. To get job-related information, students prefer to search for job information online. Career Development Centre better known as CDC is one of the most widely used platforms in a university because it is a resource provided by the university to assist students in developing their careers after graduating from the university. The way this CDC works is by providing several available jobs where users, in this case students, can later choose the job according to their wishes. These jobs have certainly been considered in advance by the university to be uploaded on the CDC platform.

In practice, the CDC is just an ordinary website that provides job vacancy information. Of course, browsing through many jobs is time-consuming coupled with the doubt that arises whether the job matches the student's interest or not because after all, browsing through thousands of jobs to find a few relevant jobs can be a tedious task for many applicants [2]. Both final-year students and fresh graduates want a job that matches their interests, skills, and experience. Not only that, from the job provider's point of view, it will also raise doubts and question marks whether the applicant is a suitable candidate so that screening must be carried out

Therefore, a system that can efficiently recommend jobs that help users find the right job is needed. With the recommender system, it can be a solution when users get too much or excessive information [3]. Problems from the job provider side can also be handled because, with this recommendation system, job provider companies do not need to carry out excessive screening processes on candidates. The recommendation system will utilize user information such as user data, occupation, and interest or experience [4]. So that users can find suitable jobs without having to choose from thousands of available jobs.

Previously, there have been platforms that provide job recommendation systems. The purpose of implementing a recommendation system is to provide personalized results or outputs for each user [5]. To obtain appropriate results and outputs, recommendation systems use information about the user, such as demographics or personal preferences [6]. The system provides recommendations using an algorithm that collects data from the user's profile, activities, and job search preferences [7]. The system also provides a job search feature that allows users to find jobs that match their qualifications and interests as well as a feature to recommend or provide references for others in the user's network [8]. However, this system works very broadly and is not focused on job vacancy management for alumni of a university. Therefore, it is necessary to develop a local recommendation system for a college which will make it easier for fresh graduates and alumni to find relevant jobs. In its development, the recommendation system uses several approaches such as collaborative filtering, content-based filtering, hybrid filtering, and knowledge-based filtering [9]. However, some approaches can be used in recommendation systems such as graph-based approaches that utilize user and item correlations [10].

The concept of each approach is actually the same, namely predicting an item that has never interacted with the user.

(2)

In 2021, Hanzhong Zhang, et al [11] used FNN to predict tabular data and obtained an accuracy of 88.72%.

which shows that this model is quite good at predicting compared to other methods, such as artificial neural networks (ANN).

In previous research, FNN was also tested using data containing video information on a movie to provide predictions for a recommendation system, using the Feed-Forward Neural Network as a model, the F1-Score result was 0.265 [12]. Although it has a value that can be said to be good, the method in this study only considers content- based features and does not take into account other factors such as user preferences or social context. In addition, this research only focuses on predicting videos based on their visual features and does not consider other types of content such as audio or text.

On the other hand, according to research conducted by Keyulu Xu, et al [13] states that Graph Neural Network is better than FNN when working on new data. because GNN has the ability in larger and complex data.

GNN can also extrapolate to more difficult algorithmic tasks. This means that GNN is better at predicting values outside of the data. In recommendation systems, this serves to predict outcomes or values that are not only within the given dataset.

From previous research, it shows that using graph neural networks will provide better prediction results in a recommendation system [14]. The way GNN works is by iterating on each node and then combining feature information and used to update the node representation [15]. Graph Neural Network has several models and one of them is GraphSAGE [4]. In job recommendation systems, GraphSAGE can be used to learn the representation of job candidates and provide job recommendations that match the user's preferences and skills [16]. With the existence of a job recommendation system using GraphSAGE, users can get job recommendations that are more targeted by their interests, expertise, and experience, to increase their chances of getting the desired job [2].To see the comparison of prediction results from FNN and GraphSAGE, in this study we used several datasets such as job recommendation dataset, Food dataset, Movie dataset, and also Epinions dataset.

2. RESEARCH METHODOLOGY

2.1 System Design

The research employs a system design as depicted in Figure 1. The recommendation system begins by inputting a dataset from the Kaggle data bank, originating from a job platform capturing 13 weeks’ worth of user activity. The dataset undergoes feature selection and preprocessing to prepare it for processing by the GraphSAGE model. This iterative process continues until meeting predefined stopping criteria. Following that, the GraphSAGE model is tested using data testing to generate predictions or recommendations.

Figure 1. System Flowchart 2.2 Dataset

The dataset used is a dataset from the Kaggle data bank. This dataset is called "Job Recommendation Dataset"

which consists of many files, but only three are used in this research, namely the user dataset Table 1, popular job dataset Table 2, and applicant dataset Table 3.

Table 1. User Dataset ID City Stat

e

Degree Type

Major Gradu Date

Wor k Hist Cou

nt

Tota l Year

s Exp

Curre nt Emplo

y

Manage d Others

Manage d How Many

47 Paramou nt

CA High

School

NaN 1999-

06-01

3 10.0 Yes No 0

(3)

ID City Stat e

Degree Type

Major Gradu Date

Wor k Hist Cou

nt

Tota l Year

s Exp

Curre nt Emplo

y

Manage d Others

Manage d How Many

00:00:0 0 72 La Mesa CA Master’s Anthropolo

gy

2011- 01-01 00:00:0

0

10 8.0 Yes No 0

98 Astoria NY Master’s Journalism 2007- 05-01 00:00:0

0

3 3.0 Yes No 0

123 Baton Rouge

LA Bachelor

’s Agricultura l

2011- 05-01 00:00:0

0

1 9.0 Yes No 0

998 1

Leesville SC None NaN NaN 2 2.0 No No 0

Table 2. Popular Job Dataset

User ID Job ID

767 299322 602736 1077691 625281 898514 1035141 65...

769 340536 28811 305416 970973 765210 114951 90808...

861 384120 1096879 23663 1003185 787775 47052 9343...

1006 219766 379509 840434 691160 813087 751038 2980...

6451 870988 642392 930929 642393 930930 42558 93135...

Table 3. Applicant Dataset User ID Job ID

47 284009

47 169528

9999 879278 9999 642316 9999 608463 2.3 Preprocessing

Preprocessing is the process of converting raw data into data that is ready to be processed at a later stage [17]. At this stage, data cleaning will be done to remove or replace values in empty rows, duplicates, or even missing values in the dataset. In addition, a data transformation process will also be carried out in order to change the format or type of data. Feature Selection also needed to choose relevant columns in the dataset [18]. This process is done using the panda’s library in the python programming language. Pandas is one of the libraries in the python programming language that provides many functions to manipulate data. In the user dataset, columns are selected that will be used for the next process, and also a selection of values that can be used for the next process, such as in the graduate column several values cannot be used for the next process. Then the applicant dataset will be checked for which users have applied for jobs and this user data will be used for further analysis. Because some user information and some history of users who have applied for jobs are needed. Because to be able to produce an output on the recommendation system must make changes to data that is categorical to numeric. This process converts categorical data into numeric by assigning a unique number to each categorical data [19]. Each unique value represents each categorical data in the dataset without changing the meaning and intent of the data shown in Table 4. Not only that, preprocessing on the graduation date column format which previously contained to much information was changed to data which only contained year. Then the popular job data is partitioned in each space into data that is easier to process in graph formatting later that shown in Table 5. This is used to find out what jobs a user has applied for and is displayed with each job applied for in one line. And if there is one user applying for many jobs, then the user will be displayed several times in the table row with each job applied for.

Table 4. Encode Result

0 1 2 3 4 5 6 7 8 9 10

72 316.0 4.0 3.0 24.0 10.0 8.0 1.0 0.0 0.0 2011.0

(4)

0 1 2 3 4 5 6 7 8 9 10

999 519.0 21.0 3.0 353.0 2.0 4.0 0.0 0.0 0.0 2007.0

From the two examples given it shows that users with IDs 72 and 999 with some attributes attached to them were encoded but did not change the meaning or attributes of each user. This is done for every user.

Table 5. Popular Job Dataset After Preprocessing

User ID Job ID

767 299322

767 602736

8729 363408

8729 1088564

2.4 Graph Format

Knowledge graph is a heterogeneous graph consisting of nodes and edges that represent entities and relationships between entities [20,21,22]. Knowledge graph is used to represent semantic relationships between items in the form of a graph and then this representation is then used to enrich the information used by the recommendation model [4]. Based on the nodes, there are two types of knowledge graph, namely homogeneous and heterogeneous.

Homogeneous knowledge graph is a graph that is assigned to only one type, for example, the node is only an

"account" and will form the same edges on each account node. While Heterogeneous is a graph that has different nodes, for example there are several types of nodes such as "Account" and also "work". While Heterogeneous is a graph that has different nodes, for example there are several types of nodes such as "Account" and also "work"

[23]. In the development of recommendation systems, what is often used is the heterogeneous type. An example of simple edges and nodes of a knowledge graph for a recommendation system is a node "User1" can be associated with the node "Job" using the edge. The example of graph format that we used in the next process shown in Figure 2. This Graph format is the transformation from the CSV file. There is actually no special library used to change this format, but the csv file is divided into targets and sources to be able to describe the graph. And to see the visualization using the library called networkx.

Figure 2. Graph Format 2.5 Graph Attention

Graph Attention is a mechanism used to weight or calculate the weight of how important a node is in the graph [24]. GAT is usually applied to inductive learning problems where the model generalizes to a graph that has never been seen before [25]. In GAT, there are nodes that are calculated using the attention function by taking into account the similarity between the feature vector of the node and the feature vector of each neighboring node [26].

To perform a richer representation of nodes based on the interactions that exist on each node in the graph. Graph Attention has several parameters such as "units" which determines the dimension of the output, then there are

"heads" which determine the amount of attention used in this study. By using GAT, the model will assign weights to the neighbors of each node and this will be useful to improve the model's ability to understand and utilize the graph structure. In this model, the TensorFlow library is used, which is a common library used in deep learning.

2.6 Graph Neural Network

Graph Neural Network is a neural network used to process graph data that is specifically designed to process data structured in the form of graphs, where the vertices in the graph are connected to each other through edges [27,28].

In recommendation systems, GNN can be used to learn the representation of nodes in the knowledge graph or the relationship between items and users so as to provide more personalized and relevant recommendations [4]. GNN is widely used because it has several advantages over neural networks which are considered less optimal such as the ability to learn more complex node representations, the ability to work with large graph data, the ability to

(5)

handle various types of tasks on graph data, the ability to improve the performance of recommendation systems [28]. In addition, GNN can also retrieve information from the vertices and edges in the graph which will be an important part of building a graph representation and then this representation can be used to provide outputs such as recommendations to users [28]. This representation can also be used for additional features in the system's recommendation model that can improve recommendation performance. GNN works by learning nodes which collect node information from neighbors and then combine it with their own nodes [4]. This process is done iteratively at each GNN layer according to a predetermined limit. There are several GNN 9 techniques commonly used, namely GraphSAGE, GCN, Directed/Undirected Graph, Hypergraph. To build these models in the code implementation, TensorFlow with tf.keras is used to build and train neural networks models.

2.7 GraphSAGE

GraphSAGE is a GNN model consisting of several layers which works by retrieving information from neighboring nodes and aggregating information from these neighbors [16]. In GraphSAGE, each node is represented by a feature vector and this feature vector is updated at each layer [4]. The algorithm in GraphSAGE uses K aggregator functions that are used to collect neighbor node information AGGREGATE k, ∀ k ∈ {1,...,K }). And also a set of weight matrices W k, ∀ k ∈{ 1,...,K } that are used for spreading between model layers. In summary, this algorithm generates node embedding by incorporating information from neighboring nodes and propagating this information through different model layers using weight matrices. GraphSAGE works at a certain value of K which corresponds to the aggregation equation. It first samples the neighbors randomly. After that, it combines or aggregates the features of the neighbor information to update the vertex representation at each layer. By combining the feature information of each vertex, it can make the resulting representation richer in information and also more accurate. Then in the final stage, the representation generated at each layer is used to predict the relationship between vertices in the graph. This process shown in Figure 3.

Figure 3. Algorithm on GraphSAGE

In Figure 3, the blue colored node represents the training node while the red colored node represents the test node.

This is the inductive learning that works in embedding the test node based on the embedding of the training node.

figure 1 provides a visualization of inductive learning on large graphs and how GraphSAGE is used to generate node embeddings that predict vertex labels on graphs that have never been seen before. However, for more details on this process, see Algorithm 1.

Algorithm 1: GraphSAGE embedding generation (i.e., forward propagation) algorithm Input

Output :

:

Graph 𝒢(𝒱, ℰ); input features { X_υ , ∀υ ∈ 𝒱}; depth K ; weight matrices W^k, ∀k ∈ {1 , …, K }; non-linearity σ; differentiable aggregator functions AGGREGATEk , ∀ k ∈ {1, ..., K}; neighborhood function 𝒩 ∶ υ → 2^𝒱 Vektor representations z_υ for all υ ∈ 𝒱

1 h_υ⁰ ⟵ X_υ , ∀υ ∈ 𝒱 ; 2 for k = 1 . . . K do 3 for υ ∈ 𝒱 do

4 𝐡_{𝒩 (υ)}^𝐤 ⟵ AGGREGATEk ({𝐡_u^k−1 , ∀u ∈ 𝒩 (υ ) });

5 𝐡_υ^k ⟵ σ (𝐖^k . CONCAT(𝐡_u^k−1 , 𝐡_{𝒩 (υ )}^k ) ) 6 end

7 𝐡_u^k ⟵ 𝐡_u^k/ ||𝐡_u^k ||₂ , ∀υ ∈ 𝒱 8 end

9 𝐳_υ ⟵ 𝐡_u^𝐊 , ∀υ ∈ 𝒱

In Algorithm 1 there are three stages namely sampling, aggregation, and update. The first is initialization which involves the initial representation of node (h) using input feature (x). And this used during model construction. Here the initial representation of node (h) using the input feature (x). Then next iterate to update the

(6)

node representation at depth (k). each iteration will consider information from neighboring nodes in the previous aggregation. the results of this stage will continue with aggregation, Attention layers also can used to calculate the attention score and the results will be combined by taking the average, using this attention layer will represent the aggregation step. The last step is to update by adding back to the node representation (h). This step calculating x

= attention_layer([x, edges]) + x to update the node representation.

2.8 Stopping Criteria

Stopping criteria are criteria used to stop the training process on the model according to a certain value limit [29].

In this study, we used several stopping criteria such as Validation Loss Stop, Accuracy Stop. We also use an evaluation like F1-Score, to calculate the F1-Score we also used Precision and Recall. This evaluation used to determine whether the model built is good enough to predict or need to be trained again.

a. Validation Loss Stop

Stopping training when the loss in the validation data no longer shows a decrease or even starts to increase and is implemented by using callbacks such as EarlyStopping that observe the metrics in the validation data.

b. Accuracy Stop

Stop training when the accuracy on the validation data no longer increases. The formula for Accuracy shown in equation (1). The accuracy used to see how accurate the model can predict and classify the recommendation result.

Accuracy = ^TP+TN

TP+FP+TN+FN (1) c. F1-Score

F1-score is used to calculate the performance of the model on classification, especially on the imbalance in the class to be classified. To calculate F1-Score, Precision and Recall are used. Recall provides information on how well the model captures all true positive cases and how few positive cases are missed shown in equation (2). And precision used to provide information on how good the model is at identifying positive cases and how little the model gives positive results that are actually negative that shown in equation (3). Then it will calculate the overall score that reflects the balance between precision and recall, the result of this calculation is called the F1-Score which is shown in equation (4).

Precision = ^TP

TP+FP (2) Recall = ^TP

TP+FN (3) F1 − Score = 2 x Precision x Recall

Precision+Recall (4)

3. RESULT AND DISCUSSION

This research consists of a training process and also a testing process on the model. The model that will be used in this research is GraphSAGE as a model that works on graph data and with its comparison is a model that works on tabular data, namely FNN (Feedforward Neural Network). These two models will be tested using the Job recommendation challenge dataset which will take the number of users as much as 50000. The model will predict whether a job is suitable for a particular user. The suitability of the predicted job depends on how the attributes of the user are with the applied job. To test whether the model used in this study provides optimal results, we also tested on several datasets that have different amounts of data, number of attributes, and also the number of predicted values. This will determine and provide insight that whether a model that has a good accuracy or evaluation value will also work well on other different datasets.

3.1 Result And Analysis of Training the GraphSAGE Model

By changing the format of the dataset from tabular to graph dataset, the graph results are obtained as shown in Figure 4.

Figure 4. Graph User to Job

(7)

Figure 4 indicates that there are users who apply for a job, the number of edges depends on how many users apply for a job and how many jobs are applied for by several users and the yellow colors means that several users applied job in accordance with their popular job. For the GraphSAGE model, the results of the training model are shown in the Table 6.

Table 6. Loss and Acc in Data Training GraphSAGE Model Epoch Loss Training acc

1 7.588265e+26 85%

2 0.1893 94%

3 0.2074 95%

4 0.1851 96%

5 0.1565 96%

6 0.1435 96%

7 0.1453 96%

Table 6 shows that the model is training until the 7th epoch. This means that there is no significant decrease in the loss value in the last few epochs and in accordance with the stopping criteria used, namely early stopping, this indicates that the model has achieved stable and optimal performance. With a stable and decreasing loss value, it indicates that the model continues to make improvements and optimizations to reduce the difference with the actual value. With a final Loss value of 0.1453 and with 96% accuracy on the training data, it shows that the GraphSAGE model is able to predict correctly and accurately on the data it has learned and in this case is the training data. To see the performance of the model on validation data, we also test the model on validation data to see whether the model has overfit or not. The results of testing on this validation data obtained a Loss value of 0.1306 and 97% accuracy. The results of the model on the two data sets show that the performance of the model remains stable even on unlearned data. The model trained with 30 epochs achieved a good level of accuracy in classifying the training and validation samples. Although the loss rate on the training data significantly decreased from the first to the second epoch, and there was a further decrease in the next few epochs, the improvement in accuracy was not very significant after the initial few epochs. This suggests that the model tends to reach an optimal point in the early stages of training. This result shows that the model does not experience overfitting, because the results on accuracy on training and validation data are relatively the same.

3.2 Result And Analysis of Training the FNN (Feedforward Neural Network) Model

Feedforward Neural Network models that are usually used on tabular data are also tested on graph datasets. This is to see whether Feedforward Neural Network is able to make predictions on graph data or not. Using the Feedforward Neural Network model, the results of Loss and accuracy on the training data are as shown in Table 7. The results of this model stop at the 5th epoch which means that there is no significant improvement in Loss in the previous epochs. In addition, similar to the GraphSAGE model that for the Feedforward Neural Network model uses Accuracy stop as a stopping criteria as well.

Table 7. Loss and Acc in Data Training FNN Model Epoch Loss Training acc

1 0.6898 64%

2 0.6894 64%

3 0.6892 65%

4 0.6917 65%

5 0.6912 65%

Table 7 shows that the Feedforward Neural Network model makes predictions on the training data with a final accuracy value of 65% and a loss value of 0.6912 and this shows that the Feedforward Neural Network model works with standard accuracy which means it is not too bad. But this value does not indicate that the Feedforward Neural Network model does not work well, because the loss value and also the accuracy prove that there is stability at each epoch. This shows that the Feedforward Neural Network model does not show significant fluctuations in the training dataset. For this reason, we also tested on the validation dataset with a final Lost value of 0.6914 and an accuracy of 70%. This result provides additional information that the Feedforward Neural Network model is able to work on data that has not been studied before, and this result shows even better than the accuracy on the training data with a considerable difference in accuracy of about 6%.

3.3 Model Testing

Both Feedforward Neural Network and GraphSAGE models have been trained and validated on the Job Recommendation dataset. The results of this training and validation process only provide an overview of the model's performance. We also need to test the model using testing data to measure how well the model works on testing data, Precision, Recall, and F1-Score evaluation metrics are used. These results are shown in Table 8.

(8)

Table 8. Model Performance

Model Evaluation

Precision Recall F1-Score

GraphSAGE 0.95 0.97 0.96

Feedforward Neural Network

0.93 0.70 0.80

The result of Precision for GraphSAGE is 0.95 or 95% which shows that GraphSAGE's ability to make predictions on correct or relevant values outperforms Feedforward Neural Network with a very thin value difference of about 2%. The same thing happens to the recall value and F1-Score value where GraphSAGE outperforms Feedforward Neural Network. But the significant difference is in Recall which has a difference of more than 20%. This distinct difference in Recall value indicates that there are predictions that should have been recommended but were missed or not detected by the Feedforward Neural Network model. In this case and dataset, our model GraphSAGE shows its ability to be more accurate compared to Feedforward Neural Network.

3.4 Model Comparison

In the previous results, it shows that the GraphSAGE model has better results than the Feedforward Neural Network on the Job Recommendation dataset. We also tested several datasets namely Movie dataset, Food dataset, and Epinions dataset. Each of these datasets has some differences in terms of features or columns and also predicted values. If previously we predicted the Job Recommendation dataset where whether the user is suitable or not with the job being applied for. This time we use a dataset that predicts whether a user or user matches or not with an item represented by a rating. In the Movie dataset there are features such as Title, Year, Genre, Duration, Actor and Critics Vote. In the Food dataset there are features such as UserID, ProductID, and Rating. And the last is the Epinions dataset which is a social network dataset but because it has UserID, ProductID, and Rating, we will also use it in testing the capabilities of our two models. With the different datasets used, this will determine the criteria for the model in making predictions that are better or more optimal on what kind of dataset. To see the results of the model performance, the results of testing both models against these datasets are shown in Table 9.

Table 9. Comparation Model Accuracy

Model Accuracy

Job Recommendation Epinions Food Movie

GraphSAGE 97% 60% 62% 71%

Feedforward Neural Network 71% 63% 59% 70%

In accordance with the results in Table 9. shows that the GraphSAGE model has a higher accuracy value than the Feedforward Neural Network on three datasets namely Job Recommendation, Food, and Movie. With the highest value in the Job recommendation dataset of 97%. Then followed by the Movie dataset of 71%. Then with a fairly low value on the Food dataset of 62% and Epinions of 60%. On the other hand, on the Epinions dataset, the Feedforward Neural Network model shows its superiority which has an accuracy that is 3% greater than our model, GraphSAGE. On the Food dataset both models show accuracy values that are not much different with a difference of about 3% difference. While on Movie dataset both models show relatively the same accuracy results with a difference of only 1%. This provides information that GraphSAGE can work optimally when the dataset being tried has more or more informative features. This can be seen in the Movie and Job Recommendation Datasets where GraphSAGE provides a higher accuracy value compared to its comparison model, namely Feedforward Neural Network. While the Feedforward Neural Network when compared to GraphSAGE, can work well on datasets that do not have many features or information on the dataset. Although Feedforward Neural Network has the greatest accuracy on the Job Recommendation dataset and Movie dataset, but this is about comparing two models. Feedforward Neural Network does have a large value on the Job Recommendation dataset and Movie dataset, but the value is smaller when compared to the GraphSAGE model, especially on the Job Recommendation dataset which has a fairly large accuracy difference of around 20%.

3.5 Experiment Result

From some existing results, GraphSAGE proves that it performs well in prediction. This is proven not only with the main dataset that we used in this study. Other datasets also prove that GraphSAGE has a better ability to predict or recommend items. After evaluating and also knowing which models have good performance for Job Recommendation in this study, we also tested to provide recommendations to certain users with the recommendations given are job id recommendations that match certain users. The number of recommendations to the user depends on how many users interact with several jobs applied. The results of recommendations on several sample users that we used are shown in Table 10.

(9)

Table 10. GraphSAGE Recommendation Result UserID List Recommendation JobID

72 3236, 3936, 669, 1810, 2143

47 1125, 30, 3286

80 3550

123 4165, 4164

767 377

From Table 10, it can be seen that the five users exemplified have a different number of recommendations.

In user 72, the GraphSAGE model provides a list of recommendations for five jobs, in user 47 there are three job recommendations, then in user 123 there are two job recommendations, the recommendation results that only provide a list of jobs are in user 80 and user 767. As said earlier, the number of recommendations depends on the user's interaction with the applied job. So, for this reason, this research also emphasizes the quality of the dataset used because the results of the recommendations will depend on how the interaction occurs in the dataset.

4. CONCLUSION

The purpose of this research is to provide job recommendations using GraphSAGE with the Job Recommendation dataset from Kaggle. In this research, a comparison is also made between two models, namely Feedforward Neural Network and GraphSAGE. In accordance with the results and discussion, GraphSAGE performance is better than Feedforward Neural Network in predicting job matches for users. The GraphSAGE model shows a good increase in accuracy and also a stable or even decreasing loss value in the training process. The GraphSAGE model is also able to make predictions on datasets that have not been seen before as evidenced by the accuracy, precision, recall and F1-Score values that remain stable in validation and testing data. On the other hand, testing is also carried out with other datasets that have different numbers of features such as Movie datasets, Epinions, and also Food datasets. Testing on different datasets aims to provide insight that the GraphSAGE model is not only accurate on the dataset used in this study. By testing for some of these datasets, the Feedforward Neural Network is not always inferior in performance compared to GraphSAGE. This is evident on the Epinions dataset. From this, it is worth reviewing that both models can be affected by the number of features as well as the information available in the dataset. Therefore, in the context of job match prediction, the GraphSAGE model is more accurate than the Feedforward Neural Network.

REFERENCES

[1] P. Jobseeker et al., “Persaingan jobseeker bagi freshgraduate di era milenial,” Sahmiyya, vol. 1, no. 1, pp. 150–156, 2022, [Online]. Available: https://e-journal.iainpekalongan.ac.id/index.php/sahmiyya/article/view/5409

[2] W. Shalaby et al., “Help me find a job: A graph-based approach for job recommendation at scale,” Proc. - 2017 IEEE Int. Conf. Big Data, Big Data 2017, vol. 2018-Janua, pp. 1544–1553, 2017, doi: 10.1109/BigData.2017.8258088.

[3] S. Rahmawati, D. Nurjanah, and R. Rismala, “Analisis dan Implementasi pendekatan Hybrid untuk Sistem Rekomendasi Pekerjaan dengan Metode Knowledge Based dan Collaborative Filtering,” Indones. J. Comput., vol. 3, no. 2, p. 11, 2018, doi: 10.21108/indojc.2018.3.2.210.

[4] S. Wu, F. Sun, W. Zhang, X. Xie, and B. Cui, “Graph Neural Networks in Recommender Systems: A Survey,” ACM Comput. Surv., vol. 55, no. 5, 2022, doi: 10.1145/3535101.

[5] Q. Guo et al., “A Survey on Knowledge Graph-Based Recommender Systems,” IEEE Trans. Knowl. Data Eng., vol. 34, no. 8, pp. 3549–3568, 2022, doi: 10.1109/TKDE.2020.3028705.

[6] M. Naumov et al., “Deep Learning Recommendation Model for Personalization and Recommendation Systems,” 2019, [Online]. Available: http://arxiv.org/abs/1906.00091

[7] D. Aguado, J. C. Andrés, A. L. García-Izquierdo, and J. Rodríguez, “LinkedIn ‘Big Four’: Job performance validation in the ICT sector,” Rev. Psicol. del Trab. y las Organ., vol. 35, no. 2, pp. 53–64, 2019, doi: 10.5093/jwop2019a7.

[8] L. D. Kumalasari and A. Susanto, “Recommendation System of Information Technology Jobs using Collaborative Filtering Method Based on LinkedIn Skills Endorsement,” Sisforma, vol. 6, no. 2, pp. 63–72, 2020, doi:

10.24167/sisforma.v6i2.2240.

[9] M. H. Mohamed, M. H. Khafagy, and M. H. Ibrahim, “Recommender Systems Challenges and Solutions Survey,” Proc.

2019 Int. Conf. Innov. Trends Comput. Eng. ITCE 2019, no. February, pp. 149–155, 2019, doi:

10.1109/ITCE.2019.8646645.

[10] L. Chen, L. Wu, R. Hong, K. Zhang, and M. Wang, “Revisiting graph based collaborative filtering: A linear residual graph convolutional network approach,” AAAI 2020 - 34th AAAI Conf. Artif. Intell., pp. 27–34, 2020, doi:

10.1609/aaai.v34i01.5330.

[11] H. Zhang, T. Zhou, T. Xu, Y. Wang, and H. Hu, “FNN-Based Prediction of Wireless Channel with Atmospheric Duct,”

IEEE Int. Conf. Commun., no. April, 2021, doi: 10.1109/ICC42927.2021.9501068.

[12] B. Markapudi, K. Chaduvula, D. N. V. S. L. S. Indira, and M. V. N. S. S. R. K. Sai Somayajulu, “Content-based video recommendation system (CBVRS): a novel approach to predict videos using multilayer feed forward neural network and Monte Carlo sampling method,” Multimed. Tools Appl., vol. 82, no. 5, pp. 6965–6991, 2023, doi: 10.1007/s11042-022- 13583-8.

[13] K. Xu, M. Zhang, J. Li, S. S. Du, K. I. Kawarabayashi, and S. Jegelka, “How Neural Networks Extrapolate: From

(10)

Feedforward To Graph Neural Networks,” ICLR 2021 - 9th Int. Conf. Learn. Represent., 2021.

[14] T. Bai, Y. Zhang, B. Wu, and J. Y. Nie, “Temporal Graph Neural Networks for Social Recommendation,” Proc. - 2020 IEEE Int. Conf. Big Data, Big Data 2020, pp. 898–903, 2020, doi: 10.1109/BigData50022.2020.9378444.

[15] M. Shi et al., “Genetic-GNN: Evolutionary architecture search for Graph Neural Networks,” Knowledge-Based Syst., vol. 247, 2022, doi: 10.1016/j.knosys.2022.108752.

[16] D. El Alaoui, J. Riffi, A. Sabri, B. Aghoutane, A. Yahyaouy, and H. Tairi, “Deep GraphSAGE-based recommendation system: jumping knowledge connections with ordinal aggregation network,” Neural Comput. Appl., vol. 34, no. 14, pp.

11679–11690, 2022, doi: 10.1007/s00521-022-07059-x.

[17] L. Hickman, S. Thapa, L. Tay, M. Cao, and P. Srinivasan, “Text Preprocessing for Text Mining in Organizational Research: Review and Recommendations,” Organ. Res. Methods, vol. 25, no. 1, pp. 114–146, 2022, doi:

10.1177/1094428120971683.

[18] A. Humeau-Heurtier, “Texture feature extraction methods: A survey,” IEEE Access, vol. 7, pp. 8975–9000, 2019, doi:

10.1109/ACCESS.2018.2890743.

[19] K. Jani, M. Chaudhuri, H. Patel, and M. Shah, “Machine learning in films: an approach towards automation in film censoring,” J. Data, Inf. Manag., vol. 2, no. 1, pp. 55–64, 2020, doi: 10.1007/s42488-019-00016-9.

[20] H. Wang, M. Zhao, X. Xie, W. Li, and M. Guo, “Knowledge graph convolutional networks for recommender systems,”

Web Conf. 2019 - Proc. World Wide Web Conf. WWW 2019, pp. 3307–3313, 2019, doi: 10.1145/3308558.3313417.

[21] K. Wiharja, J. Z. Pan, M. Kollingbaum, and Y. Deng, “More Is Better: Sequential Combinations of Knowledge Graph Embedding Approaches,” in Semantic Technology, 2018, pp. 19–35.

[22] K. Wiharja, J. Z. Pan, M. Kollingbaum, and Y. Deng, “Pattern-Based Reasoning to Investigate the Correctness of Knowledge Graphs,” in 25th Automated Reasoning Workshop, 2018, p. 10.

[23] A. Hogan et al., “Knowledge graphs,” ACM Comput. Surv., vol. 54, no. 4, 2021, doi: 10.1145/3447772.

[24] A. Salehi and H. Davulcu, “Graph Attention Auto-Encoders,” CoRR, vol. abs/1905.10715, 2019, [Online]. Available:

http://arxiv.org/abs/1905.10715

[25] P. Veličković, A. Casanova, P. Liò, G. Cucurull, A. Romero, and Y. Bengio, “Graph attention networks,” 6th Int. Conf.

Learn. Represent. ICLR 2018 - Conf. Track Proc., pp. 1–12, 2018, doi: 10.1007/978-3-031-01587-8_7.

[26] V. P. Dwivedi and X. Bresson, “A Generalization of Transformer Networks to Graphs,” 2020, [Online]. Available:

http://arxiv.org/abs/2012.09699

[27] N. R. Ananda, K. R. S. Wiharja, and M. A. Bijaksana, “Sentiment Analysis on Banking Chatbot using Graph-based Machine Learning Model,” in 2023 International Conference on Data Science and Its Applications (ICoDSA), 2023, pp.

310–315. doi: 10.1109/ICoDSA58501.2023.10276448.

[28] S. Racherla, “Available Digital service, Graph Convolutional Networks, GraphSage, Recommendation system, PinSage Research Article Racherla,” Res. Rev. Sci. Technol., vol. 3, no. 1, pp. 79–93, 2020.

[29] K. Wang, R. Mathews, C. Kiddon, H. Eichner, F. Beaufays, and D. Ramage, “Federated Evaluation of On-device Personalization,” 2019, [Online]. Available: http://arxiv.org/abs/1910.10252