Travel Package Recommendation

Location Based Social Networks (LBSN) benefit the users by allowing them to share their locations and life moments with their friends. In this paper, we propose a graph-based approach to recommend a set of personalized travel packages. Given the current location and spatiotemporal constraints, our goal is to recommend a package that meets the constraints.

People often use these LBSNs to check in to places they visit as part of their planned trips and regular activities. Registration information can be used for a number of purposes, one of which may be to recommend customized travel packages. Compared to traditional recommender systems, this problem is more challenging because traditional recommender systems only recommend individual items.

Each package consists of several POIs, selecting those few candidate POIs from a large number of POIs is a major challenge. We evaluate our proposed approaches on a subset of the census data available from Jie Pang, a well-known LBSN in China.

Figure 1.1: Travel Package Recommendation

Related Work

First Level Filtering

From the starting location of the user, we consider only those POIs in the respective city that are within the distance and reachable within the time budget specified by the user. The POIs that meet the time and distance constraints specified by the user will be retained. We performed several experiments, where we used the user's starting location as the seed location and estimated the default radius from the training data.

It was observed that in 99% of cases the visited POIs on that trip were present within this radius. This motivated us to use distance-based filtering as the first step to reduce the number of candidate POIs to be considered by the algorithm. If a user specifies the distance budget, this filtering can be done in a more informed way.

Therefore, first-level filtering helps to consider only those POIs that can be visited during the duration of the trip in a realistic sense. Since the proposed algorithm is a graph-based approach, reducing the number of candidate POIs results in a graph with reduced size.

Second Level Filtering

Score Threshold: Each POI is assigned a score based on the linear combination of the above four features. Feature Threshold: We calculate popularity, user preference, monthly popularity, and initial score for each POI and consider only those POIs as candidate POIs whose parameter values are at least threshold values set empirically for each feature.

Graph construction: Selection of Edges

Transition probability from a nodeX to nodeY is the probability that a user at nodeX will visit Y immediately. It is influenced by many other characteristics such as time spent by the user at these nodes, the travel time between them. Also, only if XtoY is a global sequential pattern, transition probability is calculated, otherwise transition probability is set to 0.

The whole process results in a Directed Graph with N nodes and variable number of edges for each user. Each edge is assigned a custom edge weight that represents the transition probability from one node to another node along the direction of the edge.

Recommendation Generation

Transition Count

L is the length of the path, also the height of the tree, taking into account the root at height 1.X is a node at level i and Y is a node at level i+1 in the path. The disadvantage of this approach is that only the global data is taken into account, while the Top-k paths are recommended, which is not specific to the user. This may result in the most popular paths being recommended and not the personalized paths for the specific user.

We try to overcome this problem by considering user preferences while recommending top-k paths for a particular user using the approaches below.

Shortest Paths

Since edge weight from one node to another is the inverse of its transition probability, if an edge weight is more, it is less likely that the corresponding POIs are visited in that order. If we look at the paths that form the edges with less weights, it is more likely that the corresponding order of POIs is visited in that order. The target user can name the number of POIs he wants to visit, which becomes the length of the path.

If he does not mention, then we can use the standard length, which is set by considering statistical patterns from the observed data. Only those paths whose length is one less than the length specified by the user or equal to the default length are retained. Then we add the start node to all the paths since start location is given as input.

If we do not find any paths of required length, we will look for the shortest paths with a larger length. To filter out the best paths, among all the shortest paths of required length returned by the Floyd - Warshall algorithm, we calculate the score for each path as below. The disadvantage of this approach is that it is expensive in terms of time taken to recommend the paths, since we use Floyd - Warshall algorithm to find All pair shortest paths which are of the order O(n3).

It will also generate all the shortest paths possible, although we only need the paths that are of desired length. Sometimes there may not be any shortest path of the required length, in which case we have to search for paths of greater length, which may not be acceptable to the user. The time this approach takes is directly proportional to the Graph size generated in Node Filtering stage.

Although we have used two levels of filtering, the size of the graph can still be large for few places. To recommend only the Top-1 path, we must follow the same procedure, which is again order of3. We are trying to overcome two problems of approach viz. time and manner, Top-1 path recommended using personal transition probability approach.

Figure 2.7: Top-K Recommendation - Stage wise

Personalized Transition Probability

This is based on the statistics of the day and is also very reasonable since every tourist wants to visit at least three POIs for breakfast, lunch and dinner and may be exhausted visiting more than ten POIs at the same time. Thresholds are selected as the points corresponding to the knee of the ROC curves. For each of the parameter values that are set, we calculate the ratio of true positives to false positives.

The knee of the ROC curves gives the best possible parameter values, which helps to select the more number of true positives. The size of the graph constructed using this method is 196, 150, and 166 for the cities of Chengdu, Nanjing, and Hong Kong, respectively. Similarly, to set the threshold for each of the parameters, we plotted ROC curves using training data.

The cut-off values are chosen as the points corresponding to the knee of the ROC curves. The size of the graph produced by this method is 241, 161 and 250 for Chengdu, Nanjing and Hong Kong respectively. It shows that the routes recommended by Scor e, T hr eshol and Personal i zed Tr ansi t i onP r obabi l i t ymethods contain 25% of the actual nodes visited by the user, while the asTr ansi t i onC ount method as a greedy approach yields only 19% of nodes actually visited. Personalized Portable P r obabi l i t y yields 34% of POIs whose order is the same as the order in which they were visited on the actual route, which is almost the same as the Scor e and T hr eshol dmethods. 27% of POIs whose order is the same as the order in which they were visited on the actual route.

Since this is a graph-based approach and finds the shortest paths of all pairs, reducing the size of the graph would drastically improve the performance of the algorithm. Even the Top recommendation is good enough that it gives almost 38 to 40% LCS accuracy, which means that about 40% of the POIs actually received in the trip are present in the top-1 recommendation in the correct sequence. Where the asTr ansitation method only takes into account the sequence of POI visits it gives relatively poor results.

The T Pmethod also gives almost similar results as the Scor method in much less time for Top-1 and relatively less time for Top-k recommendations. The T P method does not depend on the Graph Size, its execution time is proportional to the length of the required path. Experimental evaluations using ordinary longest continuum and Jaccard similarity as evaluation metrics show the efficiency of the proposed approach. We note that by using efficient filtering techniques, we can limit the number of candidate POIs to be considered for the recommendation task, thus reducing the size of the underlying graph and helping to generate recommendations in a very short time. .

It is also known that the visiting patterns of tourists for any city differ from the residents or those who visit those cities. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM 2018, Marina Del Rey, CA, USA, 5-9 February 2018.