4.7 Results and Discussion
4.7.2 Influences of Candidate Context Words and Their Neighboring Words 53
that these words also influence the distribution of ratings and should therefore accompany the context words used in making rating predictions.
4.7.2 Influences of Candidate Context Words and Their Neighboring Words
As discussed in Section 4.4, their neighboring words might alter the influence of candidate context words on their rating distributions. To analyze such influences, I follow [77] by applying the L2-norm to each column of the local context unit. This enables the influence levels of the candidate context and their neighboring words to be emphasized, as shown in Figure 4.8. For example, words that follow “staff” and “very” have more influence on rating distributions than the words that come before them. This corresponds to the following words often being “good,” “helpful” or “friendly” for “staff,” and “clean,” “convenient” or
“comfortable” for “very.” On the other hand, words such as “breakfast” are less influenced by neighboring words, meaning that the word itself sufficiently describes the rating distributions without any help from neighboring words. Moreover, the local context units can differentiate the influence of positive words such as “good” or “excellent.” Although the rating distribu- tions of “good” are influenced by its neighboring words, the word “excellent” is not. This is because the word “excellent” itself indicates the strongest positive meaning, whereas the semantic meaning of “good” can be altered if it follows words such as “not” or “very.”
For these reasons, we see that the local context units can capture the influences of the candidate context words efficiently, together with their neighboring words, on the rating distributions. This further helps to produce high-quality region embeddings, which are capable of semantically representing the distribution of ratings for the individual contextual regions.
54 Recommendations
l2 l1 “staff” r1 r2
14.33 15.63 15.84 25.59 24.19
19.30 21.55 12.39 34.13 20.82
20.70 24.94 19.59 14.01 12.97
9.27 11.22 20.34 12.74 11.20
14.16 17.26 16.16 14.40 10.98
6.85 7.95 19.42 7.77 6.82
l2 l1 “very” r1 r2
l2 l1 “clean” r1 r2
l2 l1 “breakfast” r1 r2
l2 l1 “good” r1 r2
l2 l1 “excellent” r1 r2
Figure 4.8: Visualization of the local context units for some chosen candidate context words.
4.7.3 Embedding Analysis
The previous subsection showed that CARE is able to extract words representing contexts from multiple recommendation domains. This subsection aims to show that the region embeddings, which are generated from context words and their neighboring words, accurately capture the rating distributions of their corresponding contextual regions and are therefore useful for rating prediction. The assumption is that contextual regions that contribute similar rating distributions should generate region embeddings that are close to each other in the embedding space.
To investigate this assumption, I first define a method for categorizing the distributions of ratings into classes. The idea is to assign a class to each rating distribution based on its direction (positive or negative). For example, the frequencies of ratings in the distribu- tion dist(cn,d)1= [8, 25, 34, 56, 95] are positively distributed toward high rating scores, whereas those of dist(cn,d)2 = [103, 75, 41, 18, 3] are negatively distributed toward low rating scores. Thedist(cn,d)1 would then be categorized as belonging to a positive class, whereas dist(cn,d)2 should belong to a negative class. To implement this categorization, Pearson correlation coefficient was chosen to compute a correlation score between the rating distribution and an ordinal rating vector, as expressed by
ρdist(cn,d)m,scoreR = cov(dist(cn,d)m,scoreR)
σdist(cn,d)mσscoreR , (4.7.1)
4.7 Results and Discussion 55 Table 4.4: Criteria for categorizing a rating distribution based on correlation score.
Correlation Class ρ≥0.9 Strong Positive 0.4≤ρ <0.9 Positive -0.4<ρ <0.4 Neutral -0.9<ρ ≤0.4 Negative
ρ≤-0.9 Strong Negative
where cov denotes the covariance function ,σ is a standard deviation, andscoreR∈Z|Rating|is an ordinal rating score vector (sorted in ascending order) for which|scoreR|=|dist(cn,d)m|.
For example,scoreR= [1,2,3,4,5]could be used for rating data via a five-point rating score.
After computing the correlation score for eachdist(cn,d)m usingscoreR, we can then assign it to a class by using the categorization criteria given in Table 4.4, where the rating distributions were categorized into five classes: Strong Positive, Positive, Neutral, Negative and Strong Negative.
To visualize the subtle differences between the region embeddings, the contextual regions from the TripAdvisor and Amazon Movies & TV datasets, were sampled and categorized based on their corresponding rating distributions and generating their region embeddings.
Figures 4.9 and 4.10 were obtained by applying t-distributed stochastic neighbor embedding (t-SNE) [95] to the sampled region embeddings from each dataset, where the color of each point denotes the class of its associated rating distribution. In Figure 4.9, for each dataset, 50 contextual regions from each class (250 in total) were sampled and their corresponding region embeddings were plotted. Note that the group of region embeddings representing positive and negative classes is fairly distinguishable. This supports the assumption that contextual regions with similar rating distributions are mapped close to each other in the embedding space.
The region embeddings can be analyzed in more detail by visualizing those that are associated with the contextual regions of each candidate context word. As shown in Figure 4.10, two candidate context words, “location” from TripAdvisor and “acting” from Amazon Movies & TV were selected; 10 contextual regions that contained them from each class (50 in total) were sampled and their corresponding region embeddings were plotted. Note that words that contribute positive distributions such as “great”, “good”, or “excellent” are grouped close to each other and are visually separated from negatively distributed words such as “not”, “bad”, or “but”. This again supports the assumption that neighboring words in
56 Recommendations
(a) TripAdvisor
(b) Amazon Movies & TV
Strong Positive
Strong Negative Negative Positive Neutral
Strong Positive
Strong Negative Negative Positive Neutral
Figure 4.9: Projection of sampled region embeddings for the TripAdvisor and Amazon Movies & TV datasets.
the same text region as a candidate context word influence the distribution of ratings, and should be considered when extracting contextual information from reviews.