View of MATTHEW EFFECT REDUCTION IN COLLABORATIVE RECOMMENDER SYSTEM NEW DOMAIN OF RESEARCH PROPOSALS

(1)

Vol. 05, Issue 03, March 2020, Available Online: www.ajeee.co.in/index.php/AJEEE

1

MATTHEW EFFECT REDUCTION IN COLLABORATIVE RECOMMENDER SYSTEM NEW DOMAIN OF RESEARCH PROPOSALS

Avita Fuskele Jain

Jabalpur Engineering College, Jabalpur, M. P., India

Abstract- Recommender system, an information filtering technology used in many items is presented in web sites as per the interest of users, and is implemented in applications like movies, music, venue, books, research articles, tourism and social media in general. In today‟s world, time has more value and the researchers have no much time to spend on searching for the right articles according to their research domain increasing rapidly. Thus it needs an efficient searching and filtering mechanism to choose the quality research papers, so that the effort and time of researchers can be saved. Data sparsity problem directly affects the coverage of recommendation result. Collaborative filtering is a simple benchmark ubiquitously adopted in the industry as the baseline for recommender system design. We propose reduction approach for research-paper recommender system using collaborative filtering approach to recommend a user with best research papers in their domain according to their queries and based on the similarities found from other users on the basis of their queries, which will help in avoiding time consuming searches for the user.

Keywords: Collaborative filtering, Matthew effect, Data mining, Recommender systems, Research.

1 INTRODUCTION

Recommender systems are used to recommend or suggest products or items for interested users with the help of either collaborative or content-based filtering using a sequence of discrete characteristics of the item. This Mathew effect is alike preferential attachment in network science, the node with more number of links is preferred to make a fresh link. There are many disadvantages of matthew effect like we mostly take any decision based on our past experience or based on someone‟s recommendation. In this era of technology there are many systems that recommend us based on our browsing history or based on people‟s searches but this could serve as a great disadvantage for the new products, as this products are not suggested by the recommendation system. These approaches are combined together to build Hybrid Recommender Systems [1]. Every type of model has strengths and weaknesses.

Recommender systems are just like a substitute to search algorithms as it is useful to discover items that might not have been known to the user. Recommender systems are frequently through search engines by indexing non-traditional data.

Collaborative Filtering systems work with the help of feedback collected from user in the form of scores for each items in a certain user domain and uses the

similarities and differences of profiles of various users to recommend an item to them. Recommender systems overcome information overload as it provides personalized suggestions based on the user profile history. Search engines like Google are used as an essential tool for information retrieval as per the queries provided by the users and they have been successful. The domain growth in online publication has greatly contributed to the increase in research paper domain. Thus the research papers published is rapidly increasing. So the researchers are suffering due to a large amount of research papers to find an appropriate one. User modeling is the method of gathering information regarding users by analyzing the user‟s items or ratings [2]. User models are used by various applications including search engines and recommender systems. The interest of researchers with similar taste from the search result can help to find an effective recommendation by providing more efficient search, but sharing search results is usually too bulky and time consuming to be feasible. A recommender system for research papers can be used by researchers in order to choose the best and needed research papers in their domain with the help of the ratings given by other researchers with similar interests.

(2)

2 2 RELATED WORK

The recommendation system is constructed either using collaborative approaches or by using content-based filtering approach, or as a hybrid of the two mechanisms.

Collaborative filtering helps in finding the adjacent neighbors of a customer, with respect to the users rating history and tries to generate the best possible recommendation list of research papers for the user. In the case of content-based filtering approach, a consumer profile is prepared by procuring the contents of papers rated by the user and the system will propose a list of papers that matches the customer profile. Ideal recommendation is usually attained when there are suitable pre-existing ratings for the collaborative filtering approach to deduce upon. Though collaborative filtering suffers from the problem called cold-start problem [4]. In real, the recommender system for research paper does not exist. Though, models are published, they are only partially implemented.

It was found that most of the recommendation models implements content-based filtering as compared to collaborative filtering approach. Other recommendation models include subspace- clustering algorithms [5], stereotyping and hybrid recommendations. Some of the papers have worked on collaborative filtering and item ratings. Citation is considered as ratings directly to obtain ratings. CiteSeer was the foremost research paper recommender system introduced by Giles et al [6]. Citation analysis is applied to citation databases like CiteSeer to find papers with similar user interest. Since then, many articles relating research-paper recommendation systems were published.

In our literature review some weaknesses of the current research paper models is revealed. The performance of collaborative filtering and content-based filtering results was reported by researchers [10]. Usually collaborative filtering accomplished better results than content-based filtering and occasionally worse than collaborative.

Collaborative Filtering has been the most widely used recommender system algorithm. It is also one of the earliest recommender system techniques. There has been tremendous literature on collaborative

filtering algorithms ([11], [15], [16]) and their applications. Matthew Effect and sparsity problems have been long-term headaches for recommender system designers. To tackle the Matthew Effect and sparsity problem, researchers and engineers have come up with many solutions. Google computes the trending categorical information to offset the Matthew Effect introduced by the most popular news articles. Bansal et al [17]

proposed a deep text recommender model which could ameliorate the sparsity problem.

Matthew effect patterns have been observed not only in various scientific collaborations but also in socio-technical and biological networks, metabolic networks, propagation of scientific citation, the emergence of scientific progress and career longevity as well as education and brain development [3]. There is accumulation of recognition to well known scientists, which therefore leads to double injustice, by not recognizing the junior scientists and giving credit to the renowned scientist. In the case of research discipline, the patterns are mostly observed in collaborations and in independent multiple discoveries proposed by scientist of different position [2].

3 SYSTEM ARCHITECTURE A. Block Diagram

Fig. 1 Block Diagram B. Module Description

Data Collection

This Research paper recommendation system collects the data as .csv files as in table 1 which consists of list of paper ratings for each paper with their representative id‟s. Taking Frequency and Recency into consideration Affinity of user-

(3)

3 item per session is computed. Every session for a particular user interaction with a particular item is aggregated.

(1) u=user, it=item, s=section

Table 1 Research paper dataset

3.1 Similarity search using Collaborative Filtering Approach

Collaborative filtering approach is one of the supreme popular techniques for recommender system, collects feedbacks as ratings from user. As content based filtering cannot discover the quality of an item, collaborative filtering system is used to overcome this problem.

Here we use Cosine Similarity to measure the resemblance between the interests of consumer with the other consumer‟s interest which comes under alike category. The result gained from this similarity measure is more accurate [15].

The similarity index between two consumers U1 and U2 can be calculated using the formula for cosine similarity.

Cos Sim(U1, U2) = (v (U1). v (U2)) / |v (U1)

| |v (U2) | (2)

3.2 Prediction rating using user to item matrix

The prediction rating is needed to rate the best and useful paper according to the user interest, from the rating of other users in same category. The highly rated paper among all papers will be recommended to the user based on the interest of user.

Lastly, the user link formation is prepared by building a UBCF (User Based Collaborative Filtering) model. This helps in predicting the rate of paper for recommendation and a set of papers

according to the user‟s interest are recommended to the user.

Fig. 2 Raw Ratings

3.3 Building the Recommender Model Collaborative Filtering Model: In collaborative filtering model, user behavior (previously purchased item and rating given) and similar behavior shown by other users is used to build the model and the items (or ratings given for each items) interested to user is predicted using this model.

Our recommender model “UBCF (User Based Collaborative Filtering) model”

generates the recommendation from the closest user behavior using the whole matrix saved. Internally the model calculates the cosine similarity among all consumers represented as vectors

(4)

4 Fig. 3 Histogram ratings

Fig. 4 Predicted User Rating

1. Matthew Effect: To evaluate the Matthew Effect of different approaches, we analyze the expected similarity score between users (user- based collaborative filtering) and the expected similarity score between items (item-based collaborative filtering). To be more specific, we investigate how popularity effects the similarity score between the selected user/item and other users/items. The degree to which such similarity score exhibits disproportional larger value for more popular user/item shows the severity of the Matthew effect, which we demonstrate how to quantify analytically.

2. Sparsity Problem: To evaluate the sparsity problem of different approaches, we use the expected users (user-based collaborative filtering) and expected items (item- based collaborative filtering) involved in the similarity score computation.

The more users / items get involved in

computation, the data set gets less sparse. We compute the metrics using combinatorics and compare the sparsity effect in user-based and item- based collaborative filtering approaches.

3.4 Matthew Effect in Item Based Collaborative Filtering

Let‟s consider two videos: video A is clicked by 1/m of the users, video B is clicked by 1/n of users. To simplify our statistical model, we assume the event video A is clicked is independent of the event video B is clicked. The probability video A and video B are clicked by the same user is then 1/(m*n). Assume there are W users in total, then the event that a user clicked two videos obeys the Bernoulli distribution. On average, there are W/(m*n) users who clicked both videos. We also know that, on average, W/m users clicked video A and W/n users clicked video B.

4 IMPLEMENTATION RESULTS

In this system a test data of 135 users is taken to find out the similar interest between users and recommend a research paper to a specific user. This uses a dataset of research paper ratings with attributes userid, paperid and ratings. The deviation of predicted value from the true value gives the predictive task performance.

Mean Average Error (MAE) or the squared version called Root Mean Square Error (RMAE) is used which is calculated as follows.

Matthew Effect of similarity scores in item-based collaborative filtering as a consequence of skewness in the input data structure. The orange sparkles at the upper left corner and the spread light blue points at the lower right corner illustrate the algorithmic structure of the data set:

Popular items are seldom similar to each other, but when they do, they have a rather high similarity score.

We also compute the Matthew Effect metrics in collaborative filtering context settings. Fig. 3 shows the number of users involved in computation of similarity scores of users of different ranks. The distribution of data points exhibit power-law distribution properties. According to our quantified metrics, the distribution of these

(5)

5 points should follow Zipf‟s Law distribution if the input data follows Zipf‟s Law distribution. The real world dataset proves the correctness of our formulas.

Fig. 3shows the Matthew Effect metrics for item-based collaborative filtering. The x-axis is the items of different ranks while the y-axis is the number of items involved in computation of similarity scores. The distribution of data also follows a distribution similar to Zipf‟s Law. This once again proves the formulas of our quantified results.

Figure 5 Number of users involved in similarity score computation of users of

different ranks in user based collaborative filtering algorithm

Figure 6 Number of items involved in similarity score computation of items of

different ranks in item based collaborative filtering algorithm This recommender model is evaluated by taking a sample from our data source that is affinity data which is used to train our model leaving rest of the data to validate whether the right output is produced by the model. This is achieved through a technique called “split”. Then we validated the values for both the models that are item-based and user- based collaborative

model, as the result shown in fig 5.1 user- based collaborative filtering model has no deviation compared to item-based collaborative filtering model.

Fig. 7 UBFC and IBFC Validation Result 5 CONCLUSION

In this paper, a Collaborative Filtering mechanism for recommender system designed for the research-paper recommendation is proposed.

Recommending research papers improves reading habits of other researchers with same interest or similar ideas. Thus, the data collected from other researchers browsing history are used in this case which avoids analysis of content issues.

The dataset consists of small number of researchers (users) and a large number of research papers (items). Our proposed system Collaborative filtering is the most widely used recommender system technique. Its simplicity is very helpful in analyzing the data structure for more complicated algorithms. Understanding how Matthew Effect and sparsity problem affects the collaborative filtering approaches helps understand other algorithms. In this paper, we proposed metrics used to quantify Matthew Effect and the sparsity problem. We computed the metrics analytically for user- based collaborative filtering and item-based collaborative filtering. We provided a theoretic foundation for the analysis of Matthew Effect and sparsity problem of recommender system approaches. It takes the advantage of the unique features of the data in the domain and offers a solution which is fast and produces high quality recommendations but here we made a test dataset with ratings for the research papers as pre- prepared dataset with research paper rating is not available. This system can be made more effective if actual dataset with ratings can be provided by any Research paper websites. As of now we had made our own dataset for testing.

In future research, we would like to explore quantitative analysis of other

(6)

6 recommender system models such as matrix factorization, learning to rank and deep learning. Besides collaborative filtering, we would also like to find out how input data structure affects these other algorithms‟ output. We hope our research work could help industrial researchers and engineers design better algorithms and systems

k: user-item pairing (i, j), r^ij: ratings predicted

rij: ratings known

The deviation between predicted value and the real value of different users and items are measured using the above formula.

REFERENCES

1. Kumar, M., Yadav, D. K., Singh, A., & Gupta, V.

K. (2015). A movie recommender system: Movrec.

International Journal of Computer Applications, 124(3).

2. Andreas Nürnberger, Klaus Turowski, Alesia Zuccala. (2015, March). Towards Effective Research-Paper Recommender Systems and User Modeling based on Mind Maps. Magdeburg University.

3. Ziegler, C. N., McNee, S. M., Konstan, J. A., &

Lausen, G. (2005, May). Improving recommendation lists through topic diversification. In Proceedings of the 14th international conference on World Wide Web (pp.

22-32). ACM.

4. Ng, Y. K. (2017). MovRec: a personalized movie recommendation system for children based on online movie features. Internationa.

5. Agarwal, N., Haque, E., Liu, H., & Parsons, L.

(2005, October). Research paper recommender systems: A subspace clustering approach. In International Conference on Web-Age Information Management (pp. 475-491).

Springer, Berlin, Heidelberg.

6. Farooq, U., Ganoe, C. H., Carroll, J. M., Councill, I. G., & Giles, C. L. (2008). Design and evaluation of awareness mechanisms in CiteSeer. Information Processing & Management, 44(2), 596-612.

7. Rajpurkar, S., Bhatt, D., Malhotra, P., Rajpurkar, M. S. S., Bhatt, M. D. R., &

Malhotra, M. P. (2015). Book recommendation system. International Journal for Innovative Research in Science and Technology, ISSN: 314–316, 1(11, 1).

8. Beel, J., Gipp, B., Langer, S., & Breitinger, C.

(2016). paper recommender systems: a literature survey. International Journal on Digital Libraries, 17(4), 305-338.

9. Soni, K., Goyal, R., Vadera, B., & More, S.

(2017). A Three Way Hybrid Movie Recommendation Syste. International Journal of Computer Applications, 160(9). T. Bansal, D.

Belanger, and A. McCallum, “Ask the GRU:

Multi- Task Learning for Deep Text Recommendations”, Rec Sys ‟16, September 15 - 19, 2016, Boston , MA, USA

10. H. Wang, N. Wang, and D. Yeung, “Collaborative Deep Learning for Recommender Systems”, KDD

‟ 15, August 10-13, 2015, Sydney, NSW, Australia

11. T. Bansal, D. Belanger, and A. McCallum, “Ask the GRU: Multi- Task Learning for Deep Text Recommendations”, RecSys ‟16, September 15 - 19, 2016, Boston , MA, USA

12. H. Wang, N. Wang, and D. Yeung, “Collaborative Deep Learning for Recommender Systems”, KDD‟

15, August 10-13, 2015, Sydney, NSW, Australia 13. T. Chen, J. Cai, H. Wang, D. Yu, “Instant Expert

Hunting: Building an Answerer Recommender System for a Large Scale Q&A Website”, ACM Symposium on Applied Computing, 2014.

14. C. Dai, F. Qian, W. Jiang, Z. Wang, and Z.Wu, “A Personalized Recommendation System for NetEase Dating Site”, VLDA‟14, Hangzhou, China

15. P. Covington, J. Adams, E. Sargin, “Deep Neural Networks for YouTube Recommendations”, RecSys‟16, September 15-19, 2016.

16. S. Xiaoyuan and M. K. Taghi, “A Survey of Collaborative Filtering Techniques”, Advances in Artificial Intelligence, Vol. 2009, 2009.

17. B. Mehta, T. Hofmann and W. Nejdl, “Robust Collaborative Filtering”, RecSys „07, October 19- 20, 2007

18. B. Sarwar, G. Karypis, J. Konstan, and J. Riedl,

“Item-Based Collaborative Filtering Recommendation Algorithms”, WWW‟10, May 1- 5, 2001, Hong Kong

19. T. Bansal, D. Belanger, and A. McCallum, “Ask the GRU: Multi- Task Learning for Deep Text Recommendations”, RecSys ‟16, September 15 - 19, 2016, Boston, MA, USA

20. H. Wang, N. Wang, and D. Yeung, “Collaborative Deep Learning for Recommender Systems”, KDD

‟ 15, August 10-13, 2015, Sydney, NSW, Australia

21. T. Chen, J. Cai, H. Wang, D. Yu, “Instant Expert Hunting: Building an Answerer Recommender System for a Large Scale Q&A Website”, ACM Symposium on Applied Computing, 2014 22. C. Dai, F. Qian, W. Jiang, Z. Wang, and Z. Wu,

“A Personalized Recommendation System for Net Ease Dating Site”, VLDA‟14, Hangzhou, China 23. [P. Covington, J. Adams, E. Sargin, “Deep Neural

Networks for YouTube Recommendations”, RecSys‟16, September 15-19, 2016.

(3) (4)