A comparative analysis of text similarity measures and algorithms in research paper recommender systems

Maake Benard Magara,Sunday O Ojo,Tranos Zuva

doi:10.1109/ictas.2018.8368766

Abstract

The increase in the number of online published research papers can be attributed to the recent developments of the internet and web technologies. However, researchers and online users have a difficult time getting relevant and accurate information due to information explosion on the internet. In this paper, we seek to establish which algorithms and similarity metric combinations can be used to optimise the search and recommendation of articles in a research paper recommender systems. Our investigation utilised non-linear classification algorithms with text similarity measures. An offline evaluation approach is utilised to determine the models accuracy and performance, while various similarity metrics are assessed using available datasets. We will utilise the Recursive PARTitioning (rpart), Random Forest and Boosted machine learning algorithms on research paper similarity evaluation datasets. The rpart algorithm generally performed well when compared to the Boosted and the Random Forest algorithms by getting an average accuracy and time efficiency of 80.73 and 2.354628 seconds respectively. The cosine similarity performed best when compared with the other similarity metrics. New similarity metrics and measures are going to be proposed. It has been established in this paper that there are better combinations of metrics and algorithms when attempting to develop models that can be used for research paper similarity evaluation and recommendation. Further challenges and open issues are identified.

Full Text