Abstract

The increase in the number of online published research papers can be attributed to the recent developments of the internet and web technologies. However, researchers and online users have a difficult time getting relevant and accurate information due to information explosion on the internet. In this paper, we seek to establish which algorithms and similarity metric combinations can be used to optimise the search and recommendation of articles in a research paper recommender systems. Our investigation utilised non-linear classification algorithms with text similarity measures. An offline evaluation approach is utilised to determine the models accuracy and performance, while various similarity metrics are assessed using available datasets. We will utilise the Recursive PARTitioning (rpart), Random Forest and Boosted machine learning algorithms on research paper similarity evaluation datasets. The rpart algorithm generally performed well when compared to the Boosted and the Random Forest algorithms by getting an average accuracy and time efficiency of 80.73 and 2.354628 seconds respectively. The cosine similarity performed best when compared with the other similarity metrics. New similarity metrics and measures are going to be proposed. It has been established in this paper that there are better combinations of metrics and algorithms when attempting to develop models that can be used for research paper similarity evaluation and recommendation. Further challenges and open issues are identified.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.