Abstract

Collaborative filtering (CF) is a prevailing technique utilized for recommendation systems and has been comprehensively explored to tackle the problem of information overload particularly in the Big Data context. The traditional CF algorithms are capable to perform adequately under various circumstances, nevertheless, there exist some shortcomings involving cold start and data sparsity. Moreover, a potential breakthrough rests in taking full advantage of any valuable semantic information contained in items. Therefore, for alleviating these defects, in this paper, we propose a two-stage collaborative filtering approach driven by Simhash-based semantic feature analysis, of which the first stage is Simhash-based semantic feature extraction for items and categories, and the second stage is reinforced CF rating prediction driven by intensely compressed category features. The rich semantic features of vast items and their categories can be rapidly extracted and compressed in the first stage by employing the Simhash, with being utilized to promote the traditional collaborative filtering processes. Besides, to solve the problems pertaining to the Big Data context, we design a parallel algorithm on Spark to accelerate the time-consuming process of semantic feature extraction for vast items. Finally, we conduct comprehensive experiments to validate the reinforced CF approach by adopting practical datasets, and the results reveal that compared with the traditional CF algorithms it can accomplish a promising performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call