Abstract

The amount of available scientific literature is increasing, and studies have proposed various methods for evaluating document-document similarity in order to cluster or classify documents for science mapping and knowledge discovery. In this paper, we propose hybrid methods for bibliographic coupling (BC) and linear evaluation of text or content similarity: We combined BC with BM25, Cosine, and PMRA to compare their performances with single methods in paper recommendation tasks using TREC Genomics Track 2005datasets. For paper recommendation, BC and text-based methods complement each other, and hybrid methods were better than single methods. The combinations of BC with BM25 and BC with Cosine performed better than BC with PMRA. The performances were best when the weights of BM25, Cosine, and PMRA were 0.025, 0.2, and 0.2, respectively, in hybrid methods. For paper recommendation, the combinations of BC with text-based methods were better than BC or text-based methods used alone. The choice of method should depend on the actual data and research needs. In the future, the underlying reasons for the differences in performance and the specific part or type of information they complement in text clustering or recommendation need to be examined.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call