Citation recommendation can help researchers quickly find supplementary or alternative references in massive academic resources. Current research on citation recommendation mainly focuses on the citing papers, resulting in the enormous cited papers are ignored, including the relations among cited papers and their citation context cited in citing papers. Moreover, cited paper’s content is often denoted with its original title and abstract, which is hard to acquire and rarely considers different citation motivations. Furthermore, the most appropriate method for semantic representation of cited papers’ relations and content is uncertain. Therefore, this paper studies citation recommendation from the perspective of semantic representation of cited papers’ relations and content. Firstly, four forms of citation context are designed and extracted as cited papers’ content considering citation motivations, as well as co-citation relationships are extracted as cited papers’ relations. Secondly, 132 methods are designed for generating semantic vector of cited paper, including four network embedding methods, 16 methods by combining four text representation algorithms with four forms of citation content, and 112 fusion methods. Finally, similarity among cited papers is calculated for citation recommendation and a quantitative evaluation method based on link prediction is designed, to find the most appropriate form of citation content and the optimal method. The result shows that doc2vecC (Document to Vector through Corruption) with the form of CS&SS (Current Sentences and Surrounding Sentences) performs best, in which the AUC (Area Under Curve) and MAP (Macro Average Precision) reach 0.877 and 0.889 and have increased by 0.462 and 0.370 compared with the worst-performing method. This performance is slightly improved by parameters adjustment, and a case study is performed whose results have further proved the effectiveness of this method. In addition, among four forms of cited papers’ content, CS&SS performs best in almost all methods. Furthermore, the fusion methods not always perform better than the single methods, where doc2vecC (CS&SS) performs better than the best fusion method GCN (Graph Convolutional Network). These results not only prove the effectiveness of citation recommendation from the perspective of cited paper, but also provide helpful and useful suggestions for method selection and citation content selection. The data and conclusions can be extended to other text mining-related tasks. Simultaneously, it is a preliminary research which needs to be further studied in other domains using emerging semantic representation methods.
Read full abstract