Abstract

Citation analysis is an active area of research for various reasons. So far, statistical approaches are mainly used for citation analysis, which does not look into the internal context of the citations. Deep analysis of citation may reveal interesting findings by utilizing deep neural network algorithms. The existing scholarly datasets are best suited for statistical approaches but lack citation context, intent, and section information. Furthermore, the datasets are too small to be used with deep learning approaches. For citation intent analysis, the datasets must have a citation context labeled with different citation intent classes. Most of the datasets either do not have labeled context sentences, or the sample is too small to be generalized. In this study, we critically investigated the available datasets for citation intent and proposed an automated citation intent technique to label the citation context with citation intent. Furthermore, we annotated ten million citation contexts with citation intent from Citation Context Dataset (C2D) dataset with the help of our proposed method. We applied Global Vectors (GloVe), Infersent, and Bidirectional Encoder Representations from Transformers (BERT) word embedding methods and compared their Precision, Recall, and F1 measures. It was found that BERT embedding performs significantly better, having an 89% Precision score. The labeled dataset, which is freely available for research purposes, will enhance the study of citation context analysis. Finally, It can be used as a benchmark dataset for finding the citation motivation and function from in-text citations.

Highlights

  • Citing research articles has always been an integral part of a research paper

  • Finding citation intent is vital for citation analysis, for which we need a substantial labeled dataset

  • This study provided a critical analysis of the existing datasets and discussed their limitations while using them for citation intent extraction

Read more

Summary

Introduction

Citing research articles has always been an integral part of a research paper. Scientific contents need to cite other works for various reasons [1], called citation intent. Finding citation intent of a citation is crucial for analyzing scientific literature and the relationship among scientific articles. A number of tasks, including citation intent classification, context analysis, research article recommendation, finding relevant papers, and creating citation networks, all require state-of-the-art scholarly datasets. For each of these mentioned problems, the approaches proposed to require different types of information. Citation intent can play a role in measuring the worth of a journal, area, and publication.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call