Abstract

We investigated the scientific research dissemination by analyzing the publications and citation data, implying that not all citations are significantly important. Therefore, as alluded to existing state-of-the-art models that employ feature-based techniques to measure the scholarly research dissemination between multiple entities, our model implements the convolutional neural network (CNN) with fastText-based pre-trained embedding vectors, utilizes only the citation context as its input to distinguish between important and non-important citations. Moreover, we speculate using focal-loss and class weight methods to address the inherited class imbalance problems in citation classification datasets. Using a dataset of 10 K annotated citation contexts, we achieved an accuracy of 90.7% along with a 90.6% f1-score, in the case of binary classification. Finally, we present a case study to measure the comprehensiveness of our deployed model on a dataset of 3100 K citations taken from the ACL Anthology Reference Corpus. We employed state-of-the-art graph visualization open-source tool Gephi to analyze the various aspects of citation network graphs, for each respective citation behavior.

Highlights

  • Citation analysis has been an active area of research to measure the impact of scientific publications [1,2,3,4]

  • We looked at the main communities with respect to the multiple citations behavior they exhibit using the Association for Computational Linguistics (ACL) Anthology Reference Corpus (ARC) citation network

  • We examined our citation network in terms of three aspects: (a) we offered a variety of quantitative metrics to visualize the overall trend of ACL ARC data; (b) we displayed networks of selected papers to demonstrate the community interactions; (c) we illustrated the citation edges on the basis of their citation function using the top 1% nodes with respect to their node degree

Read more

Summary

Introduction

Citation analysis has been an active area of research to measure the impact of scientific publications [1,2,3,4]. Since the inception of the internet, interactions and communication among scholars around the world have expanded This presents the concept of knowledge flow, which is the exchange of knowledge among the different scientific communities [5,6,7]. Agarwal et al [11] addressed the problems of high dimensionality input data and real-time consumption by introducing a scalable subspace clustering algorithm (SCuBA) approach. They tested the proposed algorithm on the data corpus from MovieLens (movie ratings by millions of users) and concluded that the approach outperforms subsequent clustering techniques (i.e., fallback models) at 15% precision, and is faster, scalable, and produces high-quality recommendations

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call