Abstract

Citation information in scholarly data is an important source of insight into the reception of publications and the scholarly discourse. Outcomes of citation analyses and the applicability of citation-based machine learning approaches heavily depend on the completeness of such data. One particular shortcoming of scholarly data nowadays is that non-English publications are often not included in data sets, or that language metadata is not available. Because of this, citations between publications of differing languages (cross-lingual citations) have only been studied to a very limited degree. In this paper, we present an analysis of cross-lingual citations based on over one million English papers, spanning three scientific disciplines and a time span of three decades. Our investigation covers differences between cited languages and disciplines, trends over time, and the usage characteristics as well as impact of cross-lingual citations. Among our findings are an increasing rate of citations to publications written in Chinese, citations being primarily to local non-English languages, and consistency in citation intent between cross- and monolingual citations. To facilitate further research, we make our collected data and source code publicly available.

Highlights

  • Citations are an essential tool for scientific practice

  • Concerning reason (2), the fact that unarXive is built from papers on the preprint server arxiv.org, and the Microsoft Academic Graph (MAG) contains metadata on paper’s preprint and published versions, allows us to analyze whether or not cross-lingual citations are affected by the peer review process

  • To assess the relative degree of self-citation when referring to publications in other languages, we compare the ratio of self-citations in (a) the cross-lingual citations within the documents of the cross-lingual set, and (b) the monolingual citations within the documents of the cross-lingual set

Read more

Summary

Introduction

Citations are an essential tool for scientific practice. By allowing authors to refer to existing publications, citations make it possible to position one’s work within the context of others’, critique, compare, and point readers to supplementary reading material. Because English is currently the de facto academic lingua franca [37], citations from non-English languages to English are significantly more prevalent than the other way around This dichotomy is reflected in existing literature, where usually either citations from English [24,29], or to English [20,21,41,44] are analyzed. 1. We conduct an analysis of cross-lingual citations in English papers that is considerably more extensive than existing literature in terms of corpus size as well as covered languages, time, and disciplines. We conduct an analysis of cross-lingual citations in English papers that is considerably more extensive than existing literature in terms of corpus size as well as covered languages, time, and disciplines This makes the results more representative of the areas covered, and enables the use of our collected data for machine learning-based applications such as crosslingual citation recommendation. Parts within the text of a paper, which contain a marker connected to one of the reference section entries, are called in-text citations

Related work
Cross-lingual citations in academic publications
Cross-lingual interconnections in other types of media
Identification of cross-lingual citations
Data source selection
Data Collection
#References
Prevalence
Self-citation
Geographical origin
Citation intent and sentiment
Method
Impact
Acceptance
Impact on paper success
Impact on citation data mining
Discussion and conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.