Abstract

BackgroundIn this study, we present an analysis of data citation practices in full text research articles and their corresponding supplementary data files, made available in the Open Access set of articles from Europe PubMed Central. Our aim is to investigate whether supplementary data files should be considered as a source of information for integrating the literature with biomolecular databases.ResultsUsing text-mining methods to identify and extract a variety of core biological database accession numbers, we found that the supplemental data files contain many more database citations than the body of the article, and that those citations often take the form of a relatively small number of articles citing large collections of accession numbers in text-based files. Moreover, citation of value-added databases derived from submission databases (such as Pfam, UniProt or Ensembl) is common, demonstrating the reuse of these resources as datasets in themselves. All the database accession numbers extracted from the supplementary data are publicly accessible from http://dx.doi.org/10.5281/zenodo.11771.ConclusionsOur study suggests that supplementary data should be considered when linking articles with data, in curation pipelines, and in information retrieval tasks in order to make full use of the entire research article. These observations highlight the need to improve the management of supplemental data in general, in order to make this information more discoverable and useful.

Highlights

  • In this study, we present an analysis of data citation practices in full text research articles and their corresponding supplementary data files, made available in the Open Access set of articles from Europe PubMed Central

  • We investigated to what extent, (1) publishers provide structurally annotated accession numbers in full text, (2) text mining extends publisher annotations and (3) text mining contributes to literature–database cross links

  • This open access article set is available on the Europe PubMed Central (PMC) FTP site and the linked supplementary data files are available via the Europe PMC RESTful web service

Read more

Summary

Introduction

We present an analysis of data citation practices in full text research articles and their corresponding supplementary data files, made available in the Open Access set of articles from Europe PubMed Central. Biomolecular and literature databases are a vital resource for the scientific community Linking these resources enables scientists to access, analyse and process the data comprehensively. While some publishers tag (structurally annotate) accession numbers in the text of articles as a part of their production process, this is not something done comprehensively across all publishers [1]. In the absence of machine-actionable citation data, text mining [1,2,3,4] can be used to annotate accession numbers automatically across large volumes of published research articles. We investigated to what extent, (1) publishers provide structurally annotated accession numbers in full text, (2) text mining extends publisher annotations and (3) text mining contributes to literature–database cross links. Our results show that text mining can significantly enrich publishers’ annotations and contribute to literature–database cross links (see [1] for details)

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.