PaperBot: open-source web-based search and metadata organization of scientific literature

Patricia Maraver,Giorgio A Ascoli,Todd A Gillette,Rubén Armañanzas

doi:10.1186/s12859-019-2613-z

Patricia Maraver, Giorgio A Ascoli + Show 2 more

Open Access

https://doi.org/10.1186/s12859-019-2613-z

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Jan 24, 2019
Citations: 16	License type: open-access

Affiliation: Bioengineering Center, George Mason University

Abstract

BackgroundThe biomedical literature is expanding at ever-increasing rates, and it has become extremely challenging for researchers to keep abreast of new data and discoveries even in their own domains of expertise. We introduce PaperBot, a configurable, modular, open-source crawler to automatically find and efficiently index peer-reviewed publications based on periodic full-text searches across publisher web portals.ResultsPaperBot may operate stand-alone or it can be easily integrated with other software platforms and knowledge bases. Without user interactions, PaperBot retrieves and stores the bibliographic information (full reference, corresponding email contact, and full-text keyword hits) based on pre-set search logic from a wide range of sources including Elsevier, Wiley, Springer, PubMed/PubMedCentral, Nature, and Google Scholar. Although different publishing sites require different search configurations, the common interface of PaperBot unifies the process from the user perspective. Once saved, all information becomes web accessible allowing efficient triage of articles based on their actual relevance and seamless annotation of suitable metadata content. The platform allows the agile reconfiguration of all key details, such as the selection of search portals, keywords, and metadata dimensions. The tool also provides a one-click option for adding articles manually via digital object identifier or PubMed ID. The microservice architecture of PaperBot implements these capabilities as a loosely coupled collection of distinct modules devised to work separately, as a whole, or to be integrated with or replaced by additional software. All metadata is stored in a schema-less NoSQL database designed to scale efficiently in clusters by minimizing the impedance mismatch between relational model and in-memory data structures.ConclusionsAs a testbed, we deployed PaperBot to help identify and manage peer-reviewed articles pertaining to digital reconstructions of neuronal morphology in support of the NeuroMorpho.Org data repository. PaperBot enabled the custom definition of both general and neuroscience-specific metadata dimensions, such as animal species, brain region, neuron type, and digital tracing system. Since deployment, PaperBot helped NeuroMorpho.Org more than quintuple the yearly volume of processed information while maintaining a stable personnel workforce.

Highlights

The biomedical literature is expanding at ever-increasing rates, and it has become extremely challenging for researchers to keep abreast of new data and discoveries even in their own domains of expertise
The tool we introduce with this work was initially developed to support the growth of and data acquisition for NeuroMorpho.Org, a data and knowledge repository aiming to provide unhindered access to all digital reconstructions of neuronal morphology [20]
Every keyword query is associated with a specific user-defined collection where the data are saved in the database. This design element provides additional flexibility: while PaperBot was created to aid dense literature coverage in a given domain, the collection used to save the articles depending on the query functionality could be exploited by other projects with different needs, for instance where one or two references may be sufficient to support a relevant piece of knowledge

Summary

Results

As a representative testbed of PaperBot, for over two years we harnessed the described functionalities in support of the data sharing repository NeuroMorpho.Org. The project’s success hinges on the systematic search for, and effective identification of any new publications containing digitally reconstructed neuromorphological data, followed by a collegial invitation to the corresponding author(s) to share their dataset This process entails a complex battery of combined keyword queries over several full-text search engines followed by the critical evaluation and annotation of every article found. PaperBot vastly improved the search for relevant data from peer-reviewed publications, more than quintupling the yearly number of identified articles for NeuroMorpho.Org while eliminating human involvement in tedious, no-value-added steps This tool, in whole or via appropriate combinations of its modules, could help other laboratories and projects improve their data acquisition pipelines and information curation workflows

Conclusions

Background

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

PaperBot: open-source web-based search and metadata organization of scientific literature

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Because it matters: Benefits of using the domain repository GFZ Data Services for Earth System Sciences data
Florian Ott ... Simone Frenzel
-
Florian Ott, et. al.Florian Ott ... Simone Frenzel
15 May 2023
15 May 2023

Curating geosciences data in the Earth, Space and Environmental Sciences – new developments of GFZ Data Services
Florian Ott ... Damian Ulbricht
-
Florian Ott, et. al.Florian Ott ... Damian Ulbricht
04 Mar 2021
04 Mar 2021

Understanding the funding characteristics of research impact: A proof-of-concept study linking REF 2014 impact case studies with Researchfish grant agreements.
Gavin Reddick ... Jonathan Grant
F1000Research | VOL. 10
Gavin Reddick, et. al.Gavin Reddick ... Jonathan Grant
20 Sep 2022
F1000Research | VOL. 10

PubPeer: Never Heard of It? You Have Now.
Roger Watson
Nurse Author & Editor | VOL. 26
Roger WatsonRoger Watson
01 Mar 2016
Nurse Author & Editor | VOL. 26

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PaperBot: open-source web-based search and metadata organization of scientific literature

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics