Abstract

Abstract Background Collecting scientific publications related to a specific topic is crucial for different phases of research, health care and ‘effective text mining’. Available bio-literature search engines vary in their ability to scan different sections of articles, for the user-provided search terms and/or phrases. Since a thorough scientific analysis of all major bibliographic tools has not been done, their selection has often remained subjective. We have considered most of the existing bio-literature search engines (http://www.shodhaka.com/startbioinfo/LitSearch.html) and performed an extensive analysis of 18 literature search engines, over a period of about 3 years. Eight different topics were taken and about 50 searches were performed using the selected search engines. The relevance of retrieved citations was carefully assessed after every search, to estimate the citation retrieval efficiency. Different other features of the search tools were also compared using a semi-quantitative method. Results The study provides the first tangible comparative account of relative retrieval efficiency, input and output features, resource coverage and a few other utilities of the bio-literature search tools. The results show that using a single search tool can lead to loss of up to 75% relevant citations in some cases. Hence, use of multiple search tools is recommended. But, it would also not be practical to use all or too many search engines. The detailed observations made in the study can assist researchers and health professionals in making a more objective selection among the search engines. A corollary study revealed relative advantages and disadvantages of the full-text scanning tools.*Conclusion*While many studies have attempted to compare literature search engines, important questions remained unanswered till date. Following are some of those questions, along with answers provided by the current study:a) Which tools should be used to get the maximum number of relevant citations with a reasonable effort? ANSWER: _Using PubMed, Scopus, Google Scholar and HighWire Press individually, and then compiling the hits into a union list is the best option. Citation-Compiler (http://www.shodhaka.com/compiler) can help to compile the results from each of the recommended tool._b) What is the approximate percentage of relevant citations expected to be lost if only one search engine is used? ANSWER: _About 39% of the total relevant citations were lost in searches across 4 topics; 49% hits were lost while using PubMed or HighWire Press, while 37% and 20% loss was noticed while using Google Scholar and Scopus, respectively._c) Which full text search engines can be recommended in general? ANSWER: HighWire Press and Google Scholar.d) Among the mostly used search engines, which one can be recommended for best precision? ANSWER: EBIMed.e) Among the mostly used search engines, which one can be recommended for best recall? ANSWER: Depending on the type of query used, best recall could be obtained by HighWire Press or Scopus.

Highlights

  • Introduction and BackgroundThe search for published scientific literature is a very basic, yet crucial component of hypothesis generation, protocol selection, experimental design and interpretation of observations in biology

  • The results show that using a single search tool can lead to loss of up to 75% relevant citations in some cases

  • The detailed observations made in the study can assist researchers and health professionals in making a more objective selection among the search engines

Read more

Summary

Introduction

Introduction and BackgroundThe search for published scientific literature is a very basic, yet crucial component of hypothesis generation, protocol selection, experimental design and interpretation of observations in biology. It is not possible to comment on the extensiveness of the information derived by such programs. This is mainly because we do not know if PubMed can retrieve all (or most of) the relevant citations. A relative account of the retrieval efficiencies has not been established for most of the search engines. Collecting scientific publications related to a specific topic is crucial for different phases of research, health care and ‘effective text mining’. Available bio-literature search engines vary in their ability to scan different sections of articles, for the user-provided search terms and/or phrases. Eight different topics were taken and about 50 searches were performed using the selected search engines. Different other features of the search tools were compared using a semi-quantitative method

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call