Abstract

BackgroundAs the availability of open access full text research articles increases, so does the need for sophisticated search services that make the most of this new content. Here, we present a new feature available in Europe PMC that allows selected sections of full text articles to be searched, including figures and reference lists. Users can now search particular parts of an article, reducing noise and allowing fine-tuning of searches.ResultsTo the best of our knowledge, Europe PMC is the first service that provides a granular literature search by allowing users to target their search to particular sections of articles. This new functionality is based on a heuristic algorithm that identifies and categorises article sections into 17 pre-defined categories based on the section heading. The tagger’s performance is measured against a manually curated dataset consisting of 100 full text articles with an F-score of 98.02%.ConclusionsThe section search is available from the advanced search within Europe PMC (http://europepmc.org). The source code is freely available from http://europepmc.org/ftp/oa/SectionTagger/.Electronic supplementary materialThe online version of this article (doi:10.1186/s13326-015-0003-7) contains supplementary material, which is available to authorized users.

Highlights

  • As the availability of open access full text research articles increases, so does the need for sophisticated search services that make the most of this new content

  • Analysis of the open access full text articles The section tagger only operates on the full text articles that are available as eXtensible Markup Language (XML), since OCR content lacks parsable section headings

  • We analysed the coverage of the section tagger on the open access (OA) article set (Figure 3)

Read more

Summary

Results

To the best of our knowledge, Europe PMC is the first service that provides a granular literature search by allowing users to target their search to particular sections of articles. This new functionality is based on a heuristic algorithm that identifies and categorises article sections into 17 pre-defined categories based on the section heading. The tagger’s performance is measured against a manually curated dataset consisting of 100 full text articles with an F-score of 98.02%

Background
Results and discussion
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call