Abstract
BackgroundAs the availability of open access full text research articles increases, so does the need for sophisticated search services that make the most of this new content. Here, we present a new feature available in Europe PMC that allows selected sections of full text articles to be searched, including figures and reference lists. Users can now search particular parts of an article, reducing noise and allowing fine-tuning of searches.ResultsTo the best of our knowledge, Europe PMC is the first service that provides a granular literature search by allowing users to target their search to particular sections of articles. This new functionality is based on a heuristic algorithm that identifies and categorises article sections into 17 pre-defined categories based on the section heading. The tagger’s performance is measured against a manually curated dataset consisting of 100 full text articles with an F-score of 98.02%.ConclusionsThe section search is available from the advanced search within Europe PMC (http://europepmc.org). The source code is freely available from http://europepmc.org/ftp/oa/SectionTagger/.Electronic supplementary materialThe online version of this article (doi:10.1186/s13326-015-0003-7) contains supplementary material, which is available to authorized users.
Highlights
As the availability of open access full text research articles increases, so does the need for sophisticated search services that make the most of this new content
Analysis of the open access full text articles The section tagger only operates on the full text articles that are available as eXtensible Markup Language (XML), since OCR content lacks parsable section headings
We analysed the coverage of the section tagger on the open access (OA) article set (Figure 3)
Summary
To the best of our knowledge, Europe PMC is the first service that provides a granular literature search by allowing users to target their search to particular sections of articles. This new functionality is based on a heuristic algorithm that identifies and categorises article sections into 17 pre-defined categories based on the section heading. The tagger’s performance is measured against a manually curated dataset consisting of 100 full text articles with an F-score of 98.02%
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have