Abstract

BackgroundTo date, many of the methods for information extraction of biological information from scientific articles are restricted to the abstract of the article. However, full text articles in electronic version, which offer larger sources of data, are currently available. Several questions arise as to whether the effort of scanning full text articles is worthy, or whether the information that can be extracted from the different sections of an article can be relevant.ResultsIn this work we addressed those questions showing that the keyword content of the different sections of a standard scientific article (abstract, introduction, methods, results, and discussion) is very heterogeneous.ConclusionsAlthough the abstract contains the best ratio of keywords per total of words, other sections of the article may be a better source of biologically relevant data.

Highlights

  • Pom1) could refer to two non-homologous genes, and another one (Sac1) to four; such polysemous gene names complicate gene identification from text

  • We hope for future interfaces for writers of Molecular Biology articles that should do the job upon validation by the authors

  • Derivation of Associations between the words of a section Given a section from an article, we split the text in sentences using a standard part of speech tagger (TreeTagger)

Read more

Summary

Introduction

Pom1) could refer to two non-homologous (unrelated) genes, and another one (Sac1) to four; such polysemous gene names complicate gene identification from text. There is a clear need for doing information extraction of biological data from full text scientific articles and the means for doing it are there with computers better suited for faster computation every day and new methodologies for Natural Language Processing that can be used for biomedical literature (see for example [14]). Many of the methods for information extraction of biological information from scientific articles are restricted to the abstract of the article. Several questions arise as to whether the effort of scanning full text articles is worthy, or whether the information that can be extracted from the different sections of an article can be relevant. Nowadays most journals are available in electronic version, and full text articles can be used for information extraction. An Abstract, as a summary, contains a high frequency of relevant terms (keywords), but this may not be the case of the rest of the article

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call