Sequential pattern mining for discovering gene interactions and their contextual information from biomedical texts.

Peggy Cellier,Jiří Kléma,Christophe Rigotti,Olivier Gandrillon,Marc Plantevit,Thierry Charnois,Jean-Luc Manguin,Bruno Crémilleux

doi:10.1186/s13326-015-0023-3

Abstract

BackgroundDiscovering gene interactions and their characterizations from biological text collections is a crucial issue in bioinformatics. Indeed, text collections are large and it is very difficult for biologists to fully take benefit from this amount of knowledge. Natural Language Processing (NLP) methods have been applied to extract background knowledge from biomedical texts. Some of existing NLP approaches are based on handcrafted rules and thus are time consuming and often devoted to a specific corpus. Machine learning based NLP methods, give good results but generate outcomes that are not really understandable by a user.ResultsWe take advantage of an hybridization of data mining and natural language processing to propose an original symbolic method to automatically produce patterns conveying gene interactions and their characterizations. Therefore, our method not only allows gene interactions but also semantics information on the extracted interactions (e.g., modalities, biological contexts, interaction types) to be detected. Only limited resource is required: the text collection that is used as a training corpus. Our approach gives results comparable to the results given by state-of-the-art methods and is even better for the gene interaction detection in AIMed.ConclusionsExperiments show how our approach enables to discover interactions and their characterizations. To the best of our knowledge, there is few methods that automatically extract the interactions and also associated semantics information. The extracted gene interactions from PubMed are available through a simple web interface at https://bingotexte.greyc.fr/. The software is available at https://bingo2.greyc.fr/?q=node/22.

Highlights

Literature on biology and medicine represents a huge amount of knowledge: more than 24 million publications are currently listed in the PubMed repository [1]
Biologists know from the context if the sentence is about protein or gene
Application: detection and characterization of gene interactions We have evaluated the quality of the sequential patterns found in the previous section as information extraction rules

Summary

Introduction

Literature on biology and medicine represents a huge amount of knowledge: more than 24 million publications are currently listed in the PubMed repository [1] These text collections are large and it is difficult for biologists to fully take benefit from this incredible amount of knowledge. NLP, and Information Extraction (IE) in particular, aim to provide accurate processing to extract specific knowledge such as named entities (e.g., gene, protein) and relationships between the recognized entities (e.g., gene-gene interactions, biological functions). Databases such as BioGRID [2] or STRING [3] store a large collection of interactions derived from different sources and indicate which gene. Machine learning based NLP methods, give good results but generate outcomes that are not really understandable by a user

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Biomedical Semantics	Publication Date: May 18, 2015
Citations: 44	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Sequential pattern mining for discovering gene interactions and their contextual information from biomedical texts.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Biomedical Semantics

Lead the way for us

Similar Papers

NLP methods in host-based intrusion detection systems: A systematic review and future directions
Zarrin Tasnim Sworna ... Muhammad Ali Babar
Journal of Network and Computer Applications | VOL. 220
Zarrin Tasnim Sworna, et. al.Zarrin Tasnim Sworna ... Muhammad Ali Babar
06 Oct 2023
Journal of Network and Computer Applications | VOL. 220

Single Concatenated Input is Better than Indenpendent Multiple-input for CNNs to Predict Chemical-induced Disease Relation from Literature
Bui Manh Thang ... Pham Thi Quynh Trang
VNU Journal of Science: Computer Science and Communication Engineering | VOL. 36
Bui Manh Thang, et. al.Bui Manh Thang ... Pham Thi Quynh Trang
30 May 2020
VNU Journal of Science: Computer Science and Communication Engineering | VOL. 36

Learning Relevant Models using Symbolic Regression for Automatic Text Summarization
Eder Vazquez Vazquez ... Yulia Ledeneva
Computación y Sistemas | VOL. 23
Eder Vazquez Vazquez, et. al.Eder Vazquez Vazquez ... Yulia Ledeneva
30 Mar 2019
Computación y Sistemas | VOL. 23

Dynamic-automatic pipelines for finding topic-specific information clusters using NLP methods in connection with a model-driven approach
Tobias Dorrn ... Achim Kuwertz
-
Tobias Dorrn, et. al.Tobias Dorrn ... Achim Kuwertz
28 Oct 2022
28 Oct 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sequential pattern mining for discovering gene interactions and their contextual information from biomedical texts.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Biomedical Semantics