Abstract

A wealth of knowledge concerning relations between genes and its associated diseases is present in biomedical literature. Mining these biological associations from literature can provide immense support to research ranging from drug-targetable pathways to biomarker discovery. However, time and cost of manual curation heavily slows it down. In this current scenario one of the crucial technologies is biomedical text mining, and relation extraction shows the promising result to explore the research of genes associated with diseases. By developing automatic extraction of gene-disease associations from the literature using joint ensemble learning we addressed this problem from a text mining perspective. In the proposed work, we employ a supervised machine learning approach in which a rich feature set covering conceptual, syntax and semantic properties jointly learned with word embedding are trained using ensemble support vector machine for extracting gene-disease relations from four gold standard corpora. Upon evaluating the machine learning approach shows promised results of 85.34%, 83.93%,87.39% and 85.57% of F-measure on EUADR, GAD, CoMAGC and PolySearch corpora respectively. We strongly believe that the presented novel approach combining rich syntax and semantic feature set with domain-specific word embedding through ensemble support vector machines evaluated on four gold standard corpora can act as a new baseline for future works in gene-disease relation extraction from literature.

Highlights

  • Advancements in science and technology act as a major influence on the fast increase of scientific publications, especially in the field of biomedicine [1]

  • To evaluate the performance of the current study, we conducted a series of experiments for relation extraction on EUADR, Genetic association databases (GAD), CoMAGC and PolySearch corpora

  • In order to compare the performance of our proposed methodology, we compared the results with other text mining techniques, including BeFree [26], PKDE4J [28] and PolySearch2 [30]

Read more

Summary

Introduction

Advancements in science and technology act as a major influence on the fast increase of scientific publications, especially in the field of biomedicine [1]. Scientific advancements in the research of diseases made potential discoveries in molecular and cellular components and revealed new insights into genetic alterations and signaling pathways [2]. Gene-disease relation extraction using joint ensemble learning order to keep up with new findings and to generate valid insights researchers need to go through a very difficult, tedious manual reads and analysis. Biomedical text mining is evolved and generated exceptional results and knowledge discovery in the past years using its ability to process biomedical and scientific literature automatically in largescale [4]

Methods
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call