Abstract

Due to the ever-expanding growth of biomedical publications, biologists have to retrieve up-to-date information from vast literatures to ensure they do not neglect certain significant publications. It becomes more and more important to deal with the extraction problem from the biomedical texts in an automatic way. The paper focuses on automatically identifying the relationships between human genetic diseases and genes from the biomedical literatures. The experimental data is retrieved from Mendelian Inheritance in Man (MIM) literatures of morbid in Online Mendelian Inheritance in Man (OMIM) database. We propose a hybrid method combining the rule learning and the statistical techniques. To collect the corpus used in the research, the first step is to find the sentences that include both the related human genetic diseases and genes mentioned from the morbid file, and they are regarded as the correct sentences. In the second step, the sentences that neither have the related human genetic diseases nor the genes mentioned from the morbid file are randomly selected, and they are regarded as the incorrect sentences. Next, the Memory-Based Shallow Parser is utilized to analyze these sentences to get some information in order to find rules in the following step. Then, some learning rules are obtained with a rule learner, ALEPH system. These generated rules are applied to catch the pairs of human genetic diseases and genes within one sentence. In the following, the study proposes a statistical approach, called Z-score method, to determine whether the pairs are valid or not. Finally, the experiments are made with considering some constraints and different numbers of rules. Furthermore, the evaluation metrics in the experiments are precision, recall rates, and F-scores.

Highlights

  • At present, there has been a fast-increasing amount of biomedical publications spreading in the World Wide Web.For example, up to now, more than 21 million biomedical articles are available in MEDLINE. 1 For such an expanding-growing rate of the literatures, if biologists have to manually read through all the retrieved texts to find the information they need, it will become very difficult to keep up with the new findings

  • A substantial attention of research interest focuses on developing methods for automatically processing the biological and medical scientific literature – a process often referred to as biomedical text mining

  • Some researchers compute the similarity values between genes and diseases based on Gene Ontology (GO) [5] or Disease Ontology (DO) [6] terms [7, 8, 9]

Read more

Summary

Introduction

There has been a fast-increasing amount of biomedical publications spreading in the World Wide Web.For example, up to now, more than 21 million biomedical articles are available in MEDLINE. 1 For such an expanding-growing rate of the literatures, if biologists have to manually read through all the retrieved texts to find the information they need, it will become very difficult to keep up with the new findings. Some researchers compute the similarity values between genes and diseases based on Gene Ontology (GO) [5] or Disease Ontology (DO) [6] terms [7, 8, 9] Other controlled vocabularies such as MeSH [10] have already been utilized for linking proteins to disease terminologies [11, 12, 13]. The network-based approaches to analyzing relationships between genes and diseases are proposed in [25, 26, 27] These works demonstrate that associating genes with diseases is an active area of researches as it can lead to better understanding of diseases and it can reduce both time and expenditure in developing effective drugs and treatment

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.