Survey of Natural Language Processing Techniques in Bioinformatics.

Zhiqiang Zeng,Zhiling Hong,Yun Wu,Hua Shi

doi:10.1155/2015/674296

Zhiqiang Zeng, Zhiling Hong + Show 2 more

Open Access

https://doi.org/10.1155/2015/674296

Copy DOI

Abstract

Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.

Highlights

Text mining and natural language processing refer to comprehending and analyzing natural language by using computer algorithms and programs
Researchers have predicted the structures and functions of proteins. Based on these two aspects, we summarize the text mining technologies used in bioinformatics research
As research on natural language and text mining methods develops, different application fields will be the key to future studies

Summary

Introduction

Text mining and natural language processing refer to comprehending and analyzing natural language by using computer algorithms and programs. It is an important research direction in the application field of artificial intelligence. With continuous and extensive research on machine learning and data mining algorithms, existing text mining technologies have achieved good results in automatic abstraction, automatic question answering, web relational network analysis, and anaphora resolution [1, 2]. Bioinformatics is an interdiscipline that emerged with the progress and accomplishment of the Human Genome Project. It predicts and solves live science problems related to genetics by using computer and statistical informatics. The National Center for Biotechnology Information established various databases for biological data, including sequence databases for storing DNA and protein data (e.g., dbEST and dbSNP) [8, 9], Online Mendelian Inheritance in Man database for storing disease data, Gene

Objectives

Methods

Conclusion