Abstract

During the past, there was a massive growth of knowledge of unknown proteins with the advancement of high throughput microarray technologies. Protein function prediction is the most challenging problem in bioinformatics. In the past, the homology based approaches were used to predict the protein function, but they failed when a new protein was different from the previous one. Therefore, to alleviate the problems associated with homology based traditional approaches, numerous computational intelligence techniques have been proposed in the recent past. This paper presents a state-of-the-art comprehensive review of various computational intelligence techniques for protein function predictions using sequence, structure, protein-protein interaction network, and gene expression data used in wide areas of applications such as prediction of DNA and RNA binding sites, subcellular localization, enzyme functions, signal peptides, catalytic residues, nuclear/G-protein coupled receptors, membrane proteins, and pathway analysis from gene expression datasets. This paper also summarizes the result obtained by many researchers to solve these problems by using computational intelligence techniques with appropriate datasets to improve the prediction performance. The summary shows that ensemble classifiers and integration of multiple heterogeneous data are useful for protein function prediction.

Highlights

  • Protein function prediction is a very important and challenging task in bioinformatics

  • The experiment based protein function prediction required a huge experimental and human effort to analyze a single gene or protein. To remove this drawback a number of very high throughput experimental procedures have been invented to investigate the methods that are used in function prediction. These procedures have generated a variety of data, such as protein sequences, protein structures, protein interaction network, and gene expression data used in function prediction

  • In paper [107], the authors have proposed a fuzzy k-nearest neighbor classifier based on the pseudoamino acid composition with physicochemical and statistical features derived from the protein sequences, such as amino acid composition, dipeptide composition, complexity factor, and low-frequency Fourier spectrum components to predict the nuclear receptors and their subfamilies

Read more

Summary

Introduction

Protein function prediction is a very important and challenging task in bioinformatics. An artificial neural network (ANN) based method has been proposed in papers [23, 24] to predict the DNA binding sites by using information on the amino acid sequence composition, solvent accessibility and secondary structure in paper [23], and position specific scoring matrices (PSSM) in paper [24]. In paper [41] the authors have proposed integrated SVMs based method for the prediction of rRNA, RNA, and DNA-binding proteins by using protein sequence amino acid composition and physicochemical properties such as hydrophobicity, predicted secondary structure, predicted solvent accessibility, normalized Vander Waals volume, polarity, and polarizability. In paper [107], the authors have proposed a fuzzy k-nearest neighbor classifier based on the pseudoamino acid composition with physicochemical and statistical features derived from the protein sequences, such as amino acid composition, dipeptide composition, complexity factor, and low-frequency Fourier spectrum components to predict the nuclear receptors and their subfamilies.

Observations and Discussions
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call