Pseudo Amino Acid Feature-based Protein Function Prediction using Support Vector Machine and K-Nearest Neighbors

Anjna Jayant Deen,Manasi Gyanchandani

doi:10.14569/ijacsa.2020.0110922

Abstract

Bioinformatics facing the vital challenge in protein function prediction due to protein data are available in primary structure, an amino acid sequence. Every protein cell sequence length and size are in different sequence order. Protein is available in 20 amino acid sequence alphabetic order; however, the corresponding information of the membrane protein sequence is insufficient to capture the function and structures of a protein from primary sequence datasets. A challenging task to correctly identify protein structure and function from amino acid sequence. The basic principle of PseAAC (Pseudo Amino Acid Composition) is to generate a discrete number of every protein samples. In each protein, sequence length varies due to protein functions. Some protein sequence length is less than 50, and some are large. Due to this, different sizes of the amino acid sample are chances to lose sequence order information. PseAAC feature generates a fixed size descriptor value in vector space to overcome sequence information loss and is used to further systematic evolution. Therefore machine learning computational tool synthesizes accurate identification of structure and function class of membrane protein. In this study, SVM (Support Vector Machine) and KNN (K-nearest neighbors) based prediction classifier used to identifying membrane protein and their types.

Highlights

Bioinformatics is a different field of combination to solve biological problems with computational techniques dealing remarkably in extensive scale information of system biology
ACC (Amino acid composition) is used in predicting membrane protein types [3],[5],[13], first used by the article [3], but sequence order of information can't store during implementing amino acid composition
Two different types of feature extraction techniques, namely PseAAC and sequence to integer encoding, are used, giving a feature vector of 62029 instances in a row and 51-dimension in a column classified by the proposed model, were 43418×51 training and 18611×51 test samples are implemented

Summary

Introduction

Bioinformatics is a different field of combination to solve biological problems with computational techniques dealing remarkably in extensive scale information of system biology. Learning outer and non-outer membrane cells necessary to develop computational tools for new drug design and genome sequencings [10][26]. Demand to construct computational methods that can predict membrane protein characteristics based on their primary sequence would be very helpful. Different kinds of feature extractions and classification methods have been built to be used to predict membrane types. ACC (Amino acid composition) is used in predicting membrane protein types [3],[5],[13], first used by the article [3], but sequence order of information can't store during implementing amino acid composition. Various computational methods based on learning classifiers and ensemble methods have been used for predicting cell membranes in highperformance accuracy. Patterns match and similarity were calculated by using the standard test conducted on high dimensional multiclass protein data

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Pseudo Amino Acid Feature-based Protein Function Prediction using Support Vector Machine and K-Nearest Neighbors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2020
License type: cc-by

Similar Papers

New avenues in protein function prediction
Iddo Friedberg ... Martin Jambon
Protein Science | VOL. 15
Iddo Friedberg, et. al.Iddo Friedberg ... Martin Jambon
01 Jun 2006
Protein Science | VOL. 15

Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology
Kuo-Chen Chou
Current Proteomics | VOL. 6
Kuo-Chen ChouKuo-Chen Chou
01 Dec 2009
Current Proteomics | VOL. 6

Using Chou’s pseudo amino acid composition to predict subcellular localization of apoptosis proteins: An approach with immune genetic algorithm-based ensemble classifier
Yong-Sheng Ding ... Tong-Liang Zhang
Pattern Recognition Letters | VOL. 29
Yong-Sheng Ding, et. al.Yong-Sheng Ding ... Tong-Liang Zhang
20 Jun 2008
Pattern Recognition Letters | VOL. 29

Effects of mycotoxins on apoptosis of human immune system
V Roman ... L.I Brasoveanu
EJC Supplements | VOL. 6
V Roman, et. al.V Roman ... L.I Brasoveanu
01 Jul 2008
EJC Supplements | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pseudo Amino Acid Feature-based Protein Function Prediction using Support Vector Machine and K-Nearest Neighbors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications