Abstract
Bioinformatics facing the vital challenge in protein function prediction due to protein data are available in primary structure, an amino acid sequence. Every protein cell sequence length and size are in different sequence order. Protein is available in 20 amino acid sequence alphabetic order; however, the corresponding information of the membrane protein sequence is insufficient to capture the function and structures of a protein from primary sequence datasets. A challenging task to correctly identify protein structure and function from amino acid sequence. The basic principle of PseAAC (Pseudo Amino Acid Composition) is to generate a discrete number of every protein samples. In each protein, sequence length varies due to protein functions. Some protein sequence length is less than 50, and some are large. Due to this, different sizes of the amino acid sample are chances to lose sequence order information. PseAAC feature generates a fixed size descriptor value in vector space to overcome sequence information loss and is used to further systematic evolution. Therefore machine learning computational tool synthesizes accurate identification of structure and function class of membrane protein. In this study, SVM (Support Vector Machine) and KNN (K-nearest neighbors) based prediction classifier used to identifying membrane protein and their types.
Highlights
Bioinformatics is a different field of combination to solve biological problems with computational techniques dealing remarkably in extensive scale information of system biology
ACC (Amino acid composition) is used in predicting membrane protein types [3],[5],[13], first used by the article [3], but sequence order of information can't store during implementing amino acid composition
Two different types of feature extraction techniques, namely PseAAC and sequence to integer encoding, are used, giving a feature vector of 62029 instances in a row and 51-dimension in a column classified by the proposed model, were 43418×51 training and 18611×51 test samples are implemented
Summary
Bioinformatics is a different field of combination to solve biological problems with computational techniques dealing remarkably in extensive scale information of system biology. Learning outer and non-outer membrane cells necessary to develop computational tools for new drug design and genome sequencings [10][26]. Demand to construct computational methods that can predict membrane protein characteristics based on their primary sequence would be very helpful. Different kinds of feature extractions and classification methods have been built to be used to predict membrane types. ACC (Amino acid composition) is used in predicting membrane protein types [3],[5],[13], first used by the article [3], but sequence order of information can't store during implementing amino acid composition. Various computational methods based on learning classifiers and ensemble methods have been used for predicting cell membranes in highperformance accuracy. Patterns match and similarity were calculated by using the standard test conducted on high dimensional multiclass protein data
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Advanced Computer Science and Applications
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.