Abstract
The problem of predicting the enzymes and non-enzymes from the protein sequence information is still an open problem in bioinformatics. It is further becoming more important as the number of sequenced information grows exponentially over time. We describe a novel approach for predicting the enzymes and non-enzymes from its amino-acid sequence using artificial neural network (ANN). Using 61 sequence derived features alone we have been able to achieve 79 percent correct prediction of enzymes/non-enzymes (in the set of 660 proteins). For the complete set of 61 parameters using 5-fold cross-validated classification, ANN model reveal a superior model (accuracy = 78.79 plus or minus 6.86 percent, Q(pred) = 74.734 plus or minus 17.08 percent, sensitivity = 84.48 plus or minus 6.73 percent, specificity = 77.13 plus or minus 13.39 percent). The second module of ANN is based on PSSM matrix. Using the same 5-fold cross-validation set, this ANN model predicts enzymes/non-enzymes with more accuracy (accuracy = 80.37 plus or minus 6.59 percent, Q(pred) = 67.466 plus or minus 12.41 percent, sensitivity = 0.9070 plus or minus 3.37 percent, specificity = 74.66 plus or minus 7.17 percent).
Highlights
It is generally accepted that protein structure is determined by its amino acid sequence [1] and that the knowledge of protein structures plays an important role in understanding their functions
For each non-enzyme of the testing set the correct prediction was Sequence derived parameters calculation and selection assumed if the corresponding artificial neural network (ANN) output lies between 0.1 and
The two different ANN models developed in this study are Fivefold cross-validation technique has been used for training and based on sequence derived features and position-specific matrices (PSSM) matrix method
Summary
It is generally accepted that protein structure is determined by its amino acid sequence [1] and that the knowledge of protein structures plays an important role in understanding their functions. [5] Determination of three-dimensional structure is the traditional approach to functional classification of proteins. The enormous task of function determination for every entry in GenBank has prompted the development of more sophisticated methods for protein automatic classification. [3, 4] A computational method allowing for the automatic determination of protein function from its sequence alone is one of the prevailing problems in bioinformatics. This is a very time-consuming process, and the need for a faster method of classification is obvious. This is a very time-consuming process, and the need for a faster method of classification is obvious. [6]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.