Abstract
AbstractIt is widely hypothesized that the information for determining protein hubness is found in their amino acid sequence patterns and features. This has moved us to relook at this problem. In this study, we propose a novel algorithm for identifying hub proteins which relies on the use of dipeptide compositional information and hydrophobicity ratio. In order to discern the most potential and protuberant features, two feature selection techniques, CFS (Correlation-based Feature Selection) and ReliefF algorithms were applied, which are widely used in data preprocessing for machine learning problems. Overall accuracy and time taken for processing the models were compared using a neural network classifier RBF Network and an ensemble classifier Bagging. Our proposed models led to successful prediction of hub proteins from amino acid sequence information with 92.94% and 92.10 % accuracy for RBF network and bagging respectively in case of CFS algorithm and 94.15 % and 90.89 % accuracy for RBF network and bagging respectively in case of ReliefF algorithm.KeywordsHub proteinsProtein- protein interaction networksmachine learningFeature vectors
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.