Abstract

Functional characterization of the Unknown Protein Sequence (UPS) is significant for biological research such as disease diagnosis and drug design. In this work Neuro-Fuzzy based machine learning framework is proposed with two levels of predictions, for functional characterization of UPS using augmented features and subcellular localization of the protein sequences. In the first level, Neuro-Fuzzy Approach (NFA) is applied for the categorization of UPS as bacterial or non-bacterial protein sequence. While NFA is capable to overcome the likelihood prediction problem of supervised learning algorithms with untrained samples such as UPS. In the next level functions of bacterial protein sequences are characterized using Protein Subcellular Localization (PSL) model. Physicochemical and evolutionary informations of the protein sequences are extracted and augmented as a protein feature vector that preserves the protein-residue-property and sequence-order-information. Various individual and ensemble classifiers such as Decision-Tree (C-4.5), k-Nearest-Neighbor (k-NN), Multi-Layer-Perceptron (MLP), Naïve-Bayes (NB), AdaBoost, and Gradient-Boosting-Machine (GBM) are used for the formation of the PSL model. PSL model is trained with augmented features of known Gram-Negative Bacterial Protein Sequence (GN_BPS) dataset with 10-fold cross-validation and 97.94% accuracy is achieved through C-4.5 classifier. Validated PSL model is further utilized for the functional characterization of the Unknown G- Bacterial Protein Sequences (Unk_GN_BPS) such as Unk_GN_156 and Unk_GN_61 datasets. The accuracy achieved for Unk_GN_156 is 78.20% with C-4.5 and 79.32% for the Unk_GN_61 through k-NN.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call