Augmented sequence features and subcellular localization for functional characterization of unknown protein sequences.

Saurabh Agrawal,Naresh Kumar Nagwani,Dilip Singh Sisodia

doi:10.1007/s11517-021-02436-5

Abstract

Advances in high-throughput techniques lead to evolving a large number of unknown protein sequences (UPS). Functional characterization of UPS is significant for the investigation of disease symptoms and drug repositioning. Protein subcellular localization is imperative for the functional characterization of protein sequences. Diverse techniques are used on protein sequences for feature extraction. However, many times a single feature extraction technique leads to poor prediction performance. In this paper, two feature augmentations are described through sequence induced, physicochemical, and evolutionary information of the amino acid residues. While augmented features preserve the sequence-order-information and protein-residue-properties. Two bacterial protein datasets Gram-Positive (G +) and Gram-Negative (G-) are utilized for the experimental work. After performing essential preprocessing on protein datasets, two sets of feature vectors are obtained. These feature vectors are used separately to train the different individual and ensembles such as decision tree (C4.5), k-nearest neighbor (k-NN), multi-layer perceptron (MLP), Naïve Bayes (NB), support vector machine (SVM), AdaBoost, gradient boosting machine (GBM), and random forest (RF) with fivefold cross-validation. Prediction results of the model demonstrate that overall accuracy reported by C4.5 is highest 99.57% on G + and 97.47% on G- datasets with known protein sequences. Similarly, for theUPS overall accuracy of G + is 85.17% with SVM and 82.45% with G- dataset using MLP.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Augmented sequence features and subcellular localization for functional characterization of unknown protein sequences.

Abstract

Talk to us

Similar Papers

More From: Medical & biological engineering & computing

Lead the way for us

Journal: Medical & biological engineering & computing	Publication Date: Sep 20, 2021
Citations: 3

Similar Papers

Functional characterization of unknown protein sequences using Neuro-Fuzzy based machine learning approach and sequence augmented feature
Saurabh Agrawal ... Naresh Kumar Nagwani
Expert Systems With Applications | VOL. 205
Saurabh Agrawal, et. al.Saurabh Agrawal ... Naresh Kumar Nagwani
06 Jun 2022
Expert Systems With Applications | VOL. 205

Ensemble Learners of Multiple Deep CNNs for Pulmonary Nodules Classification Using CT Images
Baihua Zhang ... Fan Yang
IEEE Access | VOL. 7
Baihua Zhang, et. al.Baihua Zhang ... Fan Yang
01 Jan 2019
IEEE Access | VOL. 7

Function Characterization of Unknown Protein Sequences Using One Hot Encoding and Convolutional Neural Network Based Model
Saurabh Agrawal ... Dilip Singh Sisodia
-
Saurabh Agrawal, et. al.Saurabh Agrawal ... Dilip Singh Sisodia
01 Jan 2023
01 Jan 2023

Temporal and Spectral Analysis of EMG for Classification of Muscular Paralysis
Et Al Shubha V Patel
International Journal on Recent and Innovation Trends in Computing and Communication | VOL. 11
Et Al Shubha V PatelEt Al Shubha V Patel
05 Nov 2023
International Journal on Recent and Innovation Trends in Computing and Communication | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Augmented sequence features and subcellular localization for functional characterization of unknown protein sequences.

Abstract

Talk to us

Similar Papers

More From: Medical & biological engineering & computing