Amphiphilic Pseudo Amino Acid Composition Research Articles

Ion channels are ion-permeable protein pores that are found in all cell lipid membranes. Distinct ion channels play multiple roles in biological processes. Proteomic data is fast accumulating as a result of the fast growth of mass spectrometry and giving us the chance to comprehensively explore ion channel classes along with their subclasses. This paper proposes an eXtreme Gradient Boosting(XGBoost)-based method to estimate the ion channel classes and their subclasses. Here, 12feature vectors are applied to better characterize protein sequences like amino acid composition, pseudo-amino acid composition, normalized moreau-broto autocorrelation, amphiphilic pseudo-amino acid composition, dipeptide composition, Geary autocorrelation, tripeptide composition, sequence-order-coupling number, composition/transition/distribution, conjoint triad, moran autocorrelation, quasi-sequence-order descriptors. Here, a total of 9920 features are extracted from the protein sequence. The principal component analysis is applied to determine the optimal number of features to optimize the performance. In 10-fold cross-validation the proposed XGBoost based approach with optimal 50 features achieved accuracy of 100%, 98.70%, 98.77%, 97.26%, 87.40%, 97.39%, 98.03%, 96.42%, and F1-Score of 100%, 99%, 99%, 97%, 87%, 97%, 98%, 97%, for prediction of ion channel and nonion channel, voltage-gated and ligand-gated ion channels, subclasses of voltage-gated ion channels (VGICs), subclasses of ligand-gated ion channels (LGICs), subclasses of voltage-gated calcium channels (VGCCs), subclasses of voltage-gated potassium channels (VGKCs), subclasses of voltage-gated sodium channels (VGSCs), and subclasses of voltage-gated chloride channels, respectively. Here the proposed approach also compares with the other approaches such as support vector machine, k-nearest neighbor, Gaussian Naïve Bayes,and random forest and also compares with existing methods such as support vector machine(SVM) with maximum relevance maximum distancewith an accuracy of 86.6%, 83.7%, and 85.1%, for ion channels, non-ion channels and overall respectively and SVM with radial basis functionkernel-based method with an accuracy of 100%, 97% and 99.9% for ion channels, nonion channels, and overall accuracy, respectively.

Read full abstract

In extremely cold environments, living organisms like plants, animals, fishes, and microbes can die due to the intracellular ice formation in their bodies. To sustain life in such cold environments, some cold-blooded species produced Antifreeze proteins (AFPs), also called ice-binding proteins. AFPs are not only limited to the medical field but also have diverse significance in the area of biotechnology, agriculture, and the food industry. Different AFPs exhibit high heterogeneity in their structures and sequences. Keeping the significance of AFPs, several machine-learning-based models have been developed by scientists for the prediction of AFPs. However, due to the complex and diverse nature of AFPs, the prediction performance of the existing methods is limited. Therefore, it is highly indispensable for researchers to develop a reliable computational model that can accurately predict AFPs. In this connection, this study presents a novel predictor for AFPs, named AFP-CMBPred. The sequences of AFPs are formulated via four different feature representation methods, such as Amphiphilic pseudo amino acid composition (Amp-PseAAC), Dipeptide Deviation from Expected Mean (DDE), Multi-Blocks Position Specific Scoring Matrix (MB-PSSM), and Consensus Sequence-based on Multi-Blocks Position Specific Scoring Matrix (CS-MB-PSSM) to collect local and global descriptors. In the next step, the extracted feature vectors are evaluated via Support Vector Machine (SVM) and Random Forest (RF) based classification learners. The prediction performance of both classifiers is further assessed using three validation methods i.e., jackknife test, 10-fold cross-validation test, and independent test. After examining the prediction rates of all validation tests, it was found that our proposed model achieved the higher prediction accuracies of ∼2.65%, ∼2.84%, and ∼3.37% using jackknife, K-fold, and independent test, respectively. The experimental outcomes validate that our proposed “AFP-CMBPred” predictor secured the highest prediction results than the existing models for the identification of AFPs. It is further anticipated that our proposed AFP-CMBPred model will be considered a valuable tool in the research academia and drug development.

Read full abstract

Amphiphilic Pseudo Amino Acid Composition Research Articles

Related Topics

Articles published on Amphiphilic Pseudo Amino Acid Composition

LLM4THP: a computing tool to identify tumor homing peptides by molecular and sequence representation of large language model based on two-layer ensemble model strategy

UmamiPreDL: Deep learning model for umami taste prediction of peptides using BERT and CNN

Stack-AAgP: Computational prediction and interpretation of anti-angiogenic peptides using a meta-learning framework

Accelerating the identification of the allergenic potential of plant proteins using a stacked ensemble-learning framework

Prediction of apoptosis protein subcellular location based on amphiphilic pseudo amino acid composition.

IGPred-HDnet: Prediction of Immunoglobulin Proteins Using Graphical Features and the Hierarchal Deep Learning-Based Approach.

MP-VHPPI: Meta predictor for viral host protein-protein interaction prediction in multiple hosts and viruses.

Machine learning-based approach for prediction of ion channels and their subclasses.

UMPred-FRL: A New Approach for Accurate Prediction of Umami Peptides Using Feature Representation Learning.

AFP-CMBPred: Computational identification of antifreeze proteins by extending consensus sequences into multi-blocks evolutionary information

Predicting Drug-Target Interactions with Electrotopological State Fingerprints and Amphiphilic Pseudo Amino Acid Composition.

Discrimination of Golgi Proteins Through Efficient Exploitation of Hybrid Feature Spaces Coupled With SMOTE and Ensemble of Support Vector Machine

IDTi-CSsmoteB: Identification of Drug–Target Interaction Based on Drug Chemical Structure and Protein Sequence Using XGBoost With Over-Sampling Technique SMOTE

Ensemble learning for protein multiplex subcellular localization prediction based on weighted KNN with different features

IACP-GAEnsC: Evolutionary genetic algorithm based ensemble classification of anticancer peptides by utilizing hybrid feature space.

Pred-binding: large-scale protein–ligand binding affinity prediction

PaPI: pseudo amino acid composition to score human protein-coding variants.

Discriminating Outer Membrane Proteins with Fuzzy K-Nearest Neighbor Algorithms Based on the General Form of Chou’s PseAAC

Prediction of Protein Subcellular Multi-Localization Based on the General form of Chou’s Pseudo Amino Acid Composition

CE-PLoc: An ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Amphiphilic Pseudo Amino Acid Composition Research Articles

Related Topics

Articles published on Amphiphilic Pseudo Amino Acid Composition

LLM4THP: a computing tool to identify tumor homing peptides by molecular and sequence representation of large language model based on two-layer ensemble model strategy

UmamiPreDL: Deep learning model for umami taste prediction of peptides using BERT and CNN

Stack-AAgP: Computational prediction and interpretation of anti-angiogenic peptides using a meta-learning framework

Accelerating the identification of the allergenic potential of plant proteins using a stacked ensemble-learning framework

Prediction of apoptosis protein subcellular location based on amphiphilic pseudo amino acid composition.

IGPred-HDnet: Prediction of Immunoglobulin Proteins Using Graphical Features and the Hierarchal Deep Learning-Based Approach.

MP-VHPPI: Meta predictor for viral host protein-protein interaction prediction in multiple hosts and viruses.

Machine learning-based approach for prediction of ion channels and their subclasses.

UMPred-FRL: A New Approach for Accurate Prediction of Umami Peptides Using Feature Representation Learning.

AFP-CMBPred: Computational identification of antifreeze proteins by extending consensus sequences into multi-blocks evolutionary information

Predicting Drug-Target Interactions with Electrotopological State Fingerprints and Amphiphilic Pseudo Amino Acid Composition.

Discrimination of Golgi Proteins Through Efficient Exploitation of Hybrid Feature Spaces Coupled With SMOTE and Ensemble of Support Vector Machine

IDTi-CSsmoteB: Identification of Drug–Target Interaction Based on Drug Chemical Structure and Protein Sequence Using XGBoost With Over-Sampling Technique SMOTE

Ensemble learning for protein multiplex subcellular localization prediction based on weighted KNN with different features

IACP-GAEnsC: Evolutionary genetic algorithm based ensemble classification of anticancer peptides by utilizing hybrid feature space.

Pred-binding: large-scale protein–ligand binding affinity prediction

PaPI: pseudo amino acid composition to score human protein-coding variants.

Discriminating Outer Membrane Proteins with Fuzzy K-Nearest Neighbor Algorithms Based on the General Form of Chou’s PseAAC

Prediction of Protein Subcellular Multi-Localization Based on the General form of Chou’s Pseudo Amino Acid Composition

CE-PLoc: An ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition