Calibrating the classifier for protein family prediction with protein sequence using machine learning techniques: An empirical investigation

T Idhaya,A Suruliandi,S P Raja,Dragos Calitoiu

doi:10.1142/s021969132250045x

Abstract

A gene is a basic unit of congenital traits and a sequence of nucleotides in deoxyribonucleic acid that encrypts protein synthesis. Proteins are made up of amino acid residue and are classified for use in protein-related research, which includes identifying changes in genes, finding associations with diseases and phenotypes, and identifying potential drug targets. To this end, proteins are studied and classified, based on the family. For family prediction, however, a computational rather than an experimental approach is introduced, owing to the time involved in the latter process. Computational approaches to protein family prediction involve two important processes, feature selection and classification. Existing approaches to protein family prediction are alignment-based and alignment-free. The drawback of the former is that it searches for protein signatures by aligning every available sequence. Consequently, the latter alignment-free approach is taken for study, given that it only needs sequence-based features to predict the protein family and is far more efficient than the former. Nevertheless, the sequence-based characteristics taken for study have additional features to offer. There is, thus, a need to select the best features of all. When comes to classification still there is no perfection in classifying the protein. So, a comparison of different approaches is done to find the best feature selection technique and classification technique for protein family prediction. From the study, the feature subset selected provides the best classification accuracy of 96% for filter-based feature selection technique and the random forest classifier.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Calibrating the classifier for protein family prediction with protein sequence using machine learning techniques: An empirical investigation

Abstract

Talk to us

Similar Papers

More From: International Journal of Wavelets, Multiresolution and Information Processing

Lead the way for us

Journal: International Journal of Wavelets, Multiresolution and Information Processing	Publication Date: Jan 25, 2023
Citations: 1

Similar Papers

A Comprehensive Review on Machine Learning Techniques for Protein Family Prediction.
T Idhaya ... S P Raja
The protein journal | VOL. 43
T Idhaya, et. al.T Idhaya ... S P Raja
01 Mar 2024
The protein journal | VOL. 43

Software-based Prediction of Liver Disease with Feature Selection and Classification Techniques
Sachin Bagga ... Jagdeep Singh
Procedia Computer Science | VOL. 167
Sachin Bagga, et. al.Sachin Bagga ... Jagdeep Singh
01 Jan 2020
Procedia Computer Science | VOL. 167

Identifying Optimal Parameters And Their Impact For Predicting Credit Card Defaulters Using Machine-Learning Algorithms
Humaira Naeem ... Muhammad Imran
Lahore Garrison University Research Journal of Computer Science and Information Technology | VOL. 6
Humaira Naeem, et. al. Humaira Naeem ... Muhammad Imran
30 Mar 2022
Lahore Garrison University Research Journal of Computer Science and Information Technology | VOL. 6

Prediction of Thyroid Disease(Hypothyroid) in Early Stage Using Feature Selection and Classification Techniques
Khandakar Zahidur Rahim ... Md Riajuliislam
-
Khandakar Zahidur Rahim, et. al.Khandakar Zahidur Rahim ... Md Riajuliislam
27 Feb 2021
27 Feb 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Calibrating the classifier for protein family prediction with protein sequence using machine learning techniques: An empirical investigation

Abstract

Talk to us

Similar Papers

More From: International Journal of Wavelets, Multiresolution and Information Processing