Abstract

Accurate identification of genes encoding proteins in genome remains an open problem in computational biology that has been receiving increasing consideration with explosion in sequence data. This has inspired us to relook at this problem. In this study, we propose a novel gene finding algorithm which relies on the use of genomic composition and dinucleotide compositional skew information. In order to identify the most prominent features, two feature selection techniques widely used in data preprocessing for machine learning problems: CFS and ReliefF algorithm applied. The performance of two types of neural network such as multilayer perceptron and RBF network was evaluated with these filter approaches. Our proposed model led to successful prediction of protein coding from non-coding with 96.47% and 94.18 % accuracy for MLP and RBF Network respectively in case of CFS and 94.94 % and 92.46 % accuracy for MLP and RBF Network respectively in case of ReliefF algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call