Abstract
Studies focusing on recognition of short genes encoding small proteins will provide new essential biological insights. This chapter presents a novel method for prediction of short genes based on chaos game representation (CGR). CGR is a graphical representation of biological sequences such as DNAs and proteins. CGR uniquely represents DNA sequences and reveals hidden patterns in it. In this study, genomic feature extraction is implemented by computing the frequency chaos game representation (FCGR) matrix. The order 2, 3 and 4 FCGR matrices are considered here, which consist of 16, 64 and 256 elements, respectively. These element matrices act as the feature descriptor for classification. We utilized principal component analysis (PCA) as a preprocessing step to reduce the feature vector dimensionality and to improve the classification performance. A novel method for classification based on the combination of FCGR and state-of-the-art pattern recognition algorithm, Naive Bayes classifier, is proposed. The results of the experiment reveal the potential of this representation for discrimination of short genes from noncoding DNA.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.