Abstract
The Helitrons, an important sub-class of the transposable elements (TEs) class II, have been revealed in diverse eukaryotic genomes. They are mobile elements with great impact on genomic evolution. Till today, there is no systematic classification model of helitrons; that’s why we thought of creating an efficient automatic model to identify these sequences. This paper focuses on the discrimination between helitrons and non-helitrons using the Support Vector Machine (SVM). In this study, we use all the SVM kernels and the higher accuracy rates are obtained by reaching the optimal kernels-parameters (d, c and σ). Further, we introduce two methods to represent the genomic sequences in the form of features to be considered later for the classification task: (i) the temporal and the spectral features extracted from the Frequency Chaos Game Signals order 2 (FCGS2) (ii) the features extracted from the Continuous Wavelet Transform (CWT) applied to the FCGS2 signals. The dataset we used regards two types DNA classes in C.elegans: the helitrons and the repetitive DNA sequences that contain microsatellites and do not form helitrons. The classification results prove that the wavelet energy feature is more effective than the FCGS2 features in the helitron’s recognition system. The performance of our system achieves a high recognition rate (Globally accuracy rate) reaching the value of 92.27%.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have