Abstract
In bioinformatics systems, the study of genetics is a popular research discipline. These systems depend on the amount of similarity between the biological data. These data are based on DNA sequences or raw sequencing reads. In the preprocessing stage, there are several methods for measuring similarity between sequences. The most popular of these methods is the alignment method and alignment-free method, which are applied to determine the amount of functional matching between sequences of nucleotides DNA, ribosome RNA, or proteins. Alignment-based methods pose a great challenge in terms of computational complexity, In addition to delaying the time to search for a match, especially if the data is heterogeneous and its size is huge, and thus the classification accuracy decreases in the post-processing stage. Alignment-free methods have overcome the challenges of alignment-based methods for measuring the distance between sequences, The size of the data used is 1000 genomes uploaded from National Center for Biotechnology Information (NCBI), after eliminating the missing and irrelevant values, it becomes 860 genomes, ready to be segmented into words by the k-mer analysis, after which the frequency of each word is counted for each query. The size of a word depends on a value of k. In this paper we used a value of k =3 ….8, for each iteration will count times of frequencies words.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Nonlinear Analysis and Applications
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.