Similarity evaluation of DNA sequences based on frequent patterns and entropy.

Xiaojing Xie,Jihong Guan,Shuigeng Zhou

doi:10.1186/1471-2164-16-s3-s5

Abstract

BackgroundDNA sequence analysis is an important research topic in bioinformatics. Evaluating the similarity between sequences, which is crucial for sequence analysis, has attracted much research effort in the last two decades, and a dozen of algorithms and tools have been developed. These methods are based on alignment, word frequency and geometric representation respectively, each of which has its advantage and disadvantage.ResultsIn this paper, for effectively computing the similarity between DNA sequences, we introduce a novel method based on frequency patterns and entropy to construct representative vectors of DNA sequences. Experiments are conducted to evaluate the proposed method, which is compared with two recently-developed alignment-free methods and the BLASTN tool. When testing on the β-globin genes of 11 species and using the results from MEGA as the baseline, our method achieves higher correlation coefficients than the two alignment-free methods and the BLASTN tool.ConclusionsOur method is not only able to capture fine-granularity information (location and ordering) of DNA sequences via sequence blocking, but also insensitive to noise and sequence rearrangement due to considering only the maximal frequent patterns. It outperforms major existing methods or tools.

Highlights

DNA sequence analysis is an important research topic in bioinformatics
We examine how the distance between human sequence and gorilla sequence changes with block size
We can see that the distance between the original human sequence segments and the corresponding shuffled ones is quite small, we conclude that our method is tolerant of sequence rearrangement

Summary

Results

For effectively computing the similarity between DNA sequences, we introduce a novel method based on frequency patterns and entropy to construct representative vectors of DNA sequences. When testing on the b-globin genes of 11 species and using the results from MEGA as the baseline, our method achieves higher correlation coefficients than the two alignmentfree methods and the BLASTN tool

Conclusions

Background

Methods

Experiments and results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Jan 29, 2015
Citations: 32	License type: cc-by

R Discovery Prime

R Discovery Prime

Similarity evaluation of DNA sequences based on frequent patterns and entropy.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

<title>Association mining of dependency between time series</title>
Alaaeldin Hafez ... Belur V Dasarathy
-
Alaaeldin Hafez, et. al.Alaaeldin Hafez ... Belur V Dasarathy
27 Mar 2001
27 Mar 2001

Comparative analysis of genetic based approach and Apriori algorithm for mining maximal frequent item sets
Mir Md Jahangir Kabir ... Byeong Ho Kang
-
Mir Md Jahangir Kabir, et. al.Mir Md Jahangir Kabir ... Byeong Ho Kang
01 May 2015
01 May 2015

A measure of DNA sequence similarity by Fourier Transform with applications on hierarchical clustering
Changchuan Yin ... Stephen S.-T Yau
Journal of Theoretical Biology | VOL. 359
Changchuan Yin, et. al.Changchuan Yin ... Stephen S.-T Yau
06 Jun 2014
Journal of Theoretical Biology | VOL. 359

A novel method for comparative analysis of DNA sequences by Ramanujan-Fourier transform.
Changchuan Yin ... Xuemeng E Yin
Journal of computational biology : a journal of computational molecular cell biology | VOL. 21
Changchuan Yin, et. al.Changchuan Yin ... Xuemeng E Yin
01 Dec 2014
Journal of computational biology : a journal of computational molecular cell biology | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Similarity evaluation of DNA sequences based on frequent patterns and entropy.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics