Abstract

In this paper, we propose a new representation of DNA sequences, which constructs the word frequency vector with multiple resolutions based on the chaos game representation. Compared with the traditional vector, it combines a range of resolutions and reserves higher resolutions, but the dimension is reduced greatly relatively. The algorithm is detailed, which calculates coding format and codes each sequence. To evaluate the significance of our method, we represent Alu sequences by our proposed coding format. After that, the acquired vectors are used to train BP neural networks to recognize the Alu sequences. The experimental results show that this representation of DNA sequences is significant and efficient in biological data processing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call