Abstract

BackgroundNumerous essential algorithms and methods, including entropy-based quantitative methods, have been developed to analyze complex DNA sequences since the last decade. Exons and introns are the most notable components of DNA and their identification and prediction are always the focus of state-of-the-art research.ResultsIn this study, we designed an integrated entropy-based analysis approach, which involves modified topological entropy calculation, genomic signal processing (GSP) method and singular value decomposition (SVD), to investigate exons and introns in DNA sequences. We optimized and implemented the topological entropy and the generalized topological entropy to calculate the complexity of DNA sequences, highlighting the characteristics of repetition sequences. By comparing digitalizing entropy values of exons and introns, we observed that they are significantly different. After we converted DNA data to numerical topological entropy value, we applied SVD method to effectively investigate exon and intron regions on a single gene sequence. Additionally, several genes across five species are used for exon predictions.ConclusionsOur approach not only helps to explore the complexity of DNA sequence and its functional elements, but also provides an entropy-based GSP method to analyze exon and intron regions. Our work is feasible across different species and extendable to analyze other components in both coding and noncoding region of DNA sequences.

Highlights

  • Numerous essential algorithms and methods, including entropy-based quantitative methods, have been developed to analyze complex Deoxyribonucleic acid (DNA) sequences since the last decade

  • We optimized and implemented the topological entropy and the generalized topological entropy to calculate the complexity of DNA sequences, highlighting the characteristics of repetition sequences

  • Modified generalized topological entropy and its application on exploring complexity of exons, introns and promoters Topological entropy was proposed by Koslicki [16] to solve entropy calculation quest on finite sequences

Read more

Summary

Introduction

Numerous essential algorithms and methods, including entropy-based quantitative methods, have been developed to analyze complex DNA sequences since the last decade. Research on Deoxyribonucleic acid (DNA) is a key content and important foundation in biological and life science studies [1, 2]. Functional DNA elements such as genes and noncoding elements are composed of four nucleotides: adenine (A), cytosine (C), guanine (G) and Information theory is a science which studies the measurement, transmission, exchange and storage of information. Information theory [9, 10] method is a feasible way to analyze genetic information [6, 11–13]. It is reasonable to analyze the genome sequence based on information entropy methods. Based on the theory of information entropy, people can quantitatively describe the complexity of given sequences and categorize these sequences according to their complexity

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call