Abstract
Recently, the ever-increasing growth of genomic sequences DNA or RNA stored in databases poses a serious challenge to the storage, process and transmission of these data. Hence effective management of genetic data is very necessary which makes data compression unavoidable. The current standard compression tools are insufficient for DNA sequences compression. In this paper we proposed an efficient lossless DNA compression algorithm based One-Bit Compression method (OBComp) that will compress both repeated and non-repeated sequences. Unlike direct coding technique where two bits are assigned to each nucleotide resulting compression ratio of 2 bits per byte (bpb), OBComp used just a single bit 0 or 1 to code the two highest occurrence nucleotides. The positions of the two others are saved. To further enhance the compression, modified version of Run Length Encoding technique and Huffman coding algorithm are then applied respectively. The proposed algorithm has efficiently reduced the original size of DNA sequences. The easy way to implement our algorithm and the remarkable compression ratio makes its use interesting.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.