Abstract
The compression is required to reduce the size of data to store in computer storage as well as to transmit data over the internet with limited bandwidth. The genomic sequences DNA or RNA contain billions of nucleotide bases (A, G, T, C) resulting large sized file to store in computer. In the previous compression algorithms the authors use direct coding technique where two bits are used to code a nucleotide base resulting compression ratio of 2 bits per byte (bpb). Some authors achieve a compression ratio less than 2 bpb after coding repeated bases differently. In this paper we proposed an improvement over direct coding technique that will compress both repeated and non-repeated sequences. The proposed algorithm provides better result as compared to existing algorithms. The existing direct coding algorithm compresses the non-repeated base (B) by prefixing a 0 followed by 2 bits code assigned for that base (i.e. 0B) whereas the repeated bases are compressed by prefixing 1 followed by 2 bits code assigned for that base (B) followed by 3 bits code to represent the number of repetitions (N) (i.e. 1BN). The non-repeated bases are coded by 3 bits and repeated bases by 6 bits but the existing algorithm is limited to compress the repeated sequence till 9 because of 3 bits coding. If the repeated sequence is greater than 9 then it requires a number of bits in multiple of 6 (i.e. 6, 12, and 18, so on). The propped algorithm compress the repeated base by making an improvement over existing one that will code a repeated sequence greater than 9 but less than 15 by prefixing 1 followed by base (B) followed by 111 followed by N (i.e. 1B111N) in 9 bits in place of 12 bits in existing algorithm hence the proposed algorithm provide better compression ratio.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.