Abstract

With the rapid development of genome sequencing technology, a large amount of genome data has been generated, it also brings the storage problem of this massive data. Therefore, the compression of genome data has become a research hotspot. We propose a new genome data compression algorithm called LCMRGC (low memory consumption referential genome compressor) for FASTA format sequences. The algorithm uses the suffix array data structure to support the search of matching strings, and uses the binary search method to accelerate accurate matching, so as to obtain better compression ratio. Experiment results on standard genome data show that the proposed algorithm significantly reduces the memory requirement for program operation, and is competitive in compression ratio and compression time.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.