Abstract

We propose a new algorithm for reference-based compression of genome resequencing data. First, we recap a recent reference-based technique for compressing resequencing data via the Longest Previous Factor (LPF). Viewing the problem from a new light, we call this the Sequential Longest Factor (SLF) method, and introduce improvements to the SLF approach. We further leverage the LPF and propose a new compression method: the Maximal Longest Factor (MLF). For the Homo sapiens genome, our proposed MLF achieves a compression ratio of 486, a significant improvement over 399 (newly improved SLF), 360 (original SLF, Beal, et al., BMC Genomics, 2016), 171 (Pinho, et al., NAR, 2011), and 157 (Wang and Zhang, NAR, 2011).

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call