Abstract
We propose a new algorithm for reference-based compression of genome resequencing data. First, we recap a recent reference-based technique for compressing resequencing data via the Longest Previous Factor (LPF). Viewing the problem from a new light, we call this the Sequential Longest Factor (SLF) method, and introduce improvements to the SLF approach. We further leverage the LPF and propose a new compression method: the Maximal Longest Factor (MLF). For the Homo sapiens genome, our proposed MLF achieves a compression ratio of 486, a significant improvement over 399 (newly improved SLF), 360 (original SLF, Beal, et al., BMC Genomics, 2016), 171 (Pinho, et al., NAR, 2011), and 157 (Wang and Zhang, NAR, 2011).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have