Abstract

Nowadays large sequencing projects handle tens of thousands of individuals. The huge files summarizing the findings definitely require compression. We propose a tool able to compress large collections of genotypes almost 30% better than the best tool to date, i.e. squeezing human genotype to less than 62 KB. Moreover, it can also compress single samples in reference to the existing database achieving comparable results. https://github.com/refresh-bio/GTShark. Supplementary data are available at Bioinformatics online.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call