Abstract

Data storage is a major and growing part of IT budgets for research since manyyears. Especially in biology, the amount of raw data products is growing continuously,and the advent of the so-called "next-generation" sequencers has made things worse.Affordable prices have pushed scientists to massively sequence whole genomes and to screenlarge cohort of patients, thereby producing tons of data as a side effect. The need formaximally fitting data into the available storage volumes has encouraged and welcomednew compression algorithms and tools. We focus here on state-of-the-art compression toolsand measure their compression performance on ABI SOLiD data.

Highlights

  • Since 1953, when the engineers at IBM’s San Jose California laboratory invented the first “random access file” [1], storage performance has continuously grown both in terms of capacity and speed data access

  • We correctly retrieved more than 95% of the original variants after decompression of the BAM files compressed by CRAMtools, NGC and Quip

  • Given the dramatic expansion of Next Generation Sequencing (NGS) technology in biomedical research and the consequent production of huge amount of data, data compression has become of great importance

Read more

Summary

Introduction

Since 1953, when the engineers at IBM’s San Jose California laboratory invented the first “random access file” [1], storage performance has continuously grown both in terms of capacity and speed data access. The space/cost ratio has dropped drastically with time. The cost was estimated to follow the equation: cost = 100.2502 × (year−1980) + 6.304 , with an extraordinarily high coefficient of correlation (r = 0.9916) [2]. This positive trend has only mitigated the economical impact of the highly elevated data production rates of today’s biomedical instruments. Storage costs have largely exceeded reagent costs, leading sometimes to the extreme decision of carrying out the experiment again rather than retain raw data for long time

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.