Abstract
Motivation: The reference CRAM file format implementation is in Java. We present ‘Scramble’: a new C implementation of SAM, BAM and CRAM file I/O.Results: The C implementation of for CRAM is 1.5–1.7× slower than BAM at decoding but 1.8–2.6× faster at encoding. We see file size savings of 34–55%.Availability and implementation: Source code is available at http://sourceforge.net/projects/staden/files/io_lib/ under the BSD software licence.Contact: jkb@sanger.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online.
Highlights
Storage capacity has been the primary driver behind the development of the CRAM format (Cochrane et al, 2013)
We identified a need for a C implementation, which was implemented as part of the Staden Package’s (Staden et al, 1999) ‘io_lib’ library
The test data used were a 4Â coverage of a Homo sapiens sample (ERR317482) aligned by BWA, with a further 1000 Genomes, and a 654Â coverage Escherichia coli test set included in the Supplementary Material
Summary
Storage capacity has been the primary driver behind the development of the CRAM format (Cochrane et al, 2013). The CRAM format (Fritz et al, 2011) is a practical implementation of reference-based compression and is a viable alternative to the earlier BAM format (Li et al, 2009). CRAM is the preferred submission format for the European Nucleotide Archive. The initial CRAM prototype was in Python, quickly followed by a Picard (http://picard.sourceforge.net/) compatible Java reference implementation (https://www.ebi.ac.uk/ena/about/cram_ toolkit). We identified a need for a C implementation, which was implemented as part of the Staden Package’s (Staden et al, 1999) ‘io_lib’ library. Our primary conversion tool is named Scramble. It can read and write SAM, BAM and CRAM formats using a unified Application Programming Interface (API)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.