Abstract
BackgroundWith next-generation sequencing (NGS) technologies, the life sciences face a deluge of raw data. Classical analysis processes for such data often begin with an assembly step, needing large amounts of computing resources, and potentially removing or modifying parts of the biological information contained in the data. Our approach proposes to focus directly on biological questions, by considering raw unassembled NGS data, through a suite of six command-line tools.FindingsDedicated to ‘whole-genome assembly-free’ treatments, the Colib’read tools suite uses optimized algorithms for various analyses of NGS datasets, such as variant calling or read set comparisons. Based on the use of a de Bruijn graph and bloom filter, such analyses can be performed in a few hours, using small amounts of memory. Applications using real data demonstrate the good accuracy of these tools compared to classical approaches. To facilitate data analysis and tools dissemination, we developed Galaxy tools and tool shed repositories.ConclusionsWith the Colib’read Galaxy tools suite, we enable a broad range of life scientists to analyze raw NGS data. More importantly, our approach allows the maximum biological information to be retained in the data, and uses a very low memory footprint.Electronic supplementary materialThe online version of this article (doi:10.1186/s13742-015-0105-2) contains supplementary material, which is available to authorized users.
Highlights
With next-generation sequencing (NGS) technologies, the life sciences face a deluge of raw data
With the Colib’read Galaxy tools suite, we enable a broad range of life scientists to analyze raw NGS data
Our approach allows the maximum biological information to be retained in the data, and uses a very low memory footprint
Summary
With next-generation sequencing (NGS) technologies, the life sciences face a deluge of raw data. A set of six tools based on this framework, KISSPLICE [2], MAPSEMBLER2 [3], DISCOSNP [4], TAKEABREAK [5], COMMET [6], and LORDEC [7], are described below. Out SNPs, small indels, alternative splicing events SNP sequences with their coverages Inversion breakpoints Validation and visualization of genome structure near a locus of interest Global comparison of input sets at the read level Corrected PacBio read set events; and TAKEABREAK detects patterns generated by inversions.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.