Colib'read on galaxy: a tools suite dedicated to biological information extraction from raw NGS reads.

Yvan Le Bras,Cyril Monjeaud,Susete Alves-Carvalho,Leena Salmela,Raluca Uricaru,Claire Lemaitre,Gustavo Sacomoto,Olivier Collin,Éric Rivals,Vincent Miele,Camille Marchet,Alexan Andrieux,Vincent Lacroix,Pierre Peterlongo,Amal Zine El Aabidine,Bastien Cazaux

doi:10.1186/s13742-015-0105-2

Abstract

BackgroundWith next-generation sequencing (NGS) technologies, the life sciences face a deluge of raw data. Classical analysis processes for such data often begin with an assembly step, needing large amounts of computing resources, and potentially removing or modifying parts of the biological information contained in the data. Our approach proposes to focus directly on biological questions, by considering raw unassembled NGS data, through a suite of six command-line tools.FindingsDedicated to ‘whole-genome assembly-free’ treatments, the Colib’read tools suite uses optimized algorithms for various analyses of NGS datasets, such as variant calling or read set comparisons. Based on the use of a de Bruijn graph and bloom filter, such analyses can be performed in a few hours, using small amounts of memory. Applications using real data demonstrate the good accuracy of these tools compared to classical approaches. To facilitate data analysis and tools dissemination, we developed Galaxy tools and tool shed repositories.ConclusionsWith the Colib’read Galaxy tools suite, we enable a broad range of life scientists to analyze raw NGS data. More importantly, our approach allows the maximum biological information to be retained in the data, and uses a very low memory footprint.Electronic supplementary materialThe online version of this article (doi:10.1186/s13742-015-0105-2) contains supplementary material, which is available to authorized users.

Highlights

With next-generation sequencing (NGS) technologies, the life sciences face a deluge of raw data
With the Colib’read Galaxy tools suite, we enable a broad range of life scientists to analyze raw NGS data
Our approach allows the maximum biological information to be retained in the data, and uses a very low memory footprint

Summary

Introduction

With next-generation sequencing (NGS) technologies, the life sciences face a deluge of raw data. A set of six tools based on this framework, KISSPLICE [2], MAPSEMBLER2 [3], DISCOSNP [4], TAKEABREAK [5], COMMET [6], and LORDEC [7], are described below. Out SNPs, small indels, alternative splicing events SNP sequences with their coverages Inversion breakpoints Validation and visualization of genome structure near a locus of interest Global comparison of input sets at the read level Corrected PacBio read set events; and TAKEABREAK detects patterns generated by inversions.

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: GigaScience	Publication Date: Feb 11, 2016
Citations: 23	License type: cc-by

R Discovery Prime

R Discovery Prime

Colib'read on galaxy: a tools suite dedicated to biological information extraction from raw NGS reads.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: GigaScience

Lead the way for us

Similar Papers

Dry Panels Supporting External Quality Assessment Programs for Next Generation Sequencing-Based HIV Drug Resistance Testing.
Marc Noguera-Julian ... Robert W Shafer
Viruses | VOL. 12
Marc Noguera-Julian, et. al.Marc Noguera-Julian ... Robert W Shafer
20 Jun 2020
Viruses | VOL. 12

VONC: A solution for the clinical assessment of somatic genomic alterations.
Robert Kueffner ...
Journal of Clinical Oncology | VOL. 37
Robert Kueffner, et. al.Robert Kueffner ...
20 May 2019
Journal of Clinical Oncology | VOL. 37

GeniePool: genomic database with corresponding annotated samples based ona cloud data lake architecture.
Noam Hadar ... Shlomi Dolev
Database : the journal of biological databases and curation | VOL. 2023
Noam Hadar, et. al.Noam Hadar ... Shlomi Dolev
13 Jun 2023
Database : the journal of biological databases and curation | VOL. 2023

Evaluation of EPISEQ SARS-CoV-2 and a Fully Integrated Application to Identify SARS-CoV-2 Variants from Several Next-Generation Sequencing Approaches.
Nathalie Mugnier ... Véronique Ligeon
Viruses | VOL. 14
Nathalie Mugnier, et. al.Nathalie Mugnier ... Véronique Ligeon
29 Jul 2022
Viruses | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Colib'read on galaxy: a tools suite dedicated to biological information extraction from raw NGS reads.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: GigaScience