QUARTIC: QUick pArallel algoRithms for high-Throughput sequencIng data proCessing.

Frédéric Jarlier,Paul Paganiban,Philippe Hupé,Thomas Magalhaes,Nicolas Fedy,Firmin Martin,Leonor Sirotti,Michael Mcmanus,Nicolas Joly

doi:10.12688/f1000research.22954.2

Abstract

Life science has entered the so-called 'big data era' where biologists, clinicians and bioinformaticians are overwhelmed with high-throughput sequencing data. While they offer new insights to decipher the genome structure they also raise major challenges to use them for daily clinical practice care and diagnosis purposes as they are bigger and bigger. Therefore, we implemented a software to reduce the time to delivery for the alignment and the sorting of high-throughput sequencing data. Our solution is implemented using Message Passing Interface and is intended for high-performance computing architecture. The software scales linearly with respect to the size of the data and ensures a total reproducibility with the traditional tools. For example, a 300X whole genome can be aligned and sorted within less than 9 hours with 128 cores. The software offers significant speed-up using multi-cores and multi-nodes parallelization.

Highlights

Life science has entered the so-called 'big data era' where biologists, clinicians and bioinformaticians are overwhelmed with highthroughput sequencing data
As we have entered the era of genomic medicine, delivering the results to the clinicians within a short delay to guide the therapeutic decision is a challenge of the utmost importance in daily clinical practice
A typical bioinformatics workflow to analyze high-throughput sequencing (HTS) data consists of a set of systematic steps of pre-processing to i) align the sequencing reads on a reference genome and ii) to sort the alignments according to their coordinates on the genome

Summary

23 Jun 2020

The most recent generation of sequencers can produce terabytes of data each day and we expect this exponential growth of the sequencing to continue This data tsunami raises many challenges, from data management to data analysis, requiring an efficient high-performance computing architecture (Lightbody et al, 2019). These steps are very time consuming (up to several days for whole genome analysis) as they suffer from bottlenecks at the CPU, IO and memory levels Removing these bottlenecks would make it possible to reduce the time-to-delivery of the results such that they could be available within a reasonable delay when very large data are produced by the sequencers. This allows an efficient distribution of the workload over the available resources of the supercomputers providing the expected scalability

Methods

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: F1000Research	Publication Date: Jun 23, 2020
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

QUARTIC: QUick pArallel algoRithms for high-Throughput sequencIng data proCessing.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research

Lead the way for us

Similar Papers

QUARTIC: QUick pArallel algoRithms for high-Throughput sequencIng data proCessing
Nicolas Joly ... Frédéric Jarlier
F1000Research | VOL. 9
Nicolas Joly, et. al.Nicolas Joly ... Frédéric Jarlier
12 Jun 2020
F1000Research | VOL. 9

QUARTIC: QUick pArallel algoRithms for high-Throughput sequencIng data proCessing.
Frédéric Jarlier ... Firmin Martin
F1000Research | VOL. 9
Frédéric Jarlier, et. al.Frédéric Jarlier ... Firmin Martin
08 Oct 2020
F1000Research | VOL. 9

QUARTIC: QUick pArallel algoRithms for high-Throughput sequencIng data proCessing
Frédéric Jarlier ... Nicolas Joly
F1000Research | VOL. 9
Frédéric Jarlier, et. al.Frédéric Jarlier ... Nicolas Joly
06 Apr 2020
F1000Research | VOL. 9

No big data without small data: learning health care systems begin and end with the individual patient.
José A Sacristán ... Tatiana Dilla
Journal of Evaluation in Clinical Practice | VOL. 21
José A Sacristán, et. al.José A Sacristán ... Tatiana Dilla
31 Mar 2015
Journal of Evaluation in Clinical Practice | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

QUARTIC: QUick pArallel algoRithms for high-Throughput sequencIng data proCessing.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research