SeedsGraph: an efficient assembler for next-generation sequencing data.

Chunyu Wang,Xiaoyan Liu,Yang Liu,Quan Zou,Maozu Guo

doi:10.1186/1755-8794-8-s2-s13

Chunyu Wang, Xiaoyan Liu + Show 3 more

Open Access

https://doi.org/10.1186/1755-8794-8-s2-s13

Copy DOI

Abstract

DNA sequencing technology has been rapidly evolving, and produces a large number of short reads with a fast rising tendency. This has led to a resurgence of research in whole genome shotgun assembly algorithms. We start the assembly algorithm by clustering the short reads in a cloud computing framework, and the clustering process groups fragments according to their original consensus long-sequence similarity. We condense each group of reads to a chain of seeds, which is a kind of substring with reads aligned, and then build a graph accordingly. Finally, we analyze the graph to find Euler paths, and assemble the reads related in the paths into contigs, and then lay out contigs with mate-pair information for scaffolds. The result shows that our algorithm is efficient and feasible for a large set of reads such as in next-generation sequencing technology.

Highlights

The introduction of the massively parallel next-generation sequencing (NGS) technologies has caused a great increase in the number of reads typically generated by experiments
The whole genome shotgun (WGS) de novo assembly problem is the reconstruction of the genetic sequence information from a set of reads sequenced from the fragments
Discussion and future work In this paper we present methods and implementation techniques for a new clustering-based, graph-conducted assembler, named SeedsGraph, which is efficient and takes advantage of cloud computing for the large dataset of NGS data

Summary

Introduction

The introduction of the massively parallel next-generation sequencing (NGS) technologies has caused a great increase in the number of reads typically generated by experiments. The shorter read length from NGS and the sheer demand for more scalable assemblers have been an important computational challenge, and the genome assembly continues to represent one of the most difficult and important algorithmic problems in bioinformatics. Software technology and algorithm implementation become critical factors when dealing with terabytes of data. Cloud computing as a brand new way of dealing with an extremely large dataset offers a good chance for bioinformatics data processing. The ability and feasibility for underlying applications have been discussed [1,2]. We design a graph-based method for the NGS reads assembly problem and implement it as a software package, SeedsGraph. In the Background section, the NGS reads assembly problem and the framework for cloud computing are discussed.

Background

9: Save T to HDFS for next job

Findings

Result

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Medical Genomics	Publication Date: May 29, 2015
Citations: 17	License type: cc-by

R Discovery Prime

R Discovery Prime

SeedsGraph: an efficient assembler for next-generation sequencing data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Genomics

Lead the way for us

Similar Papers

Short Read (Next-Generation) Sequencing
Jaya Punetha ... Eric P Hoffman
Circulation: Cardiovascular Genetics | VOL. 6
Jaya Punetha, et. al.Jaya Punetha ... Eric P Hoffman
14 Jul 2013
Circulation: Cardiovascular Genetics | VOL. 6

Next Generation Sequencing Technologies and Their Applications
Ku Chee‐Seng ... Pawitan Yudi
-
Ku Chee‐Seng, et. al.Ku Chee‐Seng ... Pawitan Yudi
19 Apr 2010
19 Apr 2010

Current state-of-art of sequencing technologies for plant genomics research
M Thudi ... Y Li
Briefings in Functional Genomics | VOL. 11
M Thudi, et. al.M Thudi ... Y Li
01 Jan 2012
Briefings in Functional Genomics | VOL. 11

Novel Computational Technologies for Next-Generation Sequencing Data Analysis and Their Applications.
Chuan Yi Tang ... Che-Lun Hung
International Journal of Genomics | VOL. 2015
Chuan Yi Tang, et. al.Chuan Yi Tang ... Che-Lun Hung
01 Jan 2015
International Journal of Genomics | VOL. 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SeedsGraph: an efficient assembler for next-generation sequencing data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Genomics