Abstract

Next-generation sequencing has revolutionized the field of genomics by producing accurate, rapid and cost-effective genome analysis with the use of high throughput sequencing technologies. This has intensified the need for accurate and performance efficient genome assemblers to assemble a large set of short reads produced by next-generation sequencing technology. Genome assembly is an NP-hard problem that is computationally challenging. Therefore, the current methods that rely on heuristic and approximation algorithms to assemble genomes prevent them from arriving at the most accurate solution. This paper presents a novel approach by gamifying whole-genome shotgun assembly from next-generation sequencing data; we present "Geno", a human-computing game designed with the aim of improving the accuracy of whole-genome shotgun assembly. We evaluate the feasibility of crowdsourcing the problem of whole-genome shotgun assembly by breaking the problem into small subtasks. The evaluation results, for single-cell Escherichia coli K-12 substr. MG1655 with a read length of 25 bp that produced 144,867 game instances of mean 25 sequences per instance at 40x coverage indicate the feasibility of sub-tasking the problem of genome assembly to be solved using crowdsourcing.<br /><br />

Highlights

  • Reconstruction of whole-genome sequences undergoes two main steps, namely genome fragmentation and genome assembly

  • MG1655 as a reference and simulated Next Generation Sequencing (NGS) reads using ART NGS simulator [44], which were mapped to the reference genome using the Burrows-Wheeler Aligner (BWA) software [45]; duplicate removal and sorting the reads using the location mappings given by BurrowsWheeler Aligner (BWA)

  • When the coverage is increased up to 40x, the number of noise reads without any overlaps could be minimized and clusters having a feasible number of reading sequences that can be aligned by an individual player could be obtained

Read more

Summary

Introduction

Reconstruction of whole-genome sequences undergoes two main steps, namely genome fragmentation and genome assembly. Genome fragmentation is the process of generating fragments of DNA called reads where genome sequencing technologies are used. These reads from sequencing methods are the inputs for genome assemblers to reconstruct the original whole-genome of the organism. Genome assembling is an essential step as sequencing methods cannot read the whole genome with a single iteration. These reads should be carefully aligned and merged. There are two main genome sequencing technologies: 1. Sanger sequencing technology introduced in 1977 [1], which produces longer DNA reads of around 2000 base pairs (bp) in length There are two main genome sequencing technologies: 1. Sanger sequencing technology introduced in 1977 [1], which produces longer DNA reads of around 2000 base pairs (bp) in length

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call