Through using next-generation sequencers to decode DNA symbols has been a majorly breakthrough in the area of genomic research for decades. A plenty of current approaches of next-generation sequencers with high throughput rates as well as relatively low costs, but it is still challenged for the assembly of the reads which those sequencers produces. We proposed, in this paper, a novel Hidden Markov Model based (HMM-based) approach for next-generation genome sequence assembly programs. The paper introduces the major challenges that currently existed assemblers encounter in the next-generation environment, and four basic stages included in our proposed method: a) pre-processing filtering, b) a graph construction process, c) a graph simplification process, d) post-processing filtering. Experimental results prove the performance of the new approach meets or exceeds the state-of-art by testing a number of DNA open-source datasets.
Read full abstract