Abstract

DNA sequencing technologies hold great promise in generating information that will guide scientists to understand how the genome effects human health and organismal evolution. The process of generating raw genome sequence data becomes cheaper and faster, but more error-prone. Assembly of such data into high-quality finished genome sequences remains challenging. Many genome assembly tools are available, but they differ in terms of their performance and their final output. More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. Here we evaluate the accuracies of several genome scaffolding algorithms using two different types of data generated from the genome of the same human individual: whole genome shotgun sequencing (WGS) and pooled clone sequencing (PCS). We observe that it is possible to obtain better assemblies if PCS data are used, compared to using only WGS data. However, the current scaffolding algorithms are developed only for WGS, and PCS-aware scaffolding algorithms remain an open problem.

Highlights

  • Completion of the Human Genome Project (HGP) was one of the greatest achievements in all life sciences research (International Human Genome Sequencing Consortium, 2004)

  • The information we gain thanks to the reference genome built by the HGP and the subsequent analyses performed by the 1000 Genomes Project and the ENCODE Project (ENCODE Project Consortium, 2012) will be the main source of knowledge in achieving precision medicine

  • To evaluate the efficacy of pooled clone sequencing (PCS) in genome scaffolding, we focused on chromosomes 1 and 20 of the human genome, which are the longest and the shortest chromosomes in the latest human reference genome, respectively (GRCh38)

Read more

Summary

Introduction

Completion of the Human Genome Project (HGP) was one of the greatest achievements in all life sciences research (International Human Genome Sequencing Consortium, 2004). The HGP was started in 1990, and thanks to the innovations in automated genome sequencing technologies, the human genome was completed in 2004. The HGP has allowed researchers to learn functions of genes and effects of their mutations, and it was the driving force and motivation for the 1000 Genomes Project (The 1000 Genomes Project Consortium, 2015). The information we gain thanks to the reference genome built by the HGP and the subsequent analyses performed by the 1000 Genomes Project and the ENCODE Project (ENCODE Project Consortium, 2012) will be the main source of knowledge in achieving precision medicine. With the help of emerging technologies, more powerful computers, and massively parallel high-throughput sequencing (HTS), scientists are able to read and assemble genomes faster than ever before (Mardis, 2008)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call