Abstract

Even though each of us shares more than 99% of the DNA sequences in our genome, there are millions of sequence codes or structure in small regions that differ between individuals, giving us different characteristics of appearance or responsiveness to medical treatments. Currently, genetic variants in diseased tissues, such as tumors, are uncovered by exploring the differences between the reference genome and the sequences detected in the diseased tissue. However, the public reference genome was derived with the DNA from multiple individuals. As a result of this, the reference genome is incomplete and may misrepresent the sequence variants of the general population. The more reliable solution is to compare sequences of diseased tissue with its own genome sequence derived from tissue in a normal state. As the price to sequence the human genome has dropped dramatically to around $1000, it shows a promising future of documenting the personal genome for every individual. However, de novo assembly of individual genomes at an affordable cost is still challenging. Thus, till now, only a few human genomes have been fully assembled. In this review, we introduce the history of human genome sequencing and the evolution of sequencing platforms, from Sanger sequencing to emerging “third generation sequencing” technologies. We present the currently available de novo assembly and post-assembly software packages for human genome assembly and their requirements for computational infrastructures. We recommend that a combined hybrid assembly with long and short reads would be a promising way to generate good quality human genome assemblies and specify parameters for the quality assessment of assembly outcomes. We provide a perspective view of the benefit of using personal genomes as references and suggestions for obtaining a quality personal genome. Finally, we discuss the usage of the personal genome in aiding vaccine design and development, monitoring host immune-response, tailoring drug therapy and detecting tumors. We believe the precision medicine would largely benefit from bioinformatics solutions, particularly for personal genome assembly.

Highlights

  • Following President Obama’s announcement of the new initiative for precision medicine, the NIH proposed a large scale sequencing project to sequence one million human genomes [1]

  • Using a different configuration with four CPUs of Xeon E7-4870 reduced the total assembly time approximately to 7 h [101]. These results demonstrate the demanding nature of the de novo genome assembly process

  • As the “ground” truth of genome sequence for the individual subjected to whole genome sequencing (WGS) is unavailable, it is important to establish quality metrics and parameters in order to evaluate the validity of assembled genome

Read more

Summary

Introduction

Following President Obama’s announcement of the new initiative for precision medicine, the NIH proposed a large scale sequencing project to sequence one million human genomes [1]. More and more genetic mutations or defects are linked to various diseases [4], and database repositories are being created, providing storage and dissemination of such actionable mutations [5] Identifying these variants in individual patients will be the key objective for an enhanced clinical diagnosis and prognosis. Even though the reference genome has been improved over the past fifteen years, the latest build of the reference genome still has hundreds of gaps and unplaced scaffolds (see Table 1), owing to different haplotypes from original donors Another pitfall of the current reference genome is that reference alleles of single nucleotide polymorphism (SNP) may represent minor alleles in the general population. We consider the benefit of using personal genomes as references and approaches to be taken in order to obtain a reliable personal genome

History of Human Genome Sequencing
Evolution of Sequencing Platforms
Illumina Platforms
Roche 454
Life Technology Ion Torrent
Qiagen Intelligent Biosystems
Pacific Biosciences
Oxford Nanopore
Current Solutions for De Novo Assembly and Post-Assembly
De Novo Assembly Approaches
Post-Assembly Approach
Data Storage
Cloud Computing
Quality Metrics and Parameters for Assembled Genome
Perspectives and Remaining Challenges
Application in Pharmaceuticals and Pharmacogenomics
10. Summary
Potential
Findings
Baitang
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call