Abstract

The rapid improvement of the next-generation sequencing (NGS) technologies has enabled unprecedented production of huge DNA sequence data at low cost. However, the NGS technologies are still limited to generate short DNA sequences, which has led to the development of many assembly algorithms to recover whole genome sequences from those short sequences. Unfortunately, the assembly algorithms alone can only construct scaffold sequences, which are generally much shorter than chromosome sequences. To generate chromosome sequences, additional expensive experimental data is required. To overcome this problem, there have been many studies to develop new computational algorithms to further merge the scaffold sequences, and produce chromosome-level sequences by utilizing an existing genome assembly of a related species called a reference. However, even though the quality of the chosen reference assembly is critical for generating a good final assembly, its effect is not well uncovered yet. In this study, we measured the effect of the reference genome assembly on the quality of the final assembly generated by reference-guided assembly algorithms. By using the genome assemblies of total eleven reference species (eight primates and three rodents), the human genome sequences were assembled from scaffold sequences by one of the reference-guided assembly algorithms, called RACA, and they were compared with known genome sequences to measure their quality in terms of the number of misassemblies. The effect of the quality of the reference assemblies was investigated in terms of divergence time against human, alignment coverage between the reference and human, and the amount of inclusion of core eukaryotic genes. We found that the divergence time is a good indicator of the quality of the final assembly when reference assemblies with high quality are used. We believe this study will contribute to broaden our understanding of the effect and importance of a reference assembly on the reference-guided assembly task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call