Abstract

The development of next-generation sequencing (NGS) technologies has dramatically increased the throughput, speed, and efficiency of genome sequencing. The short read data generated from NGS platforms, such as SOLiD and Illumina, are quite useful for mapping analysis. However, the SOLiD read data with lengths of <60 bp have been considered to be too short for de novo genome sequencing. Here, to investigate whether de novo sequencing of fungal genomes is possible using only SOLiD short read sequence data, we performed de novo assembly of the Aspergillus oryzae RIB40 genome using only SOLiD read data of 50 bp generated from mate-paired libraries with 2.8- or 1.9-kb insert sizes. The assembled scaffolds showed an N50 value of 1.6 Mb, a 22-fold increase than those obtained using only SOLiD short read in other published reports. In addition, almost 99% of the reference genome was accurately aligned by the assembled scaffold fragments in long lengths. The sequences of secondary metabolite biosynthetic genes and clusters, whose products are of considerable interest in fungal studies due to their potential medicinal, agricultural, and cosmetic properties, were also highly reconstructed in the assembled scaffolds. Based on these findings, we concluded that de novo genome sequencing using only SOLiD short reads is feasible and practical for molecular biological study of fungi. We also investigated the effect of filtering low quality data, library insert size, and k-mer size on the assembly performance, and recommend for the assembly use of mild filtered read data where the N50 was not so degraded and the library has an insert size of ∼2.0 kb, and k-mer size 33.

Highlights

  • Whole-genome sequencing is an invaluable tool in evolutionary and functional studies of biological systems

  • We successfully identified the secondary metabolite biosynthesis (SMB) gene cluster of kojic acid (KA) [22], which is used in cosmetics as a skin-whitening agent, using the genome sequence data of Aspergillus oryzae [23], including the genomic location of two genes involved in KA synthesis

  • We evaluated three factors to assess the performance of the de novo genome assembly; (1) quality of the read data, (2) insert size of a mate-paired library, and (3) kmer size used in the assembly program, by changing the degree of data filtering, the library insert size, and k-mer sizes

Read more

Summary

Introduction

Whole-genome sequencing is an invaluable tool in evolutionary and functional studies of biological systems. The development of next-generation sequencing (NGS) technologies, such as the SOLiD (Life Technologies), Solexa and Genome Analyzer (Illumina), and 454 GS FLX (Roche) systems, has increased the throughput and reduced the cost of sequencing by several orders of magnitude [1]. The whole genomes of several viral [2,3], bacterial [4,5,6,7,8], and fungal species [2,9,10,11,12] have been newly sequenced (de novo sequencing) by combining two or more NGS platforms, such as 454 and Solexa, which generate sequence reads of 250,800 bp and ,100 bp, respectively. There are only a few assemblers that can deal with the ‘‘color-space’’ format of SOLiD read data to date

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.