Abstract
BackgroundGeneration of large mate-pair libraries is necessary for de novo genome assembly but the procedure is complex and time-consuming. Furthermore, in some complex genomes, it is hard to increase the N50 length even with large mate-pair libraries, which leads to low transcript coverage. Thus, it is necessary to develop other simple scaffolding approaches, to at least solve the elongation of transcribed fragments.ResultsWe describe L_RNA_scaffolder, a novel genome scaffolding method that uses long transcriptome reads to order, orient and combine genomic fragments into larger sequences. To demonstrate the accuracy of the method, the zebrafish genome was scaffolded. With expanded human transcriptome data, the N50 of human genome was doubled and L_RNA_scaffolder out-performed most scaffolding results by existing scaffolders which employ mate-pair libraries. In these two examples, the transcript coverage was almost complete, especially for long transcripts. We applied L_RNA_scaffolder to the highly polymorphic pearl oyster draft genome and the gene model length significantly increased.ConclusionsThe simplicity and high-throughput of RNA-seq data makes this approach suitable for genome scaffolding. L_RNA_scaffolder is available at http://www.fishbrowser.org/software/L_RNA_scaffolder.
Highlights
Generation of large mate-pair libraries is necessary for de novo genome assembly but the procedure is complex and time-consuming
With minimal length coverage (MLC) and minimal percent identity (MPI) set as 0.9, the N50 length increased to the saturation point of 176 kb with maximal intron length (MIL) over 100 kb (Additional file 1: Figure S2a)
To evaluate the influence of MLC, MIL was set as 100 kb and MPI as 0.9
Summary
Generation of large mate-pair libraries is necessary for de novo genome assembly but the procedure is complex and time-consuming. To increase the N50 length, genomic libraries with different inserts are used to span repeat regions and to place contigs in their likely order and orientation in the sequence. This step is repeated from small- to large-insert libraries to generate longer scaffolds. Modified clone-based or ligation-based approaches have been developed to generate large mate-pair libraries for Illumina platforms [1,2,3]. The insert size can be over 10 kb It is necessary to develop other simple scaffolding approaches, to at least solve the elongation of transcribed fragments
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.