Abstract

GRAbB (Genomic Region Assembly by Baiting) is a new program that is dedicated to assemble specific genomic regions from NGS data. This approach is especially useful when dealing with multi copy regions, such as mitochondrial genome and the rDNA repeat region, parts of the genome that are often neglected or poorly assembled, although they contain interesting information from phylogenetic or epidemiologic perspectives, but also single copy regions can be assembled. The program is capable of targeting multiple regions within a single run. Furthermore, GRAbB can be used to extract specific loci from NGS data, based on homology, like sequences that are used for barcoding. To make the assembly specific, a known part of the region, such as the sequence of a PCR amplicon or a homologous sequence from a related species must be specified. By assembling only the region of interest, the assembly process is computationally much less demanding and may lead to assemblies of better quality. In this study the different applications and functionalities of the program are demonstrated such as: exhaustive assembly (rDNA region and mitochondrial genome), extracting homologous regions or genes (IGS, RPB1, RPB2 and TEF1a), as well as extracting multiple regions within a single run. The program is also compared with MITObim, which is meant for the exhaustive assembly of a single target based on a similar query sequence. GRAbB is shown to be more efficient than MITObim in terms of speed, memory and disk usage. The other functionalities (handling multiple targets simultaneously and extracting homologous regions) of the new program are not matched by other programs. The program is available with explanatory documentation at https://github.com/b-brankovics/grabb. GRAbB has been tested on Ubuntu (12.04 and 14.04), Fedora (23), CentOS (7.1.1503) and Mac OS X (10.7). Furthermore, GRAbB is available as a docker repository: brankovics/grabb (https://hub.docker.com/r/brankovics/grabb/).

Highlights

  • High throughput sequencing of whole genomes is becoming routine due to the advances in sequencing technologies and the reduction of the costs involved

  • The program is compared with MITObim, which is meant for the exhaustive assembly of a single target based on a similar query sequence

  • To compare the performance of GRAbB with that of MITObim 1.7, simulated sequencing data generated based on the F. graminearum mitogenome sequence available from NCBI (NC_009493) were used together with either the original F. graminearum (NC_009493) or F. oxysporum (NC_017930) mitogenome as reference

Read more

Summary

Introduction

High throughput sequencing of whole genomes is becoming routine due to the advances in sequencing technologies and the reduction of the costs involved. In many studies using NGS data, the main focus is on the nuclear genome. This can be seen on the relatively low number of organellar genome assemblies published compared to the number of nuclear genome assemblies published. The mitogenome (mitochondrial genome) and the ribosomal DNA repeat region (18S rRNA—ITS1—5.8S rRNA—ITS2—28S rRNA— IGS) are generally not completely assembled, even when there is sufficient information in the NGS data [2]. These regions contain loci that are predominantly used for phylogenetic comparisons and species identification. There is no program that promises to extract barcoding loci form NGS reads

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.