Abstract

The rapid development of high-throughput sequencing technologies has led to a dramatic decrease in the money and time required for de novo genome sequencing or genome resequencing projects, with new genome sequences constantly released every week. Among such projects, the plethora of updated genome assemblies induces the requirement of version-dependent annotation files and other compatible public dataset for downstream analysis. To handle these tasks in an efficient manner, we developed the reference-based genome assembly and annotation tool (RGAAT), a flexible toolkit for resequencing-based consensus building and annotation update. RGAAT can detect sequence variants with comparable precision, specificity, and sensitivity to GATK and with higher precision and specificity than Freebayes and SAMtools on four DNA-seq datasets tested in this study. RGAAT can also identify sequence variants based on cross-cultivar or cross-version genomic alignments. Unlike GATK and SAMtools/BCFtools, RGAAT builds the consensus sequence by taking into account the true allele frequency. Finally, RGAAT generates a coordinate conversion file between the reference and query genomes using sequence variants and supports annotation file transfer. Compared to the rapid annotation transfer tool (RATT), RGAAT displays better performance characteristics for annotation transfer between different genome assemblies, strains, and species. In addition, RGAAT can be used for genome modification, genome comparison, and coordinate conversion. RGAAT is available at https://sourceforge.net/projects/rgaat/ and https://github.com/wushyer/RGAAT_v2 at no cost.

Highlights

  • With the development of sequencing technologies, it is getting easier to obtain the genome of various species

  • Rapid annotation transfer tool (RATT), can be used for annotation transfer, but the accuracy is relatively low for repeat regions [10], whereas iCORN can be used for correcting sequence errors, but not for upgrading annotations [11]

  • For dataset with low sequence depth The 30Â 100 bp paired-end exome dataset was downloaded from Genome Comparison And Analytic Testing (GCAT) website and aligned using Bowtie2 with default parameters (Table 1)

Read more

Summary

Introduction

With the development of sequencing technologies, it is getting easier to obtain the genome of various species. There are few easy-to-use integrated tools to achieve both genome assembly and annotation transfer based on known reference genomes. Despite some tools, such as SAMtools/BCFtools and GATK, containing the module to create consensus sequence, none of them considers the true allele frequency for each variant, which is important for reducing false positive rate [6,7,8,9]. We reported the development of the referencebased genome assembly and annotation tool, RGAAT, to solve the problems encountered in the process of genome assembly and annotation These problems are very common, we did not find comprehensive solutions despite searching two popular forums: Biostars This tool can be used to identify genome variants and to build genome consensus sequences

Method
Results
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call