Abstract

The in-depth study of viral genomes is of great help in many aspects, especially in the treatment of human diseases caused by viral infections. With the rapid accumulation of viral sequencing data, improved, or alternative gene-finding systems have become necessary to process and mine these data. In this article, we present Vgas, a system combining an ab initio method and a similarity-based method to automatically find viral genes and perform gene function annotation. Vgas was compared with existing programs, such as Prodigal, GeneMarkS, and Glimmer. Through testing 5,705 virus genomes downloaded from RefSeq, Vgas demonstrated its superiority with the highest average precision and recall (both indexes were 1% higher or more than the other programs); particularly for small virus genomes (≤ 10 kb), it showed significantly improved performance (precision was 6% higher, and recall was 2% higher). Moreover, Vgas presents an annotation module to provide functional information for predicted genes based on BLASTp alignment. This characteristic may be specifically useful in some cases. When combining Vgas with GeneMarkS and Prodigal, better prediction results could be obtained than with each of the three individual programs, suggesting that collaborative prediction using several different software programs is an alternative for gene prediction. Vgas is freely available at http://cefg.uestc.cn/vgas/ or http://121.48.162.133/vgas/. We hope that Vgas could be an alternative virus gene finder to annotate new genomes or reannotate existing genome.

Highlights

  • Because of the tremendous value of in-depth studies of viral genomes for the treatment of human infectious diseases caused by viral infections, many viroinformatics resources, including web servers and databases, have been developed (Sharma et al, 2015)

  • To perform an objective evaluation, we tested 5,705 viral genomes downloaded from RefSeq1

  • Systematic tests illustrated that the program was competitive with extant programs, such as Prodigal and GeneMarkS

Read more

Summary

Introduction

Because of the tremendous value of in-depth studies of viral genomes for the treatment of human infectious diseases caused by viral infections, many viroinformatics resources, including web servers and databases, have been developed (Sharma et al, 2015). The number of sequenced viral genomes stored in the RefSeq database has increased more than five times from the year 2000 to 2016 with the rapid development of sequencing technologies (Brister et al, 2015). For the investigation of viral genomes, the first and most important step is to annotate genes accurately. Wet experiments likely represent the most accurate way to annotate viral genes, the experiments are often time-consuming, and involve huge costs to deal with such enormous data. Computational methods for viral gene prediction are needed to serve as assistance and reference instruments for experimental results. There are two major groups of computational methods to achieve relatively

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call