Abstract

We previously developed a web server CPGAVAS for annotation, visualization and GenBank submission of plastome sequences. Here, we upgrade the server into CPGAVAS2 to address the following challenges: (i) inaccurate annotation in the reference sequence likely causing the propagation of errors; (ii) difficulty in the annotation of small exons of genes petB, petD and rps16 and trans-splicing gene rps12; (iii) lack of annotation for other genome features and their visualization, such as repeat elements; and (iv) lack of modules for diversity analysis of plastomes. In particular, CPGAVAS2 provides two reference datasets for plastome annotation. The first dataset contains 43 plastomes whose annotation have been validated or corrected by RNA-seq data. The second one contains 2544 plastomes curated with sequence alignment. Two new algorithms are also implemented to correctly annotate small exons and trans-splicing genes. Tandem and dispersed repeats are identified, whose results are displayed on a circular map together with the annotated genes. DNA-seq and RNA-seq data can be uploaded for identification of single-nucleotide polymorphism sites and RNA-editing sites. The results of two case studies show that CPGAVAS2 annotates better than several other servers. CPGAVAS2 will likely become an indispensible tool for plastome research and can be accessed from http://www.herbalgenomics.org/cpgavas2.

Highlights

  • Plastomes have been widely used in phylogenetic classification and evolutionary studies of plants [1]

  • We have upgraded our previous web server CPGAVAS into CPGAVAS2, with the addition of new functions

  • The 43-plastome dataset is curated with RNA-seq data, and the 2544-plastome dataset contains the largest number of plastome sequences among similar tools

Read more

Summary

INTRODUCTION

Plastomes have been widely used in phylogenetic classification and evolutionary studies of plants [1]. The presence of mechanisms to generate RNA diversity, such as RNA-editing, has been reported for plastids [11] In this regard, new tools should be incorporated to explore plastome diversities at DNA and RNA levels taking advantage of NGS data, such as RNA-seq and Iso-seq data. Regardless of how accurate a computational pipeline is, manual curation is always needed to ensure the production of correct annotation; the output of such a pipeline should be able to be imported to other tools for editing. To meet these new challenges and demands, we have upgraded the CPGAVAS server into CPGAVAS2

RESULTS AND DISCUSSION
EVALUATION
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call