Abstract

BackgroundRepbase is a reference database of eukaryotic repetitive DNA, which includes prototypic sequences of repeats and basic information described in annotations. Updating and maintenance of the database requires specialized tools, which we have created and made available for use with Repbase, and which may be useful as a template for other curated databases.ResultsWe describe the software tools RepbaseSubmitter and Censor, which are designed to facilitate updating and screening the content of Repbase. RepbaseSubmitter is a java-based interface for formatting and annotating Repbase entries. It eliminates many common formatting errors, and automates actions such as calculation of sequence lengths and composition, thus facilitating curation of Repbase sequences. In addition, it has several features for predicting protein coding regions in sequences; searching and including Pubmed references in Repbase entries; and searching the NCBI taxonomy database for correct inclusion of species information and taxonomic position. Censor is a tool to rapidly identify repetitive elements by comparison to known repeats. It uses WU-BLAST for speed and sensitivity, and can conduct DNA-DNA, DNA-protein, or translated DNA-translated DNA searches of genomic sequence. Defragmented output includes a map of repeats present in the query sequence, with the options to report masked query sequence(s), repeat sequences found in the query, and alignments.ConclusionCensor and RepbaseSubmitter are available as both web-based services and downloadable versions. They can be found at (RepbaseSubmitter) and (Censor).

Highlights

  • Repbase is a reference database of eukaryotic repetitive DNA, which includes prototypic sequences of repeats and basic information described in annotations

  • The current version of Repbase is based on a flexible and extensible relational database schema implemented in mySQL

  • Ongoing large-scale sequencing of eukaryotic genomes has resulted in a rapid increase in the rate at which new transposable elements are discovered

Read more

Summary

Results

We describe the software tools RepbaseSubmitter and Censor, which are designed to facilitate updating and screening the content of Repbase. RepbaseSubmitter is a java-based interface for formatting and annotating Repbase entries It eliminates many common formatting errors, and automates actions such as calculation of sequence lengths and composition, facilitating curation of Repbase sequences. It has several features for predicting protein coding regions in sequences; searching and including Pubmed references in Repbase entries; and searching the NCBI taxonomy database for correct inclusion of species information and taxonomic position. Censor is a tool to rapidly identify repetitive elements by comparison to known repeats It uses WU-BLAST for speed and sensitivity, and can conduct DNA-DNA, DNA-protein, or translated DNA-translated DNA searches of genomic sequence. Defragmented output includes a map of repeats present in the query sequence, with the options to report masked query sequence(s), repeat sequences found in the query, and alignments

Background
Results and discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call