Abstract

Background Hepatitis B virus (HBV) DNA sequence data from thousands of samples are present in the public sequence databases. No publicly available, up-to-date, multiple sequence alignments, containing full-length and subgenomic fragments per genotype, are available. Such alignments are useful in many analysis applications, including data-mining and phylogenetic analyses.ResultsBy issuing a query, all HBV sequence data from the GenBank public database was downloaded (67,893 sequences). Full-length and subgenomic sequences, which were genotyped by the submitters (30,852 sequences), were placed into a multiple sequence alignment, for each genotype (genotype A: 5868 sequences, B: 4630, C: 7820, D: 8300, E: 2043, F: 985, G: 189, H: 108, I: 23), according to the results of offline BLAST searches against a custom reference library of full-length sequences. Further curation was performed to improve the alignment.Conclusions The algorithm described in this paper generates, for each of the nine HBV genotypes, multiple sequence alignments, which contain full-length and subgenomic fragments. The alignments can be updated as new sequences become available in the online public sequence databases. The alignments are available at http://hvdr.bioinf.wits.ac.za/alignments.

Highlights

  • Hepatitis B virus (HBV) DNA sequence data from thousands of samples are present in the public sequence databases

  • Evolutionary relationships between samples are elucidated by phylogenetic analyses, which typically involve DNA sequences from many organisms of interest, and from many related samples for comparison

  • The aim of the present study is to provide updated, curated, alignments of each HBV genotype (A to I), consisting of all available full-length and subgenomic fragments from the GenBank public database, where the genotype could be mined from the GenBank submission

Read more

Summary

Introduction

Hepatitis B virus (HBV) DNA sequence data from thousands of samples are present in the public sequence databases. Up-to-date, multiple sequence alignments, containing full-length and subgenomic fragments per genotype, are available. Such alignments are useful in many analysis applications, including data-mining and phylogenetic analyses. Results: By issuing a query, all HBV sequence data from the GenBank public database was downloaded (67,893 sequences). Conclusions: The algorithm described in this paper generates, for each of the nine HBV genotypes, multiple sequence alignments, which contain full-length and subgenomic fragments. Public sequence databases Direct (Sanger) DNA sequencing (Sanger et al 1977) is a relatively inexpensive and routine procedure in molecular biology. With support from the National Institutes of Health, this database was

Objectives
Methods
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.