Abstract

BackgroundThe volume of viral genomic sequence data continues to increase rapidly. This is especially true for the smaller RNA viruses, which are relatively easy to sequence in large numbers. The data volumes cause a number of significant problems for research applications that require large multiple alignments of essentially complete genomes, which are of the order of 10 kb.FindingsWe present a simple strategy to enable the creation of large quasi-multiple sequence alignments from pairwise alignment data. This process is suitable for large, closely related sequences such as the polyproteins of dengue viruses, which need the insertion of very few indels.ConclusionThe quasi-multiple sequence alignments generated by KISSa are sufficiently accurate to support tree-based genome selection for interactive bioinformatics analysis tools. The speed of this process is critical to providing an interactive experience for the user.

Highlights

  • The volume of viral genomic sequence data continues to increase rapidly

  • The quasi-multiple sequence alignments generated by Keep It Simple Sequence alignment (KISSa) are sufficiently accurate to support tree-based genome selection for interactive bioinformatics analysis tools

  • These alignments could be done at off-peak times, or the new sequences added to a pre-existing alignment with MUSCLE [4], Viral Bioinformatics Resource Center (VBRC) users frequently want to make specific selections of genomes that might be in the order of several dozen or several hundred in number and perform on-the-fly multiple sequence alignments (MSA)

Read more

Summary

Conclusion

We presented a method for quickly building useful alignments of a large number of closely related sequences (DNA or protein). The alignments are constructed from alignment tags, generated from pairwise alignments of query sequences against a common reference. Since most phyogenetic tree construction ignores MSA columns with gaps, minor imperfections are not a great consequence for this use. The KISSa constructed alignment protocol will be less reliable when multiple gaps are required in small regions, a true MSA algorithm is needed to score and optimize these regions. CU conceived the idea and specifications; FM developed the code; both authors tested the tool and contributed to writing the manuscript. Both authors read and approved the final manuscript

Background
Limitations
Edgar RC
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.