Abstract

BackgroundAlignment-free (AF) sequence comparison is attracting persistent interest driven by data-intensive applications. Hence, many AF procedures have been proposed in recent years, but a lack of a clearly defined benchmarking consensus hampers their performance assessment.ResultsHere, we present a community resource (http://afproject.org) to establish standards for comparing alignment-free approaches across different areas of sequence-based research. We characterize 74 AF methods available in 24 software tools for five research applications, namely, protein sequence classification, gene tree inference, regulatory element detection, genome-based phylogenetic inference, and reconstruction of species trees under horizontal gene transfer and recombination events.ConclusionThe interactive web service allows researchers to explore the performance of alignment-free tools relevant to their data types and analytical goals. It also allows method developers to assess their own algorithms and compare them with current state-of-the-art tools, accelerating the development of new, more accurate AF solutions.

Highlights

  • Alignment-free (AF) sequence comparison is attracting persistent interest driven by data-intensive applications

  • Zielezinski et al Genome Biology (2019) 20:144 increased attention only in the beginning of the 2000s [13]. Most of these methods are based on word statistics or word comparison, and their scalability allows them to be applied to much larger data sets than conventional multiple sequence alignment (MSA)-based methods

  • Benchmarking service To automate AF method benchmarking with a wide range of reference data sets, we developed a publicly available web-based evaluation framework (Fig. 1)

Read more

Summary

Introduction

Alignment-free (AF) sequence comparison is attracting persistent interest driven by data-intensive applications. Comparative analysis of DNA and amino acid sequences is of fundamental importance in biological research, in molecular biology and genomics. Alignment-based approaches generally remain the references for sequence comparison, MSA-based methods do not scale with the Zielezinski et al Genome Biology (2019) 20:144 increased attention only in the beginning of the 2000s [13]. Most of these methods are based on word statistics or word comparison, and their scalability allows them to be applied to much larger data sets than conventional MSA-based methods

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call