Abstract

BackgroundProtein or nucleic acid sequences contain a multitude of associated annotations representing continuous sequence elements (CSEs). Comparing these CSEs is needed, whenever we want to match identical annotations or integrate distinctive ones. Currently, there is no ready-to-use software available that provides comprehensive statistical readout for comparing two annotations of the same type with each other, which can be adapted to the application logic of the scientific question.ResultsWe have developed a method, SLALOM (for StatisticaL Analysis of Locus Overlap Method), to perform comparative analysis of sequence annotations in a highly flexible way. SLALOM implements six major operation modes and a number of additional options that can answer a variety of statistical questions about a pair of input annotations of a given sequence collection. We demonstrate the results of SLALOM on three different examples from biology and economics and compare our method to already existing software. We discuss the importance of carefully choosing the application logic to address specific scientific questions.ConclusionSLALOM is a highly versatile, command-line based method for comparing annotations in a collection of sequences, with a statistical read-out for performance evaluation and benchmarking of predictors and gene annotation pipelines. ion from sequence content even allows SLALOM to compare other kinds of positional data including, for example, data coming from time series.

Highlights

  • Protein or nucleic acid sequences contain a multitude of associated annotations representing continuous sequence elements (CSEs)

  • Results overview When two annotations of CSEs are compared, different scenarios of overlap and duplication may lead to quite some ambiguity during evaluation

  • In this study, we present the tool SLALOM for conducting in-depth comparative and statistical analyses of annotations of continuous sequence elements in a given grouped collection of sequences, which has so far not been available for this type of analysis on this level

Read more

Summary

Introduction

Protein or nucleic acid sequences contain a multitude of associated annotations representing continuous sequence elements (CSEs). Comparing these CSEs is needed, whenever we want to match identical annotations or integrate distinctive ones. Annotations from two different origins may either be reliable, or one may be more reliable and be used for benchmarking This is for instance the case, when we compare the results of a predictor to a golden standard of manually curated annotations. In this case, we want to compute performance measures.

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.