SLALOM, a flexible method for the identification and statistical analysis of overlapping continuous sequence elements in sequence- and time-series data

Roman Prytuliak,Friedhelm Pfeiffer,Bianca Hermine Habermann

doi:10.1186/s12859-018-2020-x

Abstract

BackgroundProtein or nucleic acid sequences contain a multitude of associated annotations representing continuous sequence elements (CSEs). Comparing these CSEs is needed, whenever we want to match identical annotations or integrate distinctive ones. Currently, there is no ready-to-use software available that provides comprehensive statistical readout for comparing two annotations of the same type with each other, which can be adapted to the application logic of the scientific question.ResultsWe have developed a method, SLALOM (for StatisticaL Analysis of Locus Overlap Method), to perform comparative analysis of sequence annotations in a highly flexible way. SLALOM implements six major operation modes and a number of additional options that can answer a variety of statistical questions about a pair of input annotations of a given sequence collection. We demonstrate the results of SLALOM on three different examples from biology and economics and compare our method to already existing software. We discuss the importance of carefully choosing the application logic to address specific scientific questions.ConclusionSLALOM is a highly versatile, command-line based method for comparing annotations in a collection of sequences, with a statistical read-out for performance evaluation and benchmarking of predictors and gene annotation pipelines. ion from sequence content even allows SLALOM to compare other kinds of positional data including, for example, data coming from time series.

Highlights

Protein or nucleic acid sequences contain a multitude of associated annotations representing continuous sequence elements (CSEs)
Results overview When two annotations of CSEs are compared, different scenarios of overlap and duplication may lead to quite some ambiguity during evaluation
In this study, we present the tool SLALOM for conducting in-depth comparative and statistical analyses of annotations of continuous sequence elements in a given grouped collection of sequences, which has so far not been available for this type of analysis on this level

Summary

Introduction

Protein or nucleic acid sequences contain a multitude of associated annotations representing continuous sequence elements (CSEs). Comparing these CSEs is needed, whenever we want to match identical annotations or integrate distinctive ones. Annotations from two different origins may either be reliable, or one may be more reliable and be used for benchmarking This is for instance the case, when we compare the results of a predictor to a golden standard of manually curated annotations. In this case, we want to compute performance measures.

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jan 26, 2018
Citations: 2	License type: open-access

R Discovery Prime

R Discovery Prime

SLALOM, a flexible method for the identification and statistical analysis of overlapping continuous sequence elements in sequence- and time-series data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

The Independence of Response Structure and Element Production in Timing Sequences
Charles H Shea ... Jin-Hoon Park
Research Quarterly for Exercise and Sport | VOL. 74
Charles H Shea, et. al.Charles H Shea ... Jin-Hoon Park
01 Dec 2003
Research Quarterly for Exercise and Sport | VOL. 74

A Study of Time Series Forecasting Enrollments Using Fuzzy Interval Partitioning Method
Rabia Hanif ... Sajawal Piracha
Journal of Computational and Cognitive Engineering | VOL. 2
Rabia Hanif, et. al.Rabia Hanif ... Sajawal Piracha
25 Mar 2022
Journal of Computational and Cognitive Engineering | VOL. 2

A cognition inspired approach to capturing data sequences

-

23 Feb 2017
23 Feb 2017

Integration of heterogeneous time series gene expression data by clustering on time dimension
Hongryul Ahn ... Woosuk Jung
-
Hongryul Ahn, et. al.Hongryul Ahn ... Woosuk Jung
01 Feb 2017
01 Feb 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SLALOM, a flexible method for the identification and statistical analysis of overlapping continuous sequence elements in sequence- and time-series data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics