Abstract

BackgroundLocal trend (i.e. shape) analysis of time series data reveals co-changing patterns in dynamics of biological systems. However, slow permutation procedures to evaluate the statistical significance of local trend scores have limited its applications to high-throughput time series data analysis, e.g., data from the next generation sequencing technology based studies.ResultsBy extending the theories for the tail probability of the range of sum of Markovian random variables, we propose formulae for approximating the statistical significance of local trend scores. Using simulations and real data, we show that the approximate p-value is close to that obtained using a large number of permutations (starting at time points >20 with no delay and >30 with delay of at most three time steps) in that the non-zero decimals of the p-values obtained by the approximation and the permutations are mostly the same when the approximate p-value is less than 0.05. In addition, the approximate p-value is slightly larger than that based on permutations making hypothesis testing based on the approximate p-value conservative. The approximation enables efficient calculation of p-values for pairwise local trend analysis, making large scale all-versus-all comparisons possible. We also propose a hybrid approach by integrating the approximation and permutations to obtain accurate p-values for significantly associated pairs. We further demonstrate its use with the analysis of the Polymouth Marine Laboratory (PML) microbial community time series from high-throughput sequencing data and found interesting organism co-occurrence dynamic patterns.AvailabilityThe software tool is integrated into the eLSA software package that now provides accelerated local trend and similarity analysis pipelines for time series data. The package is freely available from the eLSA website: http://bitbucket.org/charade/elsa.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0732-8) contains supplementary material, which is available to authorized users.

Highlights

  • Local trend analysis of time series data reveals co-changing patterns in dynamics of biological systems

  • Simulation Studies Monte Carlo estimates of the transition probabilities In deriving the approximate statistical significance, i.e. p-values, for local trend analysis, we make simplifying assumptions to use Markov chain modeling on diX and diY

  • We find at D = 0, starting from n = 20 to 30, points in scatter plots become concentrated on the diagonal line and they become more aligned to the diagonal as n increases

Read more

Summary

Introduction

Local trend (i.e. shape) analysis of time series data reveals co-changing patterns in dynamics of biological systems. Time series data are important resources to explore the dynamics of biological systems, where the factors of interest could be genes in gene regulation studies, or organisms and/or environmental factors in ecological studies. Qian et al [3] proposed a local similarity based measure to identify local and potential time-delayed associations between gene expression profiles. This local similarity analysis technique is further extended and successfully applied to microbial ecology time series studies [5, 6, 10, 11].

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call