Abstract

BackgroundLocal similarity analysis (LSA) of time series data has been extensively used to investigate the dynamics of biological systems in a wide range of environments. Recently, a theoretical method was proposed to approximately calculate the statistical significance of local similarity (LS) scores. However, the method assumes that the time series data are independent identically distributed, which can be violated in many problems.ResultsIn this paper, we develop a novel approach to accurately approximate statistical significance of LSA for dependent time series data using nonparametric kernel estimated long-run variance. We also investigate an alternative method for LSA statistical significance approximation by computing the local similarity score of the residuals based on a predefined statistical model. We show by simulations that both methods have controllable type I errors for dependent time series, while other approaches for statistical significance can be grossly oversized. We apply both methods to human and marine microbial datasets, where most of possible significant associations are captured and false positives are efficiently controlled.ConclusionsOur methods provide fast and effective approaches for evaluating statistical significance of dependent time series data with controllable type I error. They can be applied to a variety of time series data to reveal inherent relationships among the different factors.

Highlights

  • Local similarity analysis (LSA) of time series data has been extensively used to investigate the dynamics of biological systems in a wide range of environments

  • data driven LSA (DDLSA) and LSA for residues (LSAres) have controlled type I error rates and other approaches do not We investigated the effects of the autoregressive coefficients ρ1 and ρ2 and the number of time points n on the type I error rates of the six methods for evaluating statistical significance under the AR(1) (Eq 5), autoregressive moving average (ARMA)(1,1) (Eq 6) and ARMA(1,1)-Threshold autoregressive model (TAR)(1) (Eq 7) models

  • Comparing the power of LSAres and DDLSA Since Pearson correlation coefficient (PCC), Spearman’s rank correlation coefficient (SRCC), permutation and Theoretical LSA (TLSA) could not control type I error, we only investigated the power of LSAres and DDLSA

Read more

Summary

Introduction

Local similarity analysis (LSA) of time series data has been extensively used to investigate the dynamics of biological systems in a wide range of environments. Generation sequencing (NGS) technologies have previous studies that factors can be associated in a subset made it possible to generate a large amount of time series of time intervals (local) and maybe there are time-delays data in both genomics and metagenomics. PCC and SPCC may fail to identify question in time series data analysis is the identification such local associations with/without time-delays. Most commonly used approaches for identifying et al [2] proposed a local similarity method to idenassociated factors are to calculate the Pearson correlation tify potential local and time-shift relationships between coefficients (PCC) or Spearman correlation coefficients gene expression data. Ji and Tan [4] suggested a simi-

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.