Abstract

Journal of Time Series AnalysisVolume 33, Issue 5 p. 701-703 EditorialFree Access Editorial: Special issue on time series analysis in the biological sciences David S. Stoffer, Corresponding Author David S. Stoffer University of Pittsburgh and University of Chicago E-mail: stoffer@pitt.edu David S. Stoffer, Depart;ment of Statistics University of Pittsburgh Pittsburgh, PA 15260, USASearch for more papers by this authorHernando Ombao, Hernando Ombao University of California, IrvineSearch for more papers by this author David S. Stoffer, Corresponding Author David S. Stoffer University of Pittsburgh and University of Chicago E-mail: stoffer@pitt.edu David S. Stoffer, Depart;ment of Statistics University of Pittsburgh Pittsburgh, PA 15260, USASearch for more papers by this authorHernando Ombao, Hernando Ombao University of California, IrvineSearch for more papers by this author First published: 06 June 2012 https://doi.org/10.1111/j.1467-9892.2012.00805.xCitations: 9 AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinked InRedditWechat The types of problems encountered in analyzing time series or spatial processes (or both) from the biological sciences are about as broad as the field itself. This includes applications in molecular biology, environmental biology, epidemiology, neurology and bioinformatics, marine biology, oceanography, biotechnology, physiology, botany, ecology, medicine, and evolution. Many of the problems involve departures from linearity, normality, and stationarity or homogeneity, and may involve missing data, irregular sampling, or multiple series collected at different time scales. Moreover, many current biological time series are collected under designed experiments and thus require modelling the between-subject and between-trial variations. Most scientists who have studied time series analysis have encountered the sunspot data set, or the lynx trappings data set that are used as classic examples to demonstrate nonlinearity and non-normality. The lynx series is typical of predator-prey processes (the prey being the snowshoe hare) that are often modeled by the so-called Lotka-Volterra equations, which are a pair of simple nonlinear differential equations used to describe the interaction between the size of predator and prey populations. These series play a prominent role, for example, in the text by Tong (1990) . These series are often used to demonstrate a process that is not time reversible. Recall that if a process {Xt} is linear and Gaussian, then the process is time reversible because has the same distribution as the values in reverse order, . A process that is apparently not time reversible is shown in Figure 1. These data are taken from Shumway and Stoffer (2011) and are monthly growth rate of pneumonia and influenza deaths in the United States for 11 years, 1968–1978. The data tend to increase slowly to a peak and then decline quickly to a trough (↗↓). Moreover, although pneumonia and influenza are worse in the winter, the month with the peak number of occurrences varies annually. In addition, nonlinearity is seen in the lag plots of Figure 1. In particular, notice that in the lag-two plot, the dynamics of the present value changes according to whether the growth two months prior is above or below about 12–15%. For example, in Figure 1, the correlation between Xt and Xt−2 appears to be positive if Xt−2 < 0.15 and negative if Xt−2 > 0.15. Figure 1Open in figure viewerPowerPoint Top: Monthly growth rates of pneumonia and influenza deaths in the United States for 1968–1978. Bottom: Lag-plots of the current value, Xt, with one month prior, Xt−1, and two months prior, Xt−2 The data shown in the top of Figure 2 are a single channel EEG signal taken from the epileptogenic zone of a subject with epilepsy, but during a seizure free interval of 23.6 seconds, and is seriesFigure 3 (d) shown in Andrzejak et al. (2001). The bottom of Figure 2 shows the innovations (residuals) after the signal has been removed by on fitting an AR(p) based on AIC. Due to the large spikes in the EEG trace, it is apparent that the data are not normal. In fact, the innovations in Figure 2 are from a heavy-tailed distribution and possibly are an infinite variance process. Infinite variance processes are described in detail in Brockwell and Davis (1991, Chapter 13). In this case, it is possible to pose a linear process, but with stable innovations. Figure 2Open in figure viewerPowerPoint Top: A single channel EEG signal taken from the epileptogenic zone of a subject with epilepsy during a seizure free interval of 23.6 seconds; see Andrzejak et al (2001). Bottom: The innovations after removal of the signal using an autoregression based on AIC Figure 3Open in figure viewerPowerPoint The sample ACF of the EEG innovations (left) and the squared innovations (right); the EEG innovations series is shown in Figure 2 Most time series analysts have seen data such as the returns of the S&P 500. The fact that these types of processes tend to be uncorrelated, but dependent, has led to the development of models such as GARCH-type models, stochastic volatility models, or bilinear models. One typical exhibition is to plot a stock return series, noting obvious departures from independence, for example, clusters of volatility, but that the sample autocorrelation function (ACF) is essentially that of white noise. Then, we exhibit the sample ACF of the squares of the data and voilá, the dependence in the process is revealed. In fact, such an occurrence is not limited to financial time series, but can also be seen in processes encountered in the sciences related to the study of life (i.e. Biology). For example, the left side of Figure 3 shows the sample ACF of the EEG innovations diplayed in Figure 2. The fact that the values of the ACF are small indicate that the innovations is a white noise process. However, the right side of the figure shows the sample ACF of the squared EEG innovations, where we clearly see significant autocorrelation. Thus, while the innovations appear to be white, they are clearly not independent. Another situation in which linearity and normality are unreasonable assumptions is when the data are discrete-valued and small. One such process is the number of poliomyelitis cases reported to the U.S. Centers for Disease Control for the years 1970–1983, displayed in Figure 4. The marginal distribution appears to be overdispersed Possion, or generalized Poisson, or perhaps negative binomial, which is a mixture of Poissons; for example, see Joe and Zhu (2005). Moreover, we see that the ACF of process seems to imply a simple autocorrelation structure, which might be modelled as a simple non-Gaussian AR(1) type of model. The polio data set is taken from Zeger (1988), who fits a generalized linear ARMA-type model. Generalized linear ARMA models are an extension of generalized linear models to dependent data situations where ARMA-type autocorrelations are evident. In another approach, models have been developed to have ARMA-type autocorrelation structures, but are constrained so that the process stays in the state-space of integers, for example. Fortunately, for those who are interested in these problems, Jung and Tremayne (2011) provides a recent and extensive summary of the state-of-the-art. Figure 4Open in figure viewerPowerPoint Poliomyelitis cases reported to the U.S. Centers for Disease Control for the years 1970–1983 Many biological time series exhibit symptoms of non-stationarity, including non-constant means and variance (structural breaks, change-points) and power spectra that evolve with time. As a specific example, brain activity is often altered following a shock to the system such as seizure onset or a presentation of some external stimulus. These changes are often reflected in the spectrum and coherence. One challenge here is to quantify the impact of a shock to the biological system. A number of non-parametric approaches have been proposed, the locally stationary Fourier-based model of Priestley (1965) and Dahlhaus (1997), the locally stationary wavelets model in Nason et al. (2000), the SLEX (smooth localized Fourier) model in Ombao et al. (2005), and a local spline approach taken in Rosen et al. (2009). Recently, Gorrostieta et al. (2012) and Kang et al. (2013) studied stimulus-induced changes in the brain dynamics using time series models with subject-specific random effects to account for between-subject variation in brain responses. The previous problems are just a few simple examples of the types of data seen in the biological sciences, and many of these problems may seem familiar. These problems are discussed in more detail in the forthcoming text on nonlinear and non-Gaussian series, Douc et al. (2013). The reality is that many problems, such as brain connectivity, are highly complex and are currently only understood at a basic level. The collection of 12 invited articles included in this volume are meant to demonstrate the variety of problems and approaches that are taken to solve complex problems in the biological sciences. The techniques run the gamut of time series techniques, and include analyses in the time, spatial, and frequency domains. The hope is that we may motivate more experts in time series and spatial analysis to consider working on problems in the biological sciences. As will be seen from the collection, the problems are many and are rich. In fact, comprehensive solutions to these problems may require statistical techniques from a variety of areas including functional data analysis, mixed effects models, high dimensional data analysis, statistical learning and computing. The articles are arranged in alphabetical order according to the first author. References Andrzejak, R., Lehnertz, K., Rieke, C., Mormann, F., David, P., and Elger, C. (2001) Indications of nonlinear deterministic and finite dimensional structures in time series of brain electrical activity: dependence on recording region and brain state. Phys. Rev. E, 64, 061907. CrossrefCASPubMedWeb of Science®Google Scholar Brockwell, P. J. and Davis, R. A. (1991) Time Series: Theory and Methods 2nd edn . New York: Springer CrossrefCASGoogle Scholar Dahlhaus, R. (1997) Fitting time series models to nonstationary processes. The Annals of Statistics, 25, 1– 37. CrossrefWeb of Science®Google Scholar Douc R., Moulines, E., and Stoffer, D. S. (2013) Nonlinear Time Series: Theory, Methods and Applications with R Examples. New York: Chapman Hall. Google Scholar Gorrostieta, C., Ombao, H., Bedard, P., and Sanes, J. (2012) Investigating stimulus-induced changes in connectivity using mixed effects vector autoregressive models. NeuroImage, 59, 3347– 55. CrossrefPubMedWeb of Science®Google Scholar Joe, H. and Zhu, R. (2005) Generalized poisson distribution: the property of mixture of Poisson and comparison with negative binomial distribution. Biometrical Journal, 47(2), 219– 29. Wiley Online LibraryCASPubMedWeb of Science®Google Scholar Jung, R. and Tremayne, A. (2011) Useful models for time series of counts or simply wrong ones? Advances in Statistical Analysis, 95, 1– 33. CrossrefWeb of Science®Google Scholar Kang, H., Ombao, H., Linkletter, C., Long, N., and Badre, D. (2013) Spatio-spectral mixed effects model for functional magnetic resonance imaging data. Journal of the American Statistical Association (in press). Google Scholar Nason, G., von Sachs, R., and Kroisandt, G. (2000) Wavelet processes and adaptive estimation of the evolutionary wavelet spectrum. Journal of the Royal Statistical Society, Series B, 62, 271– 92. Wiley Online LibraryWeb of Science®Google Scholar Ombao, H., von Sachs, R., and Guo, W. (2005) The Slex analysis of multivariate non-stationary time series. Journal of the American Statistical Association, 100, 519– 31. CrossrefCASWeb of Science®Google Scholar Priestley, M. (1965) Evolutionary spectra and non-stationary processes. Journal of the Royal Statistical Society, Series B, 28, 228– 40. Web of Science®Google Scholar Rosen, O., Stoffer, D., and Wood, S. (2009) Local spectral analysis via a Bayesian mixture of smoothing splines. Journal of the American Statistical Association, 104(485), 249– 62. CrossrefCASWeb of Science®Google Scholar Shumway, R. H. and Stoffer, D. S. (2011) Time series Analysis and Its Applications, With R Examples, 3rd ed. New York: Springer. CrossrefWeb of Science®Google Scholar Tong, H. (1990) Non-linear Time Series: A Dynamical System Approach. Oxford: Oxford University Press. Google Scholar Zeger, S. L. (1988) A regression model for time series of counts. Biometrika, 75, 621– 9. CrossrefWeb of Science®Google Scholar Citing Literature Volume33, Issue5Special Issue: Time Series Analysis in the Biological SciencesSeptember 2012Pages 701-703 FiguresReferencesRelatedInformation

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call