Abstract

ABSTRACTMotivated by the problem of inferring the molecular dynamics of DNA in solution, and linking them with its base-pair composition, we consider the problem of comparing the dynamics of functional time series (FTS), and of localizing any inferred differences in frequency and along curvelength. The approach we take is one of Fourier analysis, where the complete second-order structure of the FTS is encoded by its spectral density operator, indexed by frequency and curvelength. The comparison is broken down to a hierarchy of stages: at a global level, we compare the spectral density operators of the two FTS, across frequencies and curvelength, based on a Hilbert–Schmidt criterion; then, we localize any differences to specific frequencies; and, finally, we further localize any differences along the length of the random curves, that is, in physical space. A hierarchical multiple testing approach guarantees control of the averaged false discovery rate over the selected frequencies. In this sense, we are able to attribute any differences to distinct dynamic (frequency) and spatial (curvelength) contributions. Our approach is presented and illustrated by means of a case study in molecular biophysics: how can one use molecular dynamics simulations of short strands of DNA to infer their temporal dynamics at the scaling limit, and probe whether these depend on the sequence encoded in these strands? Supplementary materials for this article are available online.

Highlights

  • These are the result of a massive computation, simulating a particle system consisting of 200,000 particles, and taking approximately 6000 CPU hours on a Cray XT5 at the Swiss National Supercomputing Centre—an ambitious molecular dynamics simulation

  • We have introduced a method for comparing the dynamics of two functional time series (FTS) at a hierarchy of levels, through a frequency domain approach

  • Our method was illustrated through a case study in molecular biophysics, and aimed at detecting sequence-dependent effects on the molecular dynamics of DNA at persistence length

Read more

Summary

Functional Data Analysis

Functional data analysis (FDA; Ramsay and Silverman 2005; Ferraty and Vieu 2006; Horváth and Kokoszka 2012; Wang, Chiou, and Mueller 2016) deals with inferential situations where each data point that is best modeled as the realization of a stochastic process, understood as a random function or a random surface, such as weather data, neuroimages, electricity consumption curves, or phonetics, to name a few (e.g., Ramsay and Silverman 2002; Antoniadis, Paparoditis, and Sapatinas 2006; Aston and Kirch 2012a; Hadjipantelis et al 2015). XT are readily available as curves, or densely sampled, see, e.g., Dauxois, Pousse, and Romain 1982; Mas and Menneteau 2003; Hall and Hosseini-Nasab 2006), statistical inference in the context of FDA typically involves an inverse problem, making it intrinsically harder from the multivariate setting This problem can be tackled by appropriate regularization—as exemplified by one-sample tests for the mean (Mas 2007), two-sample tests for the mean (Fan and Lin 1998; Cuevas, Febrero, and Fraiman 2004), and two-sample tests for covariance operators (Panaretos, Kraus, and Maddocks 2010; Kraus and Panaretos 2012; Horváth, Kokoszka, and Reeder 2013)—or through resampling techniques (e.g., Benko, Härdle, and Kneip 2009; Boente, Rodriguez, and Sued 2014; Paparoditis and Sapatinas 2014)

Functional Time Series
Molecular Biophysics
Description of the Data
Preprocessing Steps
Comparison of the Spectral Density Operators
Comparing the Spectral Density Operator at a Fixed Frequency
Localization of Differences on the Frequencies
Choice of the Discretization Grid
Localizing Differences in Frequency and Along Curvelength
Concluding Remarks
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call