Abstract

BackgroundTimeboxes are graphical user interface widgets that were proposed to specify queries on time course data. As queries can be very easily defined, an exploratory analysis of time course data is greatly facilitated. While timeboxes are effective, they have no provisions for dealing with noisy data or data with fluctuations along the time axis, which is very common in many applications. In particular, this is true for the analysis of gene expression time courses, which are mostly derived from noisy microarray measurements at few unevenly sampled time points. From a data mining point of view the robust handling of data through a sound statistical model is of great importance.ResultsWe propose probabilistic timeboxes, which correspond to a specific class of Hidden Markov Models, that constitutes an established method in data mining. Since HMMs are a particular class of probabilistic graphical models we call our method Probabilistic Graphical Query Language. Its implementation was realized in the free software package pGQL. We evaluate its effectiveness in exploratory analysis on a yeast sporulation data set.ConclusionsWe introduce a new approach to define dynamic, statistical queries on time course data. It supports an interactive exploration of reasonably large amounts of data and enables users without expert knowledge to specify fairly complex statistical models with ease. The expressivity of our approach is by its statistical nature greater and more robust with respect to amplitude and frequency fluctuation than the prior, deterministic timeboxes.

Highlights

  • Timeboxes are graphical user interface widgets that were proposed to specify queries on time course data

  • We show that timeboxes have a natural interpretation as stochastic, piece-wise linear functions, or linear Hidden Markov Models (HMM) [7]

  • We demonstrate the effectiveness of probabilistic queries on a yeast sporulation data set using our implementation of the probabilistic time-boxes

Read more

Summary

Introduction

Timeboxes are graphical user interface widgets that were proposed to specify queries on time course data. They have no provisions for dealing with noisy data or data with fluctuations along the time axis, which is very common in many applications. This is true for the analysis of gene expression time courses, which are mostly derived from noisy microarray measurements at few unevenly sampled time points. Gene expression time courses have only very few, unevenly sampled time points [3], in contrast to temporal data from other domains such as finance or multimedia. An open research problem is finding a robust method to interactively query short time courses for specific qualitative and quantitative behavior

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call