Abstract

DNA microarray technology has been widely used in life science research for many years. The technology allows scientists monitoring genes' expression level during biological processes simultaneously. Analyzing massive time-series data is important to explore the complex dynamics of biological systems. However, the analysis task of time-series gene expression data is difficult since noise levels and measurement uncertainties are high. The early clustering methods such as k-means, self-organizing maps and hierarchical clustering disregarded the temporal dependency between successive time points. As for probabilistic model-based methods, dynamic Bayesian networks (DBN) and hidden Markov models (HMM), are more suitable for time-series but fail in computational inefficiency. In addition, real gene datasets has undersampling problem for long intervals between time points of harvesting expression data. In this thesis, an unsupervised clustering algorithm which combines Spline interpolation and Affinity Propagation is proposed. The proposed method investigates the relationship between genes across distinct time points through the interval selection after using interpolation to eliminate the influence of undersampling. We demonstrate our method result in significant accuracy on real gene expression time-series datasets without extit{priori} knowledge such as the number of clusters and exemplars. Our study provides a way of clustering gene expression time-series data for future biological investigations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call