Abstract

Using DNA microarray technology, biologists get a large number of gene expression time series data. Clustering is a significant approach to extracting biological information from these data. This paper proposes a novel clustering method, HMM-based hierarchical clustering (HMM-HC), to analyze gene expression time series data. We convert time-point data to discrete symbols on the base of the fact that the logarithm of the data approximately obeys normal distribution, and build hidden Markov models with these symbols for gene sequences. In a gene expression time series, the time point data is correlated with others. The use of HMMs can help to take advantage of this special correlation. We tested the method with two common datasets. The results show that it can produce high-quality clusters and find out the appropriate cluster number.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.