Abstract

Clustering is a prominent method to identify similar patterns in large groups of data and can be beneficial in the bioinformatics studies due to this property. Classical methods such as k-means and maximum likelihood consider a mixture of Gaussian probability density function (PDF) of data and find clusters based on maximizing the PDF. However, correlation among different groups of data and existence of noise on the data make it difficult to correctly detect the correct number of clusters. Furthermore, the assumption of the Gaussian distance for the PDF is not necessarily true in real applications. This paper presents a new clustering method via wavelet-based probability density functions. For this purpose, first, a mixture of PDFs is estimated by the wavelet for each feature. After this, a multilevel thresholding method is implemented on the mixture of PDFs of each feature to obtain the clusters. Finally, a forward feature selection with memory is used to cluster the dataset based on combinations of the features. The profile alignment and agglomerative clustering (PAAC) index is applied for evaluating the number of clusters and features. Transcript expression throughout the various stages of prostate cancer is considered as a case study to identify patterns. The experimental results show the ability of the proposed method in detecting patterns of similar transcripts throughout disease progression. The results are promising in comparison with the other methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call