Abstract
In molecular biology, gene expression analysis is one of the important research areas which deals with identifying the genes having similar functionality known as co-expressed genes. Data mining techniques like clustering are frequently employed for grouping gene expressions with similar functional characteristics. Numerous such clustering techniques are available for gene expression analysis. Usually, gene expression datasets are a result of millions of measurements due to which they possess high dimensionality and noise which makes the conventional distance measures ineffective. On the other hand, entropy-based distance computation is much more efficient to capture the inhomogeneity in large dimensional data and is also quite insensitive to noise. To exploit these advantages, we propose a novel method to compute the density distribution of data points in high-dimensional and noisy gene expression datasets using the concept of entropy. After obtaining the density distribution, an existing technique known as “Extreme Clustering” is used to obtain the desired clusters present in the gene expressions dataset. The proposed technique is implemented and evaluated on diversified microarray gene expression datasets. Experiment results show that the proposed technique outperforms other popular density-based techniques in terms of cluster quality, robustness against noise, and biological significance of the genes within the clusters.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.