Abstract

Gene expression refers to the process in which the gene information is used in the functional gene product synthesis. They basically encode the proteins which in turn dictate the functionality of the cell. The first step in gene expression study involves the clustering usage. This is due to the reason that biological networks are very complex and the genes volume increases the comprehending challenges along with the data interpretation which itself inhibit vagueness, noise and imprecision. For a biological system to function, the essential cellular molecules must interact with its surrounding including RNA, DNA, metabolites and proteins. Clustering methods will help to expose the structures and the patterns in the original data for taking further decisions. The traditional clustering techniques involve hierarchical, model based, partitioning, density based, grid based and soft clustering methods. Though many of these methods provide a reliable output in clustering, they fail to incorporate huge data of gene expressions. Also, there are statistical issues along with choosing the right method and the choice of dissimilarity matrix when dealing with gene expression data. We propose to use a modified clustering algorithm using representatives (M-CURE) in this work which is more robust to outliers as compared to K-means clustering and also able to find clusters with size variances.

Highlights

  • As the data sets grows higher and higher, there is a need for good methods to identify the underlying patterns for effective storage and prediction purposes

  • We have proposed a modified CURE (M-CURE) algorithm in this work which overcomes the issues with the native CURE algorithm and helps us in clustering gene expression records as well

  • Many different clustering methods are discussed in the literature for effective separation of data and to infer information from the clusters

Read more

Summary

Introduction

As the data sets grows higher and higher, there is a need for good methods to identify the underlying patterns for effective storage and prediction purposes One such example is dealing with high amount of gene expression data to identify biologically significant subsets of samples [1]. Clustering genomic data deals with high dimensional data and they are generated with the help of new technologies such as microarrays, generation sequencing and eQTL mapping [2] This large volume of data from microarray analysis and other clustering methods will help diagnose and treat various diseases based on gene expression profiling. All these reasons, pushes the need for identifying computational methods to process and analyse such amounts of data in depth.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call