City block distance and rough-fuzzy clustering for identification of co-expressed microRNAs

Sushmita Paul,Pradipta Maji

doi:10.1039/c4mb00101j

Abstract

The microRNAs or miRNAs are short, endogenous RNAs having ability to regulate mRNA expression at the post-transcriptional level. Various studies have revealed that miRNAs tend to cluster on chromosomes. The members of a cluster that are in close proximity on chromosomes are highly likely to be processed as co-transcribed units. Therefore, a large proportion of miRNAs are co-expressed. Expression profiling of miRNAs generates a huge volume of data. Complicated networks of miRNA-mRNA interaction increase the challenges of comprehending and interpreting the resulting mass of data. In this regard, this paper presents a clustering algorithm in order to extract meaningful information from miRNA expression data. It judiciously integrates the merits of rough sets, fuzzy sets, the c-means algorithm, and the normalized range-normalized city block distance to discover co-expressed miRNA clusters. While the membership functions of fuzzy sets enable efficient handling of overlapping partitions in a noisy environment, the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in cluster definition. The city block distance is used to compute the membership functions of fuzzy sets and to find initial partition of a data set, and therefore helps to handle minute differences between two miRNA expression profiles. The effectiveness of the proposed approach, along with a comparison with other related methods, is demonstrated for several miRNA expression data sets using different cluster validity indices. Moreover, the gene ontology is used to analyze the functional consistency and biological significance of generated miRNA clusters.

Full Text