Abstract

Researchers are regularly interested in interpreting the multipartite structure of data entities according to their functional relationships. Data is often heterogeneous with intricately hidden inner structure. With limited prior knowledge, researchers are likely to confront the problem of transforming this data into knowledge. We develop a new framework, called heat-passing, which exploits intrinsic similarity relationships within noisy and incomplete raw data, and constructs a meaningful map of the data. The proposed framework is able to rank, cluster, and visualize the data all at once. The novelty of this framework is derived from an analogy between the process of data interpretation and that of heat transfer, in which all data points contribute simultaneously and globally to reveal intrinsic similarities between regions of data, meaningful coordinates for embedding the data, and exemplar data points that lie at optimal positions for heat transfer. We demonstrate the effectiveness of the heat-passing framework for robustly partitioning the complex networks, analyzing the globin family of proteins and determining conformational states of macromolecules in the presence of high levels of noise. The results indicate that the methodology is able to reveal functionally consistent relationships in a robust fashion with no reference to prior knowledge. The heat-passing framework is very general and has the potential for applications to a broad range of research fields, for example, biological networks, social networks and semantic analysis of documents.

Highlights

  • Advances in information technologies coupled with new data generation sources have resulted in the production of data at an unprecedented scale from sources as diverse as social networks, web pages, protein sequences and multimedia images

  • An effective way of interpreting data is to place all of the data in a network and to study the behavior of the network system governed by agreement among all individual interactions

  • We have carried out several sets of experiments in partitioning complex network into a small number of clusters, clustering, ranking and visualizing the Globin protein family into biological meaningful subfamilies and determining the conformational states of macromolecules in the presence of a high level of noise

Read more

Summary

Introduction

Advances in information technologies coupled with new data generation sources have resulted in the production of data at an unprecedented scale from sources as diverse as social networks, web pages, protein sequences and multimedia images. Given the lack of sufficient prior knowledge about the data, it is often challenging for people to infer a clear picture of the internal mechanism through which regions of the data set interact meaningfully with each other. An effective way of interpreting data is to place all of the data in a network and to study the behavior of the network system governed by agreement among all individual interactions. The measured behavior can robustly reflect the hidden structure behind the network [1, 2]. PLOS ONE | DOI:10.1371/journal.pone.0116121 February 10, 2015

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call