Abstract

Similarity and distance matrices are general data structures that describe reciprocal relationships between the objects within a given dataset. Commonly used methods for representation of these matrices include heatmaps, hierarchical trees, dimensionality reduction, and various types of networks. However, despite a well-developed foundation for the visualization of such representations, the challenge of creating an interactive view that would allow for quick data navigation and interpretation remains largely unaddressed. This problem becomes especially evident for large matrices with hundreds or thousands objects. In this work, we present a web-based platform for the interactive analysis of large (dis-)similarity matrices. It consists of four major interconnected and synchronized components: a zoomable heatmap, interactive hierarchical tree, scalable circular relationship diagram, and 3D multi-dimensional scaling (MDS) scatterplot. We demonstrate the use of the platform for the analysis of amino acid covariance data in proteins as part of our previously developed CoeViz tool. The web-platform enables quick and focused analysis of protein features, such as structural domains and functional sites.

Highlights

  • Similarity and distance matrices (SMs and DMs) are common data structures to represent interrelationships within a given set of objects

  • To address the aforementioned limitations of the existing approaches to visualizing large similarity and distance matrices, we present a web-based platform that combines heatmaps, dendrograms, circular relationship diagrams, and multi-dimensional scaling (MDS) plots into one interactive data visualization tool, with all components synchronized as the user examines the data

  • We incorporated an interactive cladogram of the hierarchical clustering tree, which allows for the manual highlighting of clusters, reacts to the selection

Read more

Summary

Introduction

Similarity and distance matrices (SMs and DMs) are common data structures to represent interrelationships within a given set of objects. These matrices can be used for the identification of clusters of the objects, inference of networks and communities, estimation of density of distribution, and other applications requiring quantitative measures of relatedness between the objects. Dendrograms, circular relationship diagrams, networks, and dimensionality reduction scatterplots are popular methods for visualizing similarity and distance matrices. With each cell colored according to the distance between a given pair of objects. The colors are normally a gradient of shades to represent the min–max range of all distances in the matrix

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call