Abstract

The advancement of techniques to visualize and analyze large-scale sequencing datasets is an area of active research and is rooted in traditional techniques such as heat maps and dendrograms. We introduce dendritic heat maps that display heat map results over aligned DNA sequence clusters for a range of clustering cutoffs. Dendritic heat maps aid in visualizing the effects of group differences on clustering hierarchy and relative abundance of sampled sequences. Here, we artificially generate two separate datasets with simplified mutation and population growth procedures with GC content group separation to use as example phenotypes. In this work, we use the term phenotype to represent any feature by which groups can be separated. These sequences were clustered in a fractional identity range of 0.75 to 1.0 using agglomerative minimum-, maximum-, and average-linkage algorithms, as well as a divisive centroid-based algorithm. We demonstrate that dendritic heat maps give freedom to scrutinize specific clustering levels across a range of cutoffs, track changes in phenotype inequity across multiple levels of sequence clustering specificity, and easily visualize how deeply rooted changes in phenotype inequity are in a dataset. As genotypes diverge in sample populations, clusters are shown to break apart into smaller clusters at higher identity cutoff levels, similar to a dendrogram. Phenotype divergence, which is shown as a heat map of relative abundance bin response, may or may not follow genotype divergences. This joined view highlights the relationship between genotype and phenotype divergence for treatment groups. We discuss the minimum-, maximum-, average-, and centroid-linkage algorithm approaches to building dendritic heat maps and make a case for the divisive “top-down” centroid-based clustering methodology as being the best option visualize the effects of changing factors on clustering hierarchy and relative abundance.

Highlights

  • Advances in sequencing technology and–omics research has led to rapid growth in sequencing datasets, and techniques to visualize and analyze the data are struggling to keep up

  • Starting the clustering cutoff range at a minimum value of 0.75 was done because the USEARCH manual states that the UCLUST algorithm is effective at identities of ~75% and above for nucleotide sequences, but dendritic heat maps in general are not limited to this cutoff range and should aim to show as large a clustering cutoff range as possible

  • dendritic heat maps (DHMs) represent a powerful tool for visualizing correlations in genotype and phenotype changes across evolutionary space and time, and will help decipher dynamic processes in complex, natural communities such as metatranscriptomes, where similarities occur across a multitude of scales

Read more

Summary

Introduction

Advances in sequencing technology and–omics research has led to rapid growth in sequencing datasets, and techniques to visualize and analyze the data are struggling to keep up. Tracking changes in relative abundance bin response can be useful for observing the levels at which genotypic divergence (cluster branching) correlates with phenotypic divergence (differing heat map bin response) for a population. To demonstrate this approach, we generate artificial datasets that use simplified mutation and growth processes in biological communities. Clustering and visualization of changes of state are used to track relative abundance bin responses for populations of different nucleotide usage (GC content) as various subpopulations evolve for both datasets Using these simulated datasets, we discuss the potential of DHMs to describe data across varying levels of complexity

Sequence Generation
Clustering
Alignment
Dendritic Heat Map Construction
Dendritic Heat Map
Bottom-up Hierarchical Clustering
Top-down Hierarchical Clustering
Dendritic Heat Maps of a Growing Population
Application
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call