Abstract

Background Mean shift, an iterative technique for identifying the local maxima of a probability density function, has been successfully used as a clustering method in computer vision and image processing. We apply the mean shift technique to the high dimensional space of phylogeny trees. The basic idea behind this technique is to, given a set of sample points, shift each point in the direction of the gradient of the underlying density function in an iterative manner until the points concentrate at the local maxima of the density function and form natural clusters [1]. We have developed software named MSCTrees based on a variant of the mean shift method, called the adaptive mean shift [2], to perform cluster analysis on a set of multidimensional data points corresponding to phylogenetic trees.

Highlights

  • Mean shift, an iterative technique for identifying the local maxima of a probability density function, has been successfully used as a clustering method in computer vision and image processing

  • The ms_cluster performs the following steps: 1) calculate the adaptive bandwidth for each data point using the k-Nearest Neighbor method; 2) initialize a set of points using the values of the original data points; 3) shift the set of initialized points to new locations based on the mean shift vectors computed at each point; 4) repeat

  • It maps a phylogenetic tree to a multidimensional data point by calculating the pair-wise distances between the leaves of the tree as the dimensional values of the resulting point

Read more

Summary

Introduction

An iterative technique for identifying the local maxima of a probability density function, has been successfully used as a clustering method in computer vision and image processing. Methods MSCTrees has two components: a C program called ms_cluster which implements a clustering algorithm based on the adaptive mean shift method, and a Perl script called cluster_trees.pl, which converts phylogenetic trees to multidimensional data points and calls ms_cluster to perform cluster analysis on the resulting points. The ms_cluster program, developed in C for optimized performance, takes a set of multidimensional data points as input, and outputs the clusters of the input points together with the cluster centers.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call