Abstract
Seriation is a useful statistical method to visualize clusters in a dataset. However, as the data are noisy or unbalanced, visualizing the data structure becomes challenging. To alleviate this limitation, we introduce a novel metric based on common neighborhood to evaluate the degree of sparsity in a dataset. A pile of matrices are derived for different levels of sparsity, and the matrices are permuted by a branch-and-bound algorithm. The matrix with the best block diagonal form is then selected by a compactness criterion. The selected matrix reveals the intrinsic structure of the data by excluding noisy data or outliers. This seriation algorithm is applicable even if the number of clusters is unknown or if the clusters are imbalanced. However, if the metric introduces too much sparsity in the data, the sub-sampled groups of data could be ousted. To resolve this problem, a multi-scale approach combining different levels of sparsity is proposed. The capability of the proposed seriation method is examined both by toy problems and in the context of spike sorting.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.