SimBa

Tamal K Dey,Yusu Wang,Dayu Shi

doi:10.1145/3284360

Abstract

In topological data analysis, a point cloud data P extracted from a metric space is often analyzed by computing the persistence diagram or barcodes of a sequence of Rips complexes built on P indexed by a scale parameter. Unfortunately, even for input of moderate size, the size of the Rips complex may become prohibitively large as the scale parameter increases. Starting with the Sparse Rips filtration introduced by Sheehy, some existing methods aim to reduce the size of the complex to improve time efficiency as well. However, as we demonstrate, existing approaches still fall short of scaling well, especially for high-dimensional data. In this article, we investigate the advantages and limitations of existing approaches. Based on insights gained from the experiments, we propose an efficient new algorithm, called SimBa , for approximating the persistent homology of Rips filtrations with quality guarantees. Our new algorithm leverages a batch-collapse strategy as well as a new Sparse Rips-like filtration. We experiment on a variety of low- and high-dimensional datasets. We show that our strategy presents a significant size reduction and that our algorithm for approximating Rips filtration persistence is an order of magnitude faster than existing methods in practice.

Full Text