Distributed Computation of Persistent Homology from Partitioned Big Data

Nicholas O. Malott,Rohit P. Singh,Rishi R. Verma,Philip A. Wilsey

doi:10.1109/cluster48925.2021.00050

Abstract

Topological Data Analysis is a machine learning method that summarizes the topological features of a space. Persistent Homology (PH) can identify these topological features as they persist within a point cloud; persisting in respect to the connectedness of the point cloud at increasing distances. The utility of PH is apparent in several fields including bioinformatics, network security, and object classification. However, the memory complexity of PH limits the application to relatively small point clouds for low-dimensional topological feature identification. For this reason, numerous approaches to optimize and approximate the PH have been introduced for providing results over large point clouds. One solution, Partitioned Persistent Homology (PPH), has shown favorable approximation on a single node with significant performance improvement. However, the single-node approach is limited by the available system memory, leading to the need for a distributed approach for additional (especially memory) resources. This paper studies a distributed version of PPH for use with large point clouds over a high-performance compute cluster. Experimental results of the distributed algorithm against previous studies is presented along with scalability of the distributed library.

Full Text