Abstract

Statistical topology inference is a branch of algebraic topology that analyzes the geometric structure's global topological properties underlying a point cloud dataset. There is an increasing need to analyze massive data sets and screen large databases to address real-world problems. A central challenge to modern applied mathematics is the need to generate tools to simplify the data in high dimensional order to extract the important features or the relationships while performing the analysis. A growing field of study at the intersection of algebraic topology, computational geometry, and statistics is topological data analysis (TDA) inference. This study applies TDA tools to test hypothesis between two high-dimensional data sets. Hypothesis testing is one of the most important topics of statistical topology inference. A proposed test was created, which was built on the nearest-neighbor function. Three tests such as (Hypothesis testing based on persistent homology, hypothesis testing based on persistent landscapes, and hypothesis testing based on density estimation) based on TDA, are discussed. Moreover, a modification of these tests was proposed. Monte Carlo simulation was conducted to compare the power of the previous tests. We displayed the use of TDA tools in hypothesis testing. It was proposed that this test might be established based on the nearest neighbor distance function. Furthermore, a suggested modification for the present tests based on TDA was introduced. Finally, the tests specified in the vignette were enabled by two empirical applications within the biology field. We demonstrated the efficacy of the above tests on the heart disease dataset from Statlog and the Wisconsin breast cancer dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call