Abstract

SummaryHigh-dimensional datasets are becoming more common in a variety of scientific fields. Well-known examples include next-generation sequencing in biology, patient health status in medicine, and computer vision in deep learning. Dimension reduction, using methods like principal component analysis (PCA), is a common preprocessing step for such datasets. However, while dimension reduction can save computing and human resources, it comes with the cost of significant information loss. Topological data analysis (TDA) aims to analyze the “shape” of high-dimensional datasets, without dimension reduction, by extracting features that are robust to small perturbations in data. Persistent features of a dataset can be used to describe it, and to compare it to other datasets. Visualization of persistent features can be done using topological barcodes or persistence diagrams (Figure 1). Application of TDA methods has granted greater insight into high-dimensional data (Lakshmikanth et al., 2017); one prominent example of this is its use to characterize a clinically relevant subgroup of breast cancer patients (Nicolau, Levine, & Carlsson, 2011). This is a particularly salient study as Nicolau et al. (2011) used a topological method, termed Progression Analysis of Disease, to identify a patient subgroup with 100% survival using that remains invisible to other clustering methods.

Highlights

  • While dimension reduction can save computing and human resources, it comes with the cost of significant information loss

  • Topological data analysis (TDA) aims to analyze the “shape” of high-dimensional datasets, without dimension reduction, by extracting features that are robust to small perturbations in data

  • The TDAstats R package is a comprehensive pipeline for conducting TDA

Read more

Summary

Introduction

High-dimensional datasets are becoming more common in a variety of scientific fields. Dimension reduction, using methods like principal component analysis (PCA), is a common preprocessing step for such datasets. Topological data analysis (TDA) aims to analyze the “shape” of high-dimensional datasets, without dimension reduction, by extracting features that are robust to small perturbations in data. Persistent features of a dataset can be used to describe it, and to compare it to other datasets.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call