Statistical learning on measures: An application to persistence diagrams
Statistical learning on measures: An application to persistence diagrams
- Research Article
13
- 10.1002/wics.1548
- Feb 4, 2021
- WIREs Computational Statistics
Topological data analysis (TDA) uses information from topological structures in complex data for statistical analysis and learning. This paper discusses persistent homology, a part of computational (algorithmic) topology that converts data into simplicial complexes and elicits information about the persistence of homology classes in the data. It computes and outputs the birth and death of such topologies via a persistence diagram. Data inputs for persistent homology are usually represented as point clouds or as functions, while the outputs depend on the nature of the analysis and commonly consist of either a persistence diagram, or persistence landscapes. This paper gives an introductory level tutorial on computing these summaries for time series using R, followed by an overview on using these approaches for time series classification and clustering.This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification Data: Types and Structure > Time Series, Stochastic Processes, and Functional Data Applications of Computational Statistics > Computational Mathematics
- Research Article
3
- 10.3389/fams.2023.1179301
- Jul 6, 2023
- Frontiers in Applied Mathematics and Statistics
Persistent homology (PH) is a robust method to compute multi-dimensional geometric and topological features of a dataset. Because these features are often stable under certain perturbations of the underlying data, are often discriminating, and can be used for visualization of structure in high-dimensional data and in statistical and machine learning modeling, PH has attracted the interest of researchers across scientific disciplines and in many industry applications. However, computational costs may present challenges to effectively using PH in certain data contexts, and theoretical stability results may not hold in practice. In this paper, we define, implement, and investigate a simplicial complex construction for computing persistent homology of Euclidean point cloud data, which we call the Delaunay-Rips complex (DR). By only considering simplices that appear in the Delaunay triangulation of the point cloud and assigning the Vietoris-Rips weights to simplices, DR avoids potentially costly computations in the persistence calculations. We document and compare a Python implementation of DR with other simplicial complex constructions for generating persistence diagrams. By imposing sufficient conditions on point cloud data, we are able to theoretically justify the stability of the persistence diagrams produced using DR. When the Delaunay triangulation of the point cloud changes under perturbations of the points, we prove that DR-produced persistence diagrams exhibit instability. Since we cannot guarantee that real-world data will satisfy our stability conditions, we demonstrate the practical robustness of DR for persistent homology in comparison with other simplicial complexes in machine learning applications. We find in our experiments that using DR in an ML-TDA pipeline performs comparatively well as using other simplicial complex constructions.
- Single Book
- 10.62311/nesx/rb-978-81-997377-2-3
- Dec 30, 2025
Topological AI Engines integrate topological data analysis with statistical inference, machine learning, and scalable data engineering to build learning systems that remain stable under noise, drift, and incomplete measurement. This manuscript develops an academic framework for using persistent structure—captured through filtrations, persistence diagrams, and topological summaries—as an organizing layer for research design, explanatory modeling, predictive generalization, and anomaly discovery. The central claim is that topology contributes not merely a feature family but a discipline of invariants, stability guarantees, and diagnostics that can be operationalized as governance artifacts for trustworthy analytics. Across five chapters, the book links uncertainty modeling to identification logic; formalizes causal explanation pipelines with sensitivity analysis; establishes prediction workflows emphasizing calibration, shift-robust evaluation, and auditing; and translates these methods into reproducible big data lifecycles with MLOps controls and lineage. The final chapter synthesizes sectoral horizons in health, finance, infrastructure, and public administration, outlining policy and industry pathways for accountable deployment. The result is a publisher-ready guide for researchers, practitioners, and policymakers seeking durable decision tools grounded in persistent structure and transparent evaluation. Keywords topological data analysis, persistent homology, stability, uncertainty quantification, research design, identification, causal inference, structural modeling, predictive generalization, anomaly discovery, calibration, distribution shift, robustness, reproducibility, MLOps, data lineage, governance, auditability, sectoral policy
- Single Book
- 10.62311/nesx/rb978-81-981466-7-0
- Nov 30, 2024
Abstract: This book presents a rigorous, interdisciplinary investigation into the convergence of Artificial Intelligence (AI) and Topological Data Analysis (TDA) as a transformative framework for modeling and interpreting high-dimensional data structures. It addresses a fundamental challenge in modern data science: traditional statistical and machine learning techniques often struggle to preserve the global geometric and topological properties of complex datasets. By leveraging tools from algebraic topology—such as persistent homology, simplicial complexes, and Betti numbers—TDA enables the extraction of robust, multi-scale topological features from noisy, sparse, and nonlinear data. The book introduces a comprehensive framework in which topological descriptors are integrated into AI pipelines through persistence diagrams, barcodes, and vectorized representations. Methodologies include differentiable TDA layers, topological regularization in deep learning, manifold learning via Mapper and Reeb graphs, and Bayesian inference with topological priors. Applications span across domains including neuroscience, genomics, medical imaging, finance, and computer vision. Empirical results and case studies demonstrate how topology-aware AI models enhance robustness, reduce overfitting, and provide semantically meaningful representations of data. The book concludes by identifying open challenges—such as the scalability and differentiability of topological operations—and outlines a roadmap for future developments in topology-native machine learning. Through this synthesis, the work establishes TDA not only as a diagnostic tool but as a foundational principle for next-generation AI systems in high-dimensional data environments. Keywords Topological Data Analysis, Artificial Intelligence, Persistent Homology, High-Dimensional Data, Simplicial Complexes, Betti Numbers, Manifold Learning, Mapper Algorithm, Reeb Graphs, Dimensionality Reduction, Topological Priors, Differentiable TDA, Topological Regularization, Federated Learning, Bayesian Inference, Explainable AI, Algebraic Topology, Complex Systems, Geometric Machine Learning, Shape-Aware AI
- Research Article
- 10.4171/owr/2008/29
- Jun 30, 2009
- Oberwolfach Reports
The workshop was conducted jointly with a workshop in statistical learning theory. There was substantial interaction between the two groups, both formally in terms of talks attended by members of both groups, as well as via informal discussions. The intellectual themes which were presented during the workshop are described below. Sensor nets and engineering applications: In the opening talk R. Ghrist spoke about the topology necessary to develop methods for determining intruders have entered a net of sensors, and for counting their number. Ghrist, jointly with V. de Silva and Y. Baryshnikov, has developed techniques based directly on homological calculations as well as on integrals over Euler characteristics which hold promise for implementable algorithms. In order for such algorithms to be maximally useful, one must develop error insensitive methods, which will require more probabilistic methods to be included within the algebraic topological framework. Combinatorial applications: Several presentations at the workshop elaborated on the subject of combinatorial algebraic topology. D. Kozlov has given a survey talk, which has set the accents on the subject, tying together structures, methods, and applications, as these are present at the current state of the development. Talks by R. Jardine and M. Raussen concerned the combinatorial and computational aspects of homotopy theory, finding applications of such abstract notions as Quillen's closed model category. K. Knudson gave an interesting account of connections between persistent homology and discrete morse theory. Finally, the talk of E. Babson dealt with more probabilistic aspects and served as a bridge to the presentations of M. Kahle and P. Bubenik. Dynamical systems: K. Mischaikow and S. Day spoke about the use of algebraic topology to understand the qualitative structure of dynamical systems. Mischaikow introduced his paradigm of building databases of dynamical systems based on choices of parameter values. His methods permit the construction of partitions of parameter space within which the qualitative structure remains the same. In addition, Conley index methods, or rather their computational versions, are used to prove the existence of fixed points, recurrent points, and invariant subsets within a given region in a spatial domain. Data analysis: G. Carlsson and V. de Silva spoke about applications of various kinds of diagrams to understand the qualitative geometric nature of data sets. For example, persistence diagrams allow one to recover Betti numbers of sublevel sets of a probability distribution, multidimensional persistence allows one to study sublevel sets of various functions as well, and the analysis of structure theorems for certain kinds of quivers permits one to extend the bootstrap methods to clustering, Betti numbers, as well as to perform dynamic clustering (i.e. clustering over time). There are now viable computational methods for all of these applications. Probabilistic methods: M. Kahle and P. Bubenik spoke about the beginnings of stochastic algebraic topology. Work at the level of zeroth Betti numbers has already been carried out by M. Penrose, under the heading of “geometric random graphs”. What is now needed is an extension of this work to higher dimensional homology groups, as well as to the barcodes which arise in persistent homology. Ultimately, precise results along these lines will open up the possibility of direct evaluation of significance of various qualitative observations given a null hypothesis. There were also several talks more centered at applications, such as vision recognition (J. Giesen) and material science (R. MacPherson). All things considered, the workshop was a great success in terms of scientific interaction, both within this group, as well as with the researchers in statistical learning theory, as was witnessed by many involved discussions, which often lasted well into the late evenings.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.