Efficient persistence landscape generation
Using topological summary tools such as persistence landscapes have greatly enhanced the practical usage of topological data analysis to analyze large-scale, noisy, and complex datasets. A central element of persistence landscape usage involves computing the top- k landscapes. This article presents a novel output-sensitive plane sweep algorithm for computing the top- k persistence landscapes in optimal time and space: significantly outperforming previous algorithms. Our algorithm can determine in optimal O ( n * log ( n ) ) if a given birth-death pair appears in the top- k landscapes. The runtime performance of the approach on a botnet dataset and several synthetically generated point cloud topologies, showing that the algorithm can achieve significant speedups for these datasets due to its better algorithmic design. The speedups seen range from slightly worse (in some extreme examples) to equal compared to previous works while returning exactly the same output and is significantly faster when filtering is used (15x for birth-death pairs when removing 75% of birth-death pairs). Filtering is shown to maintain machine learning performance on both synthetically generated and real world datasets while providing orders of magnitude speedup depending on how intensive of filtering is done. Due to the introduced algorithm’s algorithmic design, the speedup seen is greater when filtering using the introduced birth-death filtering algorithm. The software is freely provided in Rust with Python bindings online.
- Research Article
192
- 10.1016/j.physa.2017.09.028
- Oct 9, 2017
- Physica A: Statistical Mechanics and its Applications
Topological data analysis of financial time series: Landscapes of crashes
- Research Article
9
- 10.2139/ssrn.2931836
- Mar 13, 2017
- SSRN Electronic Journal
We explore the evolution of daily returns of four major US stock market indices during the technology crash of 2000, and the financial crisis of 2007-2009. Our methodology is based on topological data analysis (TDA). We use persistence homology to detect and quantify topological patterns that appear in multidimensional time series. Using a sliding window, we extract time-dependent point cloud data sets, to which we associate a topological space. We detect transient loops that appear in this space, and we measure their persistence. This is encoded in real-valued functions referred to as a 'persistence landscapes'. We quantify the temporal changes in persistence landscapes via their Lp-norms. We test this procedure on multidimensional time series generated by various non-linear and non-equilibrium models. We find that, in the vicinity of financial meltdowns, the Lp-norms exhibit strong growth prior to the primary peak, which ascends during a crash. Remarkably, the average spectral density at low frequencies of the time series of Lp-norms of the persistence landscapes demonstrates a strong rising trend for 250 trading days prior to either dotcom crash on 03/10/2000, or to the Lehman bankruptcy on 09/15/2008. Our study suggests that TDA provides a new type of econometric analysis, which goes beyond the standard statistical measures. The method can be used to detect early warning signals of imminent market crashes. We believe that this approach can be used beyond the analysis of financial time series presented here.
- Research Article
147
- 10.1016/j.jsc.2016.03.009
- Mar 26, 2016
- Journal of Symbolic Computation
A persistence landscapes toolbox for topological statistics
- Research Article
4
- 10.18287/2412-6179-co-1190
- Jun 1, 2023
- Computer Optics
The use of traditional methods of algebraic topology to obtain information about the shape of an object is associated with the problem of forming a small amount of information, namely, Betti numbers and Euler characteristics. The central tool for topological data analysis is the persistent homology method, which summarizes the geometric and topological information in the data using persistent diagrams and barcodes. Based on persistent homology methods, topological data can be analyzed to obtain information about the shape of an object. The construction of persistent barcodes and persistent diagrams in computational topology does not allow one to construct a Hilbert space with a scalar product. The possibility of applying the methods of topological data analysis is based on mapping persistent diagrams into a Hilbert space; one of the ways of such mapping is a method of constructing a persistence landscape. It has an advantage of being reversible, so it does not lose any information and has persistence properties. The paper considers mathematical models and functions for representing persistence landscape objects based on the persistent homology method. Methods for converting persistent barcodes and persistent diagrams into persistence landscape functions are considered. Associated with persistence landscape functions is a persistence landscape kernel that forms a mapping into a Hilbert space with a dot product. A formula is proposed for determining a distance between the persistence landscapes, which allows the distance between images of objects to be found. The persistence landscape functions map persistent diagrams into a Hilbert space. Examples of determining the distance between images based on the construction of persistence landscape functions for these images are given. Representations of topological characteristics in various models of computational topology are considered. Results for one-parameter persistence modules are extended onto multi-parameter persistence modules.
- Book Chapter
- 10.1007/978-3-030-42196-0_18
- Jan 1, 2020
Computational topologists recently developed a method, called persistent homology to analyze data presented in terms of similarity or dissimilarity. Indeed, persistent homology studies the evolution of topological features in terms of a single index, and is able to capture higher order features beyond the usual clustering techniques. There are three descriptive statistics of persistent homology, namely barcode, persistence diagram and more recently, persistence landscape. Persistence landscape is useful for statistical inference as it belongs to a space of $p-$integrable functions, a separable Banach space. We apply tools in both computational topology and statistics to DNA sequences taken from Clostridioides difficile infected patients treated with an experimental fecal microbiota transplantation. Our statistical and topological data analysis are able to detect interesting patterns among patients and donors. It also provides visualization of DNA sequences in the form of clusters and loops.
- Research Article
7
- 10.1016/j.cnsns.2021.105996
- Aug 14, 2021
- Communications in Nonlinear Science and Numerical Simulation
Topological features of multivariate distributions: Dependency on the covariance matrix
- Research Article
- 10.2139/ssrn.3833492
- Jan 1, 2021
- SSRN Electronic Journal
Topological data analysis provides a new perspective on many problems in the domain of complex systems. Here, we establish the dependency of the mean value of functional $p$-norms of 'persistence landscapes' on a uniform scaling of the underlying multivariate distribution. Furthermore, we demonstrate that the average value of $p$-norms is decreasing, when the covariance in a system is increasing. To illustrate the complex dependency of these topological features on changes of the covariance matrix, we conduct numerical experiments utilizing bi-variate distributions with known statistical properties. Our results help to explain the puzzling behavior of p-norms derived from daily log-returns of major equity indices on European and US markets at the inception phase of the global financial meltdown caused by the COVID-19 pandemic.
- Research Article
11
- 10.1016/j.jneumeth.2021.109324
- Aug 21, 2021
- Journal of neuroscience methods
Topological signal processing and inference of event-related potential response
- Research Article
2
- 10.2478/jdis-2024-0014
- Apr 1, 2024
- Journal of Data and Information Science
Purpose Many science, technology and innovation (STI) resources are attached with several different labels. To assign automatically the resulting labels to an interested instance, many approaches with good performance on the benchmark datasets have been proposed for multilabel classification task in the literature. Furthermore, several open-source tools implementing these approaches have also been developed. However, the characteristics of real-world multilabel patent and publication datasets are not completely in line with those of benchmark ones. Therefore, the main purpose of this paper is to evaluate comprehensively seven multi-label classification methods on real-world datasets. Design/methodology/approach Three real-world datasets (Biological-Sciences, Health-Sciences, and USPTO) from SciGraph and USPTO database are constructed. Seven multilabel classification methods with tuned parameters (dependency-LDA, MLkNN, LabelPowerset, RAkEL, TextCNN, TexRNN, and TextRCNN) are comprehensively compared on these three real-world datasets. To evaluate the performance, the study adopts three classification-based metrics: Macro-F1, Micro-F1, and Hamming Loss. Findings The TextCNN and TextRCNN models show obvious superiority on small-scale datasets with more complex hierarchical structure of labels and more balanced documentlabel distribution in terms of macro-F1, micro-F1 and Hamming Loss. The MLkNN method works better on the larger-scale dataset with more unbalanced document-label distribution. Research limitations Three real-world datasets differ in the following aspects: statement, data quality, and purposes. Additionally, open-source tools designed for multi-label classification also have intrinsic differences in their approaches for data processing and feature selection, which in turn impacts the performance of a multi-label classification approach. In the near future, we will enhance experimental precision and reinforce the validity of conclusions by employing more rigorous control over variables through introducing expanded parameter settings. Practical implications The observed Macro F1 and Micro F1 scores on real-world datasets typically fall short of those achieved on benchmark datasets, underscoring the complexity of real-world multi-label classification tasks. Approaches leveraging deep learning techniques offer promising solutions by accommodating the hierarchical relationships and interdependencies among labels. With ongoing enhancements in deep learning algorithms and large-scale models, it is expected that the efficacy of multi-label classification tasks will be significantly improved, reaching a level of practical utility in the foreseeable future. Originality/value (1) Seven multi-label classification methods are comprehensively compared on three real-world datasets. (2) The TextCNN and TextRCNN models perform better on small-scale datasets with more complex hierarchical structure of labels and more balanced document-label distribution. (3) The MLkNN method works better on the larger-scale dataset with more unbalanced document-label distribution.
- Research Article
- 10.1109/tpami.2024.3451328
- Dec 1, 2024
- IEEE transactions on pattern analysis and machine intelligence
Topological data analysis provides a set of tools to uncover low-dimensional structure in noisy point clouds. Prominent amongst the tools is persistence homology, which summarizes birth-death times of homological features using data objects known as persistence diagrams. To better aid statistical analysis, a functional representation of the diagrams, known as persistence landscapes, enable use of functional data analysis and machine learning tools. Topological and geometric variabilities inherent in point clouds are confounded in both persistence diagrams and landscapes, and it is important to distinguish topological signal from noise to draw reliable conclusions on the structure of the point clouds when using persistence homology. We develop a framework for decomposing variability in persistence diagrams into topological signal and topological noise through alignment of persistence landscapes using an elastic Riemannian metric. Aligned landscapes (amplitude) isolate the topological signal. Reparameterizations used for landscape alignment (phase) are linked to a resolution parameter used to generate persistence diagrams, and capture topological noise in the form of geometric, global scaling and sampling variabilities. We illustrate the importance of decoupling topological signal and topological noise in persistence diagrams (landscapes) using several simulated examples. We also demonstrate that our approach provides novel insights in two real data studies.
- Research Article
29
- 10.1016/j.physa.2021.125774
- Jan 27, 2021
- Physica a
Analysis of global stock markets’ connections with emphasis on the impact of COVID-19
- Research Article
1
- 10.1016/j.cviu.2021.103277
- Sep 13, 2021
- Computer Vision and Image Understanding
Mutual calibration training: Training deep neural networks with noisy labels using dual-models
- Research Article
- 10.1101/2025.07.24.666637
- Jul 29, 2025
- bioRxiv
Background:As the availability of single-cell RNA sequencing (scRNA-seq) data expands, there is a growing need for robust methods that enable integration and comparison across diverse biological conditions and experimental protocols. Persistent homology (PH), a technique from topological data analysis (TDA), provides a deformation-invariant framework for capturing structural patterns in high-dimensional data.Methods:In this study, PH was applied to a diverse collection of scRNA-seq datasets spanning eight tissue types to investigate how data integration affects the topological features and biological interpretability of the resulting representations. Clustering was performed based on PH-derived pairwise distances and global topological structure was assessed through Betti curves, Euler characteristics, and persistence landscapes. By comparing these summaries across raw, normalized, and integrated datasets, we examined whether integration enhances the detection of biologically meaningful patterns, or, conversely, obscures fine-scale structure.Results:This approach demonstrates that PH can serve as a powerful complementary strategy for evaluating the impact of integration and reveals how topological summaries can help disentangle biological signal from batch-related noise in single-cell data. This work establishes a framework for using topological methods to assess integration quality and highlights new avenues for interpreting complex transcriptomic landscapes beyond conventional clustering.
- Research Article
1
- 10.1093/biomet/asab022
- Nov 15, 2021
- Biometrika
SummaryGarside et al. (2021) use event history methods to analyse topological data. We provide additional background on persistent homology to contrast the hazard estimators used in Garside et al. (2021) with standard approaches in topological data analysis. In particular, Garside et al.’s approach is a local method, which has advantages and disadvantages, whereas homology is global. We also provide more details on persistence landscapes and show how a more complete use of this statistic improves its performance.
- Research Article
31
- 10.1088/2632-072x/abb4c6
- May 19, 2021
- Journal of Physics: Complexity
We use methods from computational algebraic topology to study functional brain networks in which nodes represent brain regions and weighted edges encode the similarity of functional magnetic resonance imaging (fMRI) time series from each region. With these tools, which allow one to characterize topological invariants such as loops in high-dimensional data, we are able to gain understanding of low-dimensional structures in networks in a way that complements traditional approaches that are based on pairwise interactions. In the present paper, we use persistent homology to analyze networks that we construct from task-based fMRI data from schizophrenia patients, healthy controls, and healthy siblings of schizophrenia patients. We thereby explore the persistence of topological structures such as loops at different scales in these networks. We use persistence landscapes and persistence images to represent the output of our persistent-homology calculations, and we study the persistence landscapes and persistence images using k-means clustering and community detection. Based on our analysis of persistence landscapes, we find that the members of the sibling cohort have topological features (specifically, their one-dimensional loops) that are distinct from the other two cohorts. From the persistence images, we are able to distinguish all three subject groups and to determine the brain regions in the loops (with four or more edges) that allow us to make these distinctions.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.