Role of Mixup in Topological Persistence Based Knowledge Distillation for Wearable Sensor Data.
The analysis of wearable sensor data has enabled many successes in several applications. To represent the high-sampling rate time-series with sufficient detail, the use of topological data analysis (TDA) has been considered, and it is found that TDA can complement other time-series features. Nonetheless, due to the large time consumption and high computational resource requirements of extracting topological features through TDA, it is difficult to deploy topological knowledge in machine learning and various applications. In order to tackle this problem, knowledge distillation (KD) can be adopted, which is a technique facilitating model compression and transfer learning to generate a smaller model by transferring knowledge from a larger network. By leveraging multiple teachers in KD, both time-series and topological features can be transferred, and finally, a superior student using only time-series data is distilled. On the other hand, mixup has been popularly used as a robust data augmentation technique to enhance model performance during training. Mixup and KD employ similar learning strategies. In KD, the student model learns from the smoothed distribution generated by the teacher model, while mixup creates smoothed labels by blending two labels. Hence, this common smoothness serves as the connecting link that establishes a connection between these two methods. Even though it has been widely studied to understand the interplay between mixup and KD, most of them are focused on image based analysis only, and it still remains to be understood how mixup behaves in the context of KD for incorporating multimodal data, such as both time-series and topological knowledge using wearable sensor data. In this paper, we analyze the role of mixup in KD with time-series as well as topological persistence, employing multiple teachers. We present a comprehensive analysis of various methods in KD and mixup, supported by empirical results on wearable sensor data. We observe that applying mixup to training a student in KD improves performance. We suggest a general set of recommendations to obtain an enhanced student.
- Research Article
1
- 10.1140/epjds/s13688-024-00512-y
- Dec 20, 2024
- EPJ Data Science
Topological data analysis (TDA) has shown great success in various applications involving wearable sensor data. However, there are difficulties in leveraging topological features in machine learning and wearable sensors because of the large time consumption and computational resources required to extract the features. To address this problem, knowledge distillation (KD) is utilized to generate a small model and accommodate topological features with persistence image (PI) representations from the raw time series data. Deploying topological knowledge in KD enables the student to achieve better performance compared to the one trained solely on raw time series data. However, it is not yet known if there are coherent characteristics for topological features in PI, which can aid in improving the performance during KD. In this paper, we investigate the suitability and challenges of utilizing topological features in KD for wearable sensor data, thereby contributing to the advancement of the field. Our study explores the impact of transferred topological features by comparing the Teacher-to-Student framework with Multiple Teachers-to-Student where teachers utilize both time series data and persistence images obtained by TDA as inputs. Additionally, we conduct a rigorous examination of topological knowledge effects by testing under various corruptions, knowledge types, and learning strategies in the context of human activity recognition tasks. Our analysis of topological features in KD presents the optimal strategy for incorporating these features. This study includes datasets of varying scales, window lengths, and activity classes, providing a comprehensive evaluation. Our results demonstrate that leveraging topological features in KD to enhance performance across databases.
- Research Article
1
- 10.1109/tnnls.2025.3640274
- Jan 1, 2025
- IEEE transactions on neural networks and learning systems
Wearable sensors have found numerous applications in health and wellness promotion and have achieved great success leveraging advancements in deep learning. However, the development of robust continues to be hindered by issues related to sensor noise, inconsistent sampling rates, and individual differences. Topological data analysis (TDA) has emerged as a viable solution to extract robust features from such time-series data by converting them into persistence images (PIs), which capture intrinsic characteristics and demonstrate resilience to noise and signal variations. However, the computational costs of TDA pose significant challenges for small devices with limited resources. To more efficiently incorporate topological features, we utilize knowledge distillation (KD), which is a promising way to generate a smaller model using larger models. Multiple teachers can be adopted to enrich features in KD. However, this approach has presented two key challenges: 1) differences in feature dimensions from multimodal data and 2) conflicting knowledge provided by the different teachers, both of which can degrade the student model's performance. To address these issues, we propose a novel KD framework called multimodal global latent workspace-based KD (mGLW-KD) that is motivated by global workspace theory (GTW) from cognitive neuroscience. GWT models how the brain integrates and distributes relevant information across different neural modules through a shared workspace, and it includes attentional control and working memory to prioritize and retain key information. Inspired by this theory, mGLW-KD incorporates a working memory module to unify diverse knowledge from multiple teacher models into a shared latent workspace, facilitating efficient knowledge transfer to the student model. By integrating topological insights with cognitive principles, mGLW-KD addresses the challenges posed by wearable sensor data and enables the student model to achieve superior performance using only time-series input during inference.
- Research Article
6
- 10.1109/ieeeconf56349.2022.10052019
- Oct 31, 2022
- Conference record. Asilomar Conference on Signals, Systems & Computers
Converting wearable sensor data to actionable health insights has witnessed large interest in recent years. Deep learning methods have been utilized in and have achieved a lot of successes in various applications involving wearables fields. However, wearable sensor data has unique issues related to sensitivity and variability between subjects, and dependency on sampling-rate for analysis. To mitigate these issues, a different type of analysis using topological data analysis has shown promise as well. Topological data analysis (TDA) captures robust features, such as persistence images (PI), in complex data through the persistent homology algorithm, which holds the promise of boosting machine learning performance. However, because of the computational load required by TDA methods for large-scale data, integration and implementation has lagged behind. Further, many applications involving wearables require models to be compact enough to allow deployment on edge-devices. In this context, knowledge distillation (KD) has been widely applied to generate a small model (student model), using a pre-trained high-capacity network (teacher model). In this paper, we propose a new KD strategy using two teacher models - one that uses the raw time-series and another that uses persistence images from the time-series. These two teachers then train a student using KD. In essence, the student learns from heterogeneous teachers providing different knowledge. To consider different properties in features from teachers, we apply an annealing strategy and adaptive temperature in KD. Finally, a robust student model is distilled, which utilizes the time series data only. We find that incorporation of persistence features via second teacher leads to significantly improved performance. This approach provides a unique way of fusing deep-learning with topological features to develop effective models.
- Research Article
8
- 10.1016/j.engappai.2023.107719
- Dec 20, 2023
- Engineering applications of artificial intelligence
Deep learning methods have achieved a lot of success in various applications involving converting wearable sensor data to actionable health insights. A common application areas is activity recognition, where deep-learning methods still suffer from limitations such as sensitivity to signal quality, sensor characteristic variations, and variability between subjects. To mitigate these issues, robust features obtained by topological data analysis (TDA) have been suggested as a potential solution. However, there are two significant obstacles to using topological features in deep learning: (1) large computational load to extract topological features using TDA, and (2) different signal representations obtained from deep learning and TDA which makes fusion difficult. In this paper, to enable integration of the strengths of topological methods in deep-learning for time-series data, we propose to use two teacher networks — one trained on the raw time-series data, and another trained on persistence images generated by TDA methods. These two teachers are jointly used to distill a single student model, which utilizes only the raw time-series data at test-time. This approach addresses both issues. The use of KD with multiple teachers utilizes complementary information, and results in a compact model with strong supervisory features and an integrated richer representation. To assimilate desirable information from different modalities, we design new constraints, including orthogonality imposed on feature correlation maps for improving feature expressiveness and allowing the student to easily learn from the teacher. Also, we apply an annealing strategy in KD for fast saturation and better accommodation from different features, while the knowledge gap between the teachers and student is reduced. Finally, a robust student model is distilled, which can at test-time uses only the time-series data as an input, while implicitly preserving topological features. The experimental results demonstrate the effectiveness of the proposed method on wearable sensor data. The proposed method shows 71.74% in classification accuracy on GENEActiv with WRN16-1 (1D CNNs) student, which outperforms baselines and takes much less processing time (less than 17 sec) than teachers on 6k testing samples.
- Research Article
4
- 10.1109/tim.2023.3329818
- Jan 1, 2023
- IEEE transactions on instrumentation and measurement
Wearable sensor data analysis with persistence features generated by topological data analysis (TDA) has achieved great successes in various applications, however, it suffers from large computational and time resources for extracting topological features. In this paper, our approach utilizes knowledge distillation (KD) that involves the use of multiple teacher networks trained with the raw time-series and persistence images generated by TDA, respectively. However, direct transfer of knowledge from the teacher models utilizing different characteristics as inputs to the student model results in a knowledge gap and limited performance. To address this problem, we introduce a robust framework that integrates multimodal features from two different teachers and enables a student to learn desirable knowledge effectively. To account for statistical differences in multimodalities, entropy based constrained adaptive weighting mechanism is leveraged to automatically balance the effects of teachers and encourage the student model to adequately adopt the knowledge from two teachers. To assimilate dissimilar structural information generated by different style models for distillation, batch and channel similarities within a mini-batch are used. We demonstrate the effectiveness of the proposed method on wearable sensor data.
- Research Article
1
- 10.1109/jiot.2024.3412980
- Sep 15, 2024
- IEEE internet of things journal
In applications involving analysis of wearable sensor data, machine learning techniques that use features from topological data analysis (TDA) have demonstrated remarkable performance. Persistence images (PIs) generated through TDA prove effective in capturing robust features, especially to signal perturbations, thus complementing classical time-series features. Despite its promising performance, utilizing TDA to create PI entails significant computational resources and time, posing challenges for applications on small devices. Knowledge distillation (KD) emerges as a solution to address these challenges, as it can produce a compact model. Using multiple teachers one trained with raw time-series and another with topological features, is a viable approach to distill a single compact student model. In such a case, the two teachers will have different statistical characteristics and need some form of feature harmonization. To tackle these issues, we propose uncertainty-aware topological persistence guided knowledge distillation. This approach involves separating common and distinct components between teachers and applying varying weights to control their effects. To enhance the knowledge provided to a student, uncertain features from teachers are rectified using uncertainty scores. We leverage feature similarities to offer more valuable information and employ relationships computed based on orthogonal properties to prevent excessive feature transformation. Ultimately, our method yields a robust single student that operates solely on time-series data at test-time. We validate the effectiveness of the proposed approach through empirical evaluations across various combinations of models and datasets, demonstrating its robustness and efficacy in different scenarios. The proposed method enhances the classification performance of a student model by approximately 4.3% compared to a model learned from scratch on GENEActiv.
- Conference Article
1
- 10.1063/1.5111228
- Jan 1, 2019
- AIP conference proceedings
Understanding streamflow data can be important climatic indicators for environmental risk problems such as flooding. Recently, topological data analysis (TDA) gave a new insight in data analysis. The main idea in TDA is to used results based on topology to develop tools for studying qualitative features or shape-like structure of data. Persistent homology (PH) is one of the tools in TDA that focuses on aspects of topological features in data that persists across multiple scales. So the question here is, can PH detect flood based on streamflow data. Therefore, the first attempt of streamflow analysis using PH was conducted at Guillemard Bridge Station, Kelantan River, Malaysia. Analysis for streamflow data during dry period, wet period and flood events were perform using TDA approach. The analysis result shows that PH can detect the pattern of topological features in streamflow data. The analysis suggests that the presence of short-lived topological features indicates dry period while long-lived topological features for wet period. Based on the streamflow data of flood events, PH consistently captured long-lived topological features of the data.Understanding streamflow data can be important climatic indicators for environmental risk problems such as flooding. Recently, topological data analysis (TDA) gave a new insight in data analysis. The main idea in TDA is to used results based on topology to develop tools for studying qualitative features or shape-like structure of data. Persistent homology (PH) is one of the tools in TDA that focuses on aspects of topological features in data that persists across multiple scales. So the question here is, can PH detect flood based on streamflow data. Therefore, the first attempt of streamflow analysis using PH was conducted at Guillemard Bridge Station, Kelantan River, Malaysia. Analysis for streamflow data during dry period, wet period and flood events were perform using TDA approach. The analysis result shows that PH can detect the pattern of topological features in streamflow data. The analysis suggests that the presence of short-lived topological features indicates dry period while long-lived topologica...
- Abstract
- 10.1016/j.ijrobp.2022.07.1404
- Oct 22, 2022
- International Journal of Radiation Oncology*Biology*Physics
An Artificial Intelligence Ultrasound Platform for Screening and Staging Thyroid C
- Abstract
4
- 10.1016/j.ijrobp.2021.12.023
- Mar 11, 2022
- International Journal of Radiation Oncology*Biology*Physics
An Artificial Intelligence Ultrasound Platform for Screening and Staging of Thyroid Cancer
- Conference Article
11
- 10.1109/wacv56688.2023.00235
- Jan 1, 2023
Mixup is a popular data augmentation technique based on creating new samples by linear interpolation between two given data samples, to improve both the generalization and robustness of the trained model. Knowledge distillation (KD), on the other hand, is widely used for model compression and transfer learning, which involves using a larger network's implicit knowledge to guide the learning of a smaller network. At first glance, these two techniques seem very different, however, we found that "smoothness" is the connecting link between the two and is also a crucial attribute in understanding KD's interplay with mixup. Although many mixup variants and distillation methods have been proposed, much remains to be understood regarding the role of a mixup in knowledge distillation. In this paper, we present a detailed empirical study on various important dimensions of compatibility between mixup and knowledge distillation. We also scrutinize the behavior of the networks trained with a mixup in the light of knowledge distillation through extensive analysis, visualizations, and comprehensive experiments on image classification. Finally, based on our findings, we suggest improved strategies to guide the student network to enhance its effectiveness. Additionally, the findings of this study provide insightful suggestions to researchers and practitioners that commonly use techniques from KD. Our code is available at https://github.com/hchoi71/MIX-KD.
- Research Article
7
- 10.1155/2022/8246234
- Apr 25, 2022
- Computational Intelligence and Neuroscience
Increasing global development competition highlights the value of knowledge innovation ability of high-tech enterprises. In order to acquire innovative knowledge, the mediating variables of knowledge field activity and knowledge stock ranking are selected; the moderating variables of knowledge resource pooling and knowledge evolution are adopted to construct the conceptual model and theoretical analysis framework of the influence mechanism of knowledge network arrangement mechanism on knowledge distillation; the moderating mediating effect model is derived; and the influence mechanism of knowledge network allocation mechanism on knowledge distillation of high-tech enterprises is clarified. 531 valid questionnaires were obtained online and offline, and non-percentile bootstrap based on deviation correction was used to empirically investigate the influence mechanism and transmission path of knowledge network allocation mechanism on knowledge distillation of high-tech enterprises. The empirical results show that the main effect of knowledge network pairing on knowledge distillation of high-tech enterprises is significant. The results show that knowledge field activity and knowledge stock ranking play a differential intermediary role in knowledge network allocation and knowledge distillation, knowledge field activity plays a partial intermediary role in knowledge network allocation and knowledge distillation, and knowledge stock ranking plays a partial intermediary role in knowledge network allocation and knowledge distillation. Pooling knowledge resources positively moderates the positive effect of knowledge network allocation mechanism on knowledge distillation and significantly positively moderates the mediating effect of knowledge field activity, and there is a moderated mediating effect derived from it. However, there is no significant moderating effect on knowledge stock ranking between knowledge network allocation mechanism and knowledge distillation. Knowledge evolution positively moderates the positive effect of knowledge network allocation mechanism on knowledge distillation, significantly positively moderates the mediating effect of knowledge field activity, and derives the moderated mediating effect. However, there is no significant moderating effect on knowledge stock ranking between knowledge network allocation mechanism and knowledge distillation. This paper makes an empirical study on the effect of knowledge allocation mechanism on knowledge distillation, enriches the connotation and application scope of knowledge distillation, defines the driving factors and formation mechanism of knowledge distillation, and further promotes the knowledge value and knowledge appreciation of high-tech enterprises. It has guiding and reference significance in the acquisition of innovation knowledge and the promotion of competitiveness of high-tech enterprises.
- Book Chapter
- 10.1090/clrm/072/21
- Jan 1, 2024
Topology at the undergraduate level is often a purely theoretical mathematics course, introducing concepts from point-set topology or possibly algebraic or geometric topology. However, the last two decades have seen an explosion of growth in applied topology and topological data analysis, which are topics that can be presented in an accessible way to undergraduate students and can encourage exciting projects. For the past several years, the Topology course at Macalester College has included content from point-set and algebraic topology, as well as applied topology, culminating in a project chosen by the students. In the course, students work through a topology scavenger hunt as an activity to introduce the ideas and software behind some of the primary tools in topological data analysis, namely, persistent homology and mapper. This scavenger hunt includes a variety of point clouds of varying dimensions, such as an annulus in 2D, a bouquet of loops in 3D, a sphere in 4D, and a torus in 400D. The students' goal is to analyze each point cloud with a variety of software. This activity can fit nicely into a course where students have been introduced to some of the fundamentals of point-set topology such as connectedness, continuity, compactness, etc. of arbitrary topologies as well as tools from algebraic topology such as simplicial complexes and simplicial homology, which is accessible through the lens of linear algebra. The activity takes approximately a week of class time to provide a brief introduction to persistent homology and mapper, as well as some software resources to perform these computations, and then a week outside of class time for students to work on the scavenger hunt. After completing this activity, students are able to extend the ideas learned in the scavenger hunt to an open-ended capstone project. Examples of past projects include: using persistence to explore the relationship between country development and geography, to analyze congressional voting patterns, and to classify genres of a large corpus of texts by combining with tools from natural language processing and machine learning.
- Research Article
16
- 10.1016/j.ins.2019.10.074
- Nov 1, 2019
- Information Sciences
Block change learning for knowledge distillation
- Research Article
3
- 10.1063/5.0268340
- Jun 1, 2025
- Chaos (Woodbury, N.Y.)
In a dynamical system, the time series and phase space play vital roles, and we applied topological data analysis to these characteristics. More precisely, we consider the well-known Rössler-like attractor to analyze time-series and phase-space images. We studied persistent homology representations directly from the time series of the system to obtain point cloud data. In our approach, we converted the time series to a point cloud and computed homology using the Rips complex. This enabled us to measure the topological features of the system behavior. We also applied cubical homology to phase-space images for the first time, a novel contribution that represents an image-based approach to analyze phase portraits. This article provides a review of the topological data analysis of time series using examples with the Python function. Finally, we computed topological machine learning features, such as persistent landscapes, persistence images, and Betti curves. These features enable the automated analysis and classification of dynamical behaviors and, hence, connect topological data analysis with machine learning. This study is new in that it presents a comprehensive topological data analysis pipeline tailored to dynamical systems. The goal is to make these approaches accessible and usable for nonlinear dynamics to analyze their temporal series and phase portraits.
- Research Article
36
- 10.1088/1475-7516/2018/03/025
- Mar 1, 2018
- Journal of Cosmology and Astroparticle Physics
In this paper, we introduce the topological persistence diagram as a statistic for Cosmic Microwave Background (CMB) temperature anisotropy maps. A central concept in 'Topological Data Analysis' (TDA), the idea of persistence is to represent a data set by a family of topological spaces. One then examines how long topological features 'persist' as the family of spaces is traversed. We compute persistence diagrams for simulated CMB temperature anisotropy maps featuring various levels of primordial non-Gaussianity of local type. Postponing the analysis of observational effects, we show that persistence diagrams are more sensitive to local non-Gaussianity than previous topological statistics including the genus and Betti number curves, and can constrain Δ fNLloc= 35.8 at the 68% confidence level on the simulation set, compared to Δ fNLloc= 60.6 for the Betti number curves. Given the resolution of our simulations, we expect applying persistence diagrams to observational data will give constraints competitive with those of the Minkowski Functionals. This is the first in a series of papers where we plan to apply TDA to different shapes of non-Gaussianity in the CMB and Large Scale Structure.