Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition
Rapid progress and superior performance have been achieved for skeleton-based action recognition recently. In this article, we investigate this problem under a cross-dataset setting, which is a new, pragmatic, and challenging task in real-world scenarios. Following the unsupervised domain adaptation (UDA) paradigm, the action labels are only available on a source dataset, but unavailable on a target dataset in the training stage. Different from the conventional adversarial learning-based approaches for UDA, we utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets. Our inspiration is drawn from Cubism, an art genre from the early 20th century, which breaks and reassembles the objects to convey a greater context. By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks to explore the temporal and spatial dependency of a skeleton-based action and improve the generalization ability of the model. We conduct experiments on six datasets for skeleton-based action recognition, including three large-scale datasets (NTU RGB+D, PKU-MMD, and Kinetics) where new cross-dataset settings and benchmarks are established. Extensive results demonstrate that our method outperforms state-of-the-art approaches. The source codes of our model and all the compared methods are available at https://github.com/shanice-l/st-cubism.
- Research Article
1
- 10.1609/aaai.v39i1.32019
- Apr 11, 2025
- Proceedings of the AAAI Conference on Artificial Intelligence
Cryo-Electron Tomography (cryo-ET) is a 3D imaging technology that facilitates the study of macromolecular structures at near-atomic resolution. Recent volumetric segmentation approaches on cryo-ET images have drawn widespread interest in the biological sector. However, existing methods heavily rely on manually labeled data, which requires highly professional skills, thereby hindering the adoption of fully-supervised approaches for cryo-ET images. Some unsupervised domain adaptation (UDA) approaches have been designed to enhance the segmentation network performance using unlabeled data. However, applying these methods directly to cryo-ET image segmentation tasks remains challenging due to two main issues: 1) the source dataset, usually obtained through simulation, contains a fixed level of noise, while the target dataset, directly collected from raw-data from the real-world scenario, have unpredictable noise levels. 2) the source data used for training typically consists of known macromoleculars. In contrast, the target domain data are often unknown, causing the model to be biased towards those known macromolecules, leading to a domain shift problem. To address such challenges, in this work, we introduce a voxel-wise unsupervised domain adaptation approach, termed Vox-UDA, specifically for cryo-ET subtomogram segmentation. Vox-UDA incorporates a noise generation module to simulate target-like noises in the source dataset for cross-noise level adaptation. Additionally, we propose a denoised pseudo-labeling strategy based on the improved Bilateral Filter to alleviate the domain shift problem. More importantly, we construct the first UDA cryo-ET subtomogram segmentation benchmark on three experimental datasets. Extensive experimental results on multiple benchmarks and newly curated real-world datasets demonstrate the superiority of our proposed approach compared to state-of-the-art UDA methods.
- Research Article
24
- 10.1109/access.2019.2924597
- Jan 1, 2019
- IEEE Access
Considering different corpora of speech emotions available both publicly and privately with numerous factors that make them different, the premise of having features of both training and testing samples drawn from the same distribution and the parameterization of the same feature space is not applicable in most real world scenarios. Addressing this challenge via a domain adaptation method, we propose a dual exclusive attentive transfer (DEAT) for deep convolutional neural network architecture based on unsupervised domain adaptation setting. The proposed architecture adapts to an unshared attentive transfer procedure for convolutional adaptation of both source and target domain. Correlation alignment loss (CALLoss) is applied to minimize the domain shift through the alignment of the second-order statistics of the convolutional layer's attention maps in both domains. Then, for the proposed network to effectively model the shift dissimilar domains, we make the weights of the corresponding layers exclusive but related. The proposed model minimizes the classification loss of the source domain with labels and the correlation alignment loss of both convolutional and fully-connected layers collectively. We evaluate our architecture using the Interspeech 2009 Emotion Challenge FAU Aibo Emotion Corpus as target dataset and two publicly available corpora (ABC and Emo-DB) as source dataset. Our experimental results show that our domain adaptation method is superior to other state-of-the-art methods.
- Research Article
14
- 10.1109/tnnls.2021.3119889
- Aug 1, 2023
- IEEE Transactions on Neural Networks and Learning Systems
Typical adversarial-training-based unsupervised domain adaptation (UDA) methods are vulnerable when the source and target datasets are highly complex or exhibit a large discrepancy between their data distributions. Recently, several Lipschitz-constraint-based methods have been explored. The satisfaction of Lipschitz continuity guarantees a remarkable performance on a target domain. However, they lack a mathematical analysis of why a Lipschitz constraint is beneficial to UDA and usually perform poorly on large-scale datasets. In this article, we take the principle of utilizing a Lipschitz constraint further by discussing how it affects the error bound of UDA. A connection between them is built, and an illustration of how Lipschitzness reduces the error bound is presented. A local smooth discrepancy is defined to measure the Lipschitzness of a target distribution in a pointwise way. When constructing a deep end-to-end model, to ensure the effectiveness and stability of UDA, three critical factors are considered in our proposed optimization strategy, i.e., the sample amount of a target domain, dimension, and batchsize of samples. Experimental results demonstrate that our model performs well on several standard benchmarks. Our ablation study shows that the sample amount of a target domain, the dimension, and batchsize of samples, indeed, greatly impact Lipschitz-constraint-based methods' ability to handle large-scale datasets. Code is available at https://github.com/CuthbertCai/SRDA.
- Book Chapter
28
- 10.1007/978-3-030-30671-7_2
- Jan 1, 2020
Unsupervised domain adaptation techniques have been successful for a wide range of problems where supervised labels are limited. The task is to classify an unlabeled “target” dataset by leveraging a labeled “source” dataset that comes from a slightly similar distribution. We propose metric-based adversarial discriminative domain adaptation (M-ADDA) which performs two main steps. First, it uses a metric learning approach to train the source model on the source dataset by optimizing the triplet loss function. This results in clusters where embeddings of the same label are close to each other and those with different labels are far from one another. Next, it uses the adversarial approach (as that used in ADDA (Tzeng et al. Adversarial discriminative domain adaptation, 2017, [36])) to make the extracted features from the source and target datasets indistinguishable. Simultaneously, we optimize a novel loss function that encourages the target dataset’s embeddings to form clusters. While ADDA and M-ADDA use similar architectures, we show that M-ADDA performs significantly better on the digits adaptation datasets of MNIST and USPS. This suggests that using metric learning for domain adaptation can lead to large improvements in classification accuracy for the domain adaptation task. The code is available at https://github.com/IssamLaradji/M-ADDA.
- Conference Article
322
- 10.1109/cvpr42600.2020.01367
- Jun 1, 2020
For most unsupervised person re-identification (re-ID), people often adopt unsupervised domain adaptation (UDA) method. UDA often train on the labeled source dataset and evaluate on the target dataset, which often focuses on learning differences between the source dataset and the target dataset to improve the generalization of the model. Base on these, we explore how to make use of the similarity of samples to conduct a fully unsupervised method which just trains on the unlabeled target dataset. Concretely, we propose a hierarchical clustering-guided re-ID (HCR) method. We use hierarchical clustering to generate pseudo labels and use these pseudo labels as monitors to conduct the training. In order to exclude hard examples and promote the convergence of the model, We use PK sampling in each iteration, which randomly selects a fixed number of samples from each cluster for training. We evaluate our model on Market-1501, DukeMTMC-reID and MSMT17. Results show that HCR gets the state-of-the-arts and achieves 55.3% mAP on Market-1501 and 46.8% mAP on DukeMTMC-reID. Our code will be released soon.
- Research Article
- 10.1101/2025.01.14.25320536
- May 12, 2025
- medRxiv : the preprint server for health sciences
To address the challenges for modeling time-to-event outcomes in small-sample settings, we propose a novel transfer learning approach, termed CoxTL, based on the widely used Cox proportional hazards model, accounting for potential covariate and concept shifts between source and target datasets. CoxTL utilizes a combination of density ratio weighting and importance weighting techniques to address multi-level data heterogeneity, including covariate and coefficient shifts between source and target datasets. Additionally, it accounts for potential model misspecification, ensuring robustness across a wide range of settings. We assess the performance of CoxTL through extensive simulation studies, considering data under various types of distributional shifts. Additionally, we apply CoxTL to predict End-Stage Renal Disease (ESRD) in the Hispanic population using electronic health record-derived features from the All of Us Research Program. Data from non-Hispanic White and non-Hispanic Black populations are leveraged as source cohorts. Model performance is evaluated using the C-index and Integrated Brier Score (IBS). In simulation studies, CoxTL demonstrates higher predictive accuracy, particularly in scenarios involving multi-level heterogeneity between target and source datasets. In other scenarios, CoxTL performs comparably to alternative methods specifically designed to address only a single type of distributional shift. For predicting the 2-year risk of ESRD in the Hispanic population, CoxTL achieves an increase in C-index up to 6.76% compared to the model trained exclusively on target data. Furthermore, it demonstrates up to 17.94% increase in the C-index compared to the state-of-the-art transfer learning method based on Cox model. The proposed method effectively utilizes source data to enhance time-to-event predictions in target populations with limited samples. Its ability to handle various sources and levels of data heterogeneity ensures robustness, making it particularly well-suited for real-world applications involving target populations with small sample sizes, where traditional Cox models often struggle.
- Research Article
16
- 10.1109/tse.2022.3173678
- Mar 1, 2023
- IEEE Transactions on Software Engineering
Heterogeneous defect prediction (HDP) is a promising research area in the software defect prediction domain to handle the unavailability of the past homogeneous data. In HDP, the prediction is performed using source dataset in which the independent features (metrics) are entirely different than the independent features of target dataset. One important assumption in machine learning is that independent features of the source and target datasets should be relevant to each other for better prediction accuracy. However, these assumptions do not generally hold in HDP. Further in HDP, the selected source dataset for a given target dataset may be of small size causing insufficient training. To resolve these issues, we have proposed a novel heterogeneous data preprocessing method, namely, Transfer of Data from Target dataset to Source dataset selected using Relevance score (TDTSR), for heterogeneous defect prediction. In the proposed approach, we have used chi-square test to select the relevant metrics between source and target datasets and have performed experiments using proposed approach with various machine learning algorithms. Our proposed method shows an improvement of at least 14% in terms of AUC score in the HDP scenario compared to the existing state of the art models.
- Conference Article
3
- 10.1109/compsac48688.2020.00-88
- Jul 1, 2020
Different from existing cross-project defection prediction(CPDP) problems which assume that there is a close relation between the source data sets and the target data sets, in the heterogenous cross-project defection prediction(HCPDP) problem, the target data sets can be totally different from the source data sets. In order to narrow the difference between source data sets and target data sets, we implemented our own algorithm SLA + based on the selective learning algorithm . We select one of the multiple sources that have the highest similarity to the target data set as the source data set, and select one or more of the other source data sets that are similar to both the target data set and the source data set as an intermediate domain. We set up a bridge between the target domain and the source domain through the intermediate domain , breaking the large distribution gap for transferring knowledge between the source domain and the target domain. Besides, we achieve the purpose of dimensionality reduction by mining the potential relationship between features. We have done experiments on open source data sets, and the data sets used are all heterogeneous. The experiments prove that our method achieves comparable results compared with state-of-the-art HCPDP in most cases.
- Research Article
- 10.1002/tal.70104
- Dec 2, 2025
- The Structural Design of Tall and Special Buildings
Generalization of machine learning (ML) surrogate models across distinct databases is underexplored, despite being crucial as retraining the entire model every time new data become available is inefficient. This study proposes an incremental learning methodology to improve ML models' prediction of seismic collapse of steel moment‐resisting frames (SMRFs) across distinct datasets. Three boosting algorithms, XGBoost, LightGBM, and CatBoost, were trained on a source dataset to generate surrogate ML models that can predict the SMRF's seismic response. Thereafter, the ML models were used to predict the response on a new (target) dataset of SMRFs that differ in geometric dimensions and design approaches. Initially, boosting models trained on one dataset performed poorly on another dataset, even if the datasets displayed similar characteristics and consistent feature importance rankings. Incorporation of incremental learning improved the prediction on the target dataset, but introduced catastrophic forgetting that reduced the effectiveness of the ML model on the source dataset, a problem mitigated with a rehearsal strategy. Incremental learning with rehearsal yields results comparable to those obtained by fully retraining with both source and target datasets, resulting in an effective method for ML transferability, without having to retrain entire databases and without reducing the effectiveness of ML models on the source database.
- Research Article
1
- 10.1093/bioinformatics/btaf137
- Mar 27, 2025
- Bioinformatics (Oxford, England)
Single-cell RNA sequencing (scRNA-seq) analysis relies heavily on effective clustering to facilitate numerous downstream applications. Although several machine learning methods have been developed to enhance single-cell clustering, most are fully unsupervised and overlook the rich repository of annotated datasets available from previous single-cell experiments. Since cells are inherently high-dimensional entities, unsupervised clustering can often result in clusters that lack biological relevance. Leveraging annotated scRNA-seq datasets as a reference can significantly enhance clustering performance, enabling the identification of biologically meaningful clusters in target datasets. In this article, we propose Single Cell MUlti-Source CLustering (scMUSCL), a novel transfer learning method designed to identify cell clusters in a target dataset by leveraging knowledge from multiple annotated reference datasets. scMUSCL employs a deep neural network to extract domain- and batch-invariant cell representations, effectively addressing discrepancies across various source datasets and between source and target datasets within the new representation space. Unlike existing methods, scMUSCL does not require prior knowledge of the number of clusters in the target dataset and eliminates the need for batch correction between source and target datasets. We conduct extensive experiments using 20 real-life datasets, demonstrating that scMUSCL consistently outperforms existing unsupervised and transfer learning-based methods. Furthermore, our experiments show that scMUSCL benefits from multiple source datasets as learning references and accurately estimates the number of clusters. The Python implementation of scMUSCL is available at https://github.com/arashkhoeini/scMUSCL.
- Conference Article
4
- 10.1109/icsp48669.2020.9321006
- Dec 6, 2020
Action recognition from skeleton data is an important research field of computer vision. Recently, the Graph Convolutional Network (GCN), which generalize CNN to more generic non-Euclidean structures, shows encouraging improvement in skeleton-based action recognition. However, the effect of the GCN architecture has not been fully explored. In this work, we propose a Fisher Vector (FV) encoding based GCN architecture (FV-GNN), which are potent to learn better action representations by integrating the GCN model with FV encoding. On two large-scale datasets for skeleton-based action recognition, Kinetics and NTU-RGBD, it achieves improvements over the recent approaches.
- Research Article
3
- 10.1002/mp.15827
- Jul 27, 2022
- Medical Physics
Computer-aided automatic pancreas segmentation is essential for early diagnosis and treatment of pancreatic diseases. However, the annotation of pancreas images requires professional doctors and considerable expenditure. Due to imaging differences among various institution population, scanning devices, imaging protocols, and so on, significant degradation in the performance of model inference results is prone to occur when models trained with domain-specific (usually institution-specific) datasets are directly applied to new (other centers/institutions) domain data. In this paper, we propose a novel unsupervised domain adaptation method based on adversarial learning to address pancreas segmentation challenges with the lack of annotations and domain shift interference. A 3D semantic segmentation model with attention module and residual module is designed as the backbone pancreas segmentation model. In both segmentation model and domain adaptation discriminator network, a multiscale progressively weighted structure is introduced to acquire different field of views. Features of labeled data and unlabeled data are fed in pairs into the proposed multiscale discriminator to learn domain-specific characteristics. Then the unlabeled data features with pseudodomain label are fed to the discriminator to acquire domain-ambiguous information. With this adversarial learning strategy, the performance of the segmentation network is enhanced to segment unseen unlabeleddata. Experiments were conducted on two public annotated datasets as source datasets, respectively, and one private dataset as target dataset, where annotations were not used for the training process but only for evaluation. The 3D segmentation model achieves comparative performance with state-of-the-art pancreas segmentation methods on source domain. After implementing our domain adaptation architecture, the average dice similarity coefficient (DSC) of the segmentation model trained on the NIH-TCIA source dataset increases from 58.79% to 72.73% on the local hospital dataset, while the performance of the target domain segmentation model transferred from the medical segmentation decathlon (MSD) source dataset rises from 62.34% to 71.17%. Correlations of features across data domains are utilized to train the pancreas segmentation model on unlabeled data domain, improving the generalization of the model. Our results demonstrate that the proposed method enables the segmentation model to make meaningful segmentation for unseen data of the training set. In the future, the proposed method has the potential to apply segmentation model trained on public dataset to clinical unannotated CT images from local hospital, effectively assisting radiologists in clinicalpractice.
- Conference Article
5
- 10.1109/paine56030.2022.10014879
- Oct 25, 2022
Deep learning has achieved great success in the challenging circuit annotation task by employing Convolutional Neural Networks (CNN) for the segmentation of circuit structures. The deep learning approaches require a large amount of manually annotated training data to achieve a good performance, which could cause a degradation in performance if a deep learning model trained on a given dataset is applied to a different dataset. This is commonly known as the domain shift problem for circuit annotation, which stems from the possibly large variations in distribution across different image datasets. The different image datasets could be obtained from different devices or different layers within a single device. To address the domain shift problem, we propose Histogram-gated Image Translation (HGIT), an unsupervised domain adaptation framework which transforms images from a given source dataset to the domain of a target dataset, and utilize the transformed images for training a segmentation network. Specifically, our HGIT performs generative adversarial network (GAN)-based image translation and utilizes histogram statistics for data curation. Experiments were conducted on a single labeled source dataset adapted to three different target datasets (without labels for training) and the segmentation performance was evaluated for each target dataset. We have demonstrated that our method achieves the best performance compared to the reported domain adaptation techniques, and is also reasonably close to the fully supervised benchmark.
- Research Article
35
- 10.1088/1741-2552/ac6ca8
- Jun 1, 2022
- Journal of Neural Engineering
Objective. The recent breakthrough of wearable sleep monitoring devices has resulted in large amounts of sleep data. However, as limited labels are available, interpreting these data requires automated sleep stage classification methods with a small need for labeled training data. Transfer learning and domain adaptation offer possible solutions by enabling models to learn on a source dataset and adapt to a target dataset. Approach. In this paper, we investigate adversarial domain adaptation applied to real use cases with wearable sleep datasets acquired from diseased patient populations. Different practical aspects of the adversarial domain adaptation framework are examined, including the added value of (pseudo-)labels from the target dataset and the influence of domain mismatch between the source and target data. The method is also implemented for personalization to specific patients. Main results. The results show that adversarial domain adaptation is effective in the application of sleep staging on wearable data. When compared to a model applied on a target dataset without any adaptation, the domain adaptation method in its simplest form achieves relative gains of 7%–27% in accuracy. The performance in the target domain is further boosted by adding pseudo-labels and real target domain labels when available, and by choosing an appropriate source dataset. Furthermore, unsupervised adversarial domain adaptation can also personalize a model, improving the performance by 1%–2% compared to a non-personalized model. Significance. In conclusion, adversarial domain adaptation provides a flexible framework for semi-supervised and unsupervised transfer learning. This is particularly useful in sleep staging and other wearable electroencephalography applications. (Clinical trial registration number: S64190.)
- Research Article
- 10.1145/3715917
- Mar 10, 2025
- ACM Transactions on Multimedia Computing, Communications, and Applications
Skeleton-based Action Recognition (SAR) is widely recognized for its robustness and efficiency in human action analysis, but its performance in cross-dataset tasks has been limited due to domain shifts between different datasets. To address this challenge, current methods typically approach cross-dataset SAR as an Unsupervised Domain Adaptation (UDA) task, which is tackled using domain adaptation or self-supervised learning strategies. In this article, we propose a Dual-Domain Triple Contrast (D2TC) framework for cross-dataset SAR under the UDA setting. Unlike existing UDA methods that either focus on a single strategy or superficially combine strategies, our D2TC leverages contrastive learning to integrate both strategies into a unified framework. It performs three types of contrastive learning: Self-Supervised Contrastive Learning, Supervised Contrastive Learning, and UDA with Contrastive Learning, across both source and target domains. The triple contrasts go beyond mere summation, effectively bridging the domain gap and enhancing the model’s representational capacity. Additionally, we introduce multi-modal ensemble contrast and extreme skeleton augmentation methods to further enhance the skeleton-based representation learning. Extensive experiments on six cross-dataset settings validate the superiority of our D2TC framework over state-of-the-art methods, demonstrating its effectiveness in reducing domain discrepancies and improving cross-dataset SAR performance. The codes are available on https://github.com/KennCoder7/DualDomainTripleContrast .