Federated human activity recognition (FHAR) has attracted much attention due to its great potential in privacy protection. Existing FHAR methods can collaboratively learn a global activity recognition model based on unimodal or multimodal data distributed on different local clients. However, it is still questionable whether existing methods can work well in a more common scenario where local data are from different modalities, e.g., some local clients may provide motion signals while others can only provide visual data. In this article, we study a new problem of cross-modal federated human activity recognition (CM-FHAR), which is conducive to promote the large-scale use of the HAR model on more local devices. CM-FHAR has at least three dedicated challenges: 1) distributive common cross-modal feature learning, 2) modality-dependent discriminate feature learning, 3) modality imbalance issue. To address these challenges, we propose a modality-collaborative activity recognition network (MCARN), which can comprehensively learn a global activity classifier shared across all clients and multiple modality-dependent private activity classifiers. To produce modality-agnostic and modality-specific features, we learn an altruistic encoder and an egocentric encoder under the constraint of a separation loss and an adversarial modality discriminator collaboratively learned in hyper-sphere. To address the modality imbalance issue, we propose an angular margin adjustment scheme to improve the modality discriminator on modality-imbalanced data by enhancing the intra-modality compactness of the dominant modality and increase the inter-modality discrepancy. Moreover, we propose a relation-aware global-local calibration mechanism to constrain class-level pairwise relationships for the parameters of the private classifier. Finally, through decentralized optimization with alternative steps of adversarial local updating and modality-aware global aggregation, the proposed MCARN obtains state-of-the-art performance on both modality-balanced and modality-imbalanced data.
Read full abstract