Data from multi-modal sensors, such as RGB cameras, thermal cameras, microphones, and mmWave radars, have gradually been adopted in various classification problems for better accuracy. Some sensors, like RGB cameras and microphones, however, capture privacy-invasive data, which are less likely to be used in centralized learning. Although the Federated Learning (FL) paradigm frees clients from sharing their sensor data, doing so results in reduced classification accuracy and increased training time. In this article, we introduce a novel Heterogeneous Privacy Federated Learning (HPFL) paradigm to better capitalize on the less privacy-invasive sensor data, such as thermal images and mmWave point clouds, by uploading them to the server for closing the performance gap between FL and centralized learning. HPFL not only allows clients to keep the more privacy-invasive sensor data private, such as RGB images and human voices, but also gives each client total freedom to define the levels of their privacy concern on individual sensor modalities. For example, more sensitive users may prefer to keep their thermal images private, while others do not mind sharing these images. We carry out extensive experiments to evaluate the HPFL paradigm using two representative classification problems: semantic segmentation and emotion recognition. Several key findings demonstrate the merits of HPFL: (i) compared to FedAvg, it improves foreground accuracy by 18.20% in semantic segmentation and boosts the F1-score by 4.20% in emotion recognition, (ii) with heterogeneous privacy concern levels, it achieves an even larger F1-score improvement of 6.17–16.05% in emotion recognition, and (iii) it also outperforms the state-of-the-art FL approaches by 12.04–17.70% in foreground accuracy and 2.54–4.10% in F1-score.