Abstract
Digital health applications enable continuous and objective symptom reporting by remote monitoring at patients’ homes. These applications employ digital biomarker models that must be trained on high-quality data. Such datasets are classified as personal sensitive and, due to privacy and consent regulations, might only be available for a restricted duration. Accordingly, the extension and thus the life-cycle of digital biomarker classifiers is compromised. To address this problem, we present a Privacy Preserving Synthetic Data Generation pipeline. It follows the Generative Replay approach proposed in the Continual Learning literature: a Generative Adversarial Network (GAN) is trained to generate synthetic medical samples. To ensure that the identity of subjects from the GAN training set is not compromised, a privacy evaluation is performed on the synthetic samples. We test whether a sample can be related to a subject by training a Siamese Neural Network (SNN) with triplet-loss. A sample gets classified as anonymized when it is sufficiently distant to any of the subject clusters, or if k-anonymity with k = 3 is given. Finally, only the anonymized synthetic data, which preserves the main characteristics of the original data, is stored for later model training. The applicability of the proposed method is demonstrated on the example of a Respiratory Sound Classifier, which enables the reporting of respiratory symptoms in the form of a diary. Such a diary is relevant for chronic or infectious pulmonary conditions such as Asthma, COPD, or COVID-19. The training of the WaveGAN with varying the number of iterations based on our dataset of respiratory sounds resulted in synthetic audios with changing psycho-acoustic appearance. From low to high iterations, the appearance changed from primitive robotic to a normal human style. Further, the identification of subjects based on respiratory sounds has proven to be possible by a threshold distance of the sound embeddings computed from the SNN. Accordingly, the method is applicable to evaluate the anonymity of the synthetic sounds. Finally, the original and anonymized synthetic audios were used to train two different Respiratory Sound Classifiers, with a resulting accuracy of 90% compared to 85%, respectively. We conclude: the proposed method enables class-incremental learning without the issue of catastrophic forgetting, at the cost of slight performance degradation.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.