Addressing the privacy protection and data sharing issues in Chinese medical texts, this paper introduces a federated learning approach named FLCMC for Chinese medical text classification. The paper first discusses the data heterogeneity issue in federated language modeling. Then, it proposes two perturbed federated learning algorithms, FedPA and FedPAP, based on the self-attention mechanism. In these algorithms, the self-attention mechanism is incorporated within the model aggregation module, while a perturbation term, which measures the differences between the client and the server, is added to the local update module along with a customized PAdam optimizer. Secondly, to enable a fair comparison of algorithms' performance, existing federated algorithms are improved by integrating a customized Adam optimizer. Through experiments, this paper first conducts experimental analyses on hyperparameters, data heterogeneity, and validity on synthetic datasets, which proves that the proposed federated learning algorithm has significant advantages in classification performance and convergence stability when dealing with heterogeneous data. Then, the algorithm is applied to Chinese medical text datasets to verify its effectiveness on real datasets. The comparative analysis of algorithm performance and communication efficiency shows that the algorithm exhibits strong generalization ability on deep learning models for Chinese medical texts. As for the synthetic dataset, upon comparing with comparison algorithms FedAvg, FedProx, FedAtt, and their improved versions, the experimental results show that for data with general heterogeneity, both FedPA and FedPAP show significantly more accurate and stable convergence behavior. On the real Chinese medical dataset of doctor-patient conversations, IMCS-V2, with logistic regression and long short-term memory network as training models, the experiment results show that in comparison to the above three comparison algorithms and their improved versions, FedPA and FedPAP both possess the best accuracy performance and display significantly more stable and accurate convergence behavior, proving that the method in this paper has better classification effects for Chinese medical texts.
Read full abstract