Abstract
Aiming at the application of classification tasks in deep learning in disease classification, this paper proposes a disease classification model based on multi-modal feature fusion. In this model, chest X-ray images of patients are used as image modal data, and corresponding disease descriptions are used as text modal data. By innovatively pro-posing an adaptive multi-modal attention mechanism, the feature vectors extracted from the two modal data are fused for classification by classifier. In order to verify the effectiveness of the proposed model in disease classification, this paper uses Chest X-ray dataset in open I database. In order to solve the problem of small sample size and unbalanced sample categories in this dataset, SMOTE algorithm is used to expand the samples and ablation study is designed to compare the model effects. The results show that the model based on both image and text modes and sample expansion by SMOTE algorithm can solve the problems of overfitting, low recall and F1 value due to small samples and unbalanced samples. In addition, the classification accuracy of multi-modal model using image and text is improved by about 0.55% and 2.69% respectively compared with single-modal model using only image or text. Similarly, the addition of adaptive multi-modal attention mechanism also improves the classification effect of the model by about 0.41% compared with the feature fusion method using vector concatenation simply.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.