Abstract

Aiming at the application of classification tasks in deep learning in disease classification, this paper proposes a disease classification model based on multi-modal feature fusion. In this model, chest X-ray images of patients are used as image modal data, and corresponding disease descriptions are used as text modal data. By innovatively pro-posing an adaptive multi-modal attention mechanism, the feature vectors extracted from the two modal data are fused for classification by classifier. In order to verify the effectiveness of the proposed model in disease classification, this paper uses Chest X-ray dataset in open I database. In order to solve the problem of small sample size and unbalanced sample categories in this dataset, SMOTE algorithm is used to expand the samples and ablation study is designed to compare the model effects. The results show that the model based on both image and text modes and sample expansion by SMOTE algorithm can solve the problems of overfitting, low recall and F1 value due to small samples and unbalanced samples. In addition, the classification accuracy of multi-modal model using image and text is improved by about 0.55% and 2.69% respectively compared with single-modal model using only image or text. Similarly, the addition of adaptive multi-modal attention mechanism also improves the classification effect of the model by about 0.41% compared with the feature fusion method using vector concatenation simply.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call