Humanity faces many health challenges, among which respiratory diseases are one of the leading causes of human death. Existing AI-driven pre-diagnosis approaches can enhance the efficiency of diagnosis but still face challenges. For example, single-modal data suffer from information redundancy or loss, difficulty in learning relationships between features, and revealing the obscure characteristics of complex diseases. Therefore, it is critical to explore a method that can assist clinicians in detecting lesions early and in pre-diagnosing corresponding diseases. This paper introduces a novel network structure, SCSCS-Net, which can effectively extract image features from chest X-ray images and generate medical image descriptions, assist clinicians in analyzing patients' medical imaging information, deeply explore potential disease characteristics, and assist in making pre-diagnostic decisions. The SCSCS-Net consists of a reinforced cross-modal feature representation model (RCMFR) and a self-critical cross-modal alignment model (SCCMA), which are responsible for learning the features interdependence between images and reports by using a multi-subspace self-attention structure and guiding the model in learning report generation strategies to improve the professionalism and consistency of medical terms in generated reports, respectively. We further compare our model with some advanced models on the same dataset, and the results demonstrate that our method achieves better performance. Finally, the CE and NLG metrics further confirm that the proposed method acquires the ability to generate high-quality medical reports with higher clinical consistency in generating medical reports. Our novel method has the potential to improve the early detection and pre-diagnosis of respiratory diseases. The model proposed in this paper allows to narrow the gap between Artificial intelligence technology and clinical medical diagnosis and provides the possibility for in-depth integration.
Read full abstract