Abstract

Disease diagnosis in chest X-ray images has predominantly relied on convolutional neural networks (CNNs). However, Vision Transformer (ViT) offers several advantages over CNNs, as it excels at capturing long-term dependencies, exploring correlations, and extracting features with richer semantic information. We adapted ViT for chest X-ray image analysis by making the following three key improvements: (I) employing a sliding window approach in the image sequence feature extraction module to divide the input image into blocks to identify small and difficult-to-detect lesion areas; (II) introducing an attention region selection module in the encoder layer of the ViT model to enhance the model's ability to focus on relevant regions; and (III) constructing a parallel patient metadata feature extraction network on top of the image feature extraction network to integrate multi-modal input data, enabling the model to synergistically learn and expand image-semantic information. The experimental results showed the effectiveness of our proposed model, which had an average area under the curve value of 0.831 in diagnosing 14 common chest diseases. The metadata feature network module effectively integrated patient metadata, further enhancing the model's accuracy in diagnosis. Our ViT-based model had a sensitivity of 0.863, a specificity of 0.821, and an accuracy of 0.834 in diagnosing these common chest diseases. Our model has good general applicability and shows promise in chest X-ray image analysis, effectively integrating patient metadata and enhancing diagnostic capabilities.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.