Abstract

Chest X-ray reporting aims at generating linguistic descriptions automatically for chest X-ray images, in which accurate detection and description of abnormalities are essential. However, the seriously biased data distribution (e.g., the normal cases usually dominate the whole dataset over abnormal cases) causes huge challenges for the data-driven neural models to generate satisfied abnormality descriptions. To this end, we propose a contrastive triplet network (CTN) built on the Transformer architecture for automatic chest X-ray reporting to alleviate the data-bias problem. Our CTN effectively enhances abnormalities by comparing visual and semantic information between normal and abnormal cases using a triplet network. Specifically, triplets including normal and abnormal cases are first constructed. Then, visual tokens of the chest X-ray are extracted and fed to the Transformer to generate an associated report. During training, comparisons between normal and abnormal cases are conducted via contrasting: 1) the visual embedding of the chest X-ray image encoded by the Transformer encoder, and 2) the semantic embedding of the generated report encoded by a pre-trained textual encoder. Comprehensive experiments on two publicly-available databases have shown the good performance of our method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call