Abstract

Computed Tomography Report Generation (CTRG) aims to generate medical reports towards a series of radiological images, which is an advancement of the conventional X-ray report generation (generating one medical description only based on a single X-ray snapshot). Beyond the difficulties faced in the traditional task, CTRG requires the model to filter out the lesion regions from sequential scans, producing a fine-grained report that conforms to medical logic and common sense. Limited to available datasets, there are few methods trying to tackle this task. Besides, although densely aggregating sequential features may be beneficial, it introduces extra noise. Moreover, radiology reports are long narratives composed of abnormal descriptions and template sentences, but most studies ignore this hierarchical nature and generate the entire reports uniformly. This paper aims to bridge the gap from three distinct perspectives: first, we develop two large-scale clinical datasets termed CTRG-Brain-263K and CTRG-Chest-548K, which contain 263670 brain CT scans and 548696 chest CT scans with authoritative diagnosis reports, respectively. Second, we design a self-attention-based Scan Localizer (SL) that captures a representation most reflective of the lesion area. And a reconstruction loss is introduced to minimize the distance between focused and original scans. Finally, we propose a Dynamic Generator (DG) that decouples the decoder into abnormal and template branches, with produced proposals dynamically aggregated for the final generation. Experimental results confirm the proposed SL-DG outperforms existing methods, i.e., about +5.2% and +0.4% CIDEr points on CTRG-Brain-263K and CTRG-Chest-548K, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call