DuETNet: Dual Encoder based Transfer Network for thoracic disease classification

Min Seok Lee,Sung Won Han

doi:10.1016/j.patrec.2022.08.007

Abstract

• Practical way of achieving memory efficiency under restricted resources is proposed. • Resolution calibration guideline with regard to input image sizes is given. • Entropy based label smoothing considering high class imbalance is proposed. • Channel- and spatial-wise attention modules use query, key, and value matrices. • The attention modules are highly compatible with CNN architectures. In thoracic disease classification , the original chest X-ray images are high resolution images. Nevertheless, in existing convolution neural network (CNN) models, the original images are resized to 224 × 224 before use. Diseases in local areas may not be sufficiently represented because the chest X-ray images have been resized, which compresses information excessively. Therefore, a higher resolution is required to focus on the local representations. Although the large input resolution reduces memory efficiency, previous studies have investigated using CNNs with the large input for classification performance improvement. Moreover, optimization for imbalanced classes is required because chest X-ray images have highly imbalanced pathology labels. Hence, this study proposes the Dual Encoder based Transfer Network (DuETNet) to counter the inefficiency caused by large input resolution and improve classification performance by adjusting the input size based on the RandomResizedCrop method. This image transformation method crops a random area of a given image and resizes it to a given size. Thus, a resolution calibration guideline is a practical way to achieve memory efficiency and performance gains under restricted resources by adjusting the scale factor σ on the training and test images. To treat high class imbalance, we propose entropy based label smoothing method. The method enhances generalization performance for the imbalanced minor classes by penalizing the major classes. The dual encoder comprises channel and spatial encoders, which apply channel- and spatial-wise attention to enhance the relatively significant features from the adjusted images. To evaluate the performance of DuETNet, we used the ChestX-ray14 and MIMIC-CXR-JPG datasets, and DuETNet achieved a new state-of-the-art method.

Full Text