Background and objectiveIn this study, we tried to create a machine-learning method that detects disease lesions from chest X-ray (CXR) images using a data set annotated with extracted CXR reports information. We set the nodule as the target disease lesion. Manually annotating nodules is costly in terms of time. Therefore, we used the report information to automatically produce training data for the object detection task. MethodsFirst, we use semantic segmentation model PSP-Net to recognize lung fields described in the CXR reports. Next, a classification model ResNeSt-50 is used to discriminate the nodule in segmented right and left field. It also can provide attention map by Grad-Cam. If the attention region corresponds to the location of the nodule in the CXR reports, an attention bounding box is generated. Finally, object detection model Faster-RCNN was performed using generated attention bounding box. The bounding boxes predicted by Faster-RCNN were filtered to satisfy the location extracted from CXR reports. ResultsFor lung field segmentation, a mean intersection of union of 0.889 was achieved in our best model. 15,156 chest radiographs are used for classification. The area under the receiver operating characteristics curve was 0.843 and 0.852 for the left and right lung, respectively. The detection precision of the generated attention bounding box was 0.341 to 0.531 depending on the binary setting for attention map. Through object detection process, the detection precisions of the bounding boxes were improved to 0.567 to 0.800. ConclusionWe successfully generated bounding boxes with nodule on CXR images based on the positional information of the diseases extracted from the CXR reports. Our method has the potential to provide bounding boxes for various lung lesions which can reduce the annotation burden for specialists. Short abstractMachine learning for computer aided image diagnosis requires annotation of images, but manual annotation is time-consuming for medical doctor. In this study, we tried to create a machine-learning method that creates bounding boxes with disease lesions on chest X-ray (CXR) images using the positional information extracted from CXR reports. We set the nodule as the target lesion. First, we use PSP-Net to segment the lung field according to the CXR reports. Next, a classification model ResNeSt-50 was used to discriminate the nodule in segmented lung field. We also created an attention map using the Grad-Cam algorithm. If the area of attention matched the area annotated by the CXR report, the coordinate of the bounding box was considered as a possible nodule area. Finally, we used the attention information obtained from the nodule classification model and let the object detection model trained by all of the generated bounding boxes. Through object detection model, the precision of the bounding boxes to detect nodule is improved.