Cytology, a type of pathological examination, involves sampling cells from the human body and observing the morphology of the nucleus, cytoplasm, and cell arrangement. In developing classification AI technologies to support cytology, it is essential to collect and utilize a diverse range of images without bias. However, this is often challenging in practice because of the epidemiologic bias of cancer types and cellular characteristics. The main aim of this study was to develop a method to generate cytological diagnostic images from image findings using text-to-image technology in order to generate diverse images. In the proposed method, we collected Papanicolaou-stained specimens derived from the lung cells of 135 lung cancer patients, from which we extracted 472 patch images. Descriptions of the corresponding findings for these patch images were compiled to create a data set. This dataset was then utilized to finetune the Stable Diffusion (SD) v1 and v2 models. The cell images generated by this method closely resemble real images, and both cytotechnologists and cytopathologists provided positive subjective evaluations. Furthermore, SDv2 demonstrated shapes and contours of nuclei and cytoplasm that were more similar to real images compared to SDv1, showing superior performance in quantitative evaluation metrics. When the generated images were utilized in the classification tasks for cytological images, there was an improvement in classification performance. These results indicate that the proposed method may be effective for generating high-quality cytological images, which enables the image classification model to learn diverse features, thereby improving classification performance.
Read full abstract