Abstract

To localize structural laryngeal lesions within digital flexible laryngoscopic images and to classify them as benign or suspicious for malignancy using state-of-the-art computer vision detection models. Cross-sectional diagnostic study SETTING: Tertiary care voice clinic METHODS: Digital stroboscopic videos, demographic and clinical data were collected from patients evaluated for a structural laryngeal lesion. Laryngoscopic images were extracted from videos and manually labeled with bounding boxes encompassing the lesion. Four detection models were employed to simultaneously localize and classify structural laryngeal lesions in laryngoscopic images. Classification accuracy, intersection over union (IoU) and mean average precision (mAP) were evaluated as measures of classification, localization, and overall performance, respectively. In total, 8,172 images from 147 patients were included in the laryngeal image dataset. Classification accuracy was 88.5 for individual laryngeal images and increased to 92.0 when all images belonging to the same sequence (video) were considered. Mean average precision across all four detection models was 50.1 using an IoU threshold of 0.5 to determine successful localization. Results of this study showed that deep neural network-based detection models trained using a labeled dataset of digital laryngeal images have the potential to classify structural laryngeal lesions as benign or suspicious for malignancy and to localize them within an image. This approach provides valuable insight into which part of the image was used by the model to determine a diagnosis, allowing clinicians to independently evaluate models' predictions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call