The surge in AI models for diagnosing skin lesions through image analysis is notable, yet their clinical implementation faces challenges. Common limitations include an over reliance on dermoscopy, lack of real-world applicability when only binary output (e.g. benign/malignant) is offered and low accuracy when faced with rare skin conditions. To address these common constraints associated with limited diagnostic output, and applicability to real-world settings. We developed an All-In-One Hierarchical-Out of Distribution-Clinical Triage (HOT) AI model for skin lesion analysis. Trained on a large dataset of ~208,000 lesion images, our HOT AI model generates three outputs: a hierarchical three-level prediction, an alert for out-of-distribution (OOD) images and a recommendation for dermoscopy to improve diagnostic prediction. Our hierarchical prediction output provides a binary level 1 prediction (benign/malignant), Level 2 prediction of eight possible categories (e.g. melanocytic and keratinocytic) and a more definitive Level 3 prediction from 44 lesion categories. The model produced high sensitivity for Level 1 prediction (88.14% CI: 87.42-88.51); however, significantly lower for Level 3 prediction (63.90%, CI: 62.27-65.61). By relying on all three prediction levels for consensus, Level 1 false-positives were reduced by 20-25%, and false-negatives were decreased by 11-13% of cases. OOD detection was benchmarked against previous landmark models and outperformed comparative models. Lastly, 44% of images were recommended for dermoscopy, and with additional image input, Level 3 sensitivity increased from 48.13% (CI:45.08-49.57) to 52.54% (CI:50.25-55.04). Our HOT-AI model attempts to address common challenges in existing models by combining three tasks in one model to increase accuracy and clinical utility. By providing a more nuanced prediction, and alert for OOD, the model output provides greater explainability of the AI decision process. Prospective clinical testing is required to measure how this additional output impacts user trust, and how the model performs in a real-world setting.
Read full abstract7-days of FREE Audio papers, translation & more with Prime
7-days of FREE Prime access