Abstract

This study introduces a novel approach for the diagnosis of Cleft Lip and/or Palate (CL/P) by integrating Vision Transformers (ViTs) and Siamese Neural Networks. Our study is the first to employ this integration specifically for CL/P classification, leveraging the strengths of both models to handle complex, multimodal data and few-shot learning scenarios. Unlike previous studies that rely on single-modality data or traditional machine learning models, we uniquely fuse anatomical data from ultrasound images with functional data from speech spectrograms. This multimodal approach captures both structural and acoustic features critical for accurate CL/P classification. Employing Siamese Neural Networks enables effective learning from a small number of labeled examples, enhancing the model’s generalization capabilities in medical imaging contexts where data scarcity is a significant challenge. The models were tested on the UltraSuite CLEFT dataset, which includes ultrasound video sequences and synchronized speech data, across three cleft types: Bilateral, Unilateral, and Palate-only clefts. The two-stage model demonstrated superior performance in classification accuracy (82.76%), F1-score (80.00–86.00%), precision, and recall, particularly distinguishing Bilateral and Unilateral Cleft Lip and Palate with high efficacy. This research underscores the significant potential of advanced AI techniques in medical diagnostics, offering valuable insights into their application for improving clinical outcomes in patients with CL/P.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.