Using Computer Vision and Natural Language Processing Techniques to Curate and Characterize Open Surgical Videos on YouTube

Yilun Zhang,Emmett Goodman,Chris J Kennedy,Gabriel Brat

doi:10.1097/01.xcs.0000894860.70215.b1

Abstract

INTRODUCTION: YouTube, the online multimedia platform, remains an untapped resource for surgical procedure content for education and research. Recent efforts to identify this content have lacked the diversity, volume and depth of the expanding corpus of surgical content on the site. We leverage computer vision (CV) and natural language processing (NLP) techniques to index and characterize the vast amount of open, non-minimally invasive surgical content on YouTube. METHODS: Twenty-three surgical procedure terms were used to programmatically query and collect videos, resulting in 9,197 results. Video-level metadata from YouTube were collected. Using a CV model and a NLP tool developed by our group, each video was analyzed for detailed scene understanding (e.g., action, tool, and hand detection) and medical and procedure tags from the Unified Medical Language System (UMLS) were generated. RESULTS: We identified 477, 270, and 556 videos related to head and neck, breast, and gastrointestinal surgery, respectively. 10.1 million frames were analyzed, and 11.9, 3.9, and 5.6 million, hands, surgical tools, and instances of surgical activity, respectively, were detected. 59,776 UMLS tags were also generated from video metadata. UMLS tags and temporal, “surgical signatures” were generated for each type and category of surgical procedures (Fig. 1).Figure 1CONCLUSION: Using novel CV and NLP techniques, we efficiently identified and characterized a large amount of open surgical videos in an automated way that exceeds previous efforts in scope, volume and depth. This technique enables the creation of a continuously expanding and comprehensive, surgical video repository to enhance surgical education and research.

Full Text