Abstract

Advances in computing power and the availability of digital data have led to significant progress in artificial intelligence (AI) algorithms. As a result, novel and innovative applications of AI in healthcare continue to surface both in the scientific community and the lay press at a rapid pace. AI is the field of computer science that focuses on the development of algorithms that enable high-level and rational response, interaction, and advanced cognitive and perceptual functions by machines. One area of AI that has particularly bourgeoned over the last decade is computer vision (CV)—an interdisciplinary scientific field that deals with how computers can gain a high-level understanding of digital images or videos and the ability to perform functions, such as object identification and tracking and scene recognition.1 Various fields in medicine have had significant success in the development of AI models capable of performing a variety of diagnostic functions using CV (eg, identifying abnormalities in diagnostic radiology, identifying malignant skin lesions, and interpreting electrocardiograms), and there is potential for similar success in procedural specialties such as surgery. Clinicians and innovators alike have sought to develop AI algorithms capable of improving our ability to provide therapeutic interventions, such as with real-time decision-support and computer-assisted surgery. The number of scientific publications involving AI has increased steadily over the past decade, and many AI algorithms for medical applications have been approved for use by the Food and Drug Administration.2 However, despite early successes with this new technology, there are concerns regarding the most-appropriate methodology for the design, development, and validation of AI algorithms. Furthermore, the existing literature suffers from a methodological “black box” caused by incomplete reporting.2 Therefore, more transparency and interpretability of AI-based clinical research in medicine are necessary. The Consolidated Standard of Reporting (CONSORT) and Standard Protocol Items: Recommendations and Intervention Trials (SPIRIT) guidelines have been extended to AI studies through CONSORT-AI3 and SPIRIT-AI.4 The Standards for Reporting of Diagnostic Accuracy Studies (STARD) and Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) will also be extended to STARD-AI5 and TRIPOD-ML.6 In addition, the minimal information about clinical artificial intelligence modeling 7 and minimum information for medical AI reporting 8 have been published as minimum reporting guidelines that aim to standardize medical AI research in terms of transparency and utility. Since the majority of AI interventions in medicine involve the field of computer-assisted diagnosis, the existing reporting guidelines have focused on studies related to computer-assisted diagnosis, such as diagnostic accuracy, prediction models, clinical decision support, and implementations in clinical trials (Table 1). There has not been much attention paid to surgical applications of AI algorithms that could assist in decision-making based on CV analysis of operative performance. TABLE 1 - Reporting Guidelines for Studies Involving Artificial Intelligence and Machine Learning Reporting Guideline(Year of Publication) Type of Study Number of Items Concept/Intended Use CONSORT-AI(2020) RCT 25(5 extended items) Recommendations for investigators to provide clear descriptions of the AI intervention regarding instructions and skills required for use, the setting in which the AI intervention is integrated, the handling of inputs and outputs of the AI intervention as well as human-AI interaction, and analysis of error cases. SPIRIT-AI (2020) Study protocols of clinical trials 33(7 extended items) MI-CLAIM(2020) Clinical AI modeling 6 Developed to better inform readers and users about the machine learning models themselves, especially regarding its design and validation using a retrospective study. MINIMAR(2020) AI in healthcare(classification or prediction) 4 The minimum information necessary to understand intended predictions, target populations, and hidden biases, and the ability to generalize technologies. TRIPOD-ML(underway) Prediction model studies – An introduction of machine learning prediction algorithms, building on a long and established methodology of prediction research. STARD-AI(underway) Diagnostic accuracy studies – An attempt to address the issues and challenges raised by AI-driven diagnostic modalities, including unclear methodological interpretations, lack of standardized nomenclatures, and heterogeneity of outcome measures. Computer Vision in Surgery International Collaborative AI-based CV in surgery – Assessment of surgical procedure and assistance using the AI-based CV approach to build a standardized reporting methodology, while harmonizing terminology. AI indicates artificial intelligence; CV, computer vision; RCT, randomized controlled trial. Recent advances in AI-based approaches to CV (eg, convolutional deep neural networks) have led to the development of several AI algorithms that can analyze and make interpretations within the operative field.9–11 Over 300 publications related to CV in surgery have been published—most of them in the last few years. While the aforementioned reporting guidelines such as CONSORT-AI and SPIRIT-AI have unequivocal roles in promoting high-quality reporting of data for AI research in medicine, there are nuances to research in CV in surgery that require more specialized guidelines. Given that AI-based CV in surgery is a relatively new field, methodological standards are lacking in the scientific and surgical communities. The lack of reporting guidelines specific to research and innovation in this field is a major obstacle for the production of scientific work that is interpretable, reproducible, and scalable. Researchers may struggle with reporting of their methodology, data collection, training, and testing of AI algorithms. Similarly, journal editors and peer-reviewers may have difficulty in critically appraising manuscripts to determine if the findings can be generalized or interpreted by their readership (mostly surgeons who lack a technical background in this field). Due to the innate multidisciplinary nature of research and innovation in AI-based CV, collaboration among clinicians, engineers, and data scientists is crucial, and such guidelines need the input of all stakeholders. Studies involving AI-based CV in surgery have several issues that need to be addressed by these stakeholders. Chief amongst them are the technical and nontechnical characteristics of the surgical videos used in training and testing datasets (eg, number and characteristics of patients, surgeons, and institutions from which the data are procured) and the real-time performance characteristics of the model (eg, inference speed and computational requirements). Moreover, details need to be specified with regard to data annotation (eg, definitions of the clinical phenomena being annotated, the number and clinical experience of annotators, and interannotator reliability12,13). Most AI-based CV models require content expertise for data annotation and training of AI algorithms designed to perform specialized functions. Therefore, quality assurance measures for data annotation need to be established a priori to ensure model integrity. Furthermore, standardized reporting criteria for annotation procedures for surgical videos are necessary to enable transparent reporting and appropriate interpretation. Other important considerations include data privacy as well as the ethical and responsible utilization of this technology for patient care. Much of the existing literature has been on model development and performance; however, it is imperative that ongoing research efforts in AI-based CV also be generalizable to a diverse group of populations and adhere to ethically-sound guidelines. As future infrastructure for intraoperative video data collection and sharing between institutions continue to be developed, these principles need to be clarified and incorporated as best practice guidelines. To address this important gap in surgical research, the Computer Vision in Surgery International Collaborative is under development. This collaboration will be composed of a multidisciplinary group of experts and stakeholders whose mission is to develop guidelines for the reporting of research and innovation specific to AI-based CV in surgery. The central objective is to devise a standardized and minimum set of requirements for reporting methodology and results in the publication of scientific work on CV in surgery. Given the unique nature of surgical videos, these reporting guidelines will focus on video and image analysis of surgical procedures using AI algorithms for performing CV tasks. Just as the quality of randomized controlled trials has greatly improved and contributed to the construction of robust evidence for medical practice since the first version of CONSORT was developed in 1996, these guidelines should help promote reliability, transparency, and completeness of published works, and improve the readability and interpretability by the readership. Ultimately, we hope it will contribute to the development of the field itself. While the intended guideline uniquely covers CV research in the field of surgery, this would not be limited only to a specific clinical study design or phase. Rather, it would work in tandem with other guidelines under development. The guidelines of SPIRIT-AI and CONSORT-AI are developed for clinical trials with AI interventions. The SPIRIT-AI is complementary to the CONSORT-AI statement, which aims to promote promoting transparency and completeness for clinical trials protocols for AI trials. STARD-AI and TRIPOD-ML are sets of reporting standards for diagnostic accuracy studies and prediction model studies using AI, respectively, and cover the phase of development and technical validation in silico. However, there are specific issues in the research field of AI-based CV in surgery, and they are universal issues regardless of study design and phase. Therefore, it is expected that these guidelines can be used in an over-lapping manner with existing guidelines, and rather may provide value alongside other, more generic guidelines. As part of a scoping review to identify items to include within reporting guidelines, references were retrieved from electronic databases (PubMed, MEDLINE, and Web of Science) for publishing articles and abstracts investigating AI-based CV in surgery, published from 2017 to 2020. The literature search criteria to identify the relevant studies were as follows: (surgery) AND (video) AND ((deep learning) OR (convolutional neural network) OR (computer vision)). Papers whose titles were considered irrelevant to the scope of this review were excluded, and those that were only published in English were reviewed. The following list of 4 themes of candidate items were identified to be included in the reporting guidelines: Through a preliminary scoping review to identify candidate items to include within reporting guidelines, the following 4 themes were identified: 1) study context (study design, study phase, and surgical procedure details); 2) dataset and annotation (dataset details, cohort characteristics, and annotation details); 3) model, evaluation, and validation (computer vision task, model optimization, computer specification, evaluation, and validation); 4) ethical and regulatory processes (institutional review board approval, informed consent, video dataset availability, anonymization, and code availability). The list of reporting items will be drafted using a systematic mixed-method approach. Focus groups for each theme will be formed, and qualitative data from these discussions will be synthesized into a comprehensive list of items. Finally, consensus will be established using a modified Delphi methodology to draft a finalized list of reporting guidelines. The consensus-building process will be developed in close collaboration with key stakeholders, including surgeons, computer scientists, AI engineers, journal editors, bioethicists, legal experts, global health experts, and patient advocates. To ensure a diversity of opinions, the multinational group will be composed of members from a wide breadth of demographics with representation from most continents (with the exception of Antarctica). AI-based research in healthcare has grown exponentially, and its application in CV and surgical procedures is gaining significant momentum. The growing popularity of minimally invasive surgery (eg, laparoscopy and robotic surgery) as well as the increase in the storage capacity and transfer of intraoperative data have brought us closer to a new age in digital surgery that requires rigorous surgical data science to ensure high-quality evidence for its adoption. The lack of standards for the reporting of studies in AI-based CV research for surgery may slow the development, evaluation, and adoption of these technologies and may limit hopes of using such technologies to enable the realization of image-guided surgery, intraoperative decision support systems, and autonomous surgical platforms. As this field of research continues to grow, we hope that the Computer Vision in Surgery International Collaborative can help establish best practices to guide future work and ensure that this technology is developed and implemented in a scientifically-sound, responsible, and ethical manner for the benefit of patients and the global surgical community.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call