Cochlear implant (CI) user functional outcomes are challenging to predict because of the variability in individual anatomy, neural health, CI device characteristics, and linguistic and listening experience. Machine learning (ML) techniques are uniquely poised for this predictive challenge because they can analyze nonlinear interactions using large amounts of multidimensional data. The objective of this article is to systematically review the literature regarding ML models that predict functional CI outcomes, defined as sound perception and production. We analyze the potential strengths and weaknesses of various ML models, identify important features for favorable outcomes, and suggest potential future directions of ML applications for CI-related clinical and research purposes. We conducted a systematic literature search with Web of Science, Scopus, MEDLINE, EMBASE, CENTRAL, and CINAHL from the date of inception through September 2024. We included studies with ML models predicting a CI functional outcome, defined as those pertaining to sound perception and production, and excluded simulation studies and those involving patients without CIs. Using Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, we extracted participant population, CI characteristics, ML model, and performance data. Sixteen studies examining 5058 pediatric and adult CI users (range: 4 to 2489) were included from an initial 1442 publications. Studies predicted heterogeneous outcome measures pertaining to sound production (5 studies), sound perception (12 studies), and language (2 studies). ML models use a variety of prediction features, including demographic, audiological, imaging, and subjective measures. Some studies highlighted predictors beyond traditional CI audiometric outcomes, such as anatomical and imaging characteristics (e.g., vestibulocochlear nerve area, brain regions unaffected by auditory deprivation), health system factors (e.g., wait time to referral), and patient-reported measures (e.g., dizziness and tinnitus questionnaires). Used ML models were tree-based, kernel-based, instance-based, probabilistic, or neural networks, with validation and test methods most commonly being k-fold cross-validation and train-test split. Various statistical measures were used to evaluate model performance, however, for studies reporting accuracy, the best-performing models for each study ranged from 71.0% to 98.83%. ML models demonstrate high predictive performance and illuminate factors that contribute to CI user functional outcomes. While many models showed favorable evaluation statistics, the majority were not adequately reported with regard to dataset characteristics, model creation, and validation. Furthermore, the extent of overfitting in these models is unclear and will likely result in poor generalization to new data. This suggests the need for more robust validation procedures and standardization in reporting, with the ultimate hope that the iterative improvement of these models will allow for their adoption as a future clinical tool.
Read full abstract