ObjectiveTo develop and evaluate machine learning (ML) approaches for muscle identification using intraoperative motor evoked potentials (MEPs), and to compare their performance to human experts. BackgroundThere is an unseized opportunity to apply ML analytic techniques to the world of intraoperative neuromonitoring (IOM). MEPs are the ideal candidates given the importance of their correct interpretation during a surgical operation to the brain or the spine. In this work, we develop and test a set of different ML models for muscle identification using intraoperative MEPs and compare their performance to human experts. In addition, we provide a review of the available literature on current ML applications to IOM data in neurosurgery. MethodsWe trained and tested five different ML classifiers on a MEP database developed from six different muscles in patients who underwent brain or spinal cord surgery. MEPs were obtained by both transcranial (TES) and direct cortical stimulation (DCS) protocols. The models were evaluated within a single patient and on previously unseen patients, considering signals from TES and DCS both independently and mixed. Ten expert neurophysiologists classified a set of 50 randomly selected MEPs, and their performance was compared to the best performing model. ResultsA total of 25.423 MEPs were included in the study. Random Forest proved to be the best performing model with 99 % accuracy in the single patient dataset task and a 78 %–94 % accuracy range on previously unseen patients. The model performance was maximized by representing MEPs as a set of features typically employed in signal processing compared to traditional neurophysiological parameters. The classification ability of the Random Forest model between six different muscles and across different MEP acquisition modalities (79 %) significantly exceeded that of human experts (mean 48 %). ConclusionsCarefully selected ML models proved to have reliable capacity of extracting meaningful information to classify intraoperative MEPs using a limited number of features, proving robustness across patients and signal acquisition modalities, outperforming human experts, and with the potential to act as decision support systems to the IOM team. Such encouraging results lay the path to further explore the underlying nature of clinically important signals, with the aim to continue to produce useful applications to make surgeries safer and more efficient.