BackgroundWait times impact patient satisfaction, treatment effectiveness, and the efficiency of care that the patients receive. Wait time prediction in mental health is a complex task and is affected by the difficulty in predicting the required number of treatment sessions for outpatients, high no-show rates, and the possibility of using group treatment sessions. The task of wait time analysis becomes even more challenging if the input data has low utility, which happens when the data is highly deidentified by removing both direct and quasi identifiers.ObjectiveThe first aim of this study was to develop machine learning models to predict the wait time from referral to the first appointment for psychiatric outpatients by using real-time data. The second aim was to enhance the performance of these predictive models by utilizing the system’s knowledge while the input data were highly deidentified. The third aim was to identify the factors that drove long wait times, and the fourth aim was to build these models such that they were practical and easy-to-implement (and therefore, attractive to care providers).MethodsWe analyzed retrospective highly deidentified administrative data from 8 outpatient clinics at Ontario Shores Centre for Mental Health Sciences in Canada by using 6 machine learning methods to predict the first appointment wait time for new outpatients. We used the system’s knowledge to mitigate the low utility of our data. The data included 4187 patients who received care through 30,342 appointments.ResultsThe average wait time varied widely between different types of mental health clinics. For more than half of the clinics, the average wait time was longer than 3 months. The number of scheduled appointments and the rate of no-shows varied widely among clinics. Despite these variations, the random forest method provided the minimum root mean square error values for 4 of the 8 clinics, and the second minimum root mean square error for the other 4 clinics. Utilizing the system’s knowledge increased the utility of our highly deidentified data and improved the predictive power of the models.ConclusionsThe random forest method, enhanced with the system’s knowledge, provided reliable wait time predictions for new outpatients, regardless of low utility of the highly deidentified input data and the high variation in wait times across different clinics and patient types. The priority system was identified as a factor that contributed to long wait times, and a fast-track system was suggested as a potential solution.