Appointment no-shows are disruptive to healthcare clinics, and may increase patient waiting time and clinic overtime, resulting in increased clinic costs. Appointment scheduling models typically mitigate the negative effects of no-shows through appointment overbooking. Recent work has proposed a predictive overbooking framework, where a probabilisitic classifier predicts the no-show probability of individual appointment requests, and a scheduling algorithm uses those predictions to optimally schedule appointments. Because predicting no-shows is typically an imbalanced classification problem, the preferred classifier is often chosen based upon the area under the receiver operator characteristic curve (AUC), which is a commonly used metric for many other imbalanced classification problems. Contrary to intuition, in this paper we show that employing the AUC to select a classifier results in significantly lower schedule efficiency than using other metrics such as Log Loss or Brier Score. Our computational experiments, validated on large real-world appointment data, suggest that by using Log Loss or Brier Score instead of AUC, practitioners can improve the schedule quality by 3–7%.