Infrared thermographs (IRTs) implemented according to standardized best practices have shown strong potential for detecting elevated body temperatures (EBT), which may be useful in clinical settings and during infectious disease epidemics. However, optimal IRT calibration methods have not been established and the clinical performance of these devices relative to the more common non-contact infrared thermometers (NCITs) remains unclear. In addition to confirming the findings of our preliminary analysis of clinical study results, the primary intent of this study was to compare methods for IRT calibration and identify best practices for assessing the performance of IRTs intended to detect EBT. A key secondary aim was to compare IRT clinical accuracy to that of NCITs. We performed a clinical thermographic imaging study of more than 1000 subjects, acquiring temperature data from several facial locations that, along with reference oral temperatures, were used to calibrate two IRT systems based on seven different regression methods. Oral temperatures imputed from facial data were used to evaluate IRT clinical accuracy based on metrics such as clinical bias (), repeatability, root-mean-square difference, and sensitivity/specificity. We proposed several calibration approaches designed to account for the non-uniform data density across the temperature range and a constant offset approach tended to show better ability to detect EBT. As in our prior study, inner canthi or full-face maximum temperatures provided the highest clinical accuracy. With an optimal calibration approach, these methods achieved a between ±0.03 °C with standard deviation () less than 0.3 °C, and sensitivity/specificity between 84% and 94%. Results of forehead-center measurements with NCITs or IRTs indicated reduced performance. An analysis of the complete clinical data set confirms the essential findings of our preliminary evaluation, with minor differences. Our findings provide novel insights into methods and metrics for the clinical accuracy assessment of IRTs. Furthermore, our results indicate that calibration approaches providing the highest clinical accuracy in the 37–38.5 °C range may be most effective for measuring EBT. While device performance depends on many factors, IRTs can provide superior performance to NCITs.