Abstract Background Analytical and post-analytical errors are a persistent problem, which can be benchmarked using sigma metrics. Manual processes are associated with error rates of 3–4 sigma. Secondary review of manual reporting is meant to reduce/eliminate errors and is widely used, yet the true efficacy of manual secondary review is difficult to assess. Kidney stone analysis is a manually intensive process, requiring personnel to identify stone composition from spectra and to transcribe stone percent composition into an LIS system. Recently, we developed an AI program that allows for quantitative determination of kidney stone composition from spectra. This allows for a unique opportunity to estimate error rates associated with manual reporting and those following secondary review by means of subsequent AI quality assurance analysis. Methods Experienced technologists assessed stone composition and estimated the percent of each component (± 10%), based on Fourier transform infrared (FTIR) spectroscopy. These manual spectral composition assessments are secondarily reviewed by other experienced technologists for identity and percentage composition accuracy. The recently described Kidney stone AI spectral analysis (RokkStar) was implemented to assess reported stone composition identity and percent composition for agreement following secondary review in 159 334 stone reports. AI flagged discrepant reports were reviewed by personnel and reports were revised when appropriate. Secondary reviewer error detection rate was obtained by monitoring detection in a subset of 7440 kidney stones, and was that rate was extrapolated to establish secondary review error detection rate over the entire data set. The sum (total error) of estimated secondary reviewer error detection, and well as AI-mediated error detection, and all client reported errors were used to determine error rates in sigma following manual reporting, secondary review, and AI quality assurance. Results Manual entry led to an estimated 2060 errors in 159 334 stone reports, yielding an initial sigma value of 3.73. Secondary manual review resulted in an estimated 1909 (error detection rate of 92.5%) of these errors being detected over the entire report set, improving the sigma value to 4.61. Post-verification AI caught an additional 138 errors, leading to an improved post-AI sigma value of 5.27. Post-AI quality assurance, there were 13 remaining errors, 9 of which were manual reporting process deviations subsequently caught by laboratory staff, and 4 were reporting errors observed by client physicians. All 13 of these errors involved usual stone constituents or typographical errors that were not assessed by the current AI quality assurance version. Conclusion This study represents a unique large-scale estimate of efficacy of secondary result review by means of AI quality assurance review. While the presence of undetected errors cannot be completely ruled out, in this case the relative paucity of client reported errors, as well the alignment of the estimated rate of manual errors with previous reports may indicate that most errors were detected in this system. Importantly, manual secondary review improved errors rates by ∼0.9 sigma, and the use of AI further improved estimated overall sigma metric by ∼0.7, leading to a “world class” 5+ sigma level for this complex reporting process.