Machine learning-based delta check method for detecting misidentification errors in tumor marker tests

Shinae Yu,Yuna Choi,Sollip Kim,Hyeon Seok Seok,Hangsik Shin,Kyung-Hwa Shin

doi:10.1515/cclm-2023-1185

Abstract

Abstract Objectives Misidentification errors in tumor marker tests can lead to serious diagnostic and treatment errors. This study aims to develop a method for detecting these errors using a machine learning (ML)-based delta check approach, overcoming limitations of conventional methods. Methods We analyzed five tumor marker test results: alpha-fetoprotein (AFP), cancer antigen 19-9 (CA19-9), cancer antigen 125 (CA125), carcinoembryonic antigen (CEA), and prostate-specific antigen (PSA). A total of 246,261 records were used in the analysis. Of these, 179,929 records were used for model training and 66,332 records for performance evaluation. We developed a misidentification error detection model based on the random forest (RF) and deep neural network (DNN) methods. We performed an in silico simulation with 1 % random sample shuffling. The performance of the developed models was evaluated and compared to conventional delta check methods such as delta percent change (DPC), absolute DPC (absDPC), and reference change values (RCV). Results The DNN model outperformed the RF, DPC, absDPC, and RCV methods in detecting sample misidentification errors. It achieved balanced accuracies of 0.828, 0.842, 0.792, 0.818, and 0.833 for AFP, CA19-9, CA125, CEA, and PSA, respectively. Although the RF method performed better than DPC and absDPC, it showed similar or lower performance compared to RCV. Conclusions Our research results demonstrate that an ML-based delta check method can more effectively detect sample misidentification errors compared to conventional delta check methods. In particular, the DNN model demonstrated superior and stable detection performance compared to the RF, DPC, absDPC, and RCV methods.

Full Text