The impact of misclassification errors on the performance of biomarkers based on next-generation sequencing, a simulation study

Dong Wang,Sue-Jane Wang,Samir Lababidi

doi:10.1080/10543406.2023.2269251

Abstract

ABSTRACT The development of next-generation sequencing (NGS) opens opportunities for new applications such as liquid biopsy, in which tumor mutation genotypes can be determined by sequencing circulating tumor DNA after blood draws. However, with highly diluted samples like those obtained with liquid biopsy, NGS invariably introduces a certain level of misclassification, even with improved technology. Recently, there has been a high demand to use mutation genotypes as biomarkers for predicting prognosis and treatment selection. Many methods have also been proposed to build classifiers based on multiple loci with machine learning algorithms as biomarkers. How the higher misclassification rate introduced by liquid biopsy will affect the performance of these biomarkers has not been thoroughly investigated. In this paper, we report the results from a simulation study focused on the clinical utility of biomarkers when misclassification is present due to the current technological limit of NGS in the liquid biopsy setting. The simulation covers a range of performance profiles for current NGS platforms with different machine learning algorithms and uses actual patient genotypes. Our results show that, at the high end of the performance spectrum, the misclassification introduced by NGS had very little effect on the clinical utility of the biomarker. However, in more challenging applications with lower accuracy, misclassification could have a notable effect on clinical utility. The pattern of this effect can be complex, especially for machine learning-based classifiers. Our results show that simulation can be an effective tool for assessing different scenarios of misclassification.

Full Text