BackgroundTuberculosis is a chronic infectious disease caused by mycobacterium tuberculosis (MTB) and is the ninth leading cause of death worldwide. It is still difficult to distinguish active TB from latent TB,but it is very important for individualized management and treatment to distinguish whether patients are active or latent tuberculosis infection.MethodsA total of 220 subjects, including active TB patients (ATB, n = 97) and latent TB patients (LTB, n = 113), were recruited in this study .46 features about blood routine indicators and the VCS parameters (volume, conductivity, light scatter) of neutrophils(NE), monocytes(MO), and lymphocytes(LY) were collected and was constructed classification model by four machine learning algorithms(logistic regression(LR), random forest(RF), support vector machine(SVM) and k-nearest neighbor(KNN)). And the area under the precision-recall curve (AUPRC) and the area under the receiver operating characteristic curve (AUROC) to estimate of the model’s predictive performance for dentifying active and latent tuberculosis infection.ResultsAfter verification,among the four classifications, LR and RF had the best performance (AUROC = 1, AUPRC = 1), followed by SVM (AUROC = 0.967, AUPRC = 0.971), KNN (AUROC = 0.943, AUPRC = 0.959) in the training set. And LR had the best performance (AUROC = 0.977, AUPRC = 0.957), followed by SVM (AUROC = 0.962, AUPRC = 0.949), RF (AUROC = 0.903, AUPRC = 0.922),KNN(AUROC = 0.883, AUPRC = 0.901) in the testing set.ConclusionsThe machine learning algorithm classifier based on leukocyte VCS parameters is of great value in identifying active and latent tuberculosis infection.
Read full abstract