Biomarker selection and a prospective metabolite-based machine learning diagnostic for lyme disease

Eric R Kehoe,Kartikay Sharma,Gary P Wormser,Barbara Graham,John T Belisle,Bryna L Fitzgerald,Michael J Kirby,M Nurul Islam

doi:10.1038/s41598-022-05451-0

Eric R Kehoe, Kartikay Sharma + Show 6 more

Open Access

https://doi.org/10.1038/s41598-022-05451-0

Copy DOI

Abstract

We provide a pipeline for data preprocessing, biomarker selection, and classification of liquid chromatography–mass spectrometry (LCMS) serum samples to generate a prospective diagnostic test for Lyme disease. We utilize tools of machine learning (ML), e.g., sparse support vector machines (SSVM), iterative feature removal (IFR), and k-fold feature ranking to select several biomarkers and build a discriminant model for Lyme disease. We report a 98.13% test balanced success rate (BSR) of our model based on a sequestered test set of LCMS serum samples. The methodology employed is general and can be readily adapted to other LCMS, or metabolomics, data sets.

Highlights

We provide a pipeline for data preprocessing, biomarker selection, and classification of liquid chromatography–mass spectrometry (LCMS) serum samples to generate a prospective diagnostic test for Lyme disease
After the untargeted selection in XCMS we checked for missingness in the data to identify features with missing values in more than 80% of training samples—none of features met this criterion
Relative to the 44 LC-MS biomarkers discovered and LASSO diagnostic developed in Molins et al our sparse support vector machines (SSVM) diagnostic shows an 8.35% increase in test sensitivity and a 5.00% increase in test s pecificity[5]

Summary

Introduction

We provide a pipeline for data preprocessing, biomarker selection, and classification of liquid chromatography–mass spectrometry (LCMS) serum samples to generate a prospective diagnostic test for Lyme disease. We begin with the hypothesis that feature vectors, or the vectors of metabolite peak areas, for patients with Lyme disease and their healthy counterparts are separated in space when restricted to some reduced set of discriminatory biomarkers This is the base assumption of sparse, or minimal feature, models for feature selection. Multivariate models in statistics and ML, such as partial least squares-discriminant analysis (PLS-DA), kernel support vector machines, deep learning networks, and decision trees, can over-fit when training on data sets with many features and relatively few samples[12,13,14] This may be mitigated through hyperparameter tuning: controlling the balance between training and validation accuracy in a cross-validation experiment. Using a sparsity inducing penalty in the SSVM optimization problem reduces the number of parameters available to the model and serves to prevent over-fitting by regularizing the high-dimensional model

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Reports	Publication Date: Jan 27, 2022
Citations: 10	License type: open-access

R Discovery Prime

R Discovery Prime

Biomarker selection and a prospective metabolite-based machine learning diagnostic for lyme disease

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

Simultaneous classification and feature selection via LOG SVM and Elastic LOG SVM
Jian-Wei Liu ... Li-Peng Cui
-
Jian-Wei Liu, et. al.Jian-Wei Liu ... Li-Peng Cui
01 Jul 2017
01 Jul 2017

Iterative feature removal yields highly discriminative pathways
Stephen O’Hara ... Kun Wang
BMC Genomics | VOL. 14
Stephen O’Hara, et. al.Stephen O’Hara ... Kun Wang
25 Nov 2013
BMC Genomics | VOL. 14

Band selection in hyperspectral imagery using sparse support vector machines
Michael Kirby ... Fred A Kruse
-
Michael Kirby, et. al.Michael Kirby ... Fred A Kruse
13 Jun 2014
13 Jun 2014

Vesiculobullous Lyme disease: A case series
Hayden Doughty ... Joi B Carter
JAAD Case Reports | VOL. 24
Hayden Doughty, et. al.Hayden Doughty ... Joi B Carter
25 Apr 2022
JAAD Case Reports | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Biomarker selection and a prospective metabolite-based machine learning diagnostic for lyme disease

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports