Abstract

MotivationThe measurement of disease biomarkers in easily–obtained bodily fluids has opened the door to a new type of non–invasive medical diagnostics. New technologies are being developed and fine–tuned in order to make this possibility a reality. One such technology is Field Asymmetric Ion Mobility Spectrometry (FAIMS), which allows the measurement of volatile organic compounds (VOCs) in biological samples such as urine. These VOCs are known to contain a range of information on the relevant person’s metabolism and can in principle be used for disease diagnostic purposes. Key to the effective use of such data are well–developed data processing pipelines, which are necessary to extract the most useful data from the complex underlying biological structure.ResultsIn this study, we present a new data analysis pipeline for FAIMS data, and demonstrate a number of improvements over previously used methods. We evaluate the effect of a series of candidate operational steps during data processing, such as the use of wavelet transforms, principal component analysis (PCA), and classifier ensembles. We also demonstrate the use of FAIMS data in our pipeline to diagnose diabetes on the basis of a simple urine sample using machine learning classifiers. We present results for data generated from a case-control study of 115 urine samples, collected from 72 type II diabetic patients, with 43 healthy volunteers as negative controls. The resulting pipeline combines the steps that resulted in the best classification model performance. These include the use of a two–dimensional discrete wavelet transform, and the Wilcoxon rank–sum test for feature selection. We are able to achieve a best ROC curve AUC of 0.825 (0.747–0.9, 95% CI) for classification of diabetes vs control. We also note that this result is robust to changes in the data pipeline and different analysis runs, with AUC > 0.80 achieved in a range of cases. This is a substantial improvement in performance over previously used data processing methods in this area. Our ability to make strong statements about FAIMS ability to diagnose diabetes is sadly limited, as we found confounding effects from the demographics when including these data in the pipeline. The demographics alone produced a best AUC of 0.87 (0.795–0.94, 95% CI). While the combination of the demographics and FAIMS data resulted in an improvement on the AUC (0.907; 0.848–0.97, 95% CI), it did not prove to be a significant difference. Nevertheless, the pipeline itself shows a significant improvement in performance over more basic methods which have been used with FAIMS data in the past.

Highlights

  • Disease implies a change in a chemical ‘fingerprint’ of the patient’s tissue, either due to a change in the patient’s own metabolism or due to the alterations resulting from the pathogens causing the disease [1,2,3,4,5]

  • We evaluate the effect of a series of candidate operational steps during data processing, such as the use of wavelet transforms, principal component analysis (PCA), and classifier ensembles

  • We demonstrate the use of Field Asymmetric Ion Mobility Spectrometry (FAIMS) data in our pipeline to diagnose diabetes on the basis of a simple urine sample using machine learning classifiers

Read more

Summary

Introduction

Disease implies a change in a chemical ‘fingerprint’ of the patient’s tissue, either due to a change in the patient’s own metabolism (in cancer or diabetes for instance) or due to the alterations resulting from the pathogens causing the disease (fermentome) [1,2,3,4,5] In many cases these chemical changes can be detected in natural human waste, be it breath, urine, sweat, stool or other. Gas phase diagnosis offers considerable potential for the medical profession: it is non–invasive, close to real–time, has minimal consumable cost and has the opportunity to be made point–of–care For these reasons, researchers have been developing instruments focused on the use of this technology. The use of sophisticated data processing methods are required to improve the effective use of these data

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.