Abstract

BackgroundOver the last decade, metabolomics has evolved into a mainstream enterprise utilized by many laboratories globally. Like other “omics” data, metabolomics data has the characteristics of a smaller sample size compared to the number of features evaluated. Thus the selection of an optimal subset of features with a supervised classifier is imperative. We extended an existing feature selection algorithm, threshold gradient descent regularization (TGDR), to handle multi-class classification of “omics” data, and proposed two such extensions referred to as multi-TGDR. Both multi-TGDR frameworks were used to analyze a metabolomics dataset that compares the metabolic profiles of hepatocellular carcinoma (HCC) infected with hepatitis B (HBV) or C virus (HCV) with that of cirrhosis induced by HBV/HCV infection; the goal was to improve early-stage diagnosis of HCC.ResultsWe applied two multi-TGDR frameworks to the HCC metabolomics data that determined TGDR thresholds either globally across classes, or locally for each class. Multi-TGDR global model selected 45 metabolites with a 0% misclassification rate (the error rate on the training data) and had a 3.82% 5-fold cross-validation (CV-5) predictive error rate. Multi-TGDR local selected 48 metabolites with a 0% misclassification rate and a 5.34% CV-5 error rate.ConclusionsOne important advantage of multi-TGDR local is that it allows inference for determining which feature is related specifically to the class/classes. Thus, we recommend multi-TGDR local be used because it has similar predictive performance and requires the same computing time as multi-TGDR global, but may provide class-specific inference.

Highlights

  • Over the last decade, metabolomics has evolved into a mainstream enterprise utilized by many laboratories globally

  • We investigate the use of two multi-threshold gradient descent regularization (TGDR) approaches to analyze mass spectrometry metabolomics data

  • Synthesized data In order to study the empirical performance of both multi-TGDR frameworks, we used the real values for metabolites of the hepatocellular carcinoma (HCC)/cirrhosis data (384 metabolites and 131 samples) but assigned the class membership according to pre-determined logit functions f

Read more

Summary

Introduction

Metabolomics has evolved into a mainstream enterprise utilized by many laboratories globally. We extended an existing feature selection algorithm, threshold gradient descent regularization (TGDR), to handle multi-class classification of “omics” data, and proposed two such extensions referred to as multi-TGDR. Both multi-TGDR frameworks were used to analyze a metabolomics dataset that compares the metabolic profiles of hepatocellular carcinoma (HCC) infected with hepatitis B (HBV) or C virus (HCV) with that of cirrhosis induced by HBV/HCV infection; the goal was to improve early-stage diagnosis of HCC. Feature selection algorithms, which select a subset of the most relevant features for the underlying data mining tasks, are commonly used in combination with classifier construction to analyze “omics” data or data with high-dimensional input variables. One type includes classification algorithms that can be directly extended to handle multiclass cases, and the other type includes algorithms that arise from the decomposition of multi-class problems into a series of binary ones

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call