Abstract

Hierarchical classification (HC) stratifies and classifies data from broad classes into more specific classes. Unlike commonly used data classification strategies, this enables the probabilistic prediction of unknown classes at different levels, minimizing the burden of incomplete databases. Despite these advantages, its translational application in biomedical sciences has been limited. We describe and demonstrate the implementation of a HC approach for “omics-driven” classification of 15 bacterial species at various taxonomic levels achieving 90–100% accuracy, and 9 cancer types into morphological types and 35 subtypes with 99% and 76% accuracy, respectively. Unknown bacterial species were probabilistically assigned with 100% accuracy to their respective genus or family using mass spectra (n = 284). Cancer types were predicted by mRNA data (n = 1960) for most subtypes with 95–100% accuracy. This has high relevance in clinical practice where complete datasets are difficult to compile with the continuous evolution of diseases and emergence of new strains, yet prediction of unknown classes, such as bacterial species, at upper hierarchy levels may be sufficient to initiate antimicrobial therapy. The algorithms presented here can be directly translated into clinical-use with any quantitative data, and have broad application potential, from unlabeled sample identification, to hierarchical feature selection, and discovery of new taxonomic variants.

Highlights

  • The relentless drives towards precision medicine necessitate an ever-increasing reliance on integrating of complex and large-scale multi-omics and clinical datasets derived from multiple sources[1]

  • The identification of bacteria using mass spectra has been reported and reviewed several times[9,10,11,12,13,14,15], prediction is difficult to achieve with the use of conventional classification methods which are frequently incapable of correctly assigning upper level taxonomy for species not encountered before in the dataset

  • To validate and assess the performance of the proposed Hierarchical classification (HC) algorithm, we first applied it to mass spectral profiles acquired from 15 bacterial species (Fig. 1a), determining the classification performance across 6 hierarchy levels: (i) Gram staining type; (ii) class; (iii) order; (iv) family; (v) genus; and (vi) species

Read more

Summary

Introduction

The relentless drives towards precision medicine necessitate an ever-increasing reliance on integrating of complex and large-scale multi-omics and clinical datasets derived from multiple sources[1]. Current commonly used bioinformatics methods for classification of biomedical datasets are based on ‘training’ a single classification method (or ensemble, thereof) to discriminate between different classes e.g. patient outcomes or healthy and cancerous tissue[7], or organisms such as bacterial species[8,9] Such approaches may be referred to as ‘flat classification’ and while this provides classification accuracy results which seem highly promising, there are key limitations which are often overlooked: (i) class discrimination at one level may diminish with increasing numbers of classes, resulting in lower classification accuracy for large datasets; (ii) since all classes in the model are considered to be either ‘training’ or ‘prediction’, the classification accuracy for a particular class can be influenced by other taxonomically ‘distant’ classes; and (iii) incomplete databases offer little or no predictive capacity for new (previously unknown) classes. The identification of bacteria using mass spectra has been reported and reviewed several times[9,10,11,12,13,14,15], prediction is difficult to achieve with the use of conventional classification methods which are frequently incapable of correctly assigning upper level taxonomy for species not encountered before in the dataset

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call