Omics data provide a plethora of quantifiable information that can potentially be used to identify biomarkers targeting the physiological processes and ecological phenomena of organisms. However, omics data have not been fully utilized because current prediction methods in biomarker construction are susceptible to data multidimensionality and noise. We developed OmicSense, a quantitative prediction method that uses a mixture of Gaussian distributions as the probability distribution, yielding the most likely objective variable predicted for each biomarker. Our benchmark test using a transcriptome dataset revealed that OmicSense achieves accurate and robust prediction against background noise without overfitting. Weighted gene co-expression network analysis revealed that OmicSense preferentially utilized hub nodes of the network, indicating the interpretability of the method. Application of OmicSense to single-cell transcriptome, metabolome, and microbiome datasets confirmed high prediction performance (r > 0.8), suggesting applicability to diverse scientific fields. Given the recent rapidly expanding availability of omics data, the developed prediction tool OmicSense, can accelerate the use of omics data as a “biosensor” based on an assemblage of potential biomarkers.
Read full abstract