Abstract
In the past few years, deep learning has been successfully applied to various omics data. However, the applications of deep learning in metabolomics are still relatively low compared to others omics. Currently, data pre-processing using convolutional neural network architecture appears to benefit the most from deep learning. Compound/structure identification and quantification using artificial neural network/deep learning performed relatively better than traditional machine learning techniques, whereas only marginally better results are observed in biological interpretations. Before deep learning can be effectively applied to metabolomics, several challenges should be addressed, including metabolome-specific deep learning architectures, dimensionality problems, and model evaluation regimes.
Highlights
In nuclear magnetic resonance (NMR) and mass spectroscopy (MS) based metabolomics, a variety of metabolomicsMachine learning (ML) algorithms have been developed for data pre-processing, peak identification, peak integration, compound identification/quantification, data analysis, and data integration [2,3,4,5,6]
Ease of use and accessibility of artificial neural networks (ANN) and deep learning (DL) methods are increasing for the metabolomics community due to development of neural network frameworks, simplified interfaces to the frameworks through high-level programing languages, and reduction in model computational time through optimization using graphics processing units (GPUs), which can effectively parallelize complex tasks and are readily available through stand-alone graphics cards in workstation-class machine or cloud computing services (Amazon Web Service [15], Google Cloud Platform [16], Microsoft Azure [17])
Like the aforementioned study, using DL-based regression to model the relationship between fish sizes and their metabolic profiles yielded a model with comparable performance to that of a traditional ML, Random Forest (RF) model, [85]
Summary
Machine learning (ML) or the concept of ‘training’ computational methods which can improve given more ‘experience’ or data has been a revolutionizing force in many disciplines, including metabolomics, for the last 15 years. Convolutional neural networks (CNN) were the most often utilised DL model architecture across all metabolomics data pipeline steps These models are often used in image processing due to their shift invariant characteristics and their application to metabolomic data varied across model complexities (e.g. numbers of neurons, hidden layers, filters, different types of optimizers, activation functions and loss functions). While many of the reviewed studies employed multiple types of neural networks in their work, including for different steps or performance comparisons, the non-linear rectified linear unit (ReLU) [31] was the most widely used activation function This is not surprising because ReLU is generally the most widely used activation function for CNNs [32] and may offer some advantages for dealing with the sparse nature of metabolomics data. DL model architectures for other workflow steps included a mix of shallow ANN and other variants of DNN such as autoencoders and CNNs (Fig. 1C)
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have