Abstract

In the domain of chemometrics, multiblock data analysis is widely performed for exploring or fusing data from multiple sources. Commonly used methods for multiblock predictive analysis are the extensions of latent space modelling approaches. However, recently, deep learning (DL) approaches such as convolutional neural networks (CNNs) have outperformed the single block traditional latent space modelling chemometric approaches such as partial least-square (PLS) regression. The CNNs based DL modelling can also be performed to simultaneously deal with the multiblock data but was never explored until this study. Hence, this study for the first time presents the concept of parallel input CNNs based DL modelling for multiblock predictive chemometric analysis. The parallel input CNNs based DL modelling utilizes individual convolutional layers for each data block to extract key features that are later combined and passed to a regression module composed of fully connected layers. The method was tested on a real visible and near-infrared (Vis-NIR) large data set related to dry matter prediction in mango fruit. To have the multiblock data, the visible (Vis) and near-infrared (NIR) parts were treated as two separate blocks. The performance of the parallel input CNN was compared with the traditional single block CNNs based DL modelling, as well as with a commonly used multiblock chemometric approach called sequentially orthogonalized partial least-square (SO-PLS) regression. The results showed that the proposed parallel input CNNs based deep multiblock analysis outperformed the single block CNNs based DL modelling and the SO-PLS regression analysis. The root means squared errors of prediction obtained with deep multiblock analysis was 0.818%, relatively lower by 4 and 20% than single block CNNs and SO-PLS regression, respectively. Furthermore, the deep multiblock approach attained ∼3% lower RMSE compared to the best known on the mango data set used for this study. The deep multiblock analysis approach based on parallel input CNNs could be considered as a useful tool for fusing data from multiple sources.

Highlights

  • Data from multiple sources is widely encountered in the chemometrics domain [1,2]

  • This study for the first time presented a new multiblock predictive modelling approach based on parallel input convolutional neural networks (CNNs)

  • The method was compared with a single block CNN and a popular chemometric technique called SOPLS

Read more

Summary

Introduction

Data from multiple sources is widely encountered in the chemometrics domain [1,2]. For example, measurements performed on a single sample with multiple spectroscopic sensors [3,4], data measured on multiple batches [5], and same data pre-processed with several pre-processing techniques [6e8]. The traditional single block latent variables based chemometric approaches such as principal component analysis (PCA) [9] and partial least-square regression (PLS) [10,11] analysis are widely used but they are not the optimal analysis solution when comes to multiblock data [2]. To deal with multi-source data, especial techniques called multiblock data analysis techniques exist in the domain of chemometrics [1,2,5,12e18]. Multiblock data analysis techniques exist for both data exploration [2,12,13] and predictive modelling [2,15e17]. Several feature selection methods are available such as the sparse covariate regression [25] and sequential orthogonalized covariate selection [21] that, while maintaining the predictive accuracy of models, allows extracting key hidden features from the multi-source data

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call