Abstract

Biomass estimation, fertilisation, and crop production reflect crop yield potential. The prediction of these variables allows the selection of crop cultivars with high yield potential. Deep neural networks (DNNs) can predict such crop variables. However, DNNs are data greedy algorithms that overfit/underfit on small-size datasets. Additionally, the collection of big data is expensive and laborious. Therefore, providing synthetic big data is preferable. This study aims to: (i) develop a trigonometric-Euclidean-smoother interpolation (TESI) for continuous time-series and non-time-series data augmentation to prevent DNNs from under/overfitting; (ii) compare the TESI performance to the tabular variational autoencoder (TVAE) and the conditional tabular generative adversarial network (CTGAN); and (iii) compare the DNN performance before and after data augmentation. Two time-series datasets, oil palm production and rice production, and two non-time-series datasets, fertiliser and rice total aboveground biomass (TAGB), were augmented using the TESI, TVAE, and CTGAN algorithms. The TESI retained the features’ original probability distribution in the four datasets. The C-TESI achieved the lowest mean squared error mean percentage (MAEP) on the oil palm (0.60–2.85%), rice (0.77–1.72%), and fertiliser datasets (2.04–2.21%). The TESI retained the variance inflation factor (VIF) ranges less than 10 on the four datasets; the TESI retained a VIF range of 1.99–10.06 or reduced the VIF range to 1.55–6.66. Furthermore, the TESI retained the Spearman's r (rs) range of 0.79–0.97 or increased it to 0.81–0.99 on the four datasets. The DNN achieved the highest coefficient of determination (R2) (0.77–0.99) and lowest root mean squared error (RMSE) ranges (2.8E+01–8.1E+05) on the four datasets augmented with the TESI. The Q-TESI, C-TESI, and L-TESI overcame the LN-TESI in retaining the features’ original probability distribution, minimising the augmentation loss, reducing the VIF, increasing the rs, and decreasing the DNN under/overfitting. The Q-TESI, C-TESI, and L-TESI may approximate the nonlinear changes of crop phenology in time-spaced sampling, thereby reducing the cost of sampling for scientists. In addition, they intensify zonal synthetic sampling, thereby reducing sampling labour.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.