Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Study on the detection of water status of tomato (Solanum lycopersicum L.) by multimodal deep learning

  • Abstract
  • Highlights & Summary
  • PDF
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Water plays a very important role in the growth of tomato (Solanum lycopersicum L.), and how to detect the water status of tomato is the key to precise irrigation. The objective of this study is to detect the water status of tomato by fusing RGB, NIR and depth image information through deep learning. Five irrigation levels were set to cultivate tomatoes in different water states, with irrigation amounts of 150%, 125%, 100%, 75%, and 50% of reference evapotranspiration calculated by a modified Penman-Monteith equation, respectively. The water status of tomatoes was divided into five categories: severely irrigated deficit, slightly irrigated deficit, moderately irrigated, slightly over-irrigated, and severely over-irrigated. RGB images, depth images and NIR images of the upper part of the tomato plant were taken as data sets. The data sets were used to train and test the tomato water status detection models built with single-mode and multimodal deep learning networks, respectively. In the single-mode deep learning network, two CNNs, VGG-16 and Resnet-50, were trained on a single RGB image, a depth image, or a NIR image for a total of six cases. In the multimodal deep learning network, two or more of the RGB images, depth images and NIR images were trained with VGG-16 or Resnet-50, respectively, for a total of 20 combinations. Results showed that the accuracy of tomato water status detection based on single-mode deep learning ranged from 88.97% to 93.09%, while the accuracy of tomato water status detection based on multimodal deep learning ranged from 93.09% to 99.18%. The multimodal deep learning significantly outperformed the single-modal deep learning. The tomato water status detection model built using a multimodal deep learning network with ResNet-50 for RGB images and VGG-16 for depth and NIR images was optimal. This study provides a novel method for non-destructive detection of water status of tomato and gives a reference for precise irrigation management.

Similar Papers
  • Conference Article
  • Cite Count Icon 2
  • 10.1109/icassp49357.2023.10095148
Long Range Imaging Using Multispectral Fusion of RGB and NIR Images
  • Jun 4, 2023
  • Hao Zhang + 2 more

Unlike RGB images, NIR images are robust to atmospheric environments such as Rayleigh scattering. In this paper, we propose long-range imaging using multispectral fusion of RGB and NIR images. The proposed network consists of feature extraction, feature fusion, and reconstruction. For feature extraction, we adopt pyramid feature selection to capture multi-scale information. For feature fusion, we present a learnable attention based fusion block (AFB) to refine and fuse latent features of RGB and NIR images and a multi-scale fusion block (MFB) to integrate the fused feature in multiple scales. For reconstruction, we use three convolutions corresponding to the encoder to reconstruct the fusion image. Besides, we perform image registration based on scale-invariant feature transform (SIFT) to align NIR image referring to RGB image. Experimental results show that the proposed network successfully recovers hidden textures lost in RGB images while keeping colors and outperforms state-of-the-art fusion methods in terms of both visual quality and quantitative measurements.

  • Research Article
  • Cite Count Icon 8
  • 10.1016/j.inffus.2022.11.020
LRINet: Long-range imaging using multispectral fusion of RGB and NIR images
  • Nov 21, 2022
  • Information Fusion
  • Lu Liu + 2 more

LRINet: Long-range imaging using multispectral fusion of RGB and NIR images

  • Conference Article
  • Cite Count Icon 13
  • 10.1109/icpr.2018.8545108
Multi-Spectral Fusion and Denoising of RGB and NIR Images Using Multi-Scale Wavelet Analysis
  • Aug 1, 2018
  • Haonan Su + 1 more

In this paper, we propose multi-spectral fusion and denoising (MFD) of RGB and NIR images using multi-scale wavelet analysis. We formulate MFD of RGB and NIR images as a maximum a posterior (MAP) estimation problem in the wavelet domain. The direct fusion of noisy RGB and NIR image often leads to contrast attenuation due to the discrepancy between RGB and NIR images. Thus, we generate the wavelet scale map for fusion and denoising based on correlation between NIR and RGB wavelet coefficients. To consider local contrast and visibility of NIR data on RGB components, we provide the contrast preservation term for scale map estimation based on the local contrast and visibility. We use the regularization term to select high visibility and contrast of NIR wavelet coefficients in the scale map. Since noise generally appears in the high frequency band, we use gradients of NIR wavelet coefficients as the weight for weighted least square (WLS) smoothing in the scale map. Based on the wavlet scale map, we perform fusion and denoising of RGB and NIR wavelet coefficients. Experimental results show that the proposed method successfully performs fusion of RGB and NIR images with noise reduction and detail preservation as well as outperforms state-of-the-arts in terms of discrete entropy (DE) and feature-based blind image quality evaluator (FBIQE).

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/icpr48806.2021.9412411
Deep Fusion of RGB and NIR Paired Images Using Convolutional Neural Networks
  • Jan 10, 2021
  • Lin Mei + 1 more

In low light condition, the captured color (RGB) images are highly degraded by noise with severe texture loss. In this paper, we propose deep fusion of RGB and NIR paired images in low light condition using convolutional neural networks (CNNs). The proposed deep fusion network consists of three independent sub-networks: DenoisingNet, EnhancingNet, and FusionNet. We build a denoising sub-network to eliminate noise from noisy RGB images. After denoising, we perform an enhancing sub-network to increase the brightness of low light RGB images. Since NIR image contains fine details, we fuse it with the Y channel of RGB image through a fusion subnetwork. Experimental results demonstrate that the proposed method successfully fuses RGB and NIR images, and generates high quality fusion results containing textures and colors.

  • Research Article
  • Cite Count Icon 1
  • 10.3390/agronomy15051036
A Shooting Distance Adaptive Crop Yield Estimation Method Based on Multi-Modal Fusion
  • Apr 25, 2025
  • Agronomy
  • Dan Xu + 5 more

To address the low estimation accuracy of deep learning-based crop yield image recognition methods under untrained shooting distances, this study proposes a shooting distance adaptive crop yield estimation method by fusing RGB and depth image information through multi-modal data fusion. Taking strawberry fruit fresh weight as an example, RGB and depth image data of 348 strawberries were collected at nine heights ranging from 70 to 115 cm. First, based on RGB images and shooting height information, a single-modal crop yield estimation model was developed by training a convolutional neural network (CNN) after cropping strawberry fruit images using the relative area conversion method. Second, the height information was expanded into a data matrix matching the RGB image dimensions, and multi-modal fusion models were investigated through input-layer and output-layer fusion strategies. Finally, two additional approaches were explored: direct fusion of RGB and depth images, and extraction of average shooting height from depth images for estimation. The models were tested at two untrained heights (80 cm and 100 cm). Results showed that when using only RGB images and height information, the relative area conversion method achieved the highest accuracy, with R2 values of 0.9212 and 0.9304, normalized root mean square error (NRMSE) of 0.0866 and 0.0814, and mean absolute percentage error (MAPE) of 0.0696 and 0.0660 at the two untrained heights. By further incorporating depth data, the highest accuracy was achieved through input-layer fusion of RGB images with extracted average height from depth images, improving R2 to 0.9475 and 0.9384, reducing NRMSE to 0.0707 and 0.0766, and lowering MAPE to 0.0591 and 0.0610. Validation using a developed shooting distance adaptive crop yield estimation platform at two random heights yielded MAPE values of 0.0813 and 0.0593. This model enables adaptive crop yield estimation across varying shooting distances, significantly enhancing accuracy under untrained conditions.

  • Conference Article
  • Cite Count Icon 4
  • 10.1115/icef2022-89893
Deep Learning for Surface Assessment of Cylinder Liners in Large Internal Combustion Engines
  • Oct 16, 2022
  • Matthias Schwab + 6 more

As digitalization advances, improvements in technology have made it easier to conduct condition assessment of key components of large internal combustion engines such as cylinder liners. Due to their movement relative to the piston, the inner surfaces of the liners are subject to constant wear, and it is critical that the engine operator is informed in advance of any imminent damage. This study deals with wear assessment for cylinder liners from Type 6 gas engines from INNIO Jenbacher GmbH & Co OG with a cylinder displacement of approximately 6 dm3. Currently, wear quantification with this type of liner requires high-resolution microscopic surface depth measurements. The depth maps of the surface are then used for further analysis of the liner surface topography. To perform these microscopic measurements, the liners must be disassembled, cleaned and cut into segments, which is a major drawback of the current measurement process. Since the cylinder liners examined can no longer be used even if no major wear is detected, the main goal of the research presented here is to develop a method that is able to predict the depth map of a liner from a single RGB reflection image, i.e., a color image with no direct depth information. In recent years, depth map prediction from RGB images has become a vital part of image analysis in various fields such as the automotive industry, gaming and robotics. However, only a few studies deal with depth map predictions on a microscopic scale. For this study, both RGB and depth images of the cylinder liner surface with pixel-wise alignment are obtained with the help of the same confocal microscope. This data set contains 740 pairs of high-resolution microscopic depth and RGB reflection images capturing a roughly 2 × 2 mm area. As there are no landmarks, the depth of the surface is measured relative to the core of the profile. This is a main difference to most other studies, which mainly focus on absolute depth measurements. First, the physical connection between the depth and the reflection images is investigated and described mathematically. This theoretical model provides good insight into how the information about the structure of the surface contained in the RGB image can be separated from other influencing factors such as lighting condition or color. Next, a deep learning framework is proposed to estimate the liner depth profiles from the RGB reflection images. A convolutional neural network is trained in a supervised manner to learn the correspondence between RGB and depth images. Using the physical model obtained in the first step, an RGB image is reconstructed from the predicted depth map. To ensure the physical plausibility of the model’s predictions, the similarity between the RGB input and the corresponding reconstruction is enforced by a reconstruction term. The proposed machine learning approach is comprehensively evaluated using meaningful distance measures between depth predictions and corresponding ground truth profiles. The results show that the proposed method is able to predict the depth profiles of the cylinder liners very accurately, indicating the great potential for engine liner wear assessment using microscopic RGB images.

  • PDF Download Icon
  • Research Article
  • 10.3390/electronics13163330
Depth Image Rectification Based on an Effective RGB–Depth Boundary Inconsistency Model
  • Aug 22, 2024
  • Electronics
  • Hao Cao + 3 more

Depth image has been widely involved in various tasks of 3D systems with the advancement of depth acquisition sensors in recent years. Depth images suffer from serious distortions near object boundaries due to the limitations of depth sensors or estimation methods. In this paper, a simple method is proposed to rectify the erroneous object boundaries of depth images with the guidance of reference RGB images. First, an RGB–Depth boundary inconsistency model is developed to measure whether collocated pixels in depth and RGB images belong to the same object. The model extracts the structures of RGB and depth images, respectively, by Gaussian functions. The inconsistency of two collocated pixels is then statistically determined inside large-sized local windows. In this way, pixels near object boundaries of depth images are identified to be erroneous when they are inconsistent with collocated ones in RGB images. Second, a depth image rectification method is proposed by embedding the model into a simple weighted mean filter (WMF). Experiment results on two datasets verify that the proposed method well improves the RMSE and SSIM of depth images by 2.556 and 0.028, respectively, compared with recent optimization-based and learning-based methods.

  • Supplementary Content
  • Cite Count Icon 28
  • 10.1093/genetics/iyae161
A review of multimodal deep learning methods for genomic-enabled predictionin plant breeding
  • Nov 5, 2024
  • Genetics
  • Osval A Montesinos-López + 9 more

Deep learning methods have been applied when working to enhance the prediction accuracyof traditional statistical methods in the field of plant breeding. Although deep learningseems to be a promising approach for genomic prediction, it has proven to have somelimitations, since its conventional methods fail to leverage all available information.Multimodal deep learning methods aim to improve the predictive power of their unimodalcounterparts by introducing several modalities (sources) of input information. In thisreview, we introduce some theoretical basic concepts of multimodal deep learning andprovide a list of the most widely used neural network architectures in deep learning, aswell as the available strategies to fuse data from different modalities. We mention someof the available computational resources for the practical implementation of multimodaldeep learning problems. We finally performed a review of applications of multimodal deeplearning to genomic selection in plant breeding and other related fields. We present ameta-picture of the practical performance of multimodal deep learning methods to highlighthow these tools can help address complex problems in the field of plant breeding. Wediscussed some relevant considerations that researchers should keep in mind when applyingmultimodal deep learning methods. Multimodal deep learning holds significant potential forvarious fields, including genomic selection. While multimodal deep learning displaysenhanced prediction capabilities over unimodal deep learning and other machine learningmethods, it demands more computational resources. Multimodal deep learning effectivelycaptures intermodal interactions, especially when integrating data from different sources.To apply multimodal deep learning in genomic selection, suitable architectures and fusionstrategies must be chosen. It is relevant to keep in mind that multimodal deep learning,like unimodal deep learning, is a powerful tool but should be carefully applied. Given itspredictive edge over traditional methods, multimodal deep learning is valuable inaddressing challenges in plant breeding and food security amid a growing globalpopulation.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/iros40897.2019.8967616
Generate What You Can’t See - a View-dependent Image Generation
  • Nov 1, 2019
  • Karol Piaskowski + 2 more

In order to operate autonomously, a robot should explore the environment and build a model of each of the surrounding objects. A common approach is to carefully scan the whole workspace. This is time-consuming. It is also often impossible to reach all the viewpoints required to acquire full knowledge about the environment. Humans can perform shape completion of occluded objects by relying on past experience. Therefore, we propose a method that generates images of an object from various viewpoints using a single input RGB image. A deep neural network is trained to imagine the object appearance from many viewpoints. We present the whole pipeline, which takes a single RGB image as input and returns a sequence of RGB and depth images of the object. The method utilizes a CNN-based object detector to extract the object from the natural scene. Then, the proposed network generates a set of RGB and depth images. We show the results both on a synthetic dataset and on real images.

  • Research Article
  • 10.1002/cpe.6203
An object perception and positioning method via deep perception learning object detection
  • Jan 24, 2021
  • Concurrency and Computation: Practice and Experience
  • Limei Xiao + 4 more

One of the fundamental problems when building perception systems for robot is to be able to provide semantic information as well as positioning in three‐dimensional (3D) space. However, two‐dimensional (2D) object detectors only can provide the semantic information and pixel coordinate in 2D space. While, the depth image can reflect the relative distance, and the semantic description of the object is poor. In this article, a novel object perception and positioning method via deep perception learning object detection is proposed. First, the RGB image and depth image are collected through the Kinect, and the depth image is processed to ensure the robustness of the model. Then, the RGB image can obtain the object semantic and pixel location information through an object detector based on deep learning. Finally, the object size measurement and 3D positioning are realized by combining the pixel location and the depth information. As a result, the advantages of very accurate 2D detector and the accurate depth information can be effectively captured in our model. Experimental results demonstrate that our method achieves a high accuracy of size measurement and spatial positioning.

  • Research Article
  • Cite Count Icon 1
  • 10.5075/epfl-thesis-6766
Joint Acquisition of Color and Near-Infrared Images on a Single Sensor
  • Jan 1, 2015
  • Infoscience (Ecole Polytechnique Fédérale de Lausanne)
  • Zahra Sadeghipoor Kermani

In the past few years, fusing NIR and color images has been explored in general computational photography and computer vision tasks, where traditionally only color images are used. The additional information provided by the differences of light and scene reflections in the visible and NIR bands of the electromagnetic spectrum is used in several applications such as image denoising, image dehazing, shadow detection and removal, and high dynamic-range imaging. In this thesis, we study a system that simultaneously captures color and NIR images on a single silicon sensor. Such a camera could be manufactured with minor changes in the hardware of consumer color cameras and it could be integrated inside small devices such as cell phones. We address two main challenges in color and NIR acquisition. First, we study the spatial and spectral sampling of the scene, which is inevitable in single-sensor acquisition of multiple spectral channels. We then focus on chromatic aberration distortions. Similarly to color imaging, we use a color filter array (CFA) to sample the scene in visible and NIR bands. We address two main challenges regarding the CFA: (1) designing the CFA, and (2) developing a demosaicing algorithm to reconstruct full-resolution images from the subsampled measurements. We consider a general CFA with filters that transmit different mixtures of color and NIR channels. We develop a framework that, by exploiting the spatial and spectral correlations of color and NIR images, computes the transmittance of each filter and the demosaicing matrix. Our optimized CFA and demosaicing outperform other solutions developed for single-sensor color and NIR acquisition. We also investigate a CFA that is formed by one blue, one green, one red, and one NIR-pass filter. We call it the RGBN CFA and assume that, similarly to color cameras, it uses dye filters that do not have sharp cut-offs. Hence, color and NIR radiations leak into NIR and color filters, respectively. We devise an algorithm that reconstructs full-resolution images from mixed and subsampled sensor measurements. The RGBN CFA and our reconstruction algorithm perform as well as or even better than other single-sensor acquisition techniques that use more complicated hardware components. The problem of chromatic aberration is caused by deficiencies of optical elements. A simple lens converges light rays with different wavelengths at different distances from the lens. Hence, if the color image is in focus and sharp on the sensor plane, the NIR image captured with the same focus settings is out of focus and blurred. We propose an algorithm that retrieves the lost details in NIR using the gradients of the sharp color image. As the high-frequency details of color and NIR images are not strongly correlated in all image patches, our method locally adapts the contribution of color gradients in deblurring. To achieve this, we develop a multiscale scheme that iterates between deblurring NIR and estimating the correlation between color and NIR high-frequency components. Our algorithm outperforms both blind and guided deblurring approaches. We also design a method that estimates a dense blur-kernel map when the severity of chromatic aberration changes as the depths of objects vary across the image. Our method performs better than the competing methods both in estimating the blur-kernel map and in deblurring.

  • Research Article
  • Cite Count Icon 18
  • 10.1016/j.wasman.2024.01.047
Multi-modal deep learning networks for RGB-D pavement waste detection and recognition
  • Feb 6, 2024
  • Waste Management
  • Yangke Li + 1 more

Multi-modal deep learning networks for RGB-D pavement waste detection and recognition

  • Conference Article
  • Cite Count Icon 10
  • 10.1109/iccv48922.2021.00365
Teacher-Student Adversarial Depth Hallucination to Improve Face Recognition
  • Oct 1, 2021
  • Hardik Uppal + 3 more

We present the Teacher-Student Generative Adversarial Network (TS-GAN) to generate depth images from single RGB images in order to boost the performance of face recognition systems. For our method to generalize well across unseen datasets, we design two components in the architecture, a teacher and a student. The teacher, which itself consists of a generator and a discriminator, learns a latent mapping between input RGB and paired depth images in a supervised fashion. The student, which consists of two generators (one shared with the teacher) and a discriminator, learns from new RGB data with no available paired depth information, for improved generalization. The fully trained shared generator can then be used in runtime to hallucinate depth from RGB for downstream applications such as face recognition. We perform rigorous experiments to show the superiority of TS-GAN over other methods in generating synthetic depth images. Moreover, face recognition experiments demonstrate that our hallucinated depth along with the input RGB images boost performance across various architectures when compared to a single RGB modality by average values of +1.2%, +2.6%, and +2.6% for IIIT-D, EURECOM, and LFW datasets respectively. We make our implementation public at: https://github.com/hardik-uppal/teacher-student-gan.git.

  • Research Article
  • Cite Count Icon 400
  • 10.1007/s00371-021-02166-7
A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets
  • Jun 10, 2021
  • The Visual Computer
  • Khaled Bayoudh + 3 more

The research progress in multimodal learning has grown rapidly over the last decade in several areas, especially in computer vision. The growing potential of multimodal data streams and deep learning algorithms has contributed to the increasing universality of deep multimodal learning. This involves the development of models capable of processing and analyzing the multimodal information uniformly. Unstructured real-world data can inherently take many forms, also known as modalities, often including visual and textual content. Extracting relevant patterns from this kind of data is still a motivating goal for researchers in deep learning. In this paper, we seek to improve the understanding of key concepts and algorithms of deep multimodal learning for the computer vision community by exploring how to generate deep models that consider the integration and combination of heterogeneous visual cues across sensory modalities. In particular, we summarize six perspectives from the current literature on deep multimodal learning, namely: multimodal data representation, multimodal fusion (i.e., both traditional and deep learning-based schemes), multitask learning, multimodal alignment, multimodal transfer learning, and zero-shot learning. We also survey current multimodal applications and present a collection of benchmark datasets for solving problems in various vision domains. Finally, we highlight the limitations and challenges of deep multimodal learning and provide insights and directions for future research.

  • Conference Article
  • Cite Count Icon 12
  • 10.1117/12.2077102
Gradient-based correction of chromatic aberration in the joint acquisition of color and near-infrared images
  • Feb 27, 2015
  • Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE
  • Zahra Sadeghipoor + 2 more

Chromatic aberration distortions, such as wavelength-dependent blur caused by imperfections in photographic lenses, have long been studied in color imaging. The problem becomes more challenging to solve in the case of color and near-infrared joint acquisition, as a wider range of wavelengths is captured. In this paper, we assume that the color image is in focus, hence, the NIR image captured with the same focus settings is blurred. We propose an algorithm that estimates the blur kernel and deblurs the NIR image, using the in-focus color image as a guide in both steps. In the deblurring step, we retrieve the lost details of the NIR image by using the sharp edges of the color image. The main diculty in this task is caused by the fact that the edges of color and NIR images are not always correlated. To handle this issue, the algorithm analyzes the dierences between the gradients of NIR and color channels. Simulation results verify the eectiveness of our algorithm, both in estimating the blur kernel and deblurring the NIR image, without producing ringing artifacts inherent to the results of other deblurring methods.

Save Icon
Up Arrow
Open/Close
Setting-up Chat
Loading Interface