Graphics Processing Units Memory Research Articles

Highlights Lightweight deep learning models were trained on an edge device to identify weeds in aerial images. A customized configuration file was setup to train the models. These models were deployed to detect weeds in aerial images and videos (near real-time). CSPMobileNet-v2 and YOLOv4-lite are recommended models for weed detection using edge platform. Abstract. Deep learning (DL) techniques have proven to be a successful approach in detecting weeds for site-specific weed management (SSWM). In the past, most of the research work has trained and deployed pre-trained DL models on high-end systems coupled with expensive graphical processing units (GPUs). However, only a limited number of research studies have used DL models on an edge system for aerial-based weed detection. Therefore, while focusing on hardware cost minimization, eight DL models were trained and deployed on an edge device to detect weeds in aerial-image context and videos in this study. Four large models, namely CSPDarkNet-53, DarkNet-53, DenseNet-201, and ResNet-50, along with four lightweight models, CSPMobileNet-v2, YOLOv4-lite, EfficientNet-B0, and DarkNet-Ref, were considered for training a customized DL architecture. Along with trained model performance scores (average precision score, mean average precision (mAP), intersection over union, precision, and recall), other model metrics to assess edge system performance such as billion floating-point operations/s (BFLOPS), frame rates/s (FPS), and GPU memory usage were also estimated. The lightweight CSPMobileNet-v2 and YOLOv4-lite models outperformed others in detecting weeds in aerial image context. These models were able to achieve a mAP score of 83.2% and 82.2%, delivering an FPS of 60.9 and 61.1 during near real-time weed detection in aerial videos, respectively. The popular ResNet-50 model achieved a mAP of 79.6%, which was the highest amongst all the large models deployed for weed detection tasks. Based on the results, the two lightweight models, namely, CSPMobileNet-v2 and YOLOv4-lite, are recommended, and they can be used on a low-cost edge system to detect weeds in aerial image context with significant accuracy. Keywords: Aerial image, Deep learning, Edge device, Precision agriculture, Weed detection.

Read full abstract

Abstract. Lagrangian models are fundamental tools to study atmospheric transport processes and for practical applications such as dispersion modeling for anthropogenic and natural emission sources. However, conducting large-scale Lagrangian transport simulations with millions of air parcels or more can become rather numerically costly. In this study, we assessed the potential of exploiting graphics processing units (GPUs) to accelerate Lagrangian transport simulations. We ported the Massive-Parallel Trajectory Calculations (MPTRAC) model to GPUs using the open accelerator (OpenACC) programming model. The trajectory calculations conducted within the MPTRAC model were fully ported to GPUs, i.e., except for feeding in the meteorological input data and for extracting the particle output data, the code operates entirely on the GPU devices without frequent data transfers between CPU and GPU memory. Model verification, performance analyses, and scaling tests of the Message Passing Interface (MPI) – Open Multi-Processing (OpenMP) – OpenACC hybrid parallelization of MPTRAC were conducted on the Jülich Wizard for European Leadership Science (JUWELS) Booster supercomputer operated by the Jülich Supercomputing Centre, Germany. The JUWELS Booster comprises 3744 NVIDIA A100 Tensor Core GPUs, providing a peak performance of 71.0 PFlop s−1. As of June 2021, it is the most powerful supercomputer in Europe and listed among the most energy-efficient systems internationally. For large-scale simulations comprising 108 particles driven by the European Centre for Medium-Range Weather Forecasts' fifth-generation reanalysis (ERA5), the performance evaluation showed a maximum speed-up of a factor of 16 due to the utilization of GPUs compared to CPU-only runs on the JUWELS Booster. In the large-scale GPU run, about 67 % of the runtime is spent on the physics calculations, conducted on the GPUs. Another 15 % of the runtime is required for file I/O, mostly to read the large ERA5 data set from disk. Meteorological data preprocessing on the CPUs also requires about 15 % of the runtime. Although this study identified potential for further improvements of the GPU code, we consider the MPTRAC model ready for production runs on the JUWELS Booster in its present form. The GPU code provides a much faster time to solution than the CPU code, which is particularly relevant for near-real-time applications of a Lagrangian transport model.

Read full abstract

Graphics Processing Units Memory Research Articles

Related Topics

Articles published on Graphics Processing Units Memory

Network-Assisted Noncontiguous Transfers for GPU-Aware MPI Libraries

Retracted] Optimized IANSegNet: Deep Segmentation for the Detection of Inferior Alveolar Nerve Canal

Aerial-Based Weed Detection Using Low-Cost and Lightweight Deep Learning Models on an Edge Platform

Lightweight Deep Learning Models for High-Precision Rice Seedling Segmentation from UAV-Based Multispectral Images.

Autosegmentation of brain metastases using 3D FCNN models and methods to manage GPU memory limitations

Conditions for the existence of broadcast and spatial locality in computation threads

Virtualizing GPU direct packet I/O on commodity Ethernet to accelerate GPU-NFV

Characterizing and Mitigating Soft Errors in GPU DRAM

A GPU parallel scheme for accelerating 2D and 3D peridynamics models

3D map reconstruction using a monocular camera for smart cities

Massive-Parallel Trajectory Calculations version 2.2 (MPTRAC-2.2): Lagrangian transport simulations on graphics processing units (GPUs)

Efficient cascaded V-net optimization for lower extremity CT segmentation validated using bone morphology assessment.

Efficient simulation execution of cellular automata on GPU

High-performance reconstruction of CT medical images by using out-of-core methods in GPU

Graphics Processing Unit-Based Element-by-Element Strategies for Accelerating Topology Optimization of Three-Dimensional Continuum Structures Using Unstructured All-Hexahedral Mesh

A Graphics Processing Unit–Based, Industrial Grade Compositional Reservoir Simulator

Chrono::GPU: An Open-Source Simulation Package for Granular Dynamics Using the Discrete Element Method

Multiple instance convolutional neural network with modality-based attention and contextual multi-instance learning pooling layer for effective differentiation between borderline and malignant epithelial ovarian tumors

VEDAS: an efficient GPU alternative for store and query of large RDF data sets

GPU acceleration of DEMO particle exhaust simulations

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Graphics Processing Units Memory Research Articles

Related Topics

Articles published on Graphics Processing Units Memory

Network-Assisted Noncontiguous Transfers for GPU-Aware MPI Libraries

Retracted] Optimized IANSegNet: Deep Segmentation for the Detection of Inferior Alveolar Nerve Canal

Aerial-Based Weed Detection Using Low-Cost and Lightweight Deep Learning Models on an Edge Platform

Lightweight Deep Learning Models for High-Precision Rice Seedling Segmentation from UAV-Based Multispectral Images.

Autosegmentation of brain metastases using 3D FCNN models and methods to manage GPU memory limitations

Conditions for the existence of broadcast and spatial locality in computation threads

Virtualizing GPU direct packet I/O on commodity Ethernet to accelerate GPU-NFV

Characterizing and Mitigating Soft Errors in GPU DRAM

A GPU parallel scheme for accelerating 2D and 3D peridynamics models

3D map reconstruction using a monocular camera for smart cities

Massive-Parallel Trajectory Calculations version 2.2 (MPTRAC-2.2): Lagrangian transport simulations on graphics processing units (GPUs)

Efficient cascaded V-net optimization for lower extremity CT segmentation validated using bone morphology assessment.

Efficient simulation execution of cellular automata on GPU

High-performance reconstruction of CT medical images by using out-of-core methods in GPU

Graphics Processing Unit-Based Element-by-Element Strategies for Accelerating Topology Optimization of Three-Dimensional Continuum Structures Using Unstructured All-Hexahedral Mesh

A Graphics Processing Unit–Based, Industrial Grade Compositional Reservoir Simulator

Chrono::GPU: An Open-Source Simulation Package for Granular Dynamics Using the Discrete Element Method

Multiple instance convolutional neural network with modality-based attention and contextual multi-instance learning pooling layer for effective differentiation between borderline and malignant epithelial ovarian tumors

VEDAS: an efficient GPU alternative for store and query of large RDF data sets

GPU acceleration of DEMO particle exhaust simulations