A compression pipeline for one-stage object detection model

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Deep neural networks (DNNs) have strong fitting ability on a variety of computer vision tasks, but they also require intensive computing power and large storage space, which are not always available in portable smart devices. Although a lot of studies have contributed to the compression of image classification networks, there are few model compression algorithms for object detection models. In this paper, we propose a general compression pipeline for one-stage object detection networks to meet the real-time requirements. Firstly, we propose a softer pruning strategy on the backbone to reduce the number of filters. Compared with original direct pruning, our method can maintain the integrity of network structure and reduce the drop of accuracy. Secondly, we transfer the knowledge of the original model to the small model by knowledge distillation to reduce the accuracy drop caused by pruning. Finally, as edge devices are more suitable for integer operations, we further transform the 32-bit floating point model into the 8-bit integer model through quantization. With this pipeline, the model size and inference time are compressed to 10% or less of the original, while the mAP is only reduced by 2.5% or less. We verified that performance of the compression pipeline on the Pascal VOC dataset.

Similar Papers
  • Abstract
  • Cite Count Icon 3
  • 10.1182/blood-2022-168780
A Fully Automated Digital Workflow for Assessment of Bone Marrow Cytomorphology Based on Single Cell Detection and Classification with AI
  • Nov 15, 2022
  • Blood
  • Christian Pohlkamp + 9 more

A Fully Automated Digital Workflow for Assessment of Bone Marrow Cytomorphology Based on Single Cell Detection and Classification with AI

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 5
  • 10.3390/bioengineering10070807
Performance Comparison of Object Detection Networks for Shrapnel Identification in Ultrasound Images.
  • Jul 5, 2023
  • Bioengineering
  • Sofia I Hernandez-Torres + 2 more

Ultrasound imaging is a critical tool for triaging and diagnosing subjects but only if images can be properly interpreted. Unfortunately, in remote or military medicine situations, the expertise to interpret images can be lacking. Machine-learning image interpretation models that are explainable to the end user and deployable in real time with ultrasound equipment have the potential to solve this problem. We have previously shown how a YOLOv3 (You Only Look Once) object detection algorithm can be used for tracking shrapnel, artery, vein, and nerve fiber bundle features in a tissue phantom. However, real-time implementation of an object detection model requires optimizing model inference time. Here, we compare the performance of five different object detection deep-learning models with varying architectures and trainable parameters to determine which model is most suitable for this shrapnel-tracking ultrasound image application. We used a dataset of more than 16,000 ultrasound images from gelatin tissue phantoms containing artery, vein, nerve fiber, and shrapnel features for training and evaluating each model. Every object detection model surpassed 0.85 mean average precision except for the detection transformer model. Overall, the YOLOv7tiny model had the higher mean average precision and quickest inference time, making it the obvious model choice for this ultrasound imaging application. Other object detection models were overfitting the data as was determined by lower testing performance compared with higher training performance. In summary, the YOLOv7tiny object detection model had the best mean average precision and inference time and was selected as optimal for this application. Next steps will implement this object detection algorithm for real-time applications, an important next step in translating AI models for emergency and military medicine.

  • Research Article
  • Cite Count Icon 12
  • 10.1016/j.measen.2022.100409
Split computing: DNN inference partition with load balancing in IoT-edge platform for beyond 5G
  • Aug 18, 2022
  • Measurement: Sensors
  • Jyotirmoy Karjee + 3 more

Split computing: DNN inference partition with load balancing in IoT-edge platform for beyond 5G

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 3
  • 10.3390/ani13182924
Bird Object Detection: Dataset Construction, Model Performance Evaluation, and Model Lightweighting.
  • Sep 14, 2023
  • Animals
  • Yang Wang + 6 more

The application of object detection technology has a positive auxiliary role in advancing the intelligence of bird recognition and enhancing the convenience of bird field surveys. However, challenges arise due to the absence of dedicated bird datasets and evaluation benchmarks. To address this, we have not only constructed the largest known bird object detection dataset, but also compared the performances of eight mainstream detection models on bird object detection tasks and proposed feasible approaches for model lightweighting in bird object detection. Our constructed bird detection dataset of GBDD1433-2023, includes 1433 globally common bird species and 148,000 manually annotated bird images. Based on this dataset, two-stage detection models like Faster R-CNN and Cascade R-CNN demonstrated superior performances, achieving a Mean Average Precision (mAP) of 73.7% compared to one-stage models. In addition, compared to one-stage object detection models, two-stage object detection models have a stronger robustness to variations in foreground image scaling and background interference in bird images. On bird counting tasks, the accuracy ranged between 60.8% to 77.2% for up to five birds in an image, but this decreased sharply beyond that count, suggesting limitations of object detection models in multi-bird counting tasks. Finally, we proposed an adaptive localization distillation method for one-stage lightweight object detection models that are suitable for offline deployment, which improved the performance of the relevant models. Overall, our work furnishes an enriched dataset and practice guidelines for selecting suitable bird detection models.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 146
  • 10.3390/rs12132136
Comparison of Object Detection and Patch-Based Classification Deep Learning Models on Mid- to Late-Season Weed Detection in UAV Imagery
  • Jul 3, 2020
  • Remote Sensing
  • Arun Narenthiran Veeranampalayam Sivakumar + 6 more

Mid- to late-season weeds that escape from the routine early-season weed management threaten agricultural production by creating a large number of seeds for several future growing seasons. Rapid and accurate detection of weed patches in field is the first step of site-specific weed management. In this study, object detection-based convolutional neural network models were trained and evaluated over low-altitude unmanned aerial vehicle (UAV) imagery for mid- to late-season weed detection in soybean fields. The performance of two object detection models, Faster RCNN and the Single Shot Detector (SSD), were evaluated and compared in terms of weed detection performance using mean Intersection over Union (IoU) and inference speed. It was found that the Faster RCNN model with 200 box proposals had similar good weed detection performance to the SSD model in terms of precision, recall, f1 score, and IoU, as well as a similar inference time. The precision, recall, f1 score and IoU were 0.65, 0.68, 0.66 and 0.85 for Faster RCNN with 200 proposals, and 0.66, 0.68, 0.67 and 0.84 for SSD, respectively. However, the optimal confidence threshold of the SSD model was found to be much lower than that of the Faster RCNN model, which indicated that SSD might have lower generalization performance than Faster RCNN for mid- to late-season weed detection in soybean fields using UAV imagery. The performance of the object detection model was also compared with patch-based CNN model. The Faster RCNN model yielded a better weed detection performance than the patch-based CNN with and without overlap. The inference time of Faster RCNN was similar to patch-based CNN without overlap, but significantly less than patch-based CNN with overlap. Hence, Faster RCNN was found to be the best model in terms of weed detection performance and inference time among the different models compared in this study. This work is important in understanding the potential and identifying the algorithms for an on-farm, near real-time weed detection and management.

  • Research Article
  • Cite Count Icon 34
  • 10.3390/s23094432
SsFPN: Scale Sequence (S2) Feature-Based Feature Pyramid Network for Object Detection.
  • Apr 30, 2023
  • Sensors (Basel, Switzerland)
  • Hye-Jin Park + 2 more

Object detection is a fundamental task in computer vision. Over the past several years, convolutional neural network (CNN)-based object detection models have significantly improved detection accuracyin terms of average precision (AP). Furthermore, feature pyramid networks (FPNs) are essential modules for object detection models to consider various object scales. However, the AP for small objects is lower than the AP for medium and large objects. It is difficult to recognize small objects because they do not have sufficient information, and information is lost in deeper CNN layers. This paper proposes a new FPN model named ssFPN (scale sequence (S2) feature-based feature pyramid network) to detect multi-scale objects, especially small objects. We propose a new scale sequence (S2) feature that is extracted by 3D convolution on the level of the FPN. It is defined and extracted from the FPN to strengthen the information on small objects based on scale-space theory. Motivated by this theory, the FPN is regarded as a scale space and extracts a scale sequence (S2) feature by three-dimensional convolution on the level axis of the FPN. The defined feature is basically scale-invariant and is built on a high-resolution pyramid feature map for small objects. Additionally, the deigned S2 feature can be extended to most object detection models based on FPNs. We also designed a feature-level super-resolution approach to show the efficiency of the scale sequence (S2) feature. We verified that the scale sequence (S2) feature could improve the classification accuracy for low-resolution images by training a feature-level super-resolution model. To demonstrate the effect of the scale sequence (S2) feature, experiments on the scale sequence (S2) feature built-in object detection approach including both one-stage and two-stage models were conducted on the MS COCO dataset. For the two-stage object detection models Faster R-CNN and Mask R-CNN with the S2 feature, AP improvements of up to 1.6% and 1.4%, respectively, were achieved. Additionally, the APS of each model was improved by 1.2% and 1.1%, respectively. Furthermore, the one-stage object detection models in the YOLO series were improved. For YOLOv4-P5, YOLOv4-P6, YOLOR-P6, YOLOR-W6, and YOLOR-D6 with the S2 feature, 0.9%, 0.5%, 0.5%, 0.1%, and 0.1% AP improvements were observed. For small object detection, the APS increased by 1.1%, 1.1%, 0.9%, 0.4%, and 0.1%, respectively. Experiments using the feature-level super-resolution approach with the proposed scale sequence (S2) feature were conducted on the CIFAR-100 dataset. By training the feature-level super-resolution model, we verified that ResNet-101 with the S2 feature trained on LR images achieved a 55.2% classification accuracy, which was 1.6% higher than for ResNet-101 trained on HR images.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 23
  • 10.3390/rs14174217
Real-Time Weed Control Application Using a Jetson Nano Edge Device and a Spray Mechanism
  • Aug 26, 2022
  • Remote Sensing
  • Eduardo Assunção + 6 more

Portable devices play an essential role where edge computing is necessary and mobility is required (e.g., robots in agriculture within remote-sensing applications). With the increasing applications of deep neural networks (DNNs) and accelerators for edge devices, several methods and applications have been proposed for simultaneous crop and weed detection. Although preliminary studies have investigated the performance of inference time for semantic segmentation of crops and weeds in edge devices, performance degradation has not been evaluated in detail when the required optimization is applied to the model for operation in such edge devices. This paper investigates the relationship between model tuning hyperparameters to improve inference time and its effect on segmentation performance. The study was conducted using semantic segmentation model DeeplabV3 with a MobileNet backbone. Different datasets (Cityscapes, PASCAL and ADE20K) were analyzed for a transfer learning strategy. The results show that, when using a model hyperparameter depth multiplier (DM) of 0.5 and the TensorRT framework, segmentation performance mean intersection over union (mIOU) decreased by 14.7% compared to that of a DM of 1.0 and no TensorRT. However, inference time accelerated dramatically by a factor of 14.8. At an image resolution of 1296×966, segmentation performance of 64% mIOU and inference of 5.9 frames per second (FPS) was achieved in Jetson Nano’s device. With an input image resolution of 513×513, and hyperparameters output stride OS = 32 and DM = 0.5, an inference time of 0.04 s was achieved resulting in 25 FPS. The results presented in this paper provide a deeper insight into how the performance of the semantic segmentation model of crops and weeds degrades when optimization is applied to adapt the model to run on edge devices. Lastly, an application is described for the semantic segmentation of weeds embedded in the edge device (Jetson Nano) and integrated with the robotic orchard. The results show good spraying accuracy and feasibility of the method.

  • Research Article
  • Cite Count Icon 2
  • 10.1002/int.22851
DetectSec: Evaluating the robustness of object detection models to adversarial attacks
  • Feb 8, 2022
  • International Journal of Intelligent Systems
  • Tianyu Du + 9 more

Despite their tremendous success in various machine learning tasks, deep neural networks (DNNs) are inherently vulnerable to adversarial examples, which are maliciously crafted inputs to cause DNNs to misbehave. Intensive research has been conducted on this phenomenon in simple tasks (e.g., image classification). However, little is known about this adversarial vulnerability for object detection, a much more complicated task, which often requires specialized DNNs and multiple additional components. In this paper, we present DetectSec, a uniform platform for robustness analysis of object detection models. Currently, DetectSec implements 13 representative adversarial attacks with 7 utility metrics and 13 defenses on 18 standard object detection models. Leveraging DetectSec, we conduct the first rigorous evaluation of adversarial attacks on the state-of-the-art object detection models. We analyze the impact of the factors including DNN architecture and capacity on the model robustness. We show that many conclusions about adversarial attacks and defenses in image classification tasks do not transfer to object detection tasks, for example, the targeted attack is stronger than the untargeted attack for two-stage detectors. Our findings will aid future efforts in understanding and defending against adversarial attacks in complicated tasks. In addition, we compare the robustness of different detection models and discuss their relative strengths and weaknesses. The platform DetectSec will be open source as a unique facility for further research on adversarial attacks and defenses in object detection tasks.

  • Research Article
  • Cite Count Icon 1
  • 10.5753/jidm.2020.2026
Evaluating Edge-Cloud Computing Trade-Offs for Mobile Object Detection and Classification with Deep Learning
  • Jun 30, 2020
  • Journal of Information and Data Management
  • W F Magalhães + 5 more

Internet-of-Things (IoT) applications based on Artificial Intelligence, such as mobile object detection and recognition from images and videos, may greatly benefit from inferences made by state-of-the-art Deep Neural Network(DNN) models. However, adopting such models in IoT applications poses an important challenge since DNNs usually require lots of computational resources (i.e. memory, disk, CPU/GPU, and power), which may prevent them to run on resource-limited edge devices. On the other hand, moving the heavy computation to the Cloud may significantly increase running costs and latency of IoT applications. Among the possible strategies to tackle this challenge are: (i) DNN model partitioning between edge and cloud; and (ii) running simpler models in the edge and more complex ones in the cloud, with information exchange between models, when needed. Variations of strategy (i) also include: running the entire DNN on the edge device (sometimes not feasible) and running the entire DNN on the cloud. All these strategies involve trade-offs in terms of latency, communication, and financial costs. In this article we investigate such trade-offs in real-world scenarios. We conduct several experiments using deep learning models for image-based object detection and classification. Our setup includes a Raspberry PI 3 B+ and a cloud server equipped with a GPU. Different network bandwidths are also evaluated. Our results provide useful insights about the aforementioned trade-offs. The partitioning experiment showed that, overall, running the inferences entirely on the edge or entirely on the cloud server are the best options. The collaborative approach yielded a significant increase in accuracy without penalizing running costs too much.

  • Dissertation
  • 10.17760/d20416558
Towards robust image classification with deep learning and real-time DNN inference on mobile
  • Jan 1, 2020
  • Pu Zhao

As the rapidly increasing popularity of deep learning, deep neural networks (DNN) have become the fundamental and essential building blocks in various applications such as image classification and object detection. However, there are two main issues which potentially limit the wide application of DNNs: 1) the robustness of DNN models raises security concerns, and 2) the large computation and storage requirements of DNN models lead to difficulties for its wide deployment on popular yet resource-constrained devices such as mobile phones. To investigate the DNN robustness, we explore the DNN attack, robustness evaluation and defense. More specifically, for DNN attack, we achieve various attack goals (e.g. adversarial examples and fault sneaking attacks) with different algorithms (e.g. alternating direction method of multipliers (ADMM) and natural gradient descent (NGD) attacks) under various conditions (white-box and black-box attacks). For robustness evaluation, we propose a fast evaluation method to obtain the model perturbation bound such that any model perturbation within the bound does not alter the model classification outputs or incur model mis-behaviors. For the DNN defense, we investigate the defense performance with model connection techniques and successfully mitigate the fault sneaking and backdoor attacks. With a deeper understanding of the DNN robustness, we further explore the deployment problem of DNN models on edge devices with limited resources. To satisfy the storage and computation limitation on edge devices, we adopt model pruning to remove the redundancy in models, thus reducing the storage and computation during inference. Besides, as some applications have real-time requirements with high inference speed sensitivities such as object detection on autonomous cars, we further try to implement real-time DNN inference for various DNN applications on mobile devices with pruning and compiler optimization. To summary, we mainly investigate the DNN robustness and implement real-time DNN inference on the mobile.--Author's abstract

  • Research Article
  • Cite Count Icon 1
  • 10.55041/ijsrem36414
EDGE AI BASED OBJECT DETECTION SYSTEM USING TFLITE
  • Jul 14, 2024
  • INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
  • Dominic Paul R + 1 more

Edge computing has gained prominence in recent years due to its ability to process data closer to the source, reducing latency and bandwidth requirements. In this paper, we propose an Edge AI Based Object Detection System using TensorFlow Lite (TFLite), designed to perform real-time object detection on resource-constrained edge devices. The system leverages the efficiency and portability of TFLite, a lightweight framework for deploying machine learning models on edge devices, to enable efficient inference without relying on cloud connectivity. Our proposed system integrates state-of-the-art object detection models, such as SSD (Single Shot Multibox Detector) and YOLO (You Only Look Once), into the TFLite runtime environment. Through optimization techniques such as model quantization, pruning, and architecture modifications, we tailor these models to meet the computational and memory constraints of edge devices while maintaining high detection accuracy. Furthermore, we explore hardware acceleration options, including GPU and DSP (Digital Signal Processor), to further enhance inference speed and energy efficiency. We evaluate the performance of the Edge AI Based Object Detection System on various edge devices, including smartphones, IoT (Internet of Things) devices, and embedded systems. Real-world deployment scenarios are considered, encompassing applications such as smart surveillance, industrial automation, and autonomous vehicles. The results demonstrate the system's ability to achieve real-time object detection with low latency and minimal resource consumption, making it well-suited for edge computing environments where real-time responsiveness and privacy are paramount concerns. Keywords: object Detection, Machine Learning, Tensor Flow lite, deep learning.

  • Conference Article
  • 10.1109/metrocad56305.2022.00010
Characterization of Real-Time Object Detection Workloads on Vehicular Edge
  • Apr 1, 2022
  • Sihai Tang + 4 more

As recent literature suggests the need for communication between autonomous vehicles, edge devices have emerged as a viable conduit to facilitate real-time data sharing. Edge devices strike a suitable medium between the alternatives of cloud centralization and full vehicle-to-vehicle decentralization, providing the computational savings of sending and receiving information from one place while also boosting speed by bypassing internet protocols. Given the novelty of both object detection models and autonomous vehicle-oriented edge device implementation, there are no standards for hardware and software specifications on the edge. In this project, we seek to address this void, investigating the GPU and CPU usage patterns of various object detection models and machine learning frameworks. We also aim to uncover optimization opportunities such as workload pipelining. One early difficulty was that only a few models tested achieved real-time (<33 ms) object detection. Our results show that the GPU utilization varies widely between models. One interesting is that only one CPU core is used during the inference process, suggesting the number of CPU cores will not be a bottleneck. Meanwhile, we find that increasing CPU cores proportional to the amount of traffic will likely be necessary to preserve real-time object detection.

  • Research Article
  • Cite Count Icon 9
  • 10.1145/3589766
Energy-Efficient Approximate Edge Inference Systems
  • Jul 24, 2023
  • ACM Transactions on Embedded Computing Systems
  • Soumendu Kumar Ghosh + 2 more

The rapid proliferation of the Internet of Things and the dramatic resurgence of artificial intelligence based application workloads have led to immense interest in performing inference on energy-constrained edge devices. Approximate computing (a design paradigm that trades off a small degradation in application quality for disproportionate energy savings) is a promising technique to enable energy-efficient inference at the edge. This article introduces the concept of an approximate edge inference system ( AxIS ) and proposes a systematic methodology to perform joint approximations between different subsystems in a deep neural network (DNN)-based edge inference system, leading to significant energy benefits compared to approximating individual subsystems in isolation. We use a smart camera system that executes various DNN-based image classification and object detection applications to illustrate how the sensor, memory, compute, and communication subsystems can all be approximated synergistically. We demonstrate our proposed methodology using two variants of a smart camera system: (a) Cam Edge , where the DNN is executed locally on the edge device, and (b) Cam Cloud , where the edge device sends the captured image to a remote cloud server that executes the DNN. We have prototyped such an approximate inference system using an Intel Stratix IV GX-based Terasic TR4-230 FPGA development board. Experimental results obtained using six large DNNs and four compact DNNs running image classification applications demonstrate significant energy savings (≈ 1.6× -4.7× for large DNNs and ≈ 1.5× -3.6× for small DNNs), for minimal (<1%) loss in application-level quality. Furthermore, results using four object detection DNNs exhibit energy savings of ≈ 1.5× -5.2× for similar quality loss. Compared to approximating a single subsystem in isolation, AxIS achieves 1.05× -3.25× gains in energy savings for image classification and 1.35× -4.2× gains for object detection on average, for minimal (<1%) application-level quality loss.

  • Conference Article
  • Cite Count Icon 7
  • 10.1109/cogmi52975.2021.00035
Parallel Detection for Efficient Video Analytics at the Edge
  • Dec 1, 2021
  • Yanzhao Wu + 2 more

Deep Neural Network (DNN) trained object detec-tors are widely deployed in many mission-critical systems for real time video analytics at the edge, such as autonomous driving, video surveillance and Internet of smart cameras. A common per-formance requirement in these mission-critical edge services is the near real-time latency of online object detection on edge devices. However, even with well-trained DNN object detectors, the online detection quality at edge may deteriorate for a number of reasons, such as limited capacity to run DNN object detection models on heterogeneous edge devices, and detection quality degradation due to random frame dropping when the detection processing rate is significantly slower than the incoming video frame rate. This paper addresses these problems by exploiting multi-model multi-device detection parallelism for fast object detection in edge systems with heterogeneous edge devices. First, we analyze the performance bottleneck of running a well-trained DNN model at edge for real time online object detection. We use the offline detection as a reference model, and examine the root cause by analyzing the mismatch among the incoming video streaming rate, the video processing rate for object detection, and the output rate for real time detection visualization of video streaming. Second, we study performance optimizations by exploiting multi-model detection parallelism. We show that the model-parallel detection approach can effectively speed up the FPS detection processing rate, minimizing the FPS disparity with the incoming video frame rate on heterogeneous edge devices. We evaluate the proposed approach using SSD300 and YOLOv3 (pre-trained DNN models) on benchmark videos of different video stream rates. The results show that exploiting multi-model detection parallelism can speed up the online object detection processing rate and deliver near real-time object detection performance for efficient video analytics at edge.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 12
  • 10.3390/electronics12030541
Object Recognition System for the Visually Impaired: A Deep Learning Approach using Arabic Annotation
  • Jan 20, 2023
  • Electronics
  • Nada Alzahrani + 1 more

Object detection is an important computer vision technique that has increasingly attracted the attention of researchers in recent years. The literature to date in the field has introduced a range of object detection models. However, these models have largely been English-language-based, and there is only a limited number of published studies that have addressed how object detection can be implemented for the Arabic language. As far as we are aware, the generation of an Arabic text-to-speech engine to utter objects’ names and their positions in images to help Arabic-speaking visually impaired people has not been investigated previously. Therefore, in this study, we propose an object detection and segmentation model based on the Mask R-CNN algorithm that is capable of identifying and locating different objects in images, then uttering their names and positions in Arabic. The proposed model was trained on the Pascal VOC 2007 and 2012 datasets and evaluated on the Pascal VOC 2007 testing set. We believe that this is one of a few studies that uses these datasets to train and test the Mask R-CNN model. The performance of the proposed object detection model was evaluated and compared with previous object detection models in the literature, and the results demonstrated its superiority and ability to achieve an accuracy of 83.9%. Moreover, experiments were conducted to evaluate the performance of the incorporated translator and TTS engines, and the results showed that the proposed model could be effective in helping Arabic-speaking visually impaired people understand the content of digital images.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon