Improved Real-Time Detection Transformer with Low-Frequency Feature Integrator and Token Statistics Self-Attention for Automated Grading of Stropharia rugoso-annulata Mushroom.
Manual grading of Stropharia rugoso-annulata mushroom is plagued by inefficiency and subjectivity, while existing detection models face inherent trade-offs between accuracy, real-time performance, and deployability on resource-constrained edge devices. To address these challenges, this study presents an Improved Real-Time Detection Transformer (RT-DETR) tailored for automated grading of Stropharia rugoso-annulata. Two innovative modules underpin the model: (1) the low-frequency feature integrator (LFFI), which leverages wavelet decomposition to preserve critical low-frequency global structural information, thereby enhancing the capture of large mushroom morphology; (2) the Token Statistics Self-Attention (TSSA) mechanism, which replaces traditional self-attention with second-moment statistical computations. This reduces complexity from O(n2) to O(n) and inherently generates interpretable attention patterns, augmenting model explainability. Experimental results demonstrate that the improved model achieves 95.2% mAP@0.5:0.95 at 262 FPS, with a substantial reduction in computational overhead compared to the original RT-DETR. It outperforms APHS-YOLO in both accuracy and efficiency, eliminates the need for non-maximum suppression (NMS) post-processing, and balances global structural awareness with local detail sensitivity. These attributes render it highly suitable for industrial edge deployment. This work offers an efficient framework for the automated grading of large-target crop detection.
- Research Article
2
- 10.1109/embc48229.2022.9871763
- Jul 11, 2022
- Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
In the last years there have been significant improvements in the accuracy of real-time 3D skeletal data estimation software. These applications based on convolutional neural networks (CNNs) can playa key role in a variety of clinical scenarios, from gait analysis to medical diagnosis. One of the main challenges is to apply such intelligent video analytic at a distance, which requires the system to satisfy, beside accuracy, also data privacy. To satisfy privacy by default and by design, the software has to run on "edge" computing devices, by which the sensitive information (i.e., the video stream) is elaborated close to the camera while only the process results can be stored or sent over the communication network. In this paper we address such a challenge by evaluating the accuracy of the state-of-the-art software for human pose estimation when run "at the edge". We show how the most accurate platforms for pose estimation based on complex and deep neural networks can become inaccurate due to subs amp ling of the input video frames when run on the resource constrained edge devices. In contrast, we show that, starting from less accurate and "lighter" CNNs and enhancing the pose estimation software with filters and interpolation primitives, the platform achieves better real-time performance and higher accuracy with a deviation below the error tolerance of a marker-based motion capture system.
- Research Article
- 10.3390/electronics14193948
- Oct 7, 2025
- Electronics
This paper presents a robust and computationally efficient fault classification framework for wind energy conversion systems (WECS), built upon a Robust Random Vector Functional Link Network (Robust-RVFLN) and validated through real-time simulations on a Real-Time Digital Simulator (RTDS). Unlike existing studies that depend on high-dimensional feature extraction or purely data-driven deep learning models, our approach leverages a compact set of five statistically significant and physically interpretable features derived from rotor torque, phase current, DC-link voltage, and dq-axis current components. This reduced feature set ensures both high discriminative power and low computational overhead, enabling effective deployment in resource-constrained edge devices and large-scale wind farms. A synthesized dataset representing seven representative fault scenarios—including converter, generator, gearbox, and grid faults—was employed to evaluate the model. Comparative analysis shows that the Robust-RVFLN consistently outperforms conventional classifiers (SVM, ELM) and deep models (CNN, LSTM), delivering accuracy rates of up to 99.85% for grid-side line-to-ground faults and 99.81% for generator faults. Beyond accuracy, evaluation metrics such as precision, recall, and F1-score further validate its robustness under transient operating conditions. By uniting interpretability, scalability, and real-time performance, the proposed framework addresses critical challenges in condition monitoring and predictive maintenance, offering a practical and transferable solution for next-generation renewable energy infrastructures.
- Research Article
- 10.3390/s25196170
- Oct 5, 2025
- Sensors (Basel, Switzerland)
The autonomous navigation of inspection robots in complex forest environments heavily relies on accurate trunk detection. However, existing detection models struggle to achieve both high accuracy and real-time performance on resource-constrained edge devices. To address this challenge, this study proposes a lightweight algorithm named YOLOv11-TrunkLight. The core innovations of the algorithm include (1) a novel StarNet_Trunk backbone network, which replaces traditional residual connections with element-wise multiplication and incorporates depthwise separable convolutions, significantly reducing computational complexity while maintaining a large receptive field; (2) the C2DA deformable attention module, which effectively handles the geometric deformation of tree trunks through dynamic relative position bias encoding; and (3) the EffiDet detection head, which improves detection speed and reduces the number of parameters through dual-path feature decoupling and a dynamic anchor mechanism. Experimental results demonstrate that compared to the baseline YOLOv11 model, our method improves detection speed by 13.5%, reduces the number of parameters by 34.6%, and decreases computational load (FLOPs) by 39.7%, while the average precision (mAP) is only marginally reduced by 0.1%. These advancements make the algorithm particularly suitable for deployment on resource-constrained edge devices of inspection robots, providing reliable technical support for intelligent forestry management.
- Conference Article
1
- 10.1109/wf-iot54382.2022.10152039
- Oct 26, 2022
Advances in deep learning, especially Convolutional Neural Networks (CNNs) have revolutionized intelligent frame-works such as Human Activity Recognition (HAR) systems by effectively and efficiently inferring human activity from various modalities of data. However, the training and inference of CNNs are often resource-intensive. Recent research developments are focused on bringing the effectiveness of CNNs in resource con-strained edge devices through Tiny Machine Learning (TinyML). However, this is extremely hard to achieve due to the limitations in memory, compute power, and energy of resource constrained edge devices. This paper provides a benchmark to understand these trade-offs among variations of CNN network architectures, different training methodologies, and different modalities of data in the context of HAR, TinyML, and edge devices. We tested and reported the performance of CNN and Depthwise Separable CNN (DSCNN) models as well as two training methodologies: Quantization Aware Training (QAT) and Post-training Quantization (PTQ) on five commonly used benchmark datasets containing image and time-series data: UP-Fall, Fall Detection Dataset (FDD), PAMAP2, UCI-HAR, and WISDM. We also deployed and tested the performance of the model-based standalone applications on multiple commonly available resource constrained edge devices in terms of inference time and power consumption. The experimental results demonstrate the effectiveness and feasibility of Tiny ML for HAR in edge devices.
- Conference Article
2
- 10.1145/3386367.3431666
- Nov 23, 2020
Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs) are widely used in IoT related applications. However, inferencing pre-trained large DNNs and CNNs consumes a significant amount of time, memory and computational resources. This makes it infeasible to use such DNNs/CNNs on resource constrained edge devices, In this research we are trying to implement a distributed inference schema for processing large DNNs and CNNs in such resource constrained edge devices. Our approach of solving this issue is based on partitioning DNNs/CNNs model and processing the inference tasks using two or more edge devices. Through this poster we introduce our novel approach of distributing the inference process among multiple edge devices while minimising the network overheads due to communication among devices. In addition to that, we present a task sharing mechanism among working devices and idle devices in the network and a way to convert pre-trained models to a separately executable model format.
- Research Article
8
- 10.1016/j.neunet.2023.11.044
- Nov 23, 2023
- Neural Networks
OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning
- Research Article
- 10.2298/csis240503020s
- Jan 1, 2025
- Computer Science and Information Systems
Edge computing and edge intelligence have gained significant traction in recent years due to the proliferation of Internet of Things devices, the exponential growth of data generated at the network edge, and the demand for real-time and context-aware applications. Despite its promising potential, the application of artificial intelligence on the edge faces many challenges, such as edge computing resource constraints, heterogeneity of edge devices, scalability issues, security and privacy concerns, etc. The paper addresses the challenges of deploying deep neural networks for edge intelligence and traffic object detection and recognition on a video captured by edge device cameras. The primary aim is to analyze resource consumption and achieve resource-awareness, optimizing computational resources across diverse edge devices within the edge-fog computing continuum while maintaining high object detection and recognition accuracy. To accomplish this goal, a methodology is proposed and implemented that exploits the edge-to-fog paradigm to distribute the inference workload across multiple tiers of the distributed system architecture. The edge-fog related solutions are implemented and evaluated in several use cases on datasets encompassing real-world traffic scenarios and traffic objects? recognition problems, revealing the feasibility of deploying deep neural networks for object recognition on resource-constrained edge devices. The proposed edge-to-fog methodology demonstrates enhancements in recognition accuracy and resource utilization, validating the viability of both edge-only and edge-fog based approaches. Furthermore, experimental results demonstrate the system?s adaptability to dynamic traffic scenarios, ensuring real-time recognition performance even in challenging environments.
- Conference Article
- 10.1109/icps49255.2021.9468162
- May 10, 2021
Non-maximum suppression(NMS) is an common post-processing procedure indispensable of the existing object detectors, responsible for merging the excess detections. The standard NMS is a greedy algorithm, in which detections overlapping the selected one will be deleted rudely. Although simple and fast, this algorithm will fail completely in crowded scenarios. In this paper, faced with a crowded scenario of robot sorting, we analyze the inherent defects of standard NMS in depth, and then design a NMS based on Affinity Propagation Clustering(APC) and aware of location confidence and objects density. Our NMS leads to a satisfactory result and shows the effectiveness of using global information in NMS.
- Research Article
- 10.3389/fpls.2025.1643967
- Jul 30, 2025
- Frontiers in Plant Science
Accurate detection of sugarcane nodes in complex field environments is a critical prerequisite for intelligent seed cutting and automated planting. However, existing detection methods often suffer from large model sizes and suboptimal performance, limiting their applicability on resource-constrained edge devices. To address these challenges, we propose Slim-Sugarcane, a lightweight and high-precision node detection framework optimized for real-time deployment in natural agricultural settings. Built upon YOLOv8, our model integrates GSConv, a hybrid convolution module combining group and spatial convolutions, to significantly reduce computational overhead while maintaining detection accuracy. We further introduce a Cross-Stage Local Network module featuring a single-stage aggregation strategy, which effectively minimizes structural redundancy and enhances feature representation. The proposed framework is optimized with TensorRT and deployed using FP16 quantization on the NVIDIA Jetson Orin NX platform to ensure real-time performance under limited hardware conditions. Experimental results demonstrate that Slim-Sugarcane achieves a precision of 0.922, recall of 0.802, and mean average precision of 0.852, with an inference latency of only 60.1 ms and a GPU memory footprint of 1434 MB. The proposed method exhibits superior accuracy and computational efficiency compared to existing approaches, offering a promising solution for precision agriculture and intelligent sugarcane cultivation.
- Research Article
- 10.1145/3736721
- Aug 12, 2025
- ACM Transactions on Design Automation of Electronic Systems
The scaling laws have become the de facto guidelines for designing large language models (LLMs), but they were studied under the assumption of unlimited computing resources for both training and inference. As LLMs are increasingly used as personalized intelligent assistants, their customization (i.e., learning through fine-tuning) and deployment onto resource-constrained edge devices will become more and more prevalent. An urgent but open question is how a resource-constrained computing environment would affect the design choices for a personalized LLM. We study this problem empirically in this work. In particular, we consider the tradeoffs among a number of key design factors and their intertwined impacts on learning efficiency and accuracy. The factors include the learning methods for LLM customization, the amount of personalized data used for learning customization, the types and sizes of LLMs, the compression methods of LLMs, the amount of time afforded to learn, and the difficulty levels of the target use cases. Through extensive experimentation and benchmarking, we draw a number of surprisingly insightful guidelines for deploying LLMs onto resource-constrained devices. For example, an optimal choice between parameter learning and RAG may vary depending on the difficulty of the downstream task, the longer fine-tuning time does not necessarily help the model, and a compressed LLM may be a better choice than an uncompressed LLM to learn from limited personalized data.
- Conference Article
4
- 10.1109/icaiic57133.2023.10067064
- Feb 20, 2023
This paper presents a domain-based transfer learning method for deep learning-based object detection models where the method enables real-time computation in resource-constrained edge devices. Object detection is an essential task for intelligent platforms (e.g., drones, robots, and autonomous vehicles). However, edge devices could not afford to run huge object detection models due to insufficient resources. Although a compressed deep learning model increases inference speed, the accuracy of the model could be significantly deteriorate. In this paper, we propose an accurate object detection method while achieving real-time computation on edge devices. Our method aims to reduce marginal detection outputs of models according to application domains (e.g., city, park, factory, etc). We classify crucial objects (i.e., pedestrian, car, bench, etc) for a specific domain and adopt a transfer learning in which the learning is solely towards the selected objects. Such approach improves detection accuracy even for a compressed deep learning model like tiny versions of a YOLO (you only look once) framework. From the experiments, we validate that the method empowers the YOLOv7-tiny can provide the comparable detection accuracy with a YOLOv7 model despite of 83% less parameters than that of the original model. Besides, we confirm that our method achieves 389% faster inference on resource-constrained edge devices (i.e., NVIDIA Jetsons) than the YOLOv7.
- Research Article
- 10.3389/fnins.2025.1665778
- Oct 15, 2025
- Frontiers in Neuroscience
The deployment of Spiking Neural Networks (SNNs) on resource-constrained edge devices is hindered by a critical algorithm-hardware mismatch: a fundamental trade-off between the accuracy degradation caused by aggressive quantization and the resource redundancy stemming from traditional decoupled hardware designs. To bridge this gap, we present a novel algorithm-hardware co-design framework centered on a Ternary-8-bit Hybrid Weight Quantization (T8HWQ) scheme. Our approach recasts SNN computation into a unified “8-bit × 2-bit” paradigm by quantizing first-layer weights to 2 bits and subsequent layers to 8 bits. This standardization directly enables the design of a unified PE architecture, eliminating the resource redundancy inherent in decoupled designs. To mitigate the accuracy degradation caused by aggressive first-layer quantization, we first propose a channel-wise dual compensation strategy. This method synergizes channel-wise quantization optimization with adaptive threshold neurons, leveraging reparameterization techniques to restore model accuracy without incurring additional inference overhead. Building upon T8HWQ, we propose a novel unified computing architecture that overcomes the inefficiencies of traditional decoupled designs by efficiently multiplexing processing arrays. Experimental results support our approach: On CIFAR-100, our method achieves near-lossless accuracy (<0.7% degradation vs. full precision) with a single time step, matching state-of-the-art low-bit SNNs. At the hardware level, implementation results on the Xilinx Virtex 7 platform demonstrate that our unified computing unit conserves 20.2% of lookup table (LUT) resources compared to traditional decoupled architectures. This work delivers a 6 × throughput improvement over state-of-the-art SNN accelerators—with comparable resource utilization and lower power consumption. Our integrated solution thus advances the practical implementation of high-performance, low-latency SNNs on resource-constrained edge devices.
- Conference Article
4
- 10.1117/12.2522656
- May 10, 2019
The paper describes a vision for dependable application of machine learning-based inferencing on resource-constrained edge devices. The high computational overhead of sophisticated deep learning learning techniques imposes a prohibitive overhead, both in terms of energy consumption and sustainable processing throughput, on such resource-constrained edge devices (e.g., audio or video sensors). To overcome these limitations, we propose a “cognitive edge” paradigm, whereby (a) an edge device first autonomously uses statistical analysis to identify potential collaborative IoT nodes, and (b) the IoT nodes then perform real-time sharing of various intermediate state to improve their individual execution of machine intelligence tasks. We provide an example of such collaborative inferencing for an exemplar network of video sensors, showing how such collaboration can significantly improve accuracy, reduce latency and decrease communication bandwidth compared to non-collaborative baselines. We also identify various challenges in realizing such a cognitive edge, including the need to ensure that the inferencing tasks do not suffer catastrophically in the presence of malfunctioning peer devices. We then introduce the soon-to-be deployed Cognitive IoT testbed at SMU, explaining the various features that enable empirical testing of various novel edge-based ML algorithms.
- Research Article
5
- 10.1109/access.2021.3136888
- Jan 1, 2021
- IEEE Access
Convolutional neural networks (CNNs) have gained a huge attention for real-world artificial intelligence (AI) applications such as image classification and object detection. On the other hand, for better accuracy, the size of the CNNs’ parameters (weights) has been increasing, which in turn makes it difficult to enable on-device CNN inferences in resource-constrained edge devices. Though weight pruning and 5-bit quantization methods have shown promising results, it is still challenging to deploy large CNN models in edge devices. In this paper, we propose an encoding and hardware-based decoding technique which can be applied to 5-bit quantized weight data for on-device CNN inferences in resource-constrained edge devices. Given 5-bit quantized weight data, we employ arithmetic coding with range scaling for lossless weight compression, which is performed offline. When executing on-device inferences with underlying CNN accelerators, our hardware decoder enables a fast in-situ weight decompression with small latency overhead. According to our evaluation results with five widely used CNN models, our arithmetic coding-based encoding method applied to 5-bit quantized weights shows a better compression ratio by 9.6× while also reducing the memory data transfer energy consumption by 89.2%, on average, as compared to the case of uncompressed 32-bit floating-point weights. When applying our technique to pruned weights, we obtain better compression ratios by 57.5×–112.2× while reducing energy consumption by 98.3%–99.1% as compared to the case of 32-bit floating-point weights. In addition, by pipelining the weight decoding and transfer with the CNN execution, the latency overhead of our weight decoding with 16 decoding unit (DU) hardware is only 0.16%–5.48% and 0.16%–0.91% for non-pruned and pruned weights, respectively. Moreover, our proposed technique with 4-DU decoder hardware reduces system-level energy consumption by 1.1%–9.3%.
- Research Article
- 10.3390/rs17203497
- Oct 21, 2025
- Remote Sensing
Early warning systems on edge devices such as satellites and unmanned aerial vehicles (UAVs) are essential for effective forest fire prevention. Edge Intelligence (EI) enables deploying deep learning models on edge devices; however, traditional convolutional neural networks (CNNs)/Transformer-based models struggle to balance local-global context integration and computational efficiency in such constrained environments. To address these challenges, this paper proposes HybriDet, a novel hybrid-architecture neural network for wildfire detection. This architecture integrates the strengths of both CNNs and Transformers to effectively capture both local and global contextual information. Furthermore, we introduce efficient attention mechanisms—Windowed Attention and Coordinate-Spatial (CS) Attention—to simultaneously enhance channel-wise and spatial-wise features in high-resolution imagery, enabling long-range dependency modeling and discriminative feature extraction. Additionally, to optimize deployment efficiency, we also apply model pruning techniques to improve generalization performance and inference speed. Extensive experimental evaluations demonstrate that HybriDet achieves superior feature extraction capabilities while maintaining high computational efficiency. The optimized lightweight variant of HybriDet has a compact model size of merely 6.45 M parameters, facilitating seamless deployment on resource-constrained edge devices. Comparative evaluations on the FASDD-UAV, FASDD-RS, and VOC datasets demonstrate that HybriDet achieves superior performance over state-of-the-art models, particularly in processing highly heterogeneous remote sensing (RS) imagery. When benchmarked against YOLOv8, HybriDet demonstrates a 6.4% enhancement in mAP50 on the FASDD-RS dataset while maintaining comparable computational complexity. Meanwhile, on the VOC dataset and the FASDD-UAV dataset, our model improved by 3.6% and 0.2%, respectively, compared to the baseline model YOLOv8. These advancements highlight HybriDet’s theoretical significance as a novel hybrid EI framework for wildfire detection, with practical implications for disaster emergency response, socioeconomic security, and ecological conservation.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.