Amplitude-Modulated Virtual Sensing and FPGA-Enabled Accurate Recognition for Multiple Gases Using Electronic Nose

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

This work presents an enhanced sensing framework for MEMS gas sensors based on tunable-amplitude periodic modulation, enabling multi-state excitation and feature enrichment without increasing the number of sensing elements. A multi-level periodic driving scheme is introduced to realize sensor virtualization, and the resulting multi-state responses are processed using a short-term baseline-tracking algorithm and a dislocated sparse-sampling strategy to improve feature discrimination. A lightweight multilayer perceptron (MLP) classifier is subsequently optimized and deployed on a field-programmable gate array (FPGA)-based accelerator to enable gas recognition under constrained hardware resources. Experimental results obtained from ternary mixtures of CH4, CO, and H2 demonstrate a classification accuracy of 98.5%, accompanied by a 60% reduction in model size and a fivefold improvement in computational speed on the FPGA accelerator.

Similar Papers
  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-3-030-88004-0_10
High Power-Efficient and Performance-Density FPGA Accelerator for CNN-Based Object Detection
  • Jan 1, 2021
  • Gang Zhang + 6 more

The Field Programmable Gate Array (FPGA) accelerator for CNN-based object detection has been attracting widespread attention in computer vision. For most existing FPGA accelerators, the inference accuracy and speed are affected negatively by the low power-efficient and performance-density. To address this problem, we propose a software and hardware co-designed FPGA accelerator for accurate and fast object detection with high power-efficient and performance-density. To develop the FPGA accelerator on CPU+FPGA heterogeneous platforms, a resource sensitive and energy aware FPGA accelerator framework is designed. In hardware, a hardware sensitive neural network quantization called Dynamic Fixed-point Data Quantization (DFDQ) is proposed to improve the power-efficient. In software, an algorithm-level convolution (CONV) optimization scheme is further proposed to improve the performance-density by paralleling block execution of CONV cores. To validate the proposed FPGA accelerator, a Zynq FPGA is used to build the acceleration platform of You Only Look Once (YOLO) network. Results demonstrate that the proposed FPGA accelerator outperforms the state-of-the-art methods in power-efficient and performance-density. Besides, the speed of object detection is increased by at most 16.5 times along with less than 1.5% accuracy degradation.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 5
  • 10.3390/electronics12122558
Towards On-Board SAR Processing with FPGA Accelerators and a PCIe Interface
  • Jun 6, 2023
  • Electronics
  • Emilio Isaac Baungarten-Leon + 5 more

This article addresses a novel methodology for the utilization of Field Programmable Gate Array (FPGA) accelerators in on-board Synthetic Aperture Radar (SAR) processing routines. The methodology consists of using High-Level Synthesis (HLS) to create Intellectual property (IP) blocks and using the Reusable Integration Framework for FPGA Accelerators (RIFFA) to develop a Peripheral Component Interconnect express (PCIe) interface between the Central Processing Unit (CPU) and the FPGA, attaining transfer rates up to 15.7 GB/s. HLS and RIFFA reduce development time (between fivefold and tenfold) by using high-level programming languages (e.g., C/C++); moreover, HLS provides optimizations like pipeline, cyclic partition, and unroll. The proposed schematic also has the advantage of being highly flexible and scalable since the IPs can be exchanged to perform different processing routines, and since RIFFA allows employing up to five FPGAs, multiple IPs can be implemented in each FPGA. Since Fast Fourier Transform (FFT) is one of the main functions in SAR processing, we present a FPGA accelerator in charge of the reordering stage of VEC-FFT (an optimized version of FFT) as a proof of concept. Results are retrieved in reversed bit order, and the conventional reordering function may consume more than half of the total clock cycles. Next, to demonstrate flexibility, an IP for matrix transposition is implemented, another computationally expensive process in SAR due to memory access.

  • Research Article
  • Cite Count Icon 49
  • 10.1007/s11227-021-03849-7
FPGA acceleration on a multi-layer perceptron neural network for digit recognition
  • May 13, 2021
  • The Journal of Supercomputing
  • Isaac Westby + 3 more

This paper proposes field-programmable gate array (FPGA) acceleration on a scalable multi-layer perceptron (MLP) neural network for classifying handwritten digits. First, an investigation to the network architectures is conducted to find the optimal FPGA design corresponding to different classification rates. As a case study, then a specific single-hidden-layer MLP network is implemented with an eight-stage pipelined structure on Xilinx Ultrascale FPGA. It mainly contains a timing controller designed by Verilog Hardware Description Language (HDL) and sigmoid neurons integrated by Xilinx IPs. Finally, experimental results show a greater than $$\times 10$$ speedup compared with prior implementations. The proposed FPGA architecture is expandable to other specifications on different accuracy (up to 95.82%) and hardware cost.

  • Research Article
  • Cite Count Icon 15
  • 10.1109/tcsi.2021.3122309
FPGA Accelerator for Real-Time Non-Line-of-Sight Imaging
  • Feb 1, 2022
  • IEEE Transactions on Circuits and Systems I: Regular Papers
  • Zhengpeng Liao + 5 more

Non-line-of-sight (NLOS) imaging systems reconstruct hidden scenes using computational methods based on indirect light that diffusely reflected from relay walls. Due to the computation and memory requirements of reconstruction algorithms, real-time NLOS imaging for room-size scenes based on non-confocal data has long been challenging. This paper proposes a field programmable gate array (FPGA) accelerator for the recently proposed Rayleigh-Sommerfeld Diffraction (RSD)-based NLOS reconstruction method. In the proposed accelerator design, ring sampling and radius sampling techniques are proposed to reduce the memory requirements by reconstructing the RSD kernels with a set of kernel bases and ring sampling coefficients during the runtime. Based on that, a customized hardware architecture and the corresponding FPGA design for real-time RSD-based NLOS reconstruction is further proposed. Implementation results show that the proposed FPGA accelerator is capable of reconstructing NLOS scenes at 25 frames per second (FPS), running at a relatively slow clock frequency of 50 MHz. To the best knowledge of the authors, this is the first real-time enabled FPGA accelerator for room-size NLOS imaging with a resolution of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$128\times 128$ </tex-math></inline-formula> .

  • Research Article
  • Cite Count Icon 24
  • 10.1109/tim.2023.3346517
BearingPGA-Net: A Lightweight and Deployable Bearing Fault Diagnosis Network via Decoupled Knowledge Distillation and FPGA Acceleration
  • Jan 1, 2024
  • IEEE Transactions on Instrumentation and Measurement
  • Jing-Xiao Liao + 7 more

Deep learning has achieved remarkable success in the field of bearing fault diagnosis. However, this success comes with larger models and more complex computations, which cannot be transferred into industrial fields requiring models to be of high speed, strong portability, and low-power consumption. In this article, we propose a lightweight and deployable model for bearing fault diagnosis, referred to as BearingPGA-Net, to address these challenges. First, aided by a well-trained large model, we train BearingPGA-Net via decoupled knowledge distillation (DKD). Despite its small size, our model demonstrates excellent fault diagnosis performance compared with other lightweight state-of-the-art methods. Second, we design a field-programmable gate array (FPGA) acceleration scheme for BearingPGA-Net using Verilog. This scheme involves the customized quantization and designing programmable logic gates for each layer of BearingPGA-Net on the FPGA, with an emphasis on parallel computing and module reuse to enhance the computational speed. To the best of our knowledge, this is the first instance of deploying a convolutional neural network (CNN)-based bearing fault diagnosis model on an FPGA. Experimental results reveal that our deployment scheme achieves over <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$200\times $ </tex-math></inline-formula> faster diagnosis speed compared with CPU, while achieving a lower than 0.4% performance drop in terms of F1, recall, and precision score on our independently collected bearing dataset. Our code is available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/asdvfghg/BearingPGA-Net</uri> .

  • Research Article
  • Cite Count Icon 2
  • 10.1088/1742-6596/1964/6/062014
A Survey of Low-Latency IoT System Using FPGA Accelerator
  • Jul 1, 2021
  • Journal of Physics: Conference Series
  • F P Mahimai Don Bosco + 2 more

Internet of Things has taken its place in the world of technology fairly in the previous few years. It is assumed that there will be approximately 4 billion IoT devices interconnected by the year 2030. IoT has not widespread full feathered in all the fields of application. However, the future holds a wide spectrum of implementations and dependency in IoT, which demands digital computing parameters such as faster processing of data, reduced latency and parallel processing of multiple data channel simultaneously. This publication provides a solution to satisfy these parameters, using FPGA (field-programmable gate array) accelerators in the IoT systems.In IoT, it is necessary to achieve data-centric parameters such as higher bitrate at a seamless flow rate avoiding data congestion and data traffic. The predictability of the endpoint is another important parameter to be considered in an IoT system. In this paper, we will discuss the use of Constrained Application Protocol and speculate the possibility of enhancing the performance parameters such as latency and predictability by accelerating the cloud servers with FPGA Accelerator.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/icpre55555.2022.9960650
A High-Speed IGBT Switching Characteristic Detection System Based on SoC and FPGA Acceleration
  • Sep 23, 2022
  • Yueyang Cao + 2 more

As an important condition parameter, turn-off time of insulated gate bipolar transistor (IGBT) has a close relationship with junction temperature and device health state. Researches show that it is effective to monitor the health state of IGBT by measuring turn-off time of each switching and junction temperature. However, traditional methods of calculating turnoff time either directly measure collector-emitter voltage or indirectly measure gate side parameters, which may have risks on safety and problems of complexity. In order to overcome these problems, in this paper, an IGBT switching characteristic detection system based on system-on-chip (SOC) is proposed, which takes IGBT turn-off time and output current as switching characteristic, and features in high-speed calculation (in microsecond level) of turn-off time in real time. The system uses a non-contact turn-off time calculating method to monitor the health state of IGBT in an online way and improves its performance by field programmable gate array (FPGA) acceleration on ZYNQ SoC. Compared with the existing technology, it has the advantages of high security, high performance and simple engineering implementation. Finally, the feasibility of the system is verified by experiments.

  • Conference Article
  • Cite Count Icon 6
  • 10.1109/iccsnt50940.2020.9304996
Trusted Edge Cloud Computing Mechanism Based on FPGA Cluster
  • Nov 20, 2020
  • Hongwei Kan + 5 more

To solve the problem of computing overload in cloud, we intend to design a trusted edge cloud computing model and method based on FPGA (Field Programmable Gate Array) clusters. Firstly, a device, named FPGA Box, with PCIe (Peripheral Component Interconnect Express) power supply capability is used to manage the FPGA accelerator in the model. Besides, the FPGA cluster provide heterogeneous accelerated computing services for the data center through the network. Furthermore, we proposed a trusted edge cloud computing method based on FPGA cluster. On the one hand, a bi-level encryption algorithm based on RSA is proposed to generate an authorized use code, which implied the FPGA accelerator IP (Internet Protocol) address and other information. On the other hand, based on the programmable features of the FPGA accelerators, we set FPGA registers as use status bits, which can control different working status of accelerator. Specifically, when the accelerators has been assigned, we also need upload the deadline of usage to it. Finally, the software activity process of the entire trusted edge cloud system is described in detail, including the process of generating authorized use code. Simulation results show that the edge cloud computing mechanism based on FPGA cluster is proved to be trusted and effective.

  • Research Article
  • Cite Count Icon 27
  • 10.3934/mbe.2021007
Wearable on-device deep learning system for hand gesture recognition based on FPGA accelerator.
  • Dec 7, 2020
  • Mathematical Biosciences and Engineering
  • Weibin Jiang + 7 more

Gesture recognition is critical in the field of Human-Computer Interaction, especially in healthcare, rehabilitation, sign language translation, etc. Conventionally, the gesture recognition data collected by the inertial measurement unit (IMU) sensors is relayed to the cloud or a remote device with higher computing power to train models. However, it is not convenient for remote follow-up treatment of movement rehabilitation training. In this paper, based on a field-programmable gate array (FPGA) accelerator and the Cortex-M0 IP core, we propose a wearable deep learning system that is capable of locally processing data on the end device. With a pre-stage processing module and serial-parallel hybrid method, the device is of low-power and low-latency at the micro control unit (MCU) level, however, it meets or exceeds the performance of single board computers (SBC). For example, its performance is more than twice as much of Cortex-A53 (which is usually used in Raspberry Pi). Moreover, a convolutional neural network (CNN) and a multilayer perceptron neural network (NN) is used in the recognition model to extract features and classify gestures, which helps achieve a high recognition accuracy at 97%. Finally, this paper offers a software-hardware co-design method that is worth referencing for the design of edge devices in other scenarios.

  • Research Article
  • Cite Count Icon 1
  • 10.1117/1.jei.30.3.033034
FPGA accelerator for CNN: an exploration of the kernel structured sparsity and hybrid arithmetic computation
  • Jun 28, 2021
  • Journal of Electronic Imaging
  • Guanwen Zhang + 3 more

The deployment of large-scale deep neural networks on field programmable gate array (FPGA) platforms is severely hindered by the high requirements on computational resources and off-chip data bandwidth. Traditional nonstructured sparsity algorithms can efficiently reduce the nonzero weights of neural network models. However, the nonstructured sparse connections across channels also degrade the degree of computational parallelism and consequently seriously deteriorate the performance of the FPGA accelerator. We propose an FPGA accelerator by exploring the kernel structured sparsity and hybrid arithmetic computation for the convolutional neural network (CNN). On the one hand, we introduce a hardware-friendly kernel pruning method to reduce the number of arithmetic operations of the CNN model. Our proposed method maintains high accuracy (achieving a less than 0.32% accuracy loss) and achieves a high degree of parallelism. On the other hand, we design a specific hybrid arithmetic computation for the FPGA accelerator to speed up the performance of the pruned CNN model. The FPGA accelerator consists of only 64 sets of hybrid 8-bit and 16-bit floating-point units for the convolution operation. Experiments on VGGNet16 demonstrate that the proposed FPGA accelerator achieves a state-of-the-art 5 × convolution operation reduction and a 3 × parameter compression. The proposed FPGA accelerator is able to perform at 13.2 FPS, and the corresponding energy efficiency can be boosted up to 1.9 image / J.

  • Research Article
  • 10.37934/araset.52.1.122131
Accelerating DNA Sequence Alignment using Altera DE2-115
  • Oct 1, 2024
  • Journal of Advanced Research in Applied Sciences and Engineering Technology
  • Syed Abdul Mutalib Al Junid + 6 more

DNA sequence alignment is a technique for discovering information between two base sequences which the Smith-Waterman algorithm is the accurate method that provides a precise result for alignment compared to others. However, the performance was influence by size of dataset and a long DNA base sequence which resulted the time required for the alignment process is much longer in relation to the number of DNA sequence samples. There are many ways to accelerate DNA sequence alignment, and Field Programmable Gate Array (FPGA) is a good choice due to its parallel processing and cost efficiency. Although FPGA acceleration approaches are not new, this work investigates a purely software-based FPGA acceleration using the Altera Cyclone IV EP4CE115F29C7N FPGA as the target device. The SW algorithm was developed using the C language in Quartus II version 18.1 and the Nios II software build tools for Eclipse. The development starts with setting up the Qsys architecture before developing the code in Eclipse to determine the computational performance. The result shows the computational timing and speed of the implementation, with the highest speed achieved being 198.76 cells per millisecond. To summarise, the computational performance ultimately depends on the maximum matrix size of the FPGA, which is also influenced by the DNA-based pair length and able to complete using low-cost FPGA.

  • Research Article
  • Cite Count Icon 5
  • 10.32620/reks.2023.3.03
Method of creation of FPGA based implementation of artificial intelligence as a service
  • Sep 29, 2023
  • Radioelectronic and Computer Systems
  • Artem Perepelitsyn

The subject of study in this article is the technologies of Field Programmable Gate Array (FPGA), methods, and tools for prototyping of hardware accelerators of Artificial Intelligence (AI) and providing it as a service. The goal is to reduce the efforts of creation and modification of FPGA implementation of Artificial Intelligent projects and provide such solutions as a service. Task: to analyze the possibilities of heterogeneous computing for the implementation of AI projects; analyze advanced FPGA technologies and accelerator cards that allow the organization of a service; analyze the languages, frameworks, and integrated environments for the creation of Artificial Intelligence projects for FPGA implementation; propose a technique for modifiable FPGA project prototyping to ensure a long period of compatibility with integrated environments and target devices; propose a technique for the prototyping of FPGA services with high performance to improve the efficiency of FPGA based AI projects; propose a sequence of optimization of neural networks for FPGA implementation; and provide an example of the practical implementation of the research results. According to the tasks, the following results were obtained. Analysis of the biggest companies and vendors of FPGA technology is performed. Existing heterogeneous technologies and potential non-electronic mediums for AI computations are discussed. FPGA accelerator cards with a large amount of High Bandwidth Memory (HBM) on the same chip package for implementation of AI projects are analyzed and compared. Languages, frameworks, and technologies as well as the capabilities of libraries and integrated environments for prototyping of FPGA projects for the AI applications are analyzed in detail. The sequence of prototyping of FPGA projects that are stable to changes in the environment is proposed. The sequence of prototyping of highly efficient pipelined projects for data processing is proposed. The steps of optimization of neural networks for FPGA implementation of AI applications are provided. An example of practical use of the results of research, including the use of sequences is provided. Conclusions. One of the main contributions of this research is the proposed method of creation of FPGA based implementation of AI projects in the form of services. Proposed sequence of neural network optimization for FPGA allows the reduction of the complexity of the initial program model by more than five times for hardware implementation depending on the required accuracy. The described solutions allow the construction of completely scalable and modifiable FPGA implementations of AI projects to provide it as a service.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 10
  • 10.1155/2023/3715603
Multi-Layer Perceptron Classifier with the Proposed Combined Feature Vector of 3D CNN Features and Lung Radiomics Features for COPD Stage Classification.
  • Jan 1, 2023
  • Journal of Healthcare Engineering
  • Yingjian Yang + 9 more

Computed tomography (CT) has been regarded as the most effective modality for characterizing and quantifying chronic obstructive pulmonary disease (COPD). Therefore, chest CT images should provide more information for COPD diagnosis, such as COPD stage classification. This paper proposes a features combination strategy by concatenating three-dimension (3D) CNN features and lung radiomics features for COPD stage classification based on the multi-layer perceptron (MLP) classifier. First, 465 sets of chest HRCT images are automatically segmented by a trained ResU-Net, obtaining the lung images with the Hounsfield unit. Second, the 3D CNN features are extracted from the lung region images based on a truncated transfer learning strategy. Then, the lung radiomics features are extracted from the lung region images by PyRadiomics. Third, the MLP classifier with the best classification performance is determined by the 3D CNN features and the lung radiomics features. Finally, the proposed combined feature vector is used to improve the MLP classifier's performance. The results show that compared with CNN models and other ML classifiers, the MLP classifier with the best classification performance is determined. The MLP classifier with the proposed combined feature vector has achieved accuracy, mean precision, mean recall, mean F1-score, and AUC of 0.879, 0.879, 0.879, 0.875, and 0.971, respectively. Compared to the MLP classifier with the 3D CNN features selected by Lasso, our method based on the MLP classifier has improved the classification performance by 5.8% (accuracy), 5.3% (mean precision), 5.8% (mean recall), 5.4% (mean F1-score), and 2.5% (AUC). Compared to the MLP classifier with lung radiomics features selected by Lasso, our method based on the MLP classifier has improved the classification performance by 5.0% (accuracy), 5.1% (mean precision), 5.0% (mean recall), 5.1% (mean F1-score), and 2.1% (AUC). Therefore, it is concluded that our method is effective in improving the classification performance for COPD stage classification.

  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.future.2024.107497
HashGrid: An optimized architecture for accelerating graph computing on FPGAs
  • Aug 28, 2024
  • Future Generation Computer Systems
  • Amin Sahebi + 2 more

Large-scale graph processing poses challenges due to its size and irregular memory access patterns, causing performance degradation in common architectures, such as CPUs and GPUs. Recent research includes accelerating graph processing using Field Programmable Gate Arrays (FPGAs). FPGAs can provide very efficient acceleration thanks to reconfigurable on-chip resources. Although limited, these resources offer a larger design space than CPUs and GPUs.We propose an approach in which data are preprocessed in small chunks with an optimized graph partitioning technique for execution on FPGA accelerators. The chunks, located on the host, are streamed directly into a customized memory layer implemented in the FPGA, which is tightly coupled with the processing elements responsible for the graph algorithm execution. This improves application memory access latency, which is crucial in large-sale graph computing performance.This work presents a hardware design that, combined with graph partitioning, enables us to achieve high-performance and potentially scalable handling of large graphs (i.e., graphs with millions of vertices and billions of edges in current scenarios) while using popular graph algorithms. The proposed framework accelerates performance 56 times compared with CPU (multicore with 16 logical cores in our reference experiments), 2.5 times and 4 times faster compared to state-of-the-art FPGA and GPU solutions (FPGA has 15 compute units, and GPU reference has 128 streaming-multiprocessors in our experiments), respectively, when using the PageRank algorithm. For the Single-Source-Shortest-Past (SSSP) algorithm, we achieve speedups of up to 65x, 26x, and 18x compared to CPU, GPU, and FPGA works, respectively. Lastly, in the context of the Weakly Connected Component (WCC) algorithm, our framework achieves a speedup of up to 403 times compared to the CPU, 7.4x against the GPU, and it is faster than the FPGA alternatives up to 10.3x.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/aps.2009.5171725
Field programmable gate array acceleration of bio-inspired optimization techniques for phased array design
  • Jun 1, 2009
  • Digest - IEEE Antennas and Propagation Society. International Symposium
  • Ozlem Kilic + 1 more

This paper investigates the performance improvement in computational time achieved by the use of field programmable gate arrays (FPGAs) in electromagnetic simulations. The amplitudes of a linear phased array antenna are optimized to reduce the interference in multi-beam satellite communication systems by implementing the optimization algorithm and the antenna pattern calculations on a single FPGA. The ant colony optimization algorithm, which is a bio-inspired heuristic search method based on the survival skills of ants, is applied to a phased array antenna such that nulls can be placed in the array factor to reduce interference from beams operating at the same frequency band. Due to the inherently parallel nature of the algorithm, speed improvements in the orders of ten thousands compared to conventional programming using Matlab have been demonstrated by being able to pipeline and parallelize the calculations on the FPGA.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant