Articles published on Fpga design
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
969 Search results
Sort by Recency
- Research Article
- 10.1088/1748-0221/21/04/c04003
- Apr 1, 2026
- Journal of Instrumentation
- B Akgül + 9 more
In preparation for operations at the HL-LHC, the CMS Collaboration is upgrading its endcap calorimeters with a high granularity calorimeter (HGCAL). The HGCAL back-end electronics includes two Non-Zero Suppression (NZS) boards, which dynamically disable zero-suppression in designated regions of interest. This paper presents a detailed discussion of the principal components of the implemented NZS firmware and a comprehensive account of the hardware testing performed on the Serenity platform, including validation against a Python-based emulator. Each of the 48 DAQ (Data Acquisition) boards of a single endcap receives 432-bit NZS flags, which are generated non-zero-suppression control flags to disable zero suppression for designated regions of interest on the front-end sections and sent via high-speed output channels operating at 25 Gbps. The NZS firmware processes data from six EMTF input links operating at 25 Gbps, and produces the necessary non-zero suppression control flags for real-time selection and spatial mapping of up to 27 muon candidates per bunch crossing under a 360 MHz system clock constraint. To meet the stringent timing requirements, the design adopts a fully pipelined FPGA architecture, enabling deterministic latency while sustaining continuous high-throughput operation.
- Research Article
- 10.1002/ett.70410
- Apr 1, 2026
- Transactions on Emerging Telecommunications Technologies
- L Malathi
ABSTRACT ECG signal classification is important for the early detection of cardiovascular disorders (CVDs). The current methods have been struggling with the nonlinear complexity of ECG signals, making them inefficient for real‐time diagnostic analysis. Thus, this paper proposes a new FPGA‐based deep convolutional vision transformer network (dCViTrN) (FPGA‐dCViTrN) accelerator to detect different types of arrhythmias. While performing the ECG signal classification process, an unsigned divide, and conquer‐based look‐up‐table (LUT) oriented booth multiplier (UDC‐LUT‐BM) is used to perform complex mathematical operations of dCViTrN, like multiplication, for minimizing the complexity. Two publicly available datasets, specifically the PTB‐XL and MIT‐BIH arrhythmia, are used for experimentation. Furthermore, a variety of performance indicators, including accuracy, recall, precision, and F1‐score, are utilized to evaluate the deep learning accelerator. In addition, delay, resource utilization, and power consumption are used to assess the hardware complexity. The findings show that the FPGA‐dCViTrN design delivers 99.25% and 99.7% classification accuracy on the MIT‐BIH and PTB‐XL datasets. Overall, this research provides a robust, high‐accuracy deep‐learning model strengthened by an optimized FPGA architecture, allowing for enhanced, real time ECG classification and assessment in medical diagnostics.
- Research Article
1
- 10.1016/j.neunet.2025.108333
- Apr 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Bertrand Frederick Boui A Boya + 5 more
Hopfield neural networks with diverse activation functions: impact of variable action gradients and electromagnetic radiation effects.
- Research Article
- 10.1145/3796723
- Mar 27, 2026
- ACM Transactions on Architecture and Code Optimization
- Archit Gajjar + 9 more
Modern AI models place heavy demands on compute resources, underscoring the importance of hardware accelerators that can balance performance, energy, and flexibility. The ever-growing demand for AI computing, coupled with slowing performance gains in chip manufacturing, has heightened the role of FPGA-based accelerators due to their rapid adaptability, reprogrammability, and support for custom parallel data flows. In this work, we introduce a domain-optimized FPGA architecture designed for deep neural network (DNN) inference by embedding analog in-memory compute blocks, specifically, RRAM-based Dot Product Engines (DPE), directly into the fabric. These engines perform multiply-accumulate operations within an RRAM-based crossbar, minimizing data movement through the FPGA routing fabric, while enhancing the compute throughput of the FPGA significantly, overall improving energy efficiency of mapping DNNs on FPGAs. We evaluate our architecture using the Verilog-to-Routing (VTR) framework. We simulate a novel 22 nm architecture, and employ a custom event-driven simulator to evaluate its performance. Our design achieves an average 6.58× reduction in latency, 5.42× throughput improvement, and 8,741× energy efficiency improvement when compared to state-of-the-art FPGA implementations across multiple DNNs (LeNet, ResNet, VGG).
- Research Article
- 10.3390/app16052625
- Mar 9, 2026
- Applied Sciences
- Yiqi Tang + 2 more
Identifying the time-domain waveform type under broadband conditions is a basic but very challenging task. Traditional methods based on frequency domain or training models generally have the problems of high resource consumption, large delay, and unsuitability for hardware. This paper proposes a time-domain waveform recognition architecture based on an FPGA, which is integrated with multi-feature voting. Several lightweight time domain characteristics, such as high amplitude ratio, symmetry, slope uniformity, slope change rate, and flat-top characteristics, are extracted and directly used for waveform classification. Then classify sine waves, square waves, triangular waves, and noise in the time domain according to the decision-making mechanism of voting. In order to improve reliability under non-ideal conditions, adaptive thresholds and noise perception decision-making logic are used to suppress misclassifications caused by random fluctuations and jitter. The whole engineering design focuses on resource consumption and hardware efficiency, using a fully pipeline FPGA architecture. The experimental results prove that the system has the ability of high-precision identification, low power consumption, and real-time processing in the wide frequency band, providing an efficient and practical solution for embedded waveform recognition applications.
- Research Article
- 10.1145/3797035
- Feb 11, 2026
- ACM Transactions on Architecture and Code Optimization
- Aggelos Ferikoglou + 4 more
High-Level Synthesis (HLS) streamlines FPGA programming by abstracting low-level hardware complexities, facilitating rapid microarchitecture customization through the use of directives. However, identifying optimal directives remains a significant challenge, particularly for software developers without extensive hardware expertise. HLS-driven Design Space Exploration (DSE) addresses this challenge by automating the generation of directive configurations through a variety of techniques. The effectiveness of these methods heavily relies on the quality of the data on which they are built. Unfortunately, existing datasets are often constrained in scope, complexity, and device diversity. To address these limitations, we present GN \Omega SIS ^1 , the largest open-source HLS dataset, to the best of our knowledge, containing almost 219K design points generated from prominent benchmark suites and public repositories. The dataset covers two FPGAs with varying resource characteristics and three clock frequencies. Additionally, we introduce a versatile framework to automate design point generation. We evaluate the impact of frequency scaling, FPGA architecture, and optimization targets on Quality of Result (QoR) metrics, analyze HLS directive effectiveness across code constructs, and study design transferability across FPGAs and frequencies. These analyses provide practical insights into QoR and HLS behavior, offering guidance for designers with limited optimization experience. We believe GN \Omega SIS will enhance FPGA accessibility and act as an enabler for innovation in HLS-driven DSE research. The dataset is publicly accessible on Hugging Face , and the framework is available on GitHub.
- Research Article
- 10.1002/cpe.70579
- Jan 27, 2026
- Concurrency and Computation: Practice and Experience
- Tengfei Li + 2 more
ABSTRACT Artificial intelligence hardware accelerators are gaining increasing importance in domains such as computer vision and robotics. However, deploying Convolutional Neural Networks (CNNs) on embedded systems with constrained resources and memory continues to pose a major challenge. Motivated by the requirements of robotic vision, this paper presents a DSP‐Efficient Packing Strategy (DEPS) accelerator architecture tailored for lightweight CNNs, improving both computational throughput and hardware efficiency in real‐time robotic applications. Unlike previous FPGA designs that underutilize DSP blocks, the proposed DEPS enables the parallel execution of twelve 3‐bit multiplications within a single DSP48E2 unit. A layer‐wise pipelined mapping scheme is also proposed, which directly maps each CNN layer onto hardware without intermediate buffering, ensuring continuous computation and minimizing latency. The proposed accelerator is incorporated into an intelligent tennis serving robot, serving as the real‐time vision module for object detection. Experimental results from VGG7‐tiny and UltraNet demonstrate throughputs of 299.4 GOPS and 340.0 GOPS, respectively, alongside power efficiencies of 80.1 GOPS/W and 89.2 GOPS/W. The robotic system deployment confirms that superior DSP utilization is achieved, enabling rapid, energy‐efficient, and reliable perception. This work highlights the potential of the proposed design for application in resource‐constrained edge platforms and practical robotics.
- Research Article
- 10.3390/electronics15020414
- Jan 17, 2026
- Electronics
- Eleftherios Mylonas + 4 more
The ever-increasing need for energy-efficient implementation of AI algorithms has driven the research community towards the development of many hardware architectures and frameworks for AI. A lot of work has been presented around FPGAs, while more sophisticated architectures like CGRAs have also been at the center. However, AI ecosystems are isolated and fragmented, with no standardized way to compare different frameworks with detailed Power–Performance–Area (PPA) analysis. This paper bridges the gap by presenting a unified, fully open-source hardware-aware AI acceleration pipeline that enables seamless deployment of neural networks on both FPGA and CGRA architectures. Built around the Brevitas quantization framework, it supports two distinct backend flows: FINN for high-performance dataflow accelerators and CGRA4ML for low-power coarse-grained reconfigurable designs. To facilitate this, a model translation layer from QONNX to QKeras is also introduced. To demonstrate its effectiveness, we use an autoencoder model for anomaly detection in wind turbines. We deploy our accelerated models on the AMD’s ZCU104 and benchmark it against a Raspberry Pi. Evaluation on a realistic cyber–physical testbed shows that the hardware-accelerated solutions achieve substantial performance and energy-efficiency gains—up to 10× and 37× faster inference per flow and over 11× higher efficiency—while maintaining acceptable reconstruction accuracy.
- Research Article
- 10.3390/aerospace13010074
- Jan 10, 2026
- Aerospace
- Peijun Zhong + 5 more
This paper presents an architecture and strategy for on-orbit software updating of satellite payload control systems, based on a tightly coupled DSP and FPGA design. The architecture achieves tight coupling between the DSP and FPGA via the XINTF interface, integrating the DSP program and data into the FPGA bitstream. This enables synchronous updating of both chips with a single software package, significantly reducing both uplink data volume and update time. The system features a dual-flash redundant boot design and a mutual supervision mechanism between the DSP and FPGA, enabling cross-monitoring and autonomous reset, thereby significantly enhancing the system’s fault tolerance and reliability in orbit. Experimental results demonstrate a substantial improvement in fault recovery, with the weighted mean recovery time reduced from 27.09 s to 1.56 s, a relative improvement of 94.25% compared to conventional methods. Ground-based environmental tests confirm the system’s stability and engineering viability under extreme space conditions.
- Research Article
- 10.36548/jei.2025.4.002
- Jan 6, 2026
- Journal of Electronics and Informatics
- Ravi Teja C + 4 more
Noise removal is a vital pre-processing step in wearable ECG devices for accurate arrhythmia detection. This paper proposes a hardware-efficient, multiplier-less FPGA architecture for ECG denoising using a lifting-based wavelet transform. A universal thresholding function with soft thresholding enhances signal quality, while a modified lifting-based DWT eliminates multipliers and simplifies computation. An optimized median calculation and thresholding method remove the need for comparators in VLSI design. ECG data from the MIT-BIH databases validate the approach, achieving an SNR improvement of 7.4 dB and an MSE of 0.0206. FPGA implementation on the Nexys 4 DDR board demonstrates low hardware usage and a high operating frequency of 166 MHz, outperforming existing designs.
- Research Article
- 10.1109/access.2026.3684720
- Jan 1, 2026
- IEEE Access
- Y Rasheed + 6 more
AMPEREISH: Approximate Multipliers for Power Efficiency in FPGA Designs Using Internal-Self-Healing
- Research Article
- 10.1016/j.vlsi.2025.102532
- Jan 1, 2026
- Integration
- Wanzheng Weng + 1 more
RapidPnR: Accelerating the physical design for FPGAs via design-level parallelism
- Research Article
- 10.1587/elex.23.20250671
- Jan 1, 2026
- IEICE Electronics Express
- Yuqian Sun + 2 more
A Structured FPGA Architecture for High-Speed Conversion between Streaming and Segmented Buses
- Research Article
- 10.31474/1996-1588-2026-1-42-65-72
- Jan 1, 2026
- Scientific papers of Donetsk National Technical University. Series: Informatics, Cybernetics and Computer Science
- Irina Zeleneva + 3 more
Current trends in hardware resource scaling create a fundamental industry problem – the so-called "verification gap." This article analyzes the potential of generative AI in the context of digital circuit design for improving code verification in hardware description languages. Thanks to the self-attention mechanism based on the Transformer architecture, large language models are capable of analyzing complex dependencies between code elements. This problem is particularly relevant in the context of using the VHDL hardware description language. Unlike Verilog or SystemVerilog, which have a more concise C-like syntax, VHDL is characterized by strong typing, verbosity, and rigid syntactic constructs. The object of this study is the process of automated verification of VHDL code using large language models. A comparative analysis of nine modern LLMs is conducted based on accuracy, performance, and cost criteria, and an appropriate experimental research method is proposed, including the development of a unified prompt and normalization of the obtained data. The English language is substantiated and defined as a unified standard for both the user interface and system prompts and internal code generation. A specialized dataset consisting of 25 test cases was developed during the study. Each case contains a VHDL file with a pre-introduced error (or reference-correct code) and a corresponding JSON file describing the expected result. A test dataset with error distribution by category was generated, and correct test files were used to identify possible false positives, facilitating comprehensive verification of AI models. To correctly process input data by different LLMs, a normalization algorithm using an "arbiter model" was developed. The study solved a pressing scientific and applied problem of increasing the efficiency of FPGA design verification through the informed selection of optimal large language models. The proposed LLM analysis algorithm can conceptually be applied to other hardware description languages. Practical application is possible in digital system design, as well as in the educational process of training IT specialists. The most promising direction for further research is the integration of RAG (Retrieval-Augmented Generation) technology. This will allow for supplementing the technical documentation (datasheets) of specific FPGA architectures, taking into account current families.
- Research Article
- 10.18127/j00338486-202601-02
- Jan 1, 2026
- Radioengineering
- A.O Slavyansky + 4 more
Problem statement. Designing custom integrated circuits using a developed FPGA design in order to solve import substitution problems. Goal. Description of the features of the design stages of custom integrated circuits based on basic matrix crystals (BMCs) in the interface module for data exchange, implemented using the developed FPGA design in order to solve import substitution problems. Result. An example of optimizing the software architecture of the system is provided, which makes it possible to increase the efficiency of interaction between various blocks of the designed system and improve the operability of the interface module. During the refinement process, the separation of the input and output interfaces was performed, which ensured a more accurate compliance with the requirements of the BMK logic synthesizer. The necessary technical information was prepared containing a description of the involved conclusions of the BMK crystal project for the BMK manufacturing order card. Practical significance. The obtained results of the work make it possible to effectively apply BMK in order to solve import substitution problems. The use of BMK will increase reliability and reduce cost in the mass production of equipment. The results of the work can be used in the ground, aviation and space industries.
- Research Article
- 10.17694/bajece.1615108
- Dec 31, 2025
- Balkan Journal of Electrical and Computer Engineering
- Aydın Tarık Zengin
This study presents a novel embedded FPGA design utilizing Time-over-Threshold (ToT) and Double-Time-over-Threshold (DToT) methods, addressing the limitations of traditional analog-to-digital converters (ADCs). The ToT method has gained popularity due to its advantages in power consumption, cost, and integration, yet it faces challenges such as energy vs. time resolution trade-offs and signal nonlinearity. The proposed DToT method aims to mitigate these issues by employing two thresholds, offering improved energy resolution and reduced nonlinearity compared to single-threshold ToT methods. The system is implemented on a Zynq System-on-Chip (SoC) that integrates an FPGA with an ARM CPU, enabling dynamically adjustable thresholds, high-precision timing measurements, and flexible data processing capabilities. The evaluation was conducted using a CAEN DT4800 Digital Detector Emulator, which generated signals from a 3-inch NaI detector exposed to a 22Na radioactive source. The results demonstrate the superior precision of the DToT method, particularly at lower energies, with σ/μ ratios of 1.82% and 3.82% for the 511 keV and 1275 keV peaks, respectively. This FPGA-based approach provides a versatile and high-precision solution for instrumentation applications, offering significant advantages over traditional ADCs and single-threshold ToT methods. The integration of an ARM CPU with FPGA logic allows for flexible and tunable signal processing, making it suitable for a wide range of applications, including particle physics experiments, medical imaging systems, and industrial sensors. The study underscores the potential of DToT for high-resolution spectroscopy and suggests areas for future research, such as optimizing the system for specific experimental setups in scintillation gamma cameras or positron emission tomography (PET).
- Research Article
- 10.1145/3774651
- Dec 22, 2025
- ACM Transactions on Reconfigurable Technology and Systems
- Weihai Xu + 7 more
Deep Packet Inspection (DPI) faces significant bottlenecks in regular expression (regex) matching due to escalating rule complexity and traffic volume. Existing FPGA-based solutions inefficiently process all packets through every automaton, incurring substantial resource overhead. This article proposes OD-REM, an on-demand regex matching architecture for FPGAs that dramatically improves efficiency. In addition to this novel architecture, OD-REM also introduces three innovations: (1) A Counter-Enabled Fast Reconfigurable Automaton (cFRA) compresses regex states by 97.5% via counting semantics, eliminating state explosion for bounded repetitions; (2) A Ring Queue (RQ) scheduler dynamically dispatches packets only to automata relevant to their candidate rules (identified via pre-filtering); (3) A modular pipeline for per-packet, on-chip run-time reconfiguration. Implemented on a Xilinx VU9P FPGA with 32 parallel cFRAs, OD-REM achieves 41.18 Gbps throughput—roughly 3× to 40× higher than the similar works while still performing a full reconfiguration on every packet. It reduces packet latency by 3.73 µs versus sliding-window scheduling. Integrated with Pigasus, OD-REM offloads complex rules, accelerating Hyperscan software matching by up to 37×. This work demonstrates FPGA-centric regex matching as a scalable, high-throughput solution for modern DPI systems.
- Research Article
- 10.26483/ijarcs.v16i6.7386
- Dec 21, 2025
- international journal of advanced research in computer science
- Himanshu Barhaiya
Hardware security has become highly threatened by the high rate of semiconductor design and production globalization. The hardware Trojans have become a major danger that can negatively affect confidentiality, integrity, and availability of electronic systems due to the large number of ICs that are currently being produced in insecure facilities. These destructive changes can be introduced in any of the design stages or manufacturing processes and usually they spend many years being in dormancy before activation. This paper provides a wide overview of hardware Trojan detection methods in FPGA and ASIC systems. The paper discusses different detection methods, such as side-channel testing, optimization-based testing, machine learning-assisted testing and electromagnetic (EM)-based testing. It also addresses the weaknesses specific to FPGA and ASIC design and architecture, recent research trends and challenges, including the process variation, scalability and the complexity of the design of modern SoCs. The survey gives a complete account of the methodologies of detection, and the need to have secure design flows, trusted manufacture, and superior AI-based solutions defines the future hardware security solutions
- Research Article
- 10.1145/3778036
- Dec 15, 2025
- ACM Transactions on Reconfigurable Technology and Systems
- Mohamed A Elgammal + 12 more
This is a corrigendum for the article “VTR 9: Open-Source CAD for Fabric and Beyond FPGA Architecture Exploration” published in ACM Trans. Reconfig. Technol. Syst. 18, 3, Article 39 (August 2025), 53 pages.
- Research Article
- 10.21275/sr251208213556
- Dec 13, 2025
- International Journal of Science and Research (IJSR)
- Nihalaparvin Abbas
The detection of irregular heart rhythms from electrocardiogram (ECG) signals is essential for the diagnosis and continuous monitoring of cardiac disorders, where the early identification of arrhythmias, such as premature ventricular contractions, ventricular tachycardia, and ventricular fibrillation, is critical for preventing sudden cardiac events [7]?[9]. ECG analysis is challenged by noise, baseline drift, and nonstationary behavior, necessitating robust preprocessing and feature extraction techniques, such as wavelet-based denoising and moving average QRS detection methods [3], [10], [13]. Advanced heart rate variability (HRV) analysis using compressed sensing and integral pulse frequency modulation (IPFM) models has demonstrated improved spectral resolution and prognostic value, particularly for unevenly sampled RR intervals [4]?[6], [14]. Statistical, machine learning, and deep learning approaches, including Bayesian frameworks, complexity-measure-based hypothesis testing, and convolutional neural networks, have further enhanced arrhythmia detection and classification accuracy [7]?[9], [11], [15]. To support continuous real-time monitoring, recent research has emphasized hardware-efficient implementations, with VLSI, FPGA, and ASIC-based architectures enabling low-power abnormal heartbeat detection through optimized neural networks and edge artificial intelligence, which are suitable for wearable and implantable devices [1], [12], [16]?[18].