Articles published on Hardware architecture
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
5003 Search results
Sort by Recency
- New
- Research Article
- 10.3390/electronics15020351
- Jan 13, 2026
- Electronics
- Shang-En Tsai + 2 more
The deployment of Advanced Driver-Assistance Systems (ADAS) in economically constrained markets frequently relies on hardware architectures that lack dedicated graphics processing units. Within such environments, the integration of deep neural networks faces significant hurdles, primarily stemming from strict limitations on energy consumption, the absolute necessity for deterministic real-time response, and the rigorous demands of safety certification protocols. Meanwhile, traditional geometry-based lane detection pipelines continue to exhibit limited robustness under adverse illumination conditions, including intense backlighting, low-contrast nighttime scenes, and heavy rainfall. Motivated by these constraints, this work re-examines geometry-based lane perception from a sensor-level viewpoint and introduces a Binary Line Segment Filter (BLSF) that leverages the inherent structural regularity of lane markings in bird’s-eye-view (BEV) imagery within a computationally lightweight framework. The proposed BLSF is integrated into a complete pipeline consisting of inverse perspective mapping, median local thresholding, line-segment detection, and a simplified Hough-style sliding-window fitting scheme combined with RANSAC. Experiments on a self-collected dataset of 297 challenging frames show that the inclusion of BLSF significantly improves robustness over an ablated baseline while sustaining real-time performance on a 2 GHz ARM CPU-only platform. Additional evaluations on the Dazzling Light and Night subsets of the CULane and LLAMAS benchmarks further confirm consistent gains of approximately 6–7% in F1-score, together with corresponding improvements in IoU. These results demonstrate that interpretable, geometry-driven lane feature extraction remains a practical and complementary alternative to lightweight learning-based approaches for cost- and safety-critical ADAS applications.
- New
- Research Article
- 10.7498/aps.75.20251386
- Jan 1, 2026
- Acta Physica Sinica
- Gaochen Yang + 13 more
<sec>As Moore's Law encounters limitations in scaling device physical dimensions and reducing computational power consumption, traditional silicon-based integrated circuit (IC) technologies, which have enjoyed half a century of success, are facing unprecedented challenges. These limitations are especially apparent in emerging fields such as artificial intelligence, big data processing, and high-performance computing, where the demand for computational power and energy efficiency is growing. Therefore, the exploration of novel materials and hardware architectures is crucial to address these challenges. Two-dimensional (2D) materials have become ideal candidates for the next-generation electronic devices and integrated circuits (ICs) due to their unique physical properties such as the absence of dangling bonds, high carrier mobility, tunable band gaps, and high photonic responses. Notably, 2D materials such as graphene, transition metal dichalcogenides (TMDs), and hexagonal boron nitride (h-BN) have demonstrated immense potential in electronics, optoelectronics, and flexible sensing applications.</sec><sec>This paper comprehensively reviews the recent advancements in the application of 2D materials in integrated circuits, analyzing the challenges and solutions related to large-scale integration, device design, functional circuit modules, and three-dimensional integration. Through a detailed examination of the basic properties of 2D materials, their constituent functional devices, and multifunctional integrated circuits, this paper presents a series of innovative ideas and methods, demonstrating the promising application prospects of 2D materials in future ICs.</sec><sec>The research method involves a detailed analysis of the physical properties of common 2D materials such as graphene, TMDs, and h-BN, with typical application cases explored. This paper discusse how to utilize the excellent properties of these materials to fabricate high-performance single-function devices, integrated circuit modules, and 3D integrated chips, especially focusing on solving the challenges related to large-scale growth, device integration, and interface engineering of 2D materials. The comparison of the performance and applications between various materials demonstrates the unique advantages of 2D materials in the semiconductor industry and their potential in IC design.</sec><sec>Although 2D materials perform well in laboratory environments, there are still significant challenges in practical applications, especially in large-scale production, device integration, and three-dimensional integration. Achieving high-quality, large-area growth of 2D materials, reducing interface defects, and improving device stability and reliability are still core issues that need to be addressed in research and industry. However, with the continuous advancements in 2D material fabrication technology and optimization of integration processes, these challenges are gradually being overcome, and the application prospects of 2D materials are expanding.</sec>
- New
- Research Article
- 10.70695/iaai202504a9
- Dec 31, 2025
- Innovative Applications of AI
- Weixi Huang + 3 more
Currently, the power system and control module of disinfection robots have low coupling, resulting in shortcomings in energy efficiency and reliability. Therefore, this paper proposes an embedded integrated control system based on an ARM controller. First, system-level indicators such as overall weight, battery life, speed, and dosage are given, and a unified DC bus and multi-level DC/DC topology are designed. Then, a power-control integrated hardware architecture and an RTOS real-time software platform are formed to achieve coordination between motion control, power management, and disinfection execution. Furthermore, a differential motion control algorithm and a dosage constraint strategy are designed. Experimental results show that the system has high power efficiency, relatively stable temperature rise, small trajectory tracking error, speed response that meets real-time requirements, uniform disinfection coverage, and a high logarithmic kill rate under various operating conditions. Its robustness is verified after long-term operation and fault injection tests.
- New
- Research Article
- 10.1002/smll.202503908
- Dec 30, 2025
- Small (Weinheim an der Bergstrasse, Germany)
- Yuhuan Li + 8 more
2D ferroelectric field-effect transistors (FeFETs) are expected to be a practical solution for next-generation in-memory computing, seamlessly integrating with computer hardware architectures to overcome the von Neumann bottleneck. At present, emerging 2D materials are driving chip miniaturization. Among them, In2Se3 stands out, as both theoretical and experimental studies have confirmed its novel physical properties with practical applications. Notably, monolayer In2Se3 retains ferroelectricity at room temperature and exhibits high carrier field-effect mobility (µFE). Furthermore, it can be synthesized via chemical vapor deposition (CVD) as thin films, demonstrating strong potential for large-area 2D FeFET array. Here, we investigate the In2Se3 thin film FeFET array prepared by CVD, which has a µFE of 151.7 cm2 V-1 s-1 and an on-off ratio of up to 106. Inspiringly, the array exhibits stable, non-volatile characteristics, maintaining a high-to-low resistance state window of larger than 102 even after 1800 s. Furthermore, we explore its potential for neuromorphic computing by applying the array to machine learning tasks across various datasets and neural network architectures, consistently achieving a high 2D materials, ferroelectric, neuromorphic, polymorphic, transistor recognition accuracy greater than 90%.
- New
- Research Article
- 10.22399/ijcesen.4621
- Dec 30, 2025
- International Journal of Computational and Experimental Science and Engineering
- Lalith Lakshmi Chaitanya Kumar Mangalagiri
This article examines the integration of Rust programming language for secure firmware development and artificial intelligence-driven automation in modern DevOps pipelines. Contemporary firmware engineering faces critical challenges in memory safety assurance and operational scalability across extensive product portfolios. Traditional C/C++ development approaches introduce persistent vulnerabilities through buffer overflows, use-after-free errors, and data race conditions, while manual DevOps configuration impedes developer productivity and time-to-market velocity. The investigation presents systematic implementation of Rust's ownership model and AI-driven pipeline generation across firmware development workflows. Rust pipeline architecture encompasses cross-compilation frameworks supporting multiple hardware architectures, comprehensive testing methodologies including hardware simulation environments, and automated artifact generation. AI-driven onboarding systems employ large language models and multi-agent orchestration to automate pipeline configuration, predictive compliance validation, and intelligent failure diagnostics.Key findings demonstrate complete elimination of memory safety vulnerabilities in production firmware, reduction of developer onboarding timelines by over 90% (from hours to minutes), and achievement of full performance parity with optimized C++ implementations. The transformation delivered measurable improvements in security posture, operational efficiency, and system reliability while establishing reproducible patterns for secure systems development at scale. These outcomes validate the technical feasibility and business value of modernizing firmware engineering through language-level safety guarantees and intelligent automation, providing practical frameworks for organizations pursuing similar security-first development transformations.
- New
- Research Article
- 10.1038/s41534-025-01164-0
- Dec 29, 2025
- npj Quantum Information
- Tom Jäger + 11 more
Abstract Accurately estimating the performance of quantum hardware is crucial for comparing different platforms and predicting the performance and feasibility of quantum algorithms and applications. In this paper, we tackle the problem of benchmarking a quantum register based on the NV center in diamond operating at room temperature. We define the connectivity map as well as single-qubit performance. Thanks to an all-to-all connectivity, the 2 and 3-qubit gates performance is promising and competitive among other platforms. We experimentally calibrate the error model for the register and use it to estimate the quantum volume, a metric used for quantifying the quantum computational capabilities of the register, of 8. Our results pave the way towards the unification of different architectures of quantum hardware and the evaluation of the joint metrics.
- New
- Research Article
- 10.3390/mi17010044
- Dec 29, 2025
- Micromachines
- Jiqing Wang + 2 more
This paper designs a hardware-implementable joint denoising and demosaicing acceleration system. Firstly, a lightweight network architecture with multi-scale feature extraction based on partial convolution is proposed at the algorithm level. The partial convolution scheme can reduce the redundancy of filters and feature maps, thereby reducing memory accesses, and achieve excellent visual effects with a smaller model complexity. In addition, multi-scale extraction can expand the receptive field while reducing model parameters. Then, we apply separable convolution and partial convolution to reduce the parameters of the model. Compared with the standard convolutional solution, the parameters and MACs are reduced by 83.38% and 77.71%, respectively. Moreover, different networks bring different memory access and complex computing methods; thus, we introduce a unified and flexibly configurable hardware acceleration processing platform and implement it on the Xilinx Zynq UltraScale + FPGA board. Finally, compared with the state-of-the-art neural network solution on the Kodak24 set, the peak signal-to-noise ratio and the structural similarity index measure are approximately improved by 2.36dB and 0.0806, respectively, and the computing efficiency is improved by 2.09×. Furthermore, the hardware architecture supports multi-parallelism and can adapt to the different edge-embedded scenarios. Overall, the image processing task solution proposed in this paper has positive advantages in the joint denoising and demosaicing system.
- New
- Research Article
- 10.3390/computers15010010
- Dec 25, 2025
- Computers
- Sabya Shtaiwi + 1 more
With the increasing computational demands of large language models (LLMs), there is a pressing need for more specialized hardware architectures capable of supporting their dynamic and memory-intensive workloads. This paper examines recent studies on hardware acceleration for AI, focusing on three critical aspects: energy efficiency, architectural adaptability, and runtime security. While notable advancements have been made in accelerating convolutional and deep neural networks using ASICs, FPGAs, and compute-in-memory (CIM) approaches, most existing solutions remain inadequate for the scalability and security requirements of LLMs. Our comparative analysis highlights two key limitations: restricted reconfigurability and insufficient support for real-time threat detection. To address these gaps, we propose a novel architectural framework grounded in modular adaptivity, memory-centric processing, and security-by-design principles. The paper concludes with a proposed evaluation roadmap and outlines promising future research directions, including RISC-V-based secure accelerators, neuromorphic co-processors, and hybrid quantum-AI integration.
- New
- Research Article
- 10.3390/electronics15010044
- Dec 23, 2025
- Electronics
- Peter Kolok + 3 more
Modern cryptographic systems increasingly depend on certified hardware modules to guarantee trustworthy key management, tamper resistance, and secure execution across Internet of Things (IoT), embedded, and cloud infrastructures. Although numerous FIPS 140-certified platforms exist, prior studies typically evaluate these solutions in isolation, offering limited insight into their cross-domain suitability and practical deployment trade-offs. This work addresses this gap by proposing a unified, multi-criteria evaluation framework aligned with the FIPS 140 standard family (including both FIPS 140-2 and FIPS 140-3), replacing the earlier formulation that assumed an exclusive FIPS 140-3 evaluation model. The framework systematically compares secure elements (SEs), Trusted Platform Modules (TPMs), embedded Systems-on-Chip (SoCs) with dedicated security coprocessors, enterprise-grade Hardware Security Modules (HSMs), and cloud-based trusted execution environments. It integrates certification analysis, performance normalization, physical-security assessment, integration complexity, and total cost of ownership. Validation is performed using verified CMVP certification records and harmonized performance benchmarks derived from publicly available FIPS datasets. The results reveal pronounced architectural trade-offs: lightweight SEs offer cost-efficient protection for large-scale IoT deployments, while enterprise HSMs and cloud enclaves provide high throughput and Level 3 assurance at the expense of increased operational and integration complexity. Quantitative comparison further shows that secure elements reduce active power consumption by approximately 80–85% compared to TPM 2.0 modules (<20 mW vs. 100–150 mW) but typically require 2–3× higher firmware-integration effort due to middleware dependencies. Likewise, SE050-based architectures deliver roughly 5× higher cryptographic throughput than TPMs (∼500 ops/s vs. ∼100 ops/s), whereas enterprise HSMs outperform all embedded platforms by two orders of magnitude (>10 000 ops/s). Because the evaluated platforms span both FIPS 140-2 and FIPS 140-3 certifications, the comparative analysis interprets their security guarantees in terms of requirements shared across the FIPS 140 standard family, rather than attributing all properties to FIPS 140-3 alone. No single architecture emerges as universally optimal; rather, platform suitability depends on the desired balance between assurance level, scalability, performance, and deployment constraints. The findings offer actionable guidance for engineers and system architects selecting FIPS-validated hardware for secure and compliant digital infrastructures.
- New
- Research Article
- 10.1038/s43588-025-00895-6
- Dec 22, 2025
- Nature computational science
- Hengyun Zhou + 2 more
Quantum error correction provides a route to realizing large-scale quantum computation but incurs substantial resource overheads. Here we highlight recent advances that reduce these overheads by co-designing different levels of the computational stack, including algorithms, quantum-error-correction strategies and hardware architecture. We then discuss opportunities for further optimization such as leveraging flexible qubit connectivity and quantum low-density parity check codes. These strategies can bring useful quantum computation closer to reality as experiments advance in the coming years.
- Research Article
- 10.1142/s0218127426500367
- Dec 11, 2025
- International Journal of Bifurcation and Chaos
- Jiawei Liu + 5 more
Given the extensive applications of deep neural networks in various scenarios, research efforts have increasingly concentrated on deploying neural networks on edge devices to mitigate data transmission costs. However, a significant disparity exists between the hardware capabilities of edge devices and the computational power requirements of deep neural networks. In this work, we propose a memristive neuromorphic network with a reconfigurable spike transformer circuit (SpikeMT). It offers a low-power intelligent solution for edge devices. Additionally, it facilitates the deployment of multipurpose neuromorphic networks. A reconfigurable memristive spiking neuron is specifically designed for spiking transformers, enhancing the biological characteristics of artificial neuronal circuits. Furthermore, the memristive neurons serve as the core operators in the proposed memristive spiking self-attention circuits. They facilitate the integration of the self-attention mechanism within a spike-driven paradigm without the need of analog-to-digital converters. Notably, heterogeneous data can be processed at the edge without altering the hardware architecture of SpikeMT. Experimental results demonstrate that the proposed SpikeMT is advantageous in computational efficiency and supports vision tasks, promoting the development of edge AI devices.
- Research Article
- 10.48084/etasr.14770
- Dec 8, 2025
- Engineering, Technology & Applied Science Research
- Mohamed Lamine Hamidatou + 5 more
Active contours, or snakes, are widely used in medical image segmentation due to their ability to accurately delineate object boundaries. The Gradient Vector Flow (GVF) model enhances traditional snakes by improving convergence and effectively capturing concave shapes. This paper presents a hardware implementation of the GVF algorithm on a PYNQ-Z2 FPGA using a modular architecture designed to optimize computation and parallelism. The algorithm was first developed and validated in MATLAB to evaluate its accuracy and stability on medical images. This step enabled fine-tuning of key GVF parameters, such as the regularization coefficient and the number of iterations, to ensure reliable convergence and precise contour segmentation. It was then implemented in Vivado HLS and translated into an optimized hardware architecture that leverages FPGA parallelism and pipelining to minimize latency and enhance performance. Experimental results demonstrate real-time operation, high segmentation accuracy, and low latency, confirming the suitability of this approach for embedded medical imaging applications requiring both speed and precision.
- Research Article
- 10.1145/3779417
- Dec 6, 2025
- ACM Computing Surveys
- Weibang Dai + 6 more
For decades, memory-based computation has been overshadowed by processor-centric paradigms. However, memory-based computation offers distinct advantages, including high-speed operation and energy efficiency. As a representative and powerful type of memory-based computation, lookup table (LUT)-based computing has seen a resurgence in interest. Recent advancements in memory technologies, particularly cost reduction in memories and the rise of emerging non-volatile memories (NVMs), have spurred widespread adoption of LUT-based approaches. In this paper, we first trace the historical evolution of LUT-based computation, then systematically analyze its modern applications across two domains: (1) software implementations, including LUT-based function evaluation and LUT-based neural networks; and (2) hardware architectures, such as LUT in FPGA and LUT-based processing-in-memory (PIM) systems. Finally, we discuss how NVMs could unlock new opportunities for next-generation LUT-based computing.
- Research Article
- 10.1021/jacs.5c15078
- Dec 5, 2025
- Journal of the American Chemical Society
- Wenbin Li + 13 more
Inspired by the unidirectional convergence of water in a funnel, we propose a vertical diode architecture that integrates field-effect gating, rectification, and memory functionalities. This design enables facile resistive switching of functional molecular monolayers, bidirectional electric field modulation, and low-voltage nondestructive readout. The core of the funnel diode consists of a 3.6 nm thick nanopore-decorated dielectric layer and a 10-30 nm thick unipolar organic semiconductor (e.g., pentacene), sandwiched between two parallel-plate Ohmic electrodes. When majority carriers are injected from the semiconductor side, the current exceeds that from the opposite direction by a rectifying ratio exceeding 104, due to the gating effect. By inserting molecular monolayers of Cu(II) stearate or Au25-clusters between the dielectric and pentacene layers, the funnel diode demonstrates both nonvolatile memory performance and synaptic-mimic behavior under ultralow read voltages (e.g., ±0.05 V). This funnel diode architecture fully inherits the advantages of field-effect transistors while circumventing their intrinsic limitations. Moreover, it offers a promising strategy to translate molecular or atomic-scale state changes in two-dimensional materials into resistance changes, paving the way for next-generation hardware architectures in post-Moore era.
- Research Article
3
- 10.1103/985g-58gd
- Dec 4, 2025
- PRX Quantum
- Shouzhen Gu + 3 more
Erasure qubits offer a promising avenue toward reducing the overhead of quantum error correction (QEC) protocols. However, they require additional operations, such as erasure checks, that may add extra noise and increase the run-time of QEC protocols. To assess the benefits provided by erasure qubits, we focus on the performance of the surface code as a quantum memory. In particular, we analyze various erasure check schedules, find the correctable regions in the phase space of error parameters, and probe the subthreshold scaling of the logical error rate. We then consider a realization of erasure qubits in the superconducting hardware architectures via dual-rail qubits. We use the standard transmon-based implementation of the surface code as the performance benchmark. Our results indicate that QEC protocols with erasure qubits can outperform the ones with state-of-the-art transmons, even in the absence of precise information about the locations of erasure errors.
- Research Article
- 10.21272/jes.2025.12(2).e4
- Dec 3, 2025
- Journal of Engineering Sciences
- Nagasubhadra D Uppalapati + 2 more
Satellite image denoising is essential for preserving image quality in remote sensing applications, where impulse noise significantly degrades captured data. To address this challenge, this method proposes an ultra-fast parallelized modified decision-based median filter (PMDBMF). It effectively removes impulse noise while preserving structural details. The proposed approach leverages fixed parallelization to achieve superior noise reduction with minimal computational overhead. Compared to the decision-based median filter (DBMF), the proposed PMDBMF approach achieves an overall improvement of approximately 13 %. This result demonstrates the efficiency of PMDBMF in delivering high-quality noise removal while significantly reducing processing time, making it a promising solution for real-time satellite image processing. Additionally, the PMDBMF maintains fine image details while effectively suppressing impulse noise, ensuring superior structural integrity compared to traditional median-based approaches. Its fixed parallelization strategy enhances scalability across various hardware architectures, enabling real-time deployment in resource-constrained environments. This efficiency is highly significant for research-driven domains such as environmental monitoring, disaster assessment, and geospatial analysis, where rapid and reliable image restoration is essential. Experimental analysis confirmed that the proposed PMDBMF framework achieves superior structural integrity, with robust edge and texture preservation, and enhanced noise suppression, as evidenced by notable improvements in the peak signal-to-noise ratio (PSNR), root mean square error (RMSE), structural similarity index metrics (SSIM), and computational complexity metrics.
- Research Article
- 10.1088/1748-0221/20/12/c12029
- Dec 1, 2025
- Journal of Instrumentation
- E Fröjdh + 31 more
The data rates of hybrid pixel detectors are rapidly increasing, with next-generation systems moving from 10 Gbit/s to 100 Gbit/s readout. For Matterhorn, a new single-photon counting detector under development at PSI, a 16-megapixel configuration would generate data rates of up to 3.2 Tbit/s (400 GB/s). To extract maximum information from the data and potentially apply data reduction there is a need for efficient and flexible software tools. These high data rates are not only a challenge for beamline operation but also complicate laboratory testing. Aare is an open-source library designed to help scientists analyze terabyte-scale datasets from hybrid pixel detectors. It features, for example, cluster finding, interpolation, and detector calibration. The core is implemented in C++ for performance, while low-overhead Python bindings are provided for ease of use. The code is multi-threaded, capitalizing on the parallelizable nature of pixel and frame processing. Development plans include support for heterogeneous hardware architectures (GPU/FPGA) to further enhance performance.
- Research Article
- 10.1109/tcsi.2025.3583920
- Dec 1, 2025
- IEEE Transactions on Circuits and Systems I: Regular Papers
- Aditi Paul + 2 more
Continuous Flow 4096-Point FFT/IFFT Hardware Architecture for 5G Applications
- Research Article
- 10.1088/2634-4386/ae24a5
- Dec 1, 2025
- Neuromorphic Computing and Engineering
- Mireya Zapata + 2 more
Abstract Replicating the operation of biological neurons using electronic hardware is of significant interest for engineering and biomedical applications. Spiking neural network (SNN) models are especially suited as they exhibit temporal dynamics and local synaptic plasticity, closely mimicking biological neural function. To enable biological interaction, real-time response, and the ability to explore and deploy multiple neural models becomes also necessary. In this work, the Hardware Emulator of Evolving Neural Spiking Systems (HEENS), an efficient, fully digital architecture intended for real-time execution of SNNs, is reported. Based on Single Instruction Multiple Data (SIMD) computation, an array of simple but programmable processing elements is controlled by a sequencer dispatching common instructions. Local distributed memory avoids data bottlenecks and enables parallel parameter updates and interconnect reconfiguration. The address-encoded spikes are decoded by local associative memories, that can be modified on the fly, thus supporting evolvable networks. A synchronous ring topology based on fast point-to-point serial links enables multi-node systems with minimal latency and excellent scalability. A control node controls and configures the ring nodes, drives the system execution, and monitors the processed data. The hardware is supported by a user-friendly custom set of tools that performs a simple and fast compilation of neural/synaptic algorithms and network topology on a host computer. The results of field-programmable gate array (FPGA) implementation are reported. Multimodel real-time execution of proof-of-concept networks demonstrates the proposed architecture potential.
- Research Article
- 10.54097/p5hqsj19
- Nov 27, 2025
- Academic Journal of Science and Technology
- Gary Tan + 1 more
In an era of rapid growth in unmanned aerial vehicles (UAVs) and robotic systems, understanding the fundamental hardware and software architecture of quadcopters is crucial for both academic research and practical applications. This paper provided a comprehensive overview of the various components of a quadcopter, in the spirit of answering such demand in this era where the usage of quadcopters, and robotic systems in general, is growing at an exponential rate. This paper summarizes information from numerous sources to create a top-level excursion into the subject. This review systematically introduces core modules, and elucidates their roles in achieving stable flight control. Furthermore, this paper delves into algorithms such as Kalman filtering and PID control to illustrate how data fusion and feedback mechanisms improve accuracy, response speed, and stability. This paper not only summarizes the structural and functional design of quadcopters but also provides a theoretical foundation for further optimization of UAV autonomy, sensor fusion, and intelligent flight control.