- Research Article
- 10.1109/jetcas.2025.3619017
- Mar 1, 2026
- IEEE Journal on Emerging and Selected Topics in Circuits and Systems
- Deniz Najafi + 10 more
Sequence alignment, a cornerstone application in bioinformatics, is critical for enabling personalized medicine and disease diagnostics. However, the rapid growth of genomic data has led to significant computational challenges, including limited throughput, high latency, and excessive data movement in current sequencing solutions. To address these issues, we propose Opto-Aligner, a high-performance and energy-efficient optical near-sensor accelerator framework tailored for multiple genetic tasks, mainly as DNA/RNA pre-alignment filtering in hyperdimensional space. Opto-Aligner harnesses Silicon Photonics’ promising efficiency and hyperdimensional computing (HDC) robustness to accelerate genome sequence alignment directly at the sensor level. We develop innovative microarchitectural and circuit-level solutions, including specialized hardware partitioning and mapping strategies, to overcome challenges inherent in photonic computing—our cross-layer design accounts for photonic device variability and noise, optimizing HDC algorithms for optical hardware constraints. Opto-Aligner significantly improves throughput and energy efficiency over leading electronic DNA aligners. Relative to the best published electronic aligner (BioHD-HAM), Opto-Aligner delivers a 5.7× higher single-die throughput (0.93Mbs<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">−1</sup> vs. 0.163Mbs<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">−1</sup>) and a 3.0×10<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">5</sup>-fold reduction in energy–delay product, all with sub-nanosecond comparator latency and seamless scaling to multi-bit precision. Opto-Aligner effectively bridges the gap between the computational demands of genome alignment and the limitations of optical hardware.
- Research Article
- 10.1109/jetcas.2025.3640639
- Mar 1, 2026
- IEEE Journal on Emerging and Selected Topics in Circuits and Systems
- Zhigang Xin + 13 more
To meet the growing demands of next-generation data centers for ultra-high capacity, low latency, and flexible deployment, optical-wireless integrated systems combine the high bandwidth of fiber with the agility of wireless links, enabling scalable and high-speed interconnect solutions. Polarization division multiplexing (PDM) single-input single-output (SISO) links have a simplified architecture design and can double the system capacity through the polarization dimension, but they also introduce more crosstalk and impairments to signal recovery. In this paper, an adaptive Bayes-Adam (ABA) multiple-input multiple-output (MIMO) Volterra nonlinear equalizer (VNE) with high precision and fast convergence is proposed, which better compensates for crosstalk and impairments in dual-polarization (DP) SISO terahertz (THz) transmission systems, including I/Q imbalance, polarization crosstalk, and nonlinear distortion. We experimentally demonstrate a DP-SISO fiber-THz integrated system, which adopts an easily integrable zero-intermediate frequency (Zero-IF) receiver front-end scheme to reduce bandwidth requirements and efficiently receive high baud rate signals. By employing the novel equalization algorithm, we successfully achieve up to 520 Gbit/s single-lane single-carrier wireless air interface rate (WAIR/ch/λ) at 220 GHz with a 20% soft-decision forward error correction (SD-FEC) threshold. This work provides an effective signal processing solution for enabling high-speed, scalable fiber-THz interconnects in future data center networks.
- Research Article
- 10.1109/jetcas.2026.3661247
- Jan 1, 2026
- IEEE Journal on Emerging and Selected Topics in Circuits and Systems
- Jan Finkbeiner + 5 more
- Research Article
- 10.1109/jetcas.2026.3668819
- Jan 1, 2026
- IEEE Journal on Emerging and Selected Topics in Circuits and Systems
- Ahmedul Khan + 3 more
This work investigates the feasibility of an energy-efficient resistive random-access memory (RRAM) crossbar array framework for epileptic seizure prediction using the CHB-MIT electroencephalogram (EEG) dataset. Traditional von Neumann architectures face limitations in scalability and energy efficiency for real-time medical applications, motivating the exploration of in-memory computing with RRAM devices. We develop a domain-specific feature extraction methodology tailored to EEG signals and implement a seizure prediction algorithm that can be mapped directly onto a crossbar-based architecture. To evaluate the robustness of the approach, the extracted features were quantized to a 1-bit representation and processed as inputs. Despite the aggressive quantization, the proposed workflow achieved mean prediction accuracies exceeding 75% for hardware inference, demonstrating resilience to reduced precision. Furthermore, the system exhibits extremely low read energy consumption at the picojoule level, enabling performance metrics far surpassing conventional digital platforms. The combination of high accuracy, ultra-low power requirements, and hardware-friendly implementation highlights the promise of RRAM-based in-memory computing as a scalable solution for real-time, patient-specific epilepsy monitoring and intervention in wearable and implantable devices.
- Research Article
- 10.1109/jetcas.2026.3660129
- Jan 1, 2026
- IEEE Journal on Emerging and Selected Topics in Circuits and Systems
- Dongrui Li + 2 more
Efficient on-device execution of transformer-based models (TBMs) requires specialized hardware acceleration. Yet, stringent memory and power constraints hinder the deployment of TBMs on edge platforms, necessitating aggressively quantized models to maintain energy efficiency. FP8 data types (E5M2/E4M3) provide a compelling 8-bit quantization alternative to FP16/BF16/FP32, enabling more efficient MAC computations. However, most existing FP CIM accelerators rely on serialized pre-alignment, incurring significant accuracy degradation and energy overhead. To overcome these limitations, we propose a novel FP8 CIM accelerator that integrates 1) bit-parallel, reconfigurable, Domino-logic-based FP8 multipliers with 3.2× improved energy-delay-product to support dual FP8 formats, 2) post-alignment, reconfigurable sign-group aggregation units to enable efficient and precision-preserving computing, and 3) a lossless mantissa mapping scheme to facilitate seamless higher/mixed-precision MAC operations. Fabricated in 40 nm CMOS, our test chip achieves macro-level energy efficiencies of 42.13 TFLOPS/W for E5M2 and 38.23 TFLOPS/W for E4M3, respectively. Moreover, it delivers a system energy efficiency of 9.78 TFLOPS/W and a system compute density of 0.047 TOPS/mm<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup>, 4.3× and up to 1.42× higher, respectively compared to the state-of-the-art TBM-oriented CIM accelerator. These results demonstrate that the proposed CIM circuitry provides a highly energy-efficient solution for on-chip transformer acceleration while maintaining computational accuracy.
- Research Article
- 10.1109/jetcas.2026.3667611
- Jan 1, 2026
- IEEE Journal on Emerging and Selected Topics in Circuits and Systems
- Md Rubel Sarkar + 6 more
Hyperdimensional Computing (HDC) is a brain-inspired computing paradigm that represents information using high-dimensional vectors, referred to as hypervectors (HVs), enabling inherent robustness to noise and efficient learning on resource-constrained platforms. In this work, we present NMPHDC, a reconfigurable multi-port (MP) SRAM-based logic-near-memory (LnM) HDC architecture that supports on-chip training and inference. The proposed design is configurable across multiple HV dimensions, including 256, 512, and 1024 bits, enabling flexibility across diverse workloads. A lightweight custom LnM block enables LnM execution of key HDC operations, including XOR for binding and majority (MAJ) for bundling, significantly reducing data movement overhead. In addition, we propose a custom on-chip training algorithm and integrate a dedicated on-chip random number generator (RNG) to strengthen training security through locally generated HVs. A centralized HDC controller orchestrates dataflow, computation, training, and testing with seamless configurability. Implemented in GlobalFoundries (GF) 12nm low-power (LP) FinFET technology, NMPHDC occupies 1.25 mm<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup>, consumes ∼0.72 mW in standby and ∼19.7 mW at 100 MHz, and achieves classification accuracies of 93.33%, 25.75%, 72.62%, and 75.18% on the ECG5000, CIFAR-10, MNIST, and ISOLET datasets, respectively.
- Research Article
- 10.1109/jetcas.2026.3667550
- Jan 1, 2026
- IEEE Journal on Emerging and Selected Topics in Circuits and Systems
- Harshvardhan Uppaluru + 4 more
- Research Article
- 10.1109/jetcas.2026.3690010
- Jan 1, 2026
- IEEE Journal on Emerging and Selected Topics in Circuits and Systems
- Amirtha Chandrasekaran + 2 more
- Research Article
- 10.1109/jetcas.2026.3678317
- Jan 1, 2026
- IEEE Journal on Emerging and Selected Topics in Circuits and Systems
- Shin-Uk Kang + 4 more
This work presents a compact 6T SRAM-based analog in-memory computing (A-IMC) macro for energy-efficient inference of binary and ternary neural networks (BNNs/TNNs). While conventional A-IMC architectures suffer from significant area and power overheads due to high-resolution analog-to-digital converters (ADCs), we propose an ADC-less computing scheme based on 1-bit sense amplifiers (SAs), supported by a novel training algorithm that mitigates quantization-induced performance degradation. To maintain hardware simplicity, we adopt a compact 6T bitcell structure and introduce dual wordlines (WLs) for efficient ternary input encoding. We further propose impulse Gaussian approximation (IGA), a differentiable quantization surrogate that enables stable and accurate gradient backpropagation in the presence of 1-bit partial sum quantization. Additionally, we introduce an error-aware training (EAT) method that leverages measured partial-sum (PS) error statistics from silicon to inject gradually increasing noise in the training phase, effectively compensating for hardware-induced variations. Fabricated in a 65 nm CMOS GP process, the proposed 128 × 128 A-IMC macro achieves 47.06 TOPS/mm<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> and CIFAR10 accuracy of 86.89% under 1.2/0.7 V operating voltages with ResNet20.
- Research Article
- 10.1109/jetcas.2026.3657823
- Jan 1, 2026
- IEEE Journal on Emerging and Selected Topics in Circuits and Systems
- Taehyung Park + 2 more
The embedding layer in deep learning recommendation models (DLRM) is highly memory-bound and exhibits skewed, irregular access patterns. These characteristics lead to severe load imbalance and performance bottlenecks in processing in memory (PIM) architectures. We propose TRAM (Two-level Reduction Accelerator for Memory), a heterogeneous accelerator that integrates High Bandwidth Memory based PIM architecture (HBM-PIM) with conventional dual in-line memory modules (DIMMs) to accelerate batched embedding vector reductions. TRAM reduces redundant hot-vector accesses and employs a host-side scheduling mechanism that overlaps bank-PIM operations inside DRAM banks with logic-PIM operations, where processing units are located in the buffer die. This overlap eliminates command-bandwidth stalls and compute-bound delays. In addition, metadata-aware optimizations reduce row/column access overhead by reusing contiguous address patterns within each bank. Evaluation on six recommendation datasets and three embedding dimensions demonstrates that TRAM achieves up to 2.8× speedup and 3.0× energy reduction compared to state-of-the-art heterogeneous memory systems, while preserving full compatibility with the standard DRAM interface.