Year Year arrow
arrow-active-down-0
Publisher Publisher arrow
arrow-active-down-1
Journal
1
Journal arrow
arrow-active-down-2
Institution Institution arrow
arrow-active-down-3
Institution Country Institution Country arrow
arrow-active-down-4
Publication Type Publication Type arrow
arrow-active-down-5
Field Of Study Field Of Study arrow
arrow-active-down-6
Topics Topics arrow
arrow-active-down-7
Open Access Open Access arrow
arrow-active-down-8
Language Language arrow
arrow-active-down-9
Filter Icon Filter 1
Year Year arrow
arrow-active-down-0
Publisher Publisher arrow
arrow-active-down-1
Journal
1
Journal arrow
arrow-active-down-2
Institution Institution arrow
arrow-active-down-3
Institution Country Institution Country arrow
arrow-active-down-4
Publication Type Publication Type arrow
arrow-active-down-5
Field Of Study Field Of Study arrow
arrow-active-down-6
Topics Topics arrow
arrow-active-down-7
Open Access Open Access arrow
arrow-active-down-8
Language Language arrow
arrow-active-down-9
Filter Icon Filter 1
Export
Sort by: Relevance
  • Research Article
  • 10.1109/jetcas.2025.3619017
Opto-Aligner: Optical Near-Sensor Architecture for Accelerating DNA Pre-Alignment Filtering
  • Mar 1, 2026
  • IEEE Journal on Emerging and Selected Topics in Circuits and Systems
  • Deniz Najafi + 10 more

Sequence alignment, a cornerstone application in bioinformatics, is critical for enabling personalized medicine and disease diagnostics. However, the rapid growth of genomic data has led to significant computational challenges, including limited throughput, high latency, and excessive data movement in current sequencing solutions. To address these issues, we propose Opto-Aligner, a high-performance and energy-efficient optical near-sensor accelerator framework tailored for multiple genetic tasks, mainly as DNA/RNA pre-alignment filtering in hyperdimensional space. Opto-Aligner harnesses Silicon Photonics’ promising efficiency and hyperdimensional computing (HDC) robustness to accelerate genome sequence alignment directly at the sensor level. We develop innovative microarchitectural and circuit-level solutions, including specialized hardware partitioning and mapping strategies, to overcome challenges inherent in photonic computing—our cross-layer design accounts for photonic device variability and noise, optimizing HDC algorithms for optical hardware constraints. Opto-Aligner significantly improves throughput and energy efficiency over leading electronic DNA aligners. Relative to the best published electronic aligner (BioHD-HAM), Opto-Aligner delivers a 5.7× higher single-die throughput (0.93Mbs<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">−1</sup> vs. 0.163Mbs<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">−1</sup>) and a 3.0×10<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">5</sup>-fold reduction in energy–delay product, all with sub-nanosecond comparator latency and seamless scaling to multi-bit precision. Opto-Aligner effectively bridges the gap between the computational demands of genome alignment and the limitations of optical hardware.

  • Research Article
  • 10.1109/jetcas.2025.3640639
520-Gbit/s Optical-Terahertz Integration System for Cable-Free Data Centers Utilizing Adaptive Bayes-Adam MIMO Nonlinear Equalizer
  • Mar 1, 2026
  • IEEE Journal on Emerging and Selected Topics in Circuits and Systems
  • Zhigang Xin + 13 more

To meet the growing demands of next-generation data centers for ultra-high capacity, low latency, and flexible deployment, optical-wireless integrated systems combine the high bandwidth of fiber with the agility of wireless links, enabling scalable and high-speed interconnect solutions. Polarization division multiplexing (PDM) single-input single-output (SISO) links have a simplified architecture design and can double the system capacity through the polarization dimension, but they also introduce more crosstalk and impairments to signal recovery. In this paper, an adaptive Bayes-Adam (ABA) multiple-input multiple-output (MIMO) Volterra nonlinear equalizer (VNE) with high precision and fast convergence is proposed, which better compensates for crosstalk and impairments in dual-polarization (DP) SISO terahertz (THz) transmission systems, including I/Q imbalance, polarization crosstalk, and nonlinear distortion. We experimentally demonstrate a DP-SISO fiber-THz integrated system, which adopts an easily integrable zero-intermediate frequency (Zero-IF) receiver front-end scheme to reduce bandwidth requirements and efficiently receive high baud rate signals. By employing the novel equalization algorithm, we successfully achieve up to 520 Gbit/s single-lane single-carrier wireless air interface rate (WAIR/ch/λ) at 220 GHz with a 20% soft-decision forward error correction (SD-FEC) threshold. This work provides an effective signal processing solution for enabling high-speed, scalable fiber-THz interconnects in future data center networks.

  • Open Access Icon
  • Research Article
  • 10.1109/jetcas.2026.3661247
Algorithm-Hardware Implications of Softmax Approximations for In-Memory Computing based LLM Accelerators
  • Jan 1, 2026
  • IEEE Journal on Emerging and Selected Topics in Circuits and Systems
  • Jan Finkbeiner + 5 more

  • Research Article
  • 10.1109/jetcas.2026.3668819
Energy-Efficient Epileptic Seizure Prediction Using RRAM-Based In-Memory Computing
  • Jan 1, 2026
  • IEEE Journal on Emerging and Selected Topics in Circuits and Systems
  • Ahmedul Khan + 3 more

This work investigates the feasibility of an energy-efficient resistive random-access memory (RRAM) crossbar array framework for epileptic seizure prediction using the CHB-MIT electroencephalogram (EEG) dataset. Traditional von Neumann architectures face limitations in scalability and energy efficiency for real-time medical applications, motivating the exploration of in-memory computing with RRAM devices. We develop a domain-specific feature extraction methodology tailored to EEG signals and implement a seizure prediction algorithm that can be mapped directly onto a crossbar-based architecture. To evaluate the robustness of the approach, the extracted features were quantized to a 1-bit representation and processed as inputs. Despite the aggressive quantization, the proposed workflow achieved mean prediction accuracies exceeding 75% for hardware inference, demonstrating resilience to reduced precision. Furthermore, the system exhibits extremely low read energy consumption at the picojoule level, enabling performance metrics far surpassing conventional digital platforms. The combination of high accuracy, ultra-low power requirements, and hardware-friendly implementation highlights the promise of RRAM-based in-memory computing as a scalable solution for real-time, patient-specific epilepsy monitoring and intervention in wearable and implantable devices.

  • Research Article
  • 10.1109/jetcas.2026.3660129
A Lossless, Reconfigurable FP8 Compute-in-Memory Accelerator with Domino Logic-Based In-Memory Multiplication and Sign-Group Aggregation for Transformers
  • Jan 1, 2026
  • IEEE Journal on Emerging and Selected Topics in Circuits and Systems
  • Dongrui Li + 2 more

Efficient on-device execution of transformer-based models (TBMs) requires specialized hardware acceleration. Yet, stringent memory and power constraints hinder the deployment of TBMs on edge platforms, necessitating aggressively quantized models to maintain energy efficiency. FP8 data types (E5M2/E4M3) provide a compelling 8-bit quantization alternative to FP16/BF16/FP32, enabling more efficient MAC computations. However, most existing FP CIM accelerators rely on serialized pre-alignment, incurring significant accuracy degradation and energy overhead. To overcome these limitations, we propose a novel FP8 CIM accelerator that integrates 1) bit-parallel, reconfigurable, Domino-logic-based FP8 multipliers with 3.2× improved energy-delay-product to support dual FP8 formats, 2) post-alignment, reconfigurable sign-group aggregation units to enable efficient and precision-preserving computing, and 3) a lossless mantissa mapping scheme to facilitate seamless higher/mixed-precision MAC operations. Fabricated in 40 nm CMOS, our test chip achieves macro-level energy efficiencies of 42.13 TFLOPS/W for E5M2 and 38.23 TFLOPS/W for E4M3, respectively. Moreover, it delivers a system energy efficiency of 9.78 TFLOPS/W and a system compute density of 0.047 TOPS/mm<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup>, 4.3× and up to 1.42× higher, respectively compared to the state-of-the-art TBM-oriented CIM accelerator. These results demonstrate that the proposed CIM circuitry provides a highly energy-efficient solution for on-chip transformer acceleration while maintaining computational accuracy.

  • Research Article
  • 10.1109/jetcas.2026.3667611
NMPHDC: A 12 nm Reconfigurable Multi-port SRAM based Near-Memory Hyperdimensional Computing Architecture
  • Jan 1, 2026
  • IEEE Journal on Emerging and Selected Topics in Circuits and Systems
  • Md Rubel Sarkar + 6 more

Hyperdimensional Computing (HDC) is a brain-inspired computing paradigm that represents information using high-dimensional vectors, referred to as hypervectors (HVs), enabling inherent robustness to noise and efficient learning on resource-constrained platforms. In this work, we present NMPHDC, a reconfigurable multi-port (MP) SRAM-based logic-near-memory (LnM) HDC architecture that supports on-chip training and inference. The proposed design is configurable across multiple HV dimensions, including 256, 512, and 1024 bits, enabling flexibility across diverse workloads. A lightweight custom LnM block enables LnM execution of key HDC operations, including XOR for binding and majority (MAJ) for bundling, significantly reducing data movement overhead. In addition, we propose a custom on-chip training algorithm and integrate a dedicated on-chip random number generator (RNG) to strengthen training security through locally generated HVs. A centralized HDC controller orchestrates dataflow, computation, training, and testing with seamless configurability. Implemented in GlobalFoundries (GF) 12nm low-power (LP) FinFET technology, NMPHDC occupies 1.25 mm<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup>, consumes ∼0.72 mW in standby and ∼19.7 mW at 100 MHz, and achieves classification accuracies of 93.33%, 25.75%, 72.62%, and 75.18% on the ECG5000, CIFAR-10, MNIST, and ISOLET datasets, respectively.

  • Research Article
  • 10.1109/jetcas.2026.3667550
Performance Analysis and Optimization of Fructose Memristor-Based Neuromorphic Systems
  • Jan 1, 2026
  • IEEE Journal on Emerging and Selected Topics in Circuits and Systems
  • Harshvardhan Uppaluru + 4 more

  • Research Article
  • 10.1109/jetcas.2026.3690010
Near Real-Time Spike-Driven Synaptic Plasticity for CTT-Based Neuromorphic Systems
  • Jan 1, 2026
  • IEEE Journal on Emerging and Selected Topics in Circuits and Systems
  • Amirtha Chandrasekaran + 2 more

  • Research Article
  • 10.1109/jetcas.2026.3678317
IGA-SRAM: A Compact SRAM-based IMC Engine for Binary/Ternary Deep Neural Networks with ADC-Less and Error-Aware Training Algorithm
  • Jan 1, 2026
  • IEEE Journal on Emerging and Selected Topics in Circuits and Systems
  • Shin-Uk Kang + 4 more

This work presents a compact 6T SRAM-based analog in-memory computing (A-IMC) macro for energy-efficient inference of binary and ternary neural networks (BNNs/TNNs). While conventional A-IMC architectures suffer from significant area and power overheads due to high-resolution analog-to-digital converters (ADCs), we propose an ADC-less computing scheme based on 1-bit sense amplifiers (SAs), supported by a novel training algorithm that mitigates quantization-induced performance degradation. To maintain hardware simplicity, we adopt a compact 6T bitcell structure and introduce dual wordlines (WLs) for efficient ternary input encoding. We further propose impulse Gaussian approximation (IGA), a differentiable quantization surrogate that enables stable and accurate gradient backpropagation in the presence of 1-bit partial sum quantization. Additionally, we introduce an error-aware training (EAT) method that leverages measured partial-sum (PS) error statistics from silicon to inject gradually increasing noise in the training phase, effectively compensating for hardware-induced variations. Fabricated in a 65 nm CMOS GP process, the proposed 128 × 128 A-IMC macro achieves 47.06 TOPS/mm<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> and CIFAR10 accuracy of 86.89% under 1.2/0.7 V operating voltages with ResNet20.

  • Research Article
  • 10.1109/jetcas.2026.3657823
A Coalesced Tensor Reduction Architecture for Scalable All-Bank PIM Execution
  • Jan 1, 2026
  • IEEE Journal on Emerging and Selected Topics in Circuits and Systems
  • Taehyung Park + 2 more

The embedding layer in deep learning recommendation models (DLRM) is highly memory-bound and exhibits skewed, irregular access patterns. These characteristics lead to severe load imbalance and performance bottlenecks in processing in memory (PIM) architectures. We propose TRAM (Two-level Reduction Accelerator for Memory), a heterogeneous accelerator that integrates High Bandwidth Memory based PIM architecture (HBM-PIM) with conventional dual in-line memory modules (DIMMs) to accelerate batched embedding vector reductions. TRAM reduces redundant hot-vector accesses and employs a host-side scheduling mechanism that overlaps bank-PIM operations inside DRAM banks with logic-PIM operations, where processing units are located in the buffer die. This overlap eliminates command-bandwidth stalls and compute-bound delays. In addition, metadata-aware optimizations reduce row/column access overhead by reusing contiguous address patterns within each bank. Evaluation on six recommendation datasets and three embedding dimensions demonstrates that TRAM achieves up to 2.8× speedup and 3.0× energy reduction compared to state-of-the-art heterogeneous memory systems, while preserving full compatibility with the standard DRAM interface.