- Research Article
2
- 10.1109/jssc.2025.3609411
- May 1, 2026
- IEEE Journal of Solid-State Circuits
- Jingyi Yuan + 3 more
This article presents a wide-input-range buck converter featuring a conduction-loss-minimized zero-voltage switching (ZVS) technique. The proposed ZVS topology enables accurate ZVS operation across a wide range of input voltage (<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$V_{\mathrm {IN}}$</tex-math> </inline-formula>) and load current (<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$I_{\mathrm {O}}$</tex-math> </inline-formula>). By keeping the auxiliary inductor current pulse in the ZVS branch separate from the main current paths, conduction loss is minimized, thereby enhancing efficiency across the entire load range. Furthermore, by reducing the voltage across the auxiliary inductor using an auxiliary capacitor, a small inductor is sufficient for ZVS. Furthermore, by implementing the ZVS control circuit directly in the high-voltage (HV) domain, propagation delays between HV and low-voltage (LV) domains are avoided, and thus, accurate ZVS operation is achieved. The converter was fabricated in a 0.18-<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\mu $</tex-math> </inline-formula>m BCD process with all power switches integrated on chip. Measurement results show that the converter achieves peak efficiencies of 91.7% and 92.5% at 42-to-5-V and 24-to-3.3-V conversion, respectively. Efficiency improvement is achieved across the entire 4-A load current range with just a 30-nH auxiliary inductor.
- Research Article
1
- 10.1109/jssc.2025.3604724
- May 1, 2026
- IEEE Journal of Solid-State Circuits
- Kalhan Koul + 21 more
Onyx is a system-on-chip (SoC) with a coarse-grained reconfigurable array (CGRA) for accelerating sparse and dense tensor algebra and dense image processing and machine learning (ML) applications. To support multiple inputs, multiple dimensions, and fusion in sparse applications, Onyx utilizes composable memory primitives that operate on compressed storage and streams and compute primitives that eliminate unnecessary calculations. Onyx also improves performance on dense applications with application-specialized processing elements (PEs), area-optimized memory tiles, and hybrid clock gating in the global buffer (GLB). Onyx achieves a peak energy efficiency of 756 INT16 GOPS/W, up to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$565 \times $</tex-math> </inline-formula> better energy-delay product (EDP) for sparse kernels versus CPUs with sparse libraries, and up to 76% and 85% lower EDP for image processing and ML, respectively, versus the state-of-the-art CGRA.
- Research Article
- 10.1109/jssc.2026.3671459
- May 1, 2026
- IEEE Journal of Solid-State Circuits
- Chenghao Zhang + 6 more
This article reports a 40-GS/s 8-bit time-interleaved (TI) time-domain (TD) gated-ring-oscillator analog-to-digital converter (GRO-ADC). An interleaving number of 32 is achieved with a single-channel 8-bit GRO-ADC operating at 1.25 GS/s, leading to a low front-end design complexity compared to recently published arts. The sampling front end employs a linearity-enhanced boosted switch that supports short-time and high-common-mode-level sampling, as well as a switched class-AB output buffer for low-power driving. Also, to further enhance the performance of the GRO-ADC, we devise several techniques: 1) the voltage-to-time converter utilizes differential sampling to enhance the swing and reduce the charging time while integrating the pulse-enabled cross-detector to improve power efficiency; 2) an adaptive-operation pulse generator (PG) eliminates the unnecessary pulse of the previous topology and minimizes the pulse generation power; and 3) the multi-channel interleaved architecture frees the GRO-ADC from single-channel calibration and reduces the sub-ADC design complexity. The TI GRO-ADC prototype is fabricated in a 28-nm CMOS using dual 1- and 1.8-V power supply, yielding a compact active area of 0.015 mm<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup>. With a Nyquist input, it achieves a measured SNDR of 36.1 dB and an SFDR of 49 dB at the conversion rate of 40 GS/s, corresponding to the Walden figures of merit (FoMw) of 68.6 fJ/conversion step.
- Research Article
3
- 10.1109/jssc.2025.3607917
- May 1, 2026
- IEEE Journal of Solid-State Circuits
- Ahmed E Abdelrahman + 5 more
The increasing demand for data throughput in modern data centers has intensified the need for high-speed, energy-efficient optical interconnects. Although traditional intensity-modulation direct detection schemes have served short-reach links well, their scalability to higher data rates is limited. Coherent detection, which leverages the amplitude, phase, and polarization of optical signals, offers significantly higher spectral efficiency but at the cost of increased power consumption due to the reliance on complex digital signal processing (DSP). This article proposes a low-complexity, energy-efficient coherent optical receiver (RX) architecture based on analog signal processing (ASP) techniques, specifically tailored for short-reach data center interconnects (DCIs). A key innovation is the integration of a wide-bandwidth (BW), monolithic analog carrier phase recovery (CPR) loop designed for quadrature phase shift keying (QPSK) modulation. This on-chip CPR loop eliminates the need for external optical phase recovery feedback loops, enhances phase-tracking capability, and simplifies system integration. Fabricated in a 28-nm CMOS process, the QPSK RX operates error-free at 24 Gb/s, with a CPR loop BW of 10–100 MHz and a frequency tracking range of 600 MHz, while maintaining an energy efficiency of 3.2 pJ/bit. These results demonstrate the potential of analog-intensive coherent RX architectures for energy-constrained, short-reach optical links.
- Research Article
- 10.1109/jssc.2026.3681987
- May 1, 2026
- IEEE Journal of Solid-State Circuits
- Research Article
- 10.1109/jssc.2026.3651452
- May 1, 2026
- IEEE Journal of Solid-State Circuits
- Zhengqi Xu + 5 more
This article introduces a hybrid Slepian beamforming receiver architecture with low power and area costs. Traditional large-scale true-time-delay (TTD) beamformers for wideband wireless communication suffer from high power consumption and high hardware costs. As an alternative, the Slepian beamforming approach reduces the number of analog-to-digital conversions (ADCs) and delays for TTD but retains the digital delay advantage of digital beamforming. The proposed architecture further improves energy efficiency by implementing charge-domain complex multiply and accumulate (MAC) inside successive approximation register (SAR) ADCs. The beamformer has eight intermediate frequency (IF) inputs, two simultaneous digital baseband outputs in phased mode, or one wideband output in TTD mode. The prototype is fabricated in 28-nm CMOS, consumes only 17 mW, and occupies 0.035 mm<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup>. Measurements show error vector magnitudes (EVMs) better than −30 dB for quadrature amplitude modulation (QAM)-16 and QAM-256. In tests, beam squinting errors are not observed up to a steering angle of 60°.
- Research Article
- 10.1109/jssc.2025.3604246
- May 1, 2026
- IEEE Journal of Solid-State Circuits
- Zhicheng Dong + 8 more
This article presents a 9–21-Gb/s inductor-less frequency-multiplying sub-sampling (FXSS) clock and data recovery (CDR) circuit with an embedded 1:3 demultiplexer (DEMUX) for multi-lane serial interfaces. It features a compact design with linear modeling and theoretical analysis of the FXSS architecture. An inverter-based frequency multiplier (FX) in the clock feedback path enables the ring voltage-controlled oscillator (RVCO) and transmission-gated retimers to operate at one-third rates. The primary–secondary sub-sampling phase detector (SSPD) supports full-rate triple-frequency clocks. The FX can suppress in-band noises, while a ~200-MHz loop bandwidth (BW) mitigates out-of-band RVCO noise. A duty cycle corrector (DCC) ensures fine retiming deskew. Fabricated in 65-nm CMOS technology, the proposed CDR occupies 0.0006 mm<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> and achieves a best-in-class 348.2-fs<sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">rms</sub> integrated jitter and 0.13-pJ/bit energy efficiency at 21 Gb/s. It demonstrates a jitter tolerance (JTOL) of 0.44 UI<sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">pp</sub> with a bit error rate (BER) of less than <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$10^{-12}$</tex-math> </inline-formula>.
- Research Article
- 10.1109/jssc.2025.3596162
- May 1, 2026
- IEEE Journal of Solid-State Circuits
- Yu-Cheng Lin + 5 more
Compressive sensing (CS) is a technique that compresses data using fewer samples, enabling efficient data transmission in Internet-of-Things (IoT) applications. However, reconstruction of compressively sensed data is computationally demanding, especially for 2-D images. This article presents the first image CS reconstruction processor for real-time visual streaming. The chip adopts a tailored iterative thresholding algorithm to ensure robust reconstructed image quality. Energy and area efficiencies are improved through algorithm-architecture optimizations. Projection reformulation is performed to reduce both computational complexity and memory usage. The utilization of computing units in the sparse and inverse transform engines is maximized through hardware sharing. The sparsity extractor improves hardware utilization of the thresholding core by identifying non-zero coefficients. The proposed noise estimation strategy eliminates the idle cycles, reducing memory size for the buffered data by 83%. A speculation scheme for thresholding effectively reduces hardware complexity by 74%. The memory controller is designed to support simultaneous data access with a 55% reduction in memory size. Fabricated in a 40-nm CMOS technology, the chip integrates 6.4 M logic gates in an area of 4.24 mm<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\boldsymbol {^{2}}$</tex-math> </inline-formula>. Targeting real-time streaming at 60 frames/s for VGA images, the proposed processor dissipates 26.4 mW at a clock frequency of 48 MHz from a supply voltage of 0.73 V. The chip achieves over <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$2000{\times }$</tex-math> </inline-formula> higher throughput with six orders of magnitude energy efficiency improvement than a high-end CPU. Compared to prior works for 1-D compressively sensed physiological signals, this work achieves at least <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$27.2{\times }$</tex-math> </inline-formula> higher energy efficiency and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$19.6{\times }$</tex-math> </inline-formula> higher area efficiency.
- Research Article
- 10.1109/jssc.2025.3602461
- May 1, 2026
- IEEE Journal of Solid-State Circuits
- Changmin An + 3 more
This work presents an output-capacitor-less digital-assisted analog low-dropout regulator digital-assisted analog LDO (DA-ALDO) that employs a seamless digital-to-analog transfer (D2A-TF) technique enabled by a local ground generator (LGG). The proposed architecture extracts only the strengths of digital LDOs (DLDOs)—fast transient response and low voltage droop—and analog LDOs (ALDOs)—high power supply rejection ratio (PSRR)—to achieve an optimized performance without inheriting their drawbacks. The proposed DA-ALDO is fabricated in a 28-nm CMOS process and achieves a PSRR of −57.6 and −53.7 dB at 10 kHz under load currents of 1 and 100 mA, respectively. It exhibits a voltage droop of only 54mV and a fast settling time of 667 ns in response to a 99-mA load step while consuming a quiescent current of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$338.5~{\mu }$</tex-math> </inline-formula>A. The regulator occupies a compact active area of 0.032 mm<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> and achieves a figure of merit (FoM) of 0.029 ps.
- Research Article
- 10.1109/jssc.2026.3668153
- May 1, 2026
- IEEE Journal of Solid-State Circuits
- Jia Zhou + 9 more