Limited Communication Bandwidth Research Articles

Dense matrix multiply (MM) serves as one of the most heavily used kernels in deep learning applications. To cope with the high computation demands of these applications, heterogeneous architectures featuring both FPGA and dedicated ASIC accelerators have emerged as promising platforms. For example, the AMD/Xilinx Versal ACAP architecture combines general-purpose CPU cores and programmable logic with AI Engine processors optimized for AI/ML. An array of 400 AI Engine processors executing at 1 GHz can provide up to 6.4 TFLOPS performance for 32-bit floating-point (FP32) data. However, machine learning models often contain both large and small MM operations. While large MM operations can be parallelized efficiently across many cores, small MM operations typically cannot. We observe that executing some small MM layers from the BERT natural language processing model on a large, monolithic MM accelerator in Versal ACAP achieved less than 5% of the theoretical peak performance. Therefore, one key question arises: How can we design accelerators to fully use the abundant computation resources under limited communication bandwidth for end-to-end applications with multiple MM layers of diverse sizes? We identify the biggest system throughput bottleneck resulting from the mismatch between the massive computation resources of one monolithic accelerator and the various MM layers of small sizes in the application. To resolve this problem, we propose the CHARM framework to compose multiple diverse MM accelerator architectures working concurrently on different layers within one application. CHARM includes analytical models that guide design space exploration to determine accelerator partitions and layer scheduling. To facilitate system designs, CHARM automatically generates code, enabling thorough onboard design verification. We deploy the CHARM framework on four different deep learning applications in FP32, INT16, and INT8 data types, including BERT, ViT, NCF, and MLP, on the AMD/Xilinx Versal ACAP VCK190 evaluation board. Our experiments show that we achieve 1.46 TFLOPS, 1.61 TFLOPS, 1.74 TFLOPS, and 2.94 TFLOPS inference throughput for BERT, ViT, NCF, and MLP in FP32 data type, respectively, which obtain 5.29 \(\times\) , 32.51 \(\times\) , 1.00 \(\times\) , and 1.00 \(\times\) throughput gains compared to one monolithic accelerator. CHARM achieves the maximum throughput of 1.91 TOPS, 1.18 TOPS, 4.06 TOPS, and 5.81 TOPS in the INT16 data type for the four applications. The maximum throughput achieved by CHARM in the INT8 data type is 3.65 TOPS, 1.28 TOPS, 10.19 TOPS, and 21.58 TOPS, respectively. We have open-sourced our tools, including detailed step-by-step guides to reproduce all the results presented in this article and to enable other users to learn and leverage CHARM framework and tools in their end-to-end systems: https://github.com/arc-research-lab/CHARM .

ABSTRACT For kinematic relative positioning users between two moving platforms, limited communication bandwidth and computation ability usually cannot support real-time transmission of high-rate (≥10 Hz) BeiDou Navigation Satellite System (BDS) data. The performance of Ambiguity Resolution (AR) is also a major challenge in signal blocked and loss of lock environments. Based on BDS and Inertial Navigation System (INS) data, we develop a novel mode of INS-aided BDS kinematic relative positioning between two moving platforms, aiming to realize real-time, high-rate and precise positioning with low-cost communication modules in challenge environments. To achieve this goal, the INS-aided high-rate relative kinematic positioning and INS-aided BDS AR re-initialization methods are proposed. In this mode, the baseline bias caused by the different position datum is first defined and resolved. Then, high-rate INS data are assisted to obtain kinematic relative position when real-time kinematic positioning results are unavailable within 1 second. Once the BDS data are available, the predicted relative position is used as additional constraint to facilitate AR re-initialization and improve the kinematic positioning performance. The performance of these methods is discussed in a set of experiments with 1 Hz BDS data and 100 Hz INS data, and compared with the conventional method by sending raw BDS/INS measurements. The results show that the proposed methods can achieve an accuracy of about 5 cm for the INS-aided 100 Hz relative positioning in baseline components and lengths, which is equal to the conventional method, but the transmitted data have been sharply reduced by an average of nearly 80%. With the assistance of INS-predicted baseline constraint, the relative positioning performance has been further improved. The accuracy of baseline length errors is less than 4 cm, and the AR fixing rates keep larger than 95% during the experiment, while the wrongly fixed rates are reduced to less than 0.5%.

Limited Communication Bandwidth Research Articles

Related Topics

Articles published on Limited Communication Bandwidth

Finite-Time Dynamic Event-Triggered Full-State Constraints Consensus Control for Multiagent Systems with Switching Topologies and Mismatched Disturbances

Federated learning: A cutting-edge survey of the latest advancements and applications

CHARM 2.0: Composing Heterogeneous Accelerators for Deep Learning on Versal ACAP Architecture

Event-Triggered Collaborative Fault Diagnosis for UAV–UGV Systems

Master–slaves synchronization of teleoperation systems with the Try‐Once‐Discard protocol under event‐triggered communication

Composite error event-triggered-based finite-time safe formation control of underactuated vessels

Event-Triggered Adaptive Fuzzy Switching Fault-Tolerant Control of Dual-Motor Steer-by-Wire System Considering Load Fluctuation and Limited Communication Bandwidth

Self-triggered formation attitude control for multiple spacecraft with disturbances

Probability‐guaranteed encoding–decoding‐based state estimation for delayed memristive neutral networks with event‐triggered mechanism

A Joint Intensity-Neuromorphic Event Imaging System With Bandwidth-Limited Communication Channel.

Research on high precision time-frequency transmission under low available bandwidth based on free space optical communication with special beam

Recent Progress on Digital Twins in Intelligent Connected Vehicles: A Review

A novel mode of INS-aided BDS real-time high-rate and precise kinematic relative positioning between two moving platforms

Event-based filter design for singular positive Markov jump systems with parameter uncertainty and measurement delay

Distributed Control for Nonlinear Time-Delay Multiagent Systems: Hybrid Saturation-Constraint Impulsive Approach.

Distributed Event-Triggered Quantized Fault-Tolerant Control of Linear Multiagent Systems With External Disturbances and Parameter Uncertainties.

Enhancing lane detection with a lightweight collaborative late fusion model

Quantized iterative learning control of communication-constrained systems with encoding and decoding mechanism

Real-Time Object Localization Using a Fuzzy Controller for a Vision-Based Drone

Secure State Estimation for Artificial Neural Networks With Unknown-But-Bounded Noises: A Homomorphic Encryption Scheme.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Limited Communication Bandwidth Research Articles

Related Topics

Articles published on Limited Communication Bandwidth

Finite-Time Dynamic Event-Triggered Full-State Constraints Consensus Control for Multiagent Systems with Switching Topologies and Mismatched Disturbances

Federated learning: A cutting-edge survey of the latest advancements and applications

CHARM 2.0: Composing Heterogeneous Accelerators for Deep Learning on Versal ACAP Architecture

Event-Triggered Collaborative Fault Diagnosis for UAV–UGV Systems

Master–slaves synchronization of teleoperation systems with the Try‐Once‐Discard protocol under event‐triggered communication

Composite error event-triggered-based finite-time safe formation control of underactuated vessels

Event-Triggered Adaptive Fuzzy Switching Fault-Tolerant Control of Dual-Motor Steer-by-Wire System Considering Load Fluctuation and Limited Communication Bandwidth

Self-triggered formation attitude control for multiple spacecraft with disturbances

Probability‐guaranteed encoding–decoding‐based state estimation for delayed memristive neutral networks with event‐triggered mechanism

A Joint Intensity-Neuromorphic Event Imaging System With Bandwidth-Limited Communication Channel.

Research on high precision time-frequency transmission under low available bandwidth based on free space optical communication with special beam

Recent Progress on Digital Twins in Intelligent Connected Vehicles: A Review

A novel mode of INS-aided BDS real-time high-rate and precise kinematic relative positioning between two moving platforms

Event-based filter design for singular positive Markov jump systems with parameter uncertainty and measurement delay

Distributed Control for Nonlinear Time-Delay Multiagent Systems: Hybrid Saturation-Constraint Impulsive Approach.

Distributed Event-Triggered Quantized Fault-Tolerant Control of Linear Multiagent Systems With External Disturbances and Parameter Uncertainties.

Enhancing lane detection with a lightweight collaborative late fusion model

Quantized iterative learning control of communication-constrained systems with encoding and decoding mechanism

Real-Time Object Localization Using a Fuzzy Controller for a Vision-Based Drone

Secure State Estimation for Artificial Neural Networks With Unknown-But-Bounded Noises: A Homomorphic Encryption Scheme.