Hardware-software Co-design Research Articles

Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences are represented as graph structures, where states and edges capture modifications (i.e., insertions, deletions, and substitutions) by assigning probabilities to them. These probabilities are subsequently used to compute the similarity score between a sequence and a pHMM graph. The Baum-Welch algorithm, a prevalent and highly accurate method, utilizes these probabilities to optimize and compute similarity scores. Accurate computation of these probabilities is essential for the correct identification of sequence similarities. However, the Baum-Welch algorithm is computationally intensive, and existing solutions offer either software-only or hardware-only approaches with fixed pHMM designs. When we analyze state-of-the-art works, we identify an urgent need for a flexible, high-performance, and energy-efficient hardware-software co-design to address the major inefficiencies in the Baum-Welch algorithm for pHMMs. We introduce ApHMM , the first flexible acceleration framework designed to significantly reduce both computational and energy overheads associated with the Baum-Welch algorithm for pHMMs. ApHMM employs hardware-software co-design to tackle the major inefficiencies in the Baum-Welch algorithm by (1) designing flexible hardware to accommodate various pHMM designs, (2) exploiting predictable data dependency patterns through on-chip memory with memoization techniques, (3) rapidly filtering out unnecessary computations using a hardware-based filter, and (4) minimizing redundant computations. ApHMM achieves substantial speedups of 15.55×–260.03×, 1.83×–5.34×, and 27.97× when compared to CPU, GPU, and FPGA implementations of the Baum-Welch algorithm, respectively. ApHMM outperforms state-of-the-art CPU implementations in three key bioinformatics applications: (1) error correction, (2) protein family search, and (3) multiple sequence alignment, by 1.29×–59.94×, 1.03×–1.75×, and 1.03×–1.95×, respectively, while improving their energy efficiency by 64.24×–115.46×, 1.75×, and 1.96×.

Read full abstract

Memristor-based neuromorphic computing systems (NСSs) provide a fast, high computational and energy efficient approach to neural network (NN) training and solving cognitive problems (pattern recognition, big data processing, prediction, etc.) [1]. Memristors could be organized in large crossbar arrays to perform vector-matrix multiplication (VMM) in a natural one-step way by the weighted electrical current summation (according to the Ohm’s and Kirchhoff’s laws) [1]. In contrast, being the most massively parallel operation in NN learning and inference, VMM is extremely time- and energy-expensive in traditional von Neumann architectures. Owing to this difference, memristor-based NCSs are of high interest. Memristors have already been successfully implemented for diverse NCS realizations, and such schemes as multi-layer perceptron (MLP) [2], long short-term memory and others have been demonstrated. Most of these NCSs are usually trained by various types of gradient descent learning algorithm, the hardware realization of which is challenging due to unreliable cycle-to-cycle (c2c) and device-to-device (d2d) variations of memristive devices. Several approaches have been proposed to partially mitigate these problems, including reservoir computing [3] and fine feature engineering [4]. The general idea of such approaches is to reduce the number of required weights (i.e. memristors) compared with fully connected NNs. In this respect, such novel architectures as convolutional NN (CNN) and MLP-mixer are of high interest as they provide significant weight reduction without classification efficiency drop. Although CNN based on memristors was already demonstrated, different aspects of its realization (such as hybrid hardware-software co-design) have yet to be studied. MLP-mixer was realized only in software. Therefore, in this work we have studied the possibility of hardware realization of CNN and MLP-mixer networks based on crossbar arrays of memristors. For this purpose, we studied (Co-Fe-B)x(LiNbO3)100−x nanocomposite (CFB-LNO NC) memristors, which operate through a multifilamentary resistive switching (RS) mechanism, demonstrate high endurance, long retention and possess multilevel RS [5]. Crossbar array of memristors was fabricated using laser photolithography for patterning electrode buses and ion-beam sputtering on the original facility for active layer deposition (~10 nm thick LiNbO3 and ~290 nm thick CFB-LNO NC with x ≈10–25 at.%). Details of the fabrication process could be found elsewhere [5]. I-V curves of the fabricated memristors showed small c2c and d2d variations, plasticity with 16 different resistive states and endurance of more than 105 cycles. Using the nanocomposite based crossbar arrays, we implemented a hybrid CNN, consisting of a hardware feature extractor with one/two kernels and a software classifier. Additionally, we have demonstrated in simulation that the usage of the memristors under study in the accurately adapted MLP-Mixer architecture results in high classification accuracy that is resilient to memristive variations and stuck devices.

Read full abstract

Hardware-software Co-design Research Articles

Related Topics

Articles published on Hardware-software Co-design

Hardware–Software Co-Design of an Audio Feature Extraction Pipeline for Machine Learning Applications

Acceleration of Graph Neural Network-Based Prediction Models in Chemistry via Co-Design Optimization on Intelligence Processing Units.

ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-efficient Genome Analysis

Tailor : Altering Skip Connections for Resource-Efficient Inference

In-memory and in-sensor reservoir computing with memristive devices

IEEE CASS/SSCS/EDS HUST Student Branch Chapter Successfully Organizes a Hardware–Software Codesign Lecture [Chapters

Hopscotch: A Hardware-Software Co-Design for Efficient Cache Resizing on Multi-Core SoCs

SH-GAT: Software-hardware co-design for accelerating graph attention networks on FPGA

Analysis of the Principle and Configuration of AI Chip and the State-of-art Applications

Novel neuromorphic architectures based on crossbar arrays of (Co-Fe-B)<sub>x</sub>(LiNbO<sub>3</sub>)<sub>100−x</sub> nanocomposite memristors

Novel hardware/software co-design approach for Connect6 game-solver

IMGA: Efficient In-Memory Graph Convolution Network Aggregation With Data Flow Optimizations

TSAR-ILP: Tile-Based, Synchronization-AwaRe ILP Allocating Heterogeneous Platforms for Streaming Applications

Hardware–Software Co-Design for Real-Time Latency–Accuracy Navigation in Tiny Machine Learning Applications

Teaching Edge AI at the Undergraduate Level: A Hardware–Software Co-Design Approach

RAPIDx: High-Performance ReRAM Processing In-Memory Accelerator for Sequence Alignment

BitSET: Bit-Serial Early Termination for Computation Reduction in Convolutional Neural Networks

Hardware–software codesign for peer-to-peer energy market resolution

Object Fingerprint Cache for Heterogeneous Memory System

Systemization of Knowledge: Robust Deep Learning using Hardware-software co-design in Centralized and Federated Settings

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Hardware-software Co-design Research Articles

Related Topics

Articles published on Hardware-software Co-design

Hardware–Software Co-Design of an Audio Feature Extraction Pipeline for Machine Learning Applications

Acceleration of Graph Neural Network-Based Prediction Models in Chemistry via Co-Design Optimization on Intelligence Processing Units.

ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-efficient Genome Analysis

Tailor : Altering Skip Connections for Resource-Efficient Inference

In-memory and in-sensor reservoir computing with memristive devices

IEEE CASS/SSCS/EDS HUST Student Branch Chapter Successfully Organizes a Hardware–Software Codesign Lecture [Chapters

Hopscotch: A Hardware-Software Co-Design for Efficient Cache Resizing on Multi-Core SoCs

SH-GAT: Software-hardware co-design for accelerating graph attention networks on FPGA

Analysis of the Principle and Configuration of AI Chip and the State-of-art Applications

Novel neuromorphic architectures based on crossbar arrays of (Co-Fe-B)&lt;sub&gt;x&lt;/sub&gt;(LiNbO&lt;sub&gt;3&lt;/sub&gt;)&lt;sub&gt;100−x&lt;/sub&gt; nanocomposite memristors

Novel hardware/software co-design approach for Connect6 game-solver

IMGA: Efficient In-Memory Graph Convolution Network Aggregation With Data Flow Optimizations

TSAR-ILP: Tile-Based, Synchronization-AwaRe ILP Allocating Heterogeneous Platforms for Streaming Applications

Hardware–Software Co-Design for Real-Time Latency–Accuracy Navigation in Tiny Machine Learning Applications

Teaching Edge AI at the Undergraduate Level: A Hardware–Software Co-Design Approach

RAPIDx: High-Performance ReRAM Processing In-Memory Accelerator for Sequence Alignment

BitSET: Bit-Serial Early Termination for Computation Reduction in Convolutional Neural Networks

Hardware–software codesign for peer-to-peer energy market resolution

Object Fingerprint Cache for Heterogeneous Memory System

Systemization of Knowledge: Robust Deep Learning using Hardware-software co-design in Centralized and Federated Settings

Novel neuromorphic architectures based on crossbar arrays of (Co-Fe-B)<sub>x</sub>(LiNbO<sub>3</sub>)<sub>100−x</sub> nanocomposite memristors