Abstract
Matrix multiplication is a critical time-consuming processing step in many machine learning applications. Due to the diversity of practical applications, the matrix dimensions are generally not fixed. However, most matrix calculation methods, based on field programmable gate array (FPGA) currently use fixed matrix dimensions, which limit the flexibility of machine learning algorithms in a FPGA. The bottleneck lies in the limited FPGA resources. Therefore, this paper proposes an accelerator architecture for matrix computing method with changeable dimensions. Multi-matrix synchronous calculation concept allows matrix data to be processed continuously, which improves the parallel computing characteristics of FPGA and optimizes the computational efficiency. This paper tests matrix multiplication using support vector machine (SVM) algorithm to verify the performance of proposed architecture on the ZYNQ platform. The experimental results show that, compared to the software processing method, the proposed architecture increases the performance by 21.18 times with 9947 dimensions. The dimension is changeable with a maximum value of 2,097,151, without changing hardware design. This method is also applicable to matrix multiplication processing with other machine learning algorithms.
Highlights
Field programmable gate array (FPGA)-based data processing has advantages, such as high parallelism, fast processing speed, customizable configuration, and high flexibility [1]; it is widely used in digital signal processing [2], deep learning [3], data compression [4], signal acquisition [5], and other fields
The frequency of processing element (PE) selected in this system is 100 MHz considering the output frequency of internal Phase Locked Loop (PLL)
To evaluate the resource utilization of the accelerated computing architecture, place and route of proposed architecture are shown in Figure 9, which is obtained from Xilinx Implemented Design with timing constraints
Summary
Field programmable gate array (FPGA)-based data processing has advantages, such as high parallelism, fast processing speed, customizable configuration, and high flexibility [1]; it is widely used in digital signal processing [2], deep learning [3], data compression [4], signal acquisition [5], and other fields. The support vector machine (SVM) algorithm is often used in data classification and is prevalent in such fields as pedestrian detection, facial recognition, etc. An accelerated SVM implementation on FPGA has far-reaching significance and has attracted wide attention. One scheme for SVM acceleration was proposed in [11], in which the computer sends data from the main memory to the external memory on a VC707 board through the Peripheral Component Interconnect express (PCIe) interface. The data are cached into the row and column buffer of the SVM acceleration component. The data enter the acceleration component to perform the calculations of the SVM algorithm. This design achieves a 23x speed-up at a 200 MHz clock frequency. To accelerate the SVM algorithm for applications in embedded
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have