Abstract

Quran applications include image/video processing, voice recognition, encrypting/decrypting data, etc., which are based on data parallelism. These applications are characterized by structured and regular computations on large data sets. In this paper, new processor architecture called QrnPro is proposed to accelerate Quran applications. QrnPro exploits data parallelism found in Quran applications by adding the vector processing technique to VLIW architecture. QrnPro uses VLIW architecture for processing multiple independent scalar instructions concurrently on parallel execution units. Moreover, data parallelism is expressed by vector instructions and processed on the same parallel execution units of the VLIW architecture. This combination between VLIW and vector processing makes efficient exploitation of resources even though the percentage of data parallelism is not 100%. Instruction memory of size 256×128-bit stores scalar/vector instructions of Quran applications in the form of 128-bit VLIW. A single register file (8-vector×16-element×32-bit or 128×32-bit registers) is used for storing both multi-scalar/vector elements. The control unit feeds the parallel execution units by the required operands (multi-scalar/vector elements) and can produce up to 4×32-bit results each clock cycle. Scalar/vector loads/stores take place from/to the data memory (512×128-bit) of QrnPro in a rate of 128-bit (4×32-bit elements) per clock cycle. Finally, the writeback stage writes up to four results (4×32-bit) per clock cycle coming from the memory system or from the execution units into the QrnPro register file. The design of our proposed QrnPro is implemented using VHDL targeting the Xilinx FPGA Virtex-5, XC5VLX110T-3FF1136 device and its performance is evaluated.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call