QrnPro: New Processor Architecture for Accelerating Quran Applications

Mostafa I Soliman

doi:10.1109/nooric.2013.89

Mostafa I Soliman

https://doi.org/10.1109/nooric.2013.89

Copy DOI

Export

Save

Cite

Publication Date: Dec 1, 2013

Citations: 1

Affiliation: Aswan University, Taibah University

Abstract
Full-Text
Similar Papers

Abstract

Listen

Quran applications include image/video processing, voice recognition, encrypting/decrypting data, etc., which are based on data parallelism. These applications are characterized by structured and regular computations on large data sets. In this paper, new processor architecture called QrnPro is proposed to accelerate Quran applications. QrnPro exploits data parallelism found in Quran applications by adding the vector processing technique to VLIW architecture. QrnPro uses VLIW architecture for processing multiple independent scalar instructions concurrently on parallel execution units. Moreover, data parallelism is expressed by vector instructions and processed on the same parallel execution units of the VLIW architecture. This combination between VLIW and vector processing makes efficient exploitation of resources even though the percentage of data parallelism is not 100%. Instruction memory of size 256×128-bit stores scalar/vector instructions of Quran applications in the form of 128-bit VLIW. A single register file (8-vector×16-element×32-bit or 128×32-bit registers) is used for storing both multi-scalar/vector elements. The control unit feeds the parallel execution units by the required operands (multi-scalar/vector elements) and can produce up to 4×32-bit results each clock cycle. Scalar/vector loads/stores take place from/to the data memory (512×128-bit) of QrnPro in a rate of 128-bit (4×32-bit elements) per clock cycle. Finally, the writeback stage writes up to four results (4×32-bit) per clock cycle coming from the memory system or from the execution units into the QrnPro register file. The design of our proposed QrnPro is implemented using VHDL targeting the Xilinx FPGA Virtex-5, XC5VLX110T-3FF1136 device and its performance is evaluated.

Full Text