Abstract
Aggressive embedded processors are often equipped with general purpose cores and special purpose acceleration logics. In our paper, we consider a reconfigurable processor that consists of very long instruction word (VLIW) cores and coarse grained reconfigurable arrays (CGRAs). CGRAs are particularly used to enhance the performance by exploiting loop parallelism, while VLIW cores rely on discovering instruction level parallelism. For time consuming loops, CGRAs can accelerate them with powerful pipeline scheduling. However, not all loops can be accelerated by CGRAs. Outer loops and loops containing function calls cannot be candidates for CGRA acceleration. In our paper, we adopt instruction extensions to convert code fragments in outer loops and simple functions into single instructions. With the extended instructions in CGRAs, more loops can be accelerated with CGRAs. Our experiment with mpeg2dec from Mediabench shows 32% performance increase.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have