Abstract
The high integration density in today's VLSI chips offers enormous computing power to be utilized by the design of parallel computing hardware. The implementation of computationally intensive algorithms represented by -dimensional (-D) nested loop algorithms, onto parallel array architecture is termed as mapping. The methodologies adopted for mapping these algorithms onto parallel hardware often use heuristic search that requires a lot of computational effort to obtain near optimal solutions. We propose a new mapping procedure wherein a lower dimensional subspace (of the -D problem space) of inner loop is identified, in which lies the computational expression that generates the output or outputs of the -D problem. The processing elements (PE array) are assigned to the identified sub-space and the reuse of the PE array is through the assignment of the PE array to the successive sub-spaces in consecutive clock cycles/periods (CPs) to complete the computational tasks of the -D problem. The above is used to develop our proposed modified heuristic search to arrive at optimal design and the complexity comparisons are given. The MATLAB results of the new search and the design space trade-off analysis using the high-level synthesis tool are presented for two typical computationally intensive nested loop algorithms—the 6D FSBM and the 4D edge detection alternatively known as the 2D filtering algorithm.
Highlights
The architecture consists of w1 × w2 processing elements (PEs), where w1 × w2 is the size of the window used
The intermediate output is propagated to the successive PEs within a row but has to be passed through a line buffer when passing the intermediate output between rows of PEs
The search has been performed using MATLAB, for the PE array assigned to the identified (n − x)-D subspace evolved with the nature of the Computational Trail Vector (CTV)
Summary
Today’s reconfigurable SoCs feature processing elements (PEs) with significant amount of programmable logic fabric present on the same die. The management of complexity and tapping the full potential of these RSoC architectures present many challenges [1]. A large number of heuristic algorithms have been used in developing many novel scheduling and mapping algorithms [2,3,4,5]. These approaches face difficulties in dealing with large execution times
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have