Abstract

Increased chip densities offer massive computation power to deal with fundamental big data operations such as searching and sorting. At the same time, the proliferation of processing elements (PEs) in such multicore chips together with the employment of more aggressive parallel algorithms cause the amount of space needed for interprocessor communications to dominate the overall chip space, potentially resulting in reduced computational efficiency. To overcome this issue, this paper introduces a new architecture that uses simple crosspoint switches to pair PEs instead of a complex interconnection network. This new architecture may be viewed as a ‘quadratic’ array of processors as it uses PEs rather than PEs as in linear array processor models. The switches between adjacent PEs are created using a cyclic permutation wiring idea with PEs and as many crosspoints. We demonstrate the versatility of this new parallel architecture by designing fast algorithms to sort and search a list of n elements with it. In particular, we show that finding a minimum, maximum, and searching a list of n elements can all be performed on this parallel architecture in time with additional elementary logic gates with fan-in and in time with fan-in. We further show that sorting a list of n elements can also be carried out in time using additional elementary logic gates with fan-in and threshold logic gates on the same parallel architecture. The sorting time increases to if only elementary logic gates with fan-in are used. In addition, we establish how similar queries can be handled within the same order of time complexities. We use this new parallel architecture to perform sorting and searching on big data on three different models. The first of these models provides an efficient implementation of enumeration sorting and searching for moderate size big data sets. The second model offers increased parallelism by replication of the new parallel architecture but its hardware complexity limits its use to moderate size big data sets as well. The third model removes this limitation by introducing a tradeoff parameter between the time and hardware complexity of the overall computation, thereby providing an optimal use of available resources within a given chip-set space.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call