Abstract

Popcount computations are widely used in such areas as combinatorial search, data processing, statistical analysis, and bio- and chemical informatics. In many practical problems the size of initial data is very large and increase in throughput is important. The paper suggests two types of hardware accelerators that are (1) designed in FPGAs and (2) implemented in Zynq-7000 all programmable systems-on-chip with partitioning of algorithms that use popcounts between software of ARM Cortex-A9 processing system and advanced programmable logic. A three-level system architecture that includes a general-purpose computer, the problem-specific ARM, and reconfigurable hardware is then proposed. The results of experiments and comparisons with existing benchmarks demonstrate that although throughput of popcount computations is increased in FPGA-based designs interacting with general-purpose computers, communication overheads (in experiments with PCI express) are significant and actual advantages can be gained if not only popcount but also other types of relevant computations are implemented in hardware. The comparison of software/hardware designs for Zynq-7000 all programmable systems-on-chip with pure software implementations in the same Zynq-7000 devices demonstrates increase in performance by a factor ranging from 5 to 19 (taking into account all the involved communication overheads between the programmable logic and the processing systems).

Highlights

  • IntroductionThe execution time for popcount computations over vectors has a significant impact on overall performance of systems that use the results of such computations

  • Popcount P(A) of a binary vector A is the number of ones in the vector A = {a0, . . . , aN−1}

  • The execution time for popcount computations over vectors has a significant impact on overall performance of systems that use the results of such computations

Read more

Summary

Introduction

The execution time for popcount computations over vectors has a significant impact on overall performance of systems that use the results of such computations. They are widely requested in different areas and we will show below just a few examples. Note that more than a hundred of thousands of such registers are available in recent even lowcost devices This technique permits all rows and columns to be accessed and processed concurrently counting HW for all the rows and columns in parallel

Methods
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.