Abstract
The Hungarian algorithm solves the linear assignment problem in polynomial time. A GPU/CUDA implementation of this algorithm is proposed. GPUs are massive parallel machines. In this implementation, the alternating path search phase of the algorithm is distributed by several blocks in a way to minimize global device synchronization. This phase is very important and has a big contribution to the execution time. Other advanced features also implemented are: parallel graph traversal; the parallel detection of multiple alternating paths in a single iteration; a simplified and fast matrix compression that stores the zeros of the slack matrix, resulting in very fast graph traversal; highly optimized reductions for the initial slack matrix calculation and update. This results in a fast implementation for moderate size problems.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.