Efficient graphical-processor-unit parallelization algorithm for computing Eigen values

Sofien Ben Sayadia,Mohamed Hedi Bedoui,Yaroub Elloumi,Mohamed Akil

doi:10.1117/1.jei.29.6.063008

Abstract

Several leading-edge applications such as pathology detection, biometric identification, and face recognition are based mainly on blob and line detection. To address this problem, Eigen value computing has been commonly employed due to its accuracy and robustness. However, Eigen value computing requires a raised computational processing, intensive memory data access, and data overlapping, which involve higher execution times. To overcome these limitations, we propose in this paper a new parallel strategy to implement Eigen value computing using a graphics processing unit (GPU). Our contributions are (1) to optimize instruction scheduling to reduce the computation time, (2) to efficiently partition processing into blocks to increase the occupancy of streaming multiprocessors, (3) to provide efficient input data splitting on shared memory to benefit from its lower access time, and (4) to propose new data management of shared memory to avoid access memory conflict and reduce memory bank accesses. Experimental results show that our proposed GPU parallel strategy for Eigen value computing achieves speedups of 27 compared with a multithreaded implementation, of 16 compared with a predefined function in the OpenCV library, and of eight compared with a predefined function in the Cublas library, all of which are performed into a quad core multi-central-processing unit platform. Next, our parallel strategy is evaluated through an Eigen value-based method for retinal thick vessel segmentation, which is essential for detecting ocular pathologies. Eigen value computing is executed in 0.017 s when using Structured Analysis of the Retina database images. Accordingly, we achieved real-time thick retinal vessel segmentation with an average execution time of about 0.039 s.

Full Text