In this paper, we present a new surfel (surface element) based multi-view stereo algorithm that runs entirely on GPU. We utilize the flexibility of surfel-based 3D shape representation and global optimization by graph cuts in the same framework. Unlike previous works, the algorithm is optimized to massive parallel processing on GPU. First, we construct surfel candidates by local stereo matching and voting. After refining the position and orientation of the surfel candidates, we extract the optimal surfels by employing graph cuts under photo-consistency and surfel orientation constraints. In contrast to the conventional voxel based methods, the proposed algorithm utilizes more accurate photo-consistency and reconstructs the 3D shape up to sub-voxel accuracy. The orientation of the constructed surfel candidates imposes an effective constraint that reduces the effect of the minimal surface bias. The entire processing pipeline is implemented on the latest GPU to significantly speed up the processing. The experimental results show that the proposed approach reconstructs the 3D shape of an object accurately and efficiently, which runs more than 100 times faster than on CPU.