Face detection algorithms varies in speed and performance on GPUs. Different algorithms can report different speeds on different GPUs that are not governed by linear or near-linear approximations. This is due to many factors such as register file size, occupancy rate of the GPU, speed of the memory, and speed of double precision processors. This paper studies the most common face detection algorithms LBP and Haar-like and study the bottlenecks associated with deploying both algorithms on different GPU architectures. The study focuses on the bottlenecks and the associated techniques to resolve them based on the different GPUs specifications.