Building detection in high-resolution satellite images has received great attention, as it is important to increase the accuracy of urban planning. The building boundary detection in the desert environment is a real challenge due to the nature of low contrast images in the desert environment. The traditional computer vision algorithms for building boundary detection lack scalability, robustness, and accuracy. On the other hand, deep learning detection algorithms have not been applied to such low contrast satellite images. So, there is a real need to employ deep learning algorithms for building detection tasks in low contrast high-resolution images. In this paper, we propose a novel building detection method based on a single-shot multi-box (SSD) detector. We develop the state-of-the-art SSD detection algorithm based on three approaches. First, we propose data-augmentation techniques to overcome the low contrast images’ appearance. Second, we develop the SSD backbone using a novel saliency visual attention mechanism. Moreover, we investigate several pre-trained networks performance and several fusion functions to increase the performance of the SSD backbone. The third approach is based on optimizing the anchor-boxes sizes which are used in the detection stage to increase the performance of the SSD head. During our experiments, we have prepared a new dataset for buildings inside Riyadh City, Saudi Arabia that consists of 3878 buildings. We have compared our proposed approach vs other approaches in the literature. The proposed system has achieved the highest average precision, recall, F1-score, and IOU performance. Our proposed method has achieved a fast average prediction time with the lowest variance for our testing set. Our experimental results are very promising and can be generalized to other object detection tasks in low contrast images.