Nowadays, deep neural networks are widely applied in sustainable smart cities and societies, including smart manufacturing, healthcare, industries, agriculture, surveillance, and various artificial intelligence-based real-life applications. In this regard, the human detection system has gained notable attention since it is recognized as a crucial task in intelligent surveillance applications. Researchers practiced a variety of computer vision and deep neural networks-based techniques for human detection-based applications; however, they often focused on the frontal view camera perspective. Thus, in this work, we have introduced a human detection system for intelligent surveillance in smart cities and societies with a completely distinct perspective, i.e., an overhead perspective that can provide sufficient visibility and coverage of a scene in congested and obstructed environments. However, human appearance can be difficult from such an extreme point of view, as there are significant variations in humans’ poses and appearances. Therefore, in this work, leveraging the deep neural network-based object detection technique, the Gaussian YOLOv3 algorithm is used for human detection. The algorithm determines the bounding box uncertainty by modeling its coordinates as a Gaussian parameter, improving accuracy and reducing false positives. A Gaussian YOLOv3 is combined with channel attention and feature intertwine modules to improve specific feature maps. The channel attention module is combined with the feature map to learn each channel's weight autonomously, improve the key features, and enhance the network's ability to discriminate between humans and background. At the same time, different channels of the feature map are intertwined to obtain more representative features. Finally, the features obtained from the attention and feature intertwine modules are fused to form an improved feature map. In addition, to further increase the detection accuracy of the algorithm for human detection, transfer learning is adopted. The experimental outcomes reveal that training improves the Gaussian YOLOv3 algorithm's potential for human detection with an overall detection accuracy of 94%.