The contagious Corona Virus (COVID-19) transmission can be reduced by following and maintaining physical distancing (also known as COVID-19 social distance). The World Health Organisation (WHO) recommends preventing COVID-19 from spreading in public areas. On the other hand, people may not be maintaining the required 2-m physical distance as a mandated safety precaution in shopping malls and public places. The spread of the fatal disease may be slowed by an active monitoring system suitable for identifying distances between people and alerting them. This paper introduced a deep learning-based system for automatically detecting physical distance using video from security cameras. The proposed system introduced the TH-YOLOv5 for object detection and classification and Deepsort for tracking the detected people using bounding boxes from the video. TH-YOLOv5 included another prediction head to identify objects of varying sizes. The original prediction heads are then replaced with Transformer Heads (TH) to investigate the prediction capability of the self-attention mechanism. Then, we include the convolutional block attention model (CBAM) to identify attention areas in settings with dense objects. Pairwise L2 vectorized normalization was utilized to generate a three-dimensional feature space for tracking physical distances and the violation index, determining the number of individuals who follow the distance rules. We use the MS COCO and HumanCrowd, CityPersons, and Oxford Town Centre (OTC) data sets for training and testing. Experimental results demonstrate that the proposed system obtained a weighted mAP score of 89.5% and an FPS score of 29; both are computationally comparable.