Video surveillance extensively uses person detection and tracking technology based on video. The majority of person detection and classification techniques currently in use encounter challenges in video sequences brought on by occlusion, ambient lighting, and variations in human facial position. This paper proposed an effective person identification and classification system based on deep learning, which comprises a you only look once at version 8 (YOLOv8) detection and classification model, to classify human faces in video sequences accurately. This work proposes a new staff-detection and classification (S-DEC) dataset for comprehensive performance evaluation. visual tracker benchmark (VTB) standard database is used for performance comparison with the proposed S-DEC dataset. The proposed technique achieved 98.67% precision accuracy. For the S-DEC dataset, the system gave 94.67% accuracy in identifying facial images from a video sequence of 38 people addressing the pose variation occlusion challenge. Earlier methods used to provide approximately 85% to 90% results taking more execution time. Many existing techniques were successful in detecting people only-identification of the detected person has been done in limited papers. The proposed method uses the cross-stage partial connections (CSPDarknet53) model, integrated with YOLOv8, to achieve faster results. The proposed framework took 35 minutes to train a deep learning model. A testing time of 2 minutes ensured that the proposed framework outplayed other existing methodologies and successfully identified extra information about the detected person.