The collaborative use of camera near-field sensors for monitoring the number and status of tourists is a crucial aspect of smart scenic spot management. This paper proposes a near-field perception technical system that achieves dynamic and accurate detection of tourist targets in mountainous scenic areas, addressing the challenges of real-time passive perception and safety management of tourists. The technical framework involves the following steps: Firstly, real-time video stream signals are collected from multiple cameras to create a distributed perception network. Then, the YOLOX network model is enhanced with the CBAM module and ASFF method to improve the dynamic recognition of preliminary tourist targets in complex scenes. Additionally, the BYTE target dynamic tracking algorithm is employed to address the issue of target occlusion in mountainous scenic areas, thereby enhancing the accuracy of model detection. Finally, the video target monocular spatial positioning algorithm is utilized to determine the actual geographic location of tourists based on the image coordinates. The algorithm was deployed in the Tianmeng Scenic Area of Yimeng Mountain in Shandong Province, and the results demonstrate that this technical system effectively assists in accurately perceiving and spatially positioning tourists in mountainous scenic spots. The system demonstrates an overall accuracy in tourist perception of over 90%, with spatial positioning errors less than 1.0 m and a root mean square error (RMSE) of less than 1.14. This provides auxiliary technical support and effective data support for passive real-time dynamic precise perception and safety management of regional tourist targets in mountainous scenic areas with no/weak satellite navigation signals.