In some special locations, phone calls are strictly prohibited due to the possibility of significant security hazards. To prevent and reduce this dangerous behavior, we implemented real-time detection of handheld phones on resource-limited devices and developed a novel target detection network called YOLO-PAI. Initially, a pruning algorithm is mainly used to optimize CSPDarknet53 to reduce the number of model parameters while maintaining detection accuracy. Next, the SE attention mechanism module is introduced, and the SRBlock structure is designed to improve the network's attention to the input data and enhance the network feature extraction capability. Then, a new low-dimensional feature extraction branch is added to help the network extract more diverse and richer feature information. Finally, the InceptionV3 structure replaces the original 3 × 3 convolution to reduce the number of model parameters. the YOLO-PAI network can be executed in real-time on an NVIDIA Jetson TX2 embedded device and tested under various illumination and obstacle occlusion conditions. Experimental results show that YOLO-PAI reduces network structure parameters and computational costs while maintaining accuracy. In the Phonehand_Imgs dataset, the network model is shrunk by 190MB and the accuracy of handheld call detection reaches 94%, which is 1.44% higher than YOLOv4. FPS can reach 45 fps, which is 21 fps higher than the original model. On the other two datasets, YOLO-PAI outperformed other popular networks in terms of detection accuracy and speed. In addition, running on the NVIDIA Jetson TX2 embedded device, its detection speed is more than 15 fps faster than other popular networks.