Accurately extracting the Region of Interest (ROI) of a palm print was crucial for subsequent palm print recognition. However, under unconstrained environmental conditions, the user's palm posture and angle, as well as the background and lighting of the environment, were not controlled, making the extraction of the ROI of palm print a major challenge. In existing research methods, traditional ROI extraction methods relied on image segmentation and were difficult to apply to multiple datasets simultaneously under the aforementioned interference. However, deep learning-based methods typically did not consider the computational cost of the model and were difficult to apply to embedded devices. This article proposed a palm print ROI extraction method based on lightweight networks. Firstly, the YOLOv5-lite network was used to detect and preliminarily locate the palm, in order to eliminate most of the interference from complex backgrounds. Then, an improved UNet was used for keypoints detection. This network model reduced the number of parameters compared to the original UNet model, improved network performance, and accelerated network convergence. The output of this model combined Gaussian heatmap regression and direct regression and proposed a joint loss function based on JS loss and L2 loss for supervision. During the experiment, a mixed database consisting of 5 databases was used to meet the needs of practical applications. The results showed that the proposed method achieved an accuracy of 98.3% on the database, with an average detection time of only 28ms on the GPU, which was superior to other mainstream lightweight networks, and the model size was only 831k. In the open-set test, with a success rate of 93.4%, an average detection time of 5.95ms on the GPU, it was far ahead of the latest palm print ROI extraction algorithm and could be applied in practice.