During ultrasound-guided (US-guided) needle puncture for minimally invasive procedures, automated needle tip localization can help clinicians capture small tips in US images easily and precisely, providing them with obvious tip indicators on the screen and bringing them more confidence during the procedures. However, automated needle tip localization in US images is challenging due to serious interferences arising from all kinds of echoes. We propose a method that localizes needle tips under continuous spatial and temporal constraints in the real-time US frame stream. A temporal constraint is firstly acquired by detecting translational tip motion in motion-enhanced US images with a deep learning-based (DL-based) detector. A spatial constraint and candidate tip locations are obtained by detecting needle shafts and tips in the raw grayscale B-mode images with another DL-based detector. To provide continuous constraints, estimated tip velocity from acquired temporal constraint is used to predict tip locations in frames where no temporal or spatial constraint is detected. Finally, tip coordinates are precisely localized among candidate tips under the spatial and temporal constraints. Experimental results evaluated on 1121 US images from porcine organ punctures, and 895 images from human thyroid punctures demonstrate that the proposed method is effective and efficient, surpassing existing methods. On porcine organ data, a 97.2% recall rate and a 91.9% precision rate on tip detection and 0.88 ± 0.70mm root-mean-square error (RMSE) on tip localization were achieved. On the human thyroid data, which was not involved in the training, 86.5% recall, 84.3% precision and 0.92 ± 0.78mm RMSE were achieved separately. The running speed of 14.5 frames per second was achieved only using a CPU. The proposed method provides a more reliable solution for automated needle tip localization during US-guided needle puncture, being more robust to interferences. Fast running speed leads to its practicability in the real-time US stream.