Accurate and computationally efficient motion estimation is a critical component of real-time ultrasound strain elastography (USE). With the advent of deep-learning neural network models, a growing body of work has explored supervised convolutional neural network (CNN)-based optical flow in the framework of USE. However, the above-said supervised learning was often done using simulated ultrasound data. The research community has questioned whether simulated ultrasound data containing simple motion can train deep-learning CNN models that can reliably track complex in vivo speckle motion. In parallel with other research groups' efforts, this study developed an unsupervised motion estimation neural network (UMEN-Net) for USE by adapting a well-established CNN model named PWC-Net. Our network's input is a pair of predeformation and postdeformation radio frequency (RF) echo signals. The proposed network outputs both axial and lateral displacement fields. The loss function consists of a correlation between the predeformation signal and the motion-compensated postcompression signal, smoothness of the displacement fields, and tissue incompressibility. Notably, an innovative correlation method known as the globally optimized correspondence (GOCor) volumes module developed by Truong et al. was used to replace the original Corr module to enhance our evaluation of signal correlation. The proposed CNN model was tested using simulated, phantom, and in vivo ultrasound data containing biologically confirmed breast lesions. Its performance was compared against other state-of-the-art methods, including two deep-learning-based tracking methods (MPWC-Net++ and ReUSENet) and two conventional tracking methods (GLUE and BRGMT-LPF). In summary, compared with the four known methods mentioned above, our unsupervised CNN model not only obtained higher signal-to-noise ratios (SNRs) and contrast-to-noise ratios (CNRs) for axial strain estimates but also improved the quality of the lateral strain estimates.