Abstract Cow body weight monitoring is critical in farm management because it is associated with the growth, nutritional status, and health of dairy cows. Regularly collecting body weight records often requires human labor or expensive equipment. Recently, a low-cost 3D depth sensor camera-enabled computer vision analysis has been proposed to address the body weight prediction challenges. Cow body weight is a repeated trait; however, past body weight prediction research only used data collected at a single time point. Furthermore, the utility of deep learning on body weight predictions remains unanswered. Therefore, the objectives of this study were to predict cow body weight from repeatedly measured video data and compare the performance of thresholding and Mask R-CNN deep learning approaches. We set up an Intel Realsense D435 camera at the Virginia Tech Kentland farm to collect top-view videos for 10 Holstein and 2 Jersey cows for 28 days, twice per day. Real body weight records were also collected by a walk-over-weighing system simultaneously. A total of 40,405 depth images and depth map files were obtained. We explored three approaches to segment the cow body from the background, including single thresholding, adaptive thresholding, and Mask R-CNN. Four image-derived biometric measurements, length, width, height, and volume, were estimated from segmented images and fitted in ordinary least squares. Two cross-validation designs, forecasting and leave-several-cow-out, were used to evaluate prediction performance. The correlation between image-derived biometric measurements and body weight ranged from 0.84 to 0.95. On average, the Mask-RCNN approach resulted in the best prediction coefficient of determination and mean absolute percentage error of 0.96 and 3.54% in the forecasting cross-validation, respectively. The predictive performance of the single and adaptive thresholding approaches was similar. The Mask-RCNN approach was also the best in the leave-several-cow-out cross-validation, followed by adaptive and single thresholding. The prediction coefficients of determination of Mask-RCNN for leave-one-cow-out, leave-two-cow-out, leave-three-cow-out, leave-four-cow-out were 0.05, 0.69, 0.85, and 0.90, respectively, while mean absolute percentage errors were 4.51%, 4.51%, and 4.59%, 4.70% respectively. Our results suggest predicting cow body weight from depth video data using deep learning is feasible.