Abstract Accurate pig body weight (BW) measurement is essential for producers as it is related to pig growth, health, and marketing, yet conventional manual weighing methods are time-consuming and may cause potential stress to the animals. Although there is a growing trend towards the adoption of three-dimensional cameras coupled with computer vision techniques for pig BW estimation, their validation using industry-scale data are still limited. Among the prevailing methodologies, semantic segmentation and supervised pre-training regression are prominent models currently used. Therefore, the objectives of this study were: 1) to estimate pig BW from repeatedly measured video data obtained from a commercial setting, 2) to compare the performance of the two image analysis methods: thresholding segmentation and deep regression approaches, and 3) to evaluate the predictive ability of BW estimation models. An Intel RealSense D435 camera was installed in the commercial farm to collect top-view videos of 540 pigs biweekly at five different time points over three months. At the same time, manually measured BW records were collected using a digital weighing system. We used an automated video conversion pipeline and fine-tuned YOLOv8 to pre-process the raw depth videos. Subsequently, we acquired a total of 151,756 depth images and depth map files. Adaptive thresholding was applied to segment the pig body from the background. Four image-derived biometric features, including dorsal length, abdominal width, height, and volume, were estimated from the segmented images and fitted using ordinary least squares and random forest models. We applied transfer learning by initializing the weights of five deep learning models, including ResNet50, Xception, EfficientNetV2S, ConvNeXtBase, and Vision Transformer, with pre-trained weights from ImageNet. We then fine-tuned these models on the pig depth images. The last layer of each model was adapted to linear regression, enabling direct estimation of BW for each image without the need for additional image pre-processing steps. We employed random repeated subsampling cross-validation, dividing 80% of the pigs for training and 20% for testing, to evaluate prediction performance at each time point. The best prediction coefficients of determination and mean absolute percentage error for each time point were 0.76, 0.86, 0.90, 0.83, 0.90, and 4.57%, 3.76%, 3.01%, 3.41%, 4.84%, respectively. On average, the Xception model resulted in the best prediction coefficient of determination and mean absolute percentage error of 0.90 and 3.01%. Our results suggest that deep learning-based supervised learning models improve the prediction performance of pig BW from industry-scale depth video data.