Net primary productivity (NPP) is an important index to evaluate the carbon absorption capacity in agricultural ecosystems, timely and accurate spatial–temporal variations of NPP play a significant role in guiding agricultural production. Currently, NPP observation at the canopy scale is primarily based on the chamber method. However, upscaling the spatial–temporal estimates of NPP at the canopy scale is still challenging. In this study, maize daytime NPP was measured by the chamber, and multispectral images of maize canopies were obtained via unmanned aerial vehicle (UAV) multispectral system. We explored the potential of multispectral images for estimating maize daytime NPP at the canopy scale. Four machine learning algorithms were employed to estimate daytime NPP using ground factors and vegetation indices (VIs) × photosynthetic active radiation (PAR) independently. NPP estimation models based on the gradient boosting regression (GBR), random forest (RF), and support vector regression (SVR) using ground factors outperformed the VIs × PAR-based model, barring the ridge regression (RR) model. Among them, the GBR-based models performed better, and the model using ground factors (R2 = 0.958) outperformed those using VIs × PAR (R2 = 0.899). However, the ground factor-based GBR model was complex and contained more input parameters, thus making it highly-time and labor-intensive, and highly destructive. Moreover, ground factor-based GBR model could not reflect the variation of NPP in maize canopy. The VIs × PAR-based GBR model could explain 89.9% of the daytime NPP, and it only required two parameters, including VIs and PAR. Therefore, the VIs × PAR-based GBR model can easily obtain high resolution and long time-series NPP information nondestructively for a large range. This study reveals that maize daytime NPP can be estimated using high-resolution UAV multispectral images and provides a good reference for promoting the monitoring and upscaling of NPP observations.