Crop yield estimation is a major issue of crop monitoring which remains particularly challenging in developing countries due to the problem of timely and adequate data availability. Whereas traditional agricultural systems mainly rely on scarce ground-survey data, freely available multi-temporal and multi-spectral remote sensing images are excellent tools to support these vulnerable systems by accurately monitoring and estimating crop yields before harvest. In this context, we introduce the use of Sentinel-2 (S2) imagery, with a medium spatial, spectral and temporal resolutions, to estimate rice crop yields in Nepal as a case study. Firstly, we build a new large-scale rice crop database (RicePAL) composed by multi-temporal S2 and climate/soil data from the Terai districts of Nepal. Secondly, we propose a novel 3D Convolutional Neural Network (CNN) adapted to these intrinsic data constraints for the accurate rice crop yield estimation. Thirdly, we study the effect of considering different temporal, climate and soil data configurations in terms of the performance achieved by the proposed approach and several state-of-the-art regression and CNN-based yield estimation methods. The extensive experiments conducted in this work demonstrate the suitability of the proposed CNN-based framework for rice crop yield estimation in the developing country of Nepal using S2 data.