Accurate rice panicle detection and growth stages recognition are crucial steps in rice field phenotyping. However, conventional manual characterization of rice panicles is time consuming and labor intensive. In this study, a RiceRes2Net based on improved Cascade RCNN (Region-CNN) architecture was proposed to detect the rice panicle and recognize the growth stages under the complex field environment. RiceRes2Net first adopted the Res2Net network and Feature Pyramid Network (FPN) as the backbone network to generate and fuse multi-scale feature maps. Then, RiceRes2Net constituted a four IoU thresholds cascade RCNN to deal with multi-scale feature maps to give the target class prediction and coordinate regression of the bounding boxes. In addition, Soft non-maximum suppression (Soft NMS) and Generalized Intersection over Union (GIoU) loss function were also integrated into RiceRes2Net to better predict the bounding boxes of the occluded panicles. Datasets of the rice panicles were acquired by smartphone in two comprehensive field plot experiments under complex field background. Rice panicles differed in genotype, planting density, growing practices, planting season and growth stages, which constituted a comprehensive rice panicles phenotyping. The results showed that RiceRes2Net outperformed the traditional cascade RCNN in rice panicle detection, with average precision (AP) values of 96.8%, 93.7%, 82.4% at booting stage, heading stage, and filling stage, respectively. Furthermore, RiceRes2Net has a significant advantage in detecting the occlusion panicle thereby increase the accuracy. To test the robustness of RiceRes2Net, the counting results of RiceRes2Net was compared with the manual counting results with an independent test set. The RMSE values at three growth stages were 1.19, 2.56, and 3.13, respectively. In addition, the performance of the RiceRes2Net was compared to the widely used state-of-art deep learning models. The results showed that RiceRes2Net can learn a more representative set of features that helped better locate the rice panicles at three growth stages, and thus achieved better detection accuracy than the other deep learning models. In terms of panicle growth stages recognition, RiceRes2Net showed satisfactory results with high precision values of 99.83%, 99.34%, and 94.59% in recognition of booting stage, heading stage, and filling stage, respectively. The average accuracy of growth stages recognition was 96.42%. The overall results suggest that RiceRes2Net is a promising tool for detection of rice panicles and the growth stage, and has great potentials for field applications.